Welcome to MSDN Blogs Sign in | Join | Help

How We Did It - Nintex Reporting

I'd like to introduce today's guest blogger - Mike Fitzmaurice, ex-Microsoft SharePoint Product Manager and currently the Vice-President of Product Technology at Nintex.  As a Microsoft Gold Partner, Nintex aligns with Microsoft's strategic and architectural direction to develop products on the SharePoint platform. 

A common request for many SharePoint deployments is to provide more advanced reporting about a SharePoint farm than what is currently available out-of-box.  For today's blog, Mike explains the reporting needs and then dives into how Nintex Reporting works by providing insight into their technical approach and rational around a few of their design decisions such as using SQL Server's data warehousing capabilities to incorporating Silverlight into their web parts.  I hope Mike's article helps you to envision solutions that can be built on the SharePoint platform.  So over to Mike...

Dave Pae
SharePoint Technical Product Manager

How We Did It: Nintex Reporting

The name indeed merits a moment of clarification: Nintex Reporting is all about content/collaboration analytics reports for anyone who owns/manages any SharePoint assets.  It’s reporting about SharePoint farms, not reporting for SharePoint farms.  Nintex Reporting will tell you things like:

  1. The size of your site collections, sites, lists, and libraries
  2. Which documents are being viewed/downloaded/edited/checked out, etc.
  3. Relative amounts of activity at different points of the day/week/month/year, broken down by type of activity if desired
  4. What content you have in your document libraries, broken down by file type, content type, etc.
  5. Which features have been activated in which sites
  6. How content is distributed across content databases
  7. Which discussion boards (or which lists of any kind) are being regularly used
  8. Which users are contributing list content and/or authoring document/page content
  9. Which documents have been checked out for a long period of time.
  10. Which sites have gone unused for too long
  11. Who’s using search, which queries are being run, and from which scopes
  12. How often content is viewed vs. updated

It’s actually a long list.  There are 75+ reports in the box, many of which have drill-down report definitions.  And if you want to just install it and use it, go for it – it’s designed for a happy experience that way.

But it’s how it works that we think is interesting, and on the assumption that you’d find that interesting as well…

Background

Remember a while back when Mike Fitzmaurice warned people to stay out of SharePoint configuration and content databases? Part of the feedback received was “but I need to be able to report on what’s going on in my SharePoint farm”.  It’s a legitimate issue. If you’re not querying the database, the remaining option is to query the object model, a la SPSReport, not that such a technique is fast or prone to a lot of ad-hoc examination and analysis.

There’s a better way that combines the best of both worlds, and that’s Nintex Reporting.  The process, upon which we’ll elaborate in a moment, goes like this:

  1. Collectors crawl the SharePoint object model regularly and log what they find to a data warehouse.
  2. The data warehouse holds information collected in an easy-to-query ROLAP schema (snowflaked dimensions and multiple fact tables, to be exact).
  3. A data management service runs reports on a scheduled basis and saves their results to a result cache database.  This lets us build up a set of report results that can be quickly retrieved over and over again without having to rerun the query every time.
  4. The user interface, a custom SharePoint site application that makes heavy use of Web Parts containing custom Silverlight controls, queries and displays data from the result cache (most of the time) and the data warehouse itself (when someone wants to re-run a report and/or drill down into the details).

Philosophically, the collectors (one for the SharePoint object model and one for Windows Performance Monitor counters) are the only things that really come in contact with a SharePoint farm.  Everything else is out of band; even the UI is essentially a SharePoint site with Web Parts that query a database.

So let’s look at each of these building blocks in turn…

Collectors

These are lightweight, managed code services that run on one or more Web front ends (WFEs) in a SharePoint farm.  We’ve created a standard API for writing collectors, and we ship with two of them, one for Performance Monitor counter data, and one for collecting data from the SharePoint object model.

The SharePoint Collector, on a throttle-able schedule, navigates through the SharePoint farm hierarchy to inventory every web application, site collection, site, list/library, and item/document.  Whatever it finds, it logs into the data warehouse.

Structure and content are easy, but what about activity?  Well, as it turns out, the only way to get truly useful activity info is to activate the auditing policy on lists/libraries for which you want activity data.  Even if you only have Windows SharePoint Services, auditing policies are there; there’s just no exposed UI to get to them.  We can turn on auditing everywhere programmatically.  We can even prune the audit logs as we harvest data from them to keep content databases from filling up.

We looked at plenty of alternatives – this is the one that provided the most useful data with the least risk of negative impact.  For example:

  • Culling the IIS logs won’t tell you much about what’s happening.  You’ll know what is happening to (if it’s a page or a document), but you won’t really know what’s going on.  IIS logs cannot distinguish between draft save, check-in, or document publish activities.  Almost nothing but auditing can.
  • Sticking a JavaScript token into every page isn’t an option for SharePoint assets that aren’t pages (e.g., documents), plus we’d prefer not to require users to modify every master page.  Plus, oftentimes it’s not about pages; when you’re editing a task list item, what’s the salient thing you want to measure, the fact that are you changing an item or the fact that you’re posting a page?
  • Writing a HTTP Handler could work, but that would be extra intrusive and potentially destabilizing.  One should not tread lightly down this path.
  • Culling the output from WSS/MOSS usage analysis processing was an option we also considered, and would have used but for the fact that (a) there’s some information it doesn’t provide, and (b) it pre-aggregates too much information, leaving us without raw data to slice/dice by user, by type of request, etc.

Trumping all of this is the fact that auditing might well already be turned on to address compliance concerns, and in such scenarios Nintex Reporting represents zero additional overhead.  Moreover, Microsoft has already done what it can to minimize the impact of auditing policies on a farm’s behavior (it’s non-zero, to be sure, but all techniques are non-zero).

Collectors log gathered facts directly to the data warehouse database.  If collectors are installed on multiple Web Front Ends, they make use of semaphore-like information in the warehouse to ensure that no two collectors grab the same data.

Finally, you can have the collectors ignore certain kinds of activity (e.g., the MOSS search gatherer, the portal administrator) and certain kinds of content (e.g., CSS files).

Data Warehouse

Nintex Reporting actually makes use of two databases, one purely for storing the actual data warehouse data, and another database for everything else, which includes configuration data and cached results of previously-run reports.

The data warehouse revolves around three kinds of fact tables, specifically devoted to performance monitor counters, search executions, and content audit events.  List Items/Documents are actually a dimension table, although when it comes to reports about content or structure (as opposed to activity), they behave like a de facto fact table.

In addition, a report (which should actually be thought of as a report definition) is actually a SQL Server stored procedure.  To make authoring/compilation a bit easier, we keep these reports in the data warehouse database as well.

Actually, there’s just a little bit more data in the warehouse to help the collector services as well, primarily a place to read/post collection information so multiple collectors don’t attempt to collect the same data.  It’s kept in this database so the collectors only need to maintain one database connection.

Configuration/Cache Databases

The configuration/caching database is responsible for, well, everything else.  It holds the set of report schedules (which report to run on which day at which time with which parameters), the access control lists that determine which users can view/edit/execute which report schedules, layout information for formatting dashboard statistics, subscription information, and a lot more.

Of extreme importance is the presence of one output cache table for every report definition.  When scheduled (as opposed to ad-hoc) reports are executed by the data management service, their results are redirected into these output cache tables so they can be quickly retrieved as needed without having to rerun the entire report.  It also allows one to rummage through the entire history of results from previous executions.

Data Management Service

The Data Management Service is responsible for checking the report execution schedule and running reports as needed. It also maintains the data warehouse and config/cache databases, purging old warehouse data and/or report results according to administrator preferences.

While the Data Management service could be installed on any machine, the work it does is out of band from standard user requests. As such, we recommend that it be installed on whichever machine you have running WSS/MOSS timer jobs and other scheduled tasks.

User Interface

There are three key parts to the Nintex Reporting interface, a single report view, a dashboard (which shows several reports at once), and the administrative UI. We deliver these inside of a Nintex Reporting SharePoint site, which you can create in the location of your choice.

The view of a single-report schedule (again, one report definition may have many schedules, each with its own parameters, execution frequency, and permissions) is made up of four Web Parts, and looks like this:

clip_image002[4]

What you’re seeing above consists of Web Parts that display (1) the schedule’s parameters and execution history (in case you want to see results from an earlier point in time), (2) the Silverlight-based graphical view of the report, (3) the tabular view of that same data, and (4) possible actions that can be performed on the report (e.g., run now, edit schedule, export to PDF or Excel, subscribe).

The Web Parts are capable of retrieving their data independently of each other, but they’re connectable, and by default receive info on which set of report results to retrieve from Web Part 1 in the above figure (Parameters/Archived Reports). When multiple parts on a page attempt to retrieve data, they will make use of pooled/cached Object Data Sources whenever possible.

The graphical Web Parts make use of Silverlight 1.0 controls of our own creation, which retrieve data by calling an ASP.NET page that returns ready-to-load XAML data from the aforementioned Object Data Source. We opted for Silverlight because it allowed us to build a rich, interactive visualization method that would scale down to WSS and up to MOSS with one codebase, leverage our .NET development skills in the process, and allowed us to offload graphics processing to the browser to keep the load on the server down. We didn’t look at more recent versions of Silverlight because they weren’t shipping at the time.

Dashboard pages employ a set of graphical Web Parts, so while each makes use of an Object Data Source for retrieving/caching data, there will be more data being retrieved/cached for a dashboard page than a single-report page.

We keep track of which report schedules are being displayed in which dashboard, in case a user wishes to subscribe to a dashboard as opposed to an individual report. Subscription checking/notifications, by the way, are handled by a custom SharePoint Scheduled Task job.

Delivering the UI in a Nintex Report Center site serves two purposes. First of all, it puts all of the elements into a single location for navigation, configuration, etc. Second of all, it allows administrators to secure reports by only allowing specific users/groups of that site to edit/run/view specific report schedules. We needed to make use of SharePoint users/groups because we didn’t want to depend on an Active Directory-integrated security model being in effect.

Any and all report schedules can be viewed in Web Parts placed in any other site in any other site collection (e.g., a user’s My Site), but the report schedule will still be subject to the permissions assigned to it in the Nintex Reporting site.

Summary and Upcoming Developments

The goal of Nintex Reporting is to ensure that users get the most information possible as responsively as possible with the lowest impact on IT-managed resources as possible. It’s business intelligence technology applied to content/usage analytics for collaborative content.

What’s particularly significant is that it treats a SharePoint farm as something special. SharePoint sites are not just a set of Web pages, but rather a set of collaborative assets (Web content management sites are both, and we’ll get to that in a moment). What’s more important, a page that renders list items or the items themselves? The solution delivered in the box focuses on the latter case, although the architecture of Nintex Reporting allows both.

We’re delighted to have the product in the marketplace, but we’re by no means resting. In the months to come, Nintex will be focused on three key areas of innovation for Nintex Reporting:

Software Development Kit. This should be ready very, very shortly, actually. It will be primarily focused on showing interested parties how to add new and modify existing report definitions to meet their needs. It will include thorough descriptions of our data warehouse so you’ll know what data is available to be queried, and several examples of custom reports. SQL Server stored procedure development skills will be needed to do this, but new reports can be exported to XML and imported elsewhere, allowing for build-test-deploy scenarios.

Updates to the SDK will include how to create custom collectors to populate the warehouse with entirely new kinds of data, as well as examples of how to make use of the warehouse with SQL Server Analysis Services and Reporting Services.

Per-Item Usage Inspection. Most of our ready-to-use reports support being able to drill down into individual activities for inspection purposes, but this will bring the capability down into individual documents, list items, etc. The data for this already exists; this is strictly an addition to the UI to make more of the data warehouse investment.

Automatic Provisioning of Site-Specific Reports. It’s already possible to create report schedules scoped to specific sites, as well as being able to create new Web Part pages in those sites that contain Web Parts that display those report schedules. This will allow the reports to be provisioned and the page to be added to the affected site in one action (activating a feature on that site). It will make it much easier for IT to be able to offer a set of Nintex Reports as a service to individual site owners.

Reporting on Site/List/Item Security. We do not yet report on permission changes (although they are captured by auditing and can be placed in our warehouse), but an update in the medium term will introduce such reports to our repertoire, as well as reports on who can do what with which sites/lists/items.

Reporting on Web Page Analytics. The above comment about paying attention to what SharePoint sites contain rather than how the application renders it remains true, but Web publishing sites are a separate case. We are at work on an alternate means of gathering usage data for such sites that will be more in line with other Web analytic solutions so Nintex Reporting users will be able to look at page hits, click-through's, etc. Because of the separation of collection from data warehousing from report generation from presentation, however, this is likely to be a matter of a new custom collector as opposed to a radical enhancement to the product’s design.

We actually think you’ll still want to use both the current means of audit-based activity collection for sites where content is authored, edited, and prepared for testing. The combination of approaches will give you a picture into, for example, who’s developing content (which we already do) and how much of that content is being read by the public (which classic Web analytic products do now and we’ll do relatively soon).

Longer-term, we’ll look at taking further advantage of the data warehouse to create entirely new solutions leveraging data mining technology. Compliance monitoring, threshold-based alerts, and administrative actions triggered by observed patterns are all scenarios to which the architecture lends itself – but that’s a story for another time.

Published Thursday, December 04, 2008 3:24 PM by sptblog

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

# Your Publicity To You » Blog Archive » Microsoft SharePoint Team Blog : How We Did It - Nintex Reporting

# SharePoint Daily for December 5, 2008

Top News Stories Microsoft Empowers IT Professionals by Providing Training on Office SharePoint Server

Friday, December 05, 2008 9:50 AM by SharePoint Daily

# Museum Quality Model Ships

The interview took place late last month as a result of the book that Van der Eem is having an American journalist write. The maker of the programme Un Dia Den Bida said before De Telegraaf: “ I have taken two cameras with me. I had one turned off after

Monday, March 16, 2009 1:08 AM by Museum Quality Model Ships

Leave a Comment

(required) 
required 
(required) 

  
Enter Code Here: Required
 
Page view tracker