Eventually we all run into a situation that makes us cringe: the users want to design their own reports. If this were something as simple as giving users a color picker and some visual layout tools, that would be one thing. Most of the time it’s more complex. Business users know their problem domain very well; they want to be able to view widget sales for the last three quarters broken up by sales person, location, and four other factors that you couldn’t possibly dream of, but you know you have the data for. What are the best options? How do you get the job done?
It’s no secret that I’m a fan of dynamic SQL and when you’re building ad hoc reports, dynamic SQL can be the quickest way to get the job done. For those of you unfamiliar with the term, dynamic SQL is SQL that has been built in the database in response to inputs from some outside source. Dynamic SQL is frequently used to respond to two separate code paths. For example:
IF @SearchDate < DATEADD(yyyy, -10, GETDATE())
SET @sql = @sql + 'FROM Archive.SalesOrderHeader';
SET @sql = @sql + 'FROM Sales.SalesOrderHeader' ;
Depending on the age of the data, the code will read from one of two tables. The idea, of course, is to keep our implementation invisible from the user. This doesn’t tell us how we can use dynamic SQL to respond to ad hoc queries from users.
In order to use dynamic SQL to respond to user reports, the features must meet some pretty strict requirements – we need to be able to base all of our ad hoc reports on a stable core query. “Stable core query” is just my way of saying “all of these reports join to the same base set of tables.” This makes it easy to dynamically add join conditions to the end of the
FROM clause. In effect, we’re just shoving strings together in the database to get data back to the users.
Unfortunately, dynamic SQL doesn’t always produce the best results. It’s difficult to build true ad hoc querying capacity with dynamic SQL. Users can be given a reasonable facsimile of dynamic SQL by providing many canned choices, but creating this SQL may be time consuming, error prone, and ultimately it can become difficult to maintain. The more options that users have, the more complex the T-SQL becomes. Clearly this pattern doesn’t work for large applications.
Lucene and SOLR
Apache Lucene is an open source search engine written in Java. SOLR adds functionality like document indexing (PDF, Word documents, etc), database integration, geospatial search, and faceted search. Both of these products are free and scale independently of SQL Server. Another benefit is that Lucene and SOLR can be fed documents to index in many formats – XML, JSON, PDF, or Word docs are all acceptable. Developers merely have to point the documents at SOLR and they will be indexed.
If you think this sounds an awful lot like full text search, you’d be right. Lucene and SOLR share a lot with SQL Server’s full text search feature. One of the big differences is that they exist outside of the database. Another difference is that you can query SOLR directly and return all of the fields that you want without having to read from tables in a database.
What’s more, SOLR has a very fluent search syntax making it possible to many types of complex queries. SOLR’s rich syntax makes it very easy to construct complex queries according to user needs without having to worry about writing SQL that can perform the underlying search. Finding sales data for the third quarter of the last five years where the sale contained headphones would look like this:
items_sold:headphones sale_date:[NOW-5YEARS/DAY TO *] quarter:3. It’s certainly not something that business users could write, but neither is SQL.
I’m not a huge fan of OLAP databases because I fear all things I don’t understand.
Okay, that last paragraph is a lie. SQL Server Analysis Services is a fantastic way to build crazy reports, the likes of which the world has never seen. I hadn’t even thought of this mechanism until I talked with a client recently and they told me what they were doing to deliver ad hoc reporting. New or modified data is marked with a timestamp. The data is periodically fed into an Analysis Service cube, right now this is once a day. The users get ad hoc reporting capabilities through the cube. The data currently isn’t real time enough, but the cube processing is fast enough on the current hardware that the frequency could be increased dramatically.
Here’s the best part: many different tools can be pointed at the cube to make this happen. Right now, users are generating reports with Excel and pivot tables, but nothing says that additional front ends can’t be stood up. Excel, Power Pivot, and SQL Server Reporting Services all make it easy for users to create their own reports and get to data in ways that the business can use.
What’s Your Secret?
There are as many ways to implement ad hoc reporting as there are developers implementing it. What are some of the ways that you’ve seen ad hoc reporting implemented. Sound off in the comments.