Scenario

It is a pretty common situation to have transaction date-time stored in a DATETIME field. The problems start with the fact most applications used GETDATE() or some such equivalent at the client side to record the order date-time stamp. So a typical entry for an OrderDate would actually end up with a time component as well. In versions prior to SQL Server 2008, there was only this option - no specific options like the DATE data type.

Problem

While INSERT code worked pretty adequately in this case, the problem starts with running reports of some kind. Most reports take parameters with the date (or date range) for which the reports are required. The (anti-)pattern around the WHERE clause predicates in a typical report query (based on the AdventureWorks database) would look like this:

SELECT *
FROM Sales.SalesOrderHeader
WHERE CONVERT (varchar(10), OrderDate, 112) = '20040731'

OR

SELECT *
FROM Sales.SalesOrderHeader
WHERE OrderDate LIKE '31 Jul 2004%'

Both these predicates are non-optimal for usage of any index (and hopefully it would be a clustered index, to support range scans efficiently) on OrderDate:

  • The first query fails to use an index (and hence scans) because it explicitly converts the OrderDate field to a ISO-format date (varchar) in order to perform the comparison:

image

  • The second query has an implicit conversion to a varchar data type and hence again fails to use any indexes on OrderDate:

image

Workaround

From a design perspective there are potentially 2 problems rolled into 1 in this scenario:

  • The fact that we are storing date and time values in the same datetime field, which is required for an OLTP system
  • The fact that the report is running on the same database as the OLTP system

In many cases the 2nd aspect (reporting and OLTP on the same database) is unavoidable due to hardware constraints. In such cases, the possible workarounds could be:

Option 1: Use explicit timestamps in the literal portion of the predicate in the WHERE clause

This one is perhaps the most practical workaround for most workloads. Here the query is re-written as follows:

SELECT *
FROM Sales.SalesOrderHeader
WHERE OrderDate BETWEEN '31 Jul 2004 00:00:00' AND '31 Jul 2004 23:59:59:997'

The ensuing query plan looks more efficient. Of course the cost of the key lookup is due to the 'SELECT *' usage, and that is something which can be overcome by either:

  • Specifying only the relevant fields in the output list OR
  • Considering a clustered index on OrderDate (this probably makes sense in a reporting-only database)

Another thing to note is the usage of 997 milliseconds in the above query. Why not 999 milliseconds? Because then SQL Server would round it up to '1 Aug 2004 00:00:00'. Check this linked KB article for more details.

image

Option 2: Store only the date portion

If the time portion of the order timestamp is not important, have the application INSERT the CONVERTed varchar (or better still, perform that conversion in the application) and store only the date time portion. If you are in a position to upgrade to SQL Server 2008 then the DATE datatype may be of interest here.

Option 3: Have separate fields for datetime and date portion

Modify the database schema to have separate fields for the OrderDateTime and OrderDate. Modify the INSERT code to insert the raw date-timestamp into OrderDateTime, and the a CONVERTed date portion into the OrderDate field. Use the OrderDate field for reports and OrderDateTime for other queries which need it.

Conclusion

Separating reporting from OLTP processing is the key design aspect to always be kept in mind. In situations where that is not possible, some refactoring of the query can help. Table denormalization, coupled with minor changes in the application can also help in some situations to optimize query execution.

There may be other innovative solutions to the problem, and I am eager to hear from any readers about those comments!