Compensating Records

What are compensating records?

First a little background. Whenever a work item is updated a new revision is created. These revisions are moved over to the Warehouse and stored in the various Work Item Dimension and Fact tables. Each revision will be in the Warehouse – however each date will not have a revision. Something like (ignoring actual schema for now):

ID

Revision

Date

Remaining Work

1

1

4/10/2009

50

1

2

4/15/2009

40

1

3

4/20/2009

30

1

4

4/25/2009

20

Now if you wanted to ask a question about work items on a specific date then one way you can answer your question is to get the latest revision of the work item created before that date.

For example say you want to know the Remaining Work for a Work item on 4/22/2009. You could get all the work items whose Date is <= 4/22/2009 (Revisions 1, 2, and 3); then from these find the one with the largest Revision (Revision 3). This gives you a Remaining Work of 30.

However this requires you to do a MAX in your query and doesn’t work particularly well in the Cube. So we have added compensating records. These records are primarily used for running sum computations. Each one effectively negates, or compensates, the previous revision of that work item.

For example, using the same data from above:

1. 4/10/2009 – Work Item 1 is added.

ID

Revision

Date

Remaining Work

Record Count

Is Compensating

1

1

4/10/2009

50

1

No

2. 4/15/2009 – Work Item 1 is revised.

ID

Revision

Date

Remaining Work

Record Count

Is Compensating

1

1

4/10/2009

50

1

No

1

1

4/15/2009

-50

-1

Yes

1

2

4/15/2009

40

1

No

3. 4/20/2009 and 4/25/2009 – Work Item 1 is revised.

ID

Revision

Date

Remaining Work

Record Count

Is Compensating

1

1

4/10/2009

50

1

No

1

1

4/15/2009

-50

-1

Yes

1

2

4/15/2009

40

1

No

1

2

4/20/2009

-40

-1

Yes

1

3

4/20/2009

30

1

No

1

3

4/25/2009

-30

-1

Yes

1

4

4/25/2009

20

1

No

Now you can answer the same question as above by getting all the revisions of work items older than 4/22/2009 (Revisions 1, 1’, 2, 2’, and 3) and summing their Remaining Work. This gives you a Remaining Work of (50 + -50 + 40 + -40 + 30) = 30. This should be faster than without the compensating records (and it makes the Cube happy!).

What about that Record Count column?

You can apply the same running sum logic to the Record Count to get the number of work items as of a date. So you could ask how many work items were there on 4/22/2009. Again we sum up the values for all rows with a Date <= 4/22/2009 and we get (1 + -1 + 1 + -1 + 1) = 1. Not terribly exciting with our data, but you can see how if we had more work items this could be useful.

Again – with TFS

Alright so let’s look at some example using the TFS 2008 schema. Say I want to get the number of Active and Resolved Work Items assigned to me as of a given date. First thing I need to know is what tables I should look at:

Info needed in query

Warehouse Table

Active and Resolved Work Items

[Work Item] filtered on [System_State]

Assigned to me

[Work Item History] or [Current Work Item] joined to [Person] on [Work Item History].[Person] = [Person].[__ID] or [Current Work Item].[Person] = [Person].[__ID]

Work Items as of a date

[Work Item History] filtered on [Date]

Putting this together I see I need [Work Item History], [Work Item], and [Person]. So let’s build the query:

SELECT       wi.[System_State]

            ,SUM([Record Count]) AS [Count]

FROM        [Work Item History] AS wih

LEFT JOIN   [Work Item] AS wi

    ON  wi.[__ID] = wih.[Work Item]

LEFT JOIN   [Person] AS p

    ON  p.[__ID] = wih.[Assigned To]

WHERE   [Date] < CONVERT(DATETIME, '2009-03-19', 126) AND

        p.[Person] = N'Nick Ericson' AND

        wi.[System_State] IN ('Active', 'Resolved')

GROUP BY    wi.[System_State]

System_State

Count

Resolved

2

Active

6

I can change the query slightly to get a better query plan (please send any further optimizations along – I’ll add them):

    SELECT       wi.[System_State]

                ,SUM([Record Count]) AS [Count]

    FROM        [Work Item History] AS wih

    LEFT JOIN   [Work Item] AS wi

        ON  wi.[__ID] = wih.[Work Item]

    WHERE   wih.[Assigned To] = (SELECT TOP 1 [__ID] FROM [Person] WHERE [Person] = N'Nick Ericson') AND

            wih.[Date] < CONVERT(DATETIME, '2009-03-19', 126) AND           

            wi.[System_State] IN ('Active', 'Resolved')

    GROUP BY    wi.[System_State]

One thing I should note that can be confusing is that I am passing in 2009-03-19 00:00:00.000 for the Date. Most people would consider this not to be 19th, but instead the 18th (or so barely the 19th that we shouldn’t use it for determining how many bugs were Active on the 19th). However in the Warehouse we truncate the times off of the Date field. So if a work item was marked active at 6 PM on the 19th the Date field in the Warehouse will have 2009-03-19 00:00:00.000. This is a long way of saying that 2009-03-19 is the 19th, not the 18th.

Alright getting the counts for someone on a given day was fairly straight forward. You may want to add in some logic to return Active and Resolved counts of 0 when the query returns NULLs – but this depends on how you will build your report.

Trending

How about a trend report? You basically need to execute the queries from the section above over a range of dates. We can use a Cross Apply here (again send along your optimized queries):

SELECT      d.[Date] AS [Date], ca.*

FROM        [Date] AS d

CROSS APPLY

(

    SELECT       wi.[System_State]

                ,SUM([Record Count]) AS [Count]

    FROM        [Work Item History] AS wih

    LEFT JOIN   [Work Item] AS wi

        ON  wi.[__ID] = wih.[Work Item]

    WHERE   wih.[Assigned To] = (SELECT TOP 1 [__ID] FROM [Person] WHERE [Person] = N'Nick Ericson') AND

            wih.[Date] < d.[__ID] AND           

            wi.[System_State] IN ('Active', 'Resolved')

    GROUP BY    wi.[System_State]

) AS ca

WHERE       d.[__ID] BETWEEN CONVERT(DATETIME, '2009-03-19', 126) AND CONVERT(DATETIME, '2009-03-21', 126)

Date

System_State

Count

3/19/2009

Active

6

3/19/2009

Resolved

2

3/20/2009

Active

7

3/20/2009

Resolved

2

3/21/2009

Active

7

3/21/2009

Resolved

2