Recently, I started a personal project for analyzing stock market trends and calculating the optimal trading strategies. I've learned a lot of new stuff about LINQ and SQL 2008 from this. Based on that I'm opening a new series on data warehouse tips.
In my project, one of the things I quickly realized after calculating that I would need to generate about 1 billion rows in order to have sufficient detail for analysis of 5,000 publicly traded stocks with 1 year of history was that I needed to consolidate the data. It was obvious I needed a way to summarize data into periods and use those as dimensions for analyzing the performance, rather than calculating all of this from the detailed data.
What would be nice, would be to have periods that could overlap and align. For example, I want to be able to go back and look at last month, last quarter, last 6 months, and last year. However each one of these requires a different set of transactions because even though the closing date is the same, the opening date is different. The variation in open date affects the price at which a stock would be purchased at (or short-sold at), thus leading to a different set of transactions based on an entry/exit strategy. In order to focus on the technical aspect of this, I won't get into all of the application design considerations. Suffice to say, that discrete, yet aligned periods were needed.
So I immediately wrote a stored procedure (final version below) to try to insert my periods and decided to leverage the merge capability.
Below is the table definition of the Period Table
CREATE TABLE [dbo].[Period]( [PeriodId] [smallint] NOT NULL, [StartDate] [date] NOT NULL, [EndDate] [date] NOT NULL, [MonthsInPeriod] AS (datediff(month,[StartDate],dateadd(day,(1),[EndDate]))), CONSTRAINT [PK_Period] PRIMARY KEY CLUSTERED ( [PeriodId] ASC)) ON [PRIMARY]
Note, that we can calculate the number of months in the period (assumes all periods are in 1 month increments) easily using a computed field with datediff
And the Merging (upsert) stored procedure
CREATE PROCEDURE [dbo].[UpsertMonthlyPeriod] @PeriodDate date, @Months intAS BEGIN DECLARE @EndDate date SET @EndDate = DATEADD(DD,-1,DATEADD(MM,@Months, @PeriodDate)) MERGE INTO Period AS Target USING (VALUES (@PeriodDate, @EndDate)) AS Source (NewStartDate, NewEndDate) ON Target.StartDate = Source.NewStartDate AND Target.EndDate =Source.NewEndDate WHEN NOT MATCHED BY TARGET THEN INSERT (StartDate, EndDate) VALUES (NewStartDate, NewEndDate);END
This is very simple example of the new Merge functionality. The USING portion identifies the values to use for mapping the source data to the target, which in this case was simply start-date and end-date (remember I am basically allowing overlapping periods for analysis purposes).
The below stored procedure then actually generates the periods by calling the Upsert stored proc:
CREATE PROCEDURE [dbo].[SetupPeriods] @StartDate date, @Periods int, @CreateCummulative bit AS BEGIN DECLARE @x int = 0 DECLARE @PeriodDate date SET @PeriodDate = @StartDate
WHILE @x < @Periods BEGIN -- Create Monthly periods DECLARE @PeriodMonths int EXEC dbo.UpsertMonthlyPeriod @PeriodDate = @PeriodDate, @Months=1 -- Create Quarerly periods IF @x % 3 = 0 OR @CreateCummulative =1 BEGIN EXEC dbo.UpsertMonthlyPeriod @PeriodDate = @PeriodDate, @Months=3 END -- Create Bi-Annual periods IF @x % 6 = 0 OR @CreateCummulative =1 BEGIN EXEC dbo.UpsertMonthlyPeriod @PeriodDate = @PeriodDate, @Months=6 END -- Create Annual periods IF @x % 12 = 0 OR @CreateCummulative =1 BEGIN EXEC dbo.UpsertMonthlyPeriod @PeriodDate = @PeriodDate, @Months=12 END SET @x = @x+1 SET @PeriodDate = DATEADD(MM,@x,@StartDate) ENDEND
The end result is the data shown below:
PeriodId StartDate EndDate MonthsInPeriod8041 2008-04-01 2008-04-30 18043 2008-04-01 2008-06-30 38046 2008-04-01 2008-09-30 68040 2008-04-01 2009-03-31 128051 2008-05-01 2008-05-31 18053 2008-05-01 2008-07-31 38056 2008-05-01 2008-10-31 68050 2008-05-01 2009-04-30 128061 2008-06-01 2008-06-30 18063 2008-06-01 2008-08-31 38066 2008-06-01 2008-11-30 68071 2008-07-01 2008-07-31 18073 2008-07-01 2008-09-30 38076 2008-07-01 2008-12-31 68081 2008-08-01 2008-08-31 18083 2008-08-01 2008-10-31 38086 2008-08-01 2009-01-31 68091 2008-09-01 2008-09-30 18093 2008-09-01 2008-11-30 38096 2008-09-01 2009-02-28 68101 2008-10-01 2008-10-31 18103 2008-10-01 2008-12-31 38106 2008-10-01 2009-03-31 68111 2008-11-01 2008-11-30 18113 2008-11-01 2009-01-31 38116 2008-11-01 2009-04-30 68121 2008-12-01 2008-12-31 18123 2008-12-01 2009-02-28 39011 2009-01-01 2009-01-31 19013 2009-01-01 2009-03-31 39021 2009-02-01 2009-02-28 19023 2009-02-01 2009-04-30 39031 2009-03-01 2009-03-31 19041 2009-04-01 2009-04-30 1
But, wait! Where did the PeriodId come from? It's not an identity field and the numbers appear to follow a pattern.
Can you guess the pattern?
Yes, that's right - last 2 digits of year + 2 digit month + length of period (except that 12 month period is considered "0"). This was so that it could still fit in smallint field. I've got millions of rows linking back to the period and want to keep the index small, so didn't want the extra digit.
The trick for this was an insert/update trigger shown below:
CREATE TRIGGER [dbo].[Period_IU_Trig] ON [dbo].[Period] AFTER INSERT, UPDATEAS BEGIN-- Set Period Id to be meaningful (YY + MM + Period Length)
SET NOCOUNT ON; UPDATE Period SET PeriodId = CAST(CAST(DATEPART(YY,INSERTED.StartDate) - 2000 AS VARCHAR(2)) + CAST(RIGHT(100 + (DATEPART(MM,INSERTED.StartDate)),2) AS VARCHAR(2)) + CAST((CASE WHEN INSERTED.MonthsInPeriod = 12 THEN 0 ELSE INSERTED.MonthsInPeriod END)AS CHAR(1)) AS SMALLINT) FROM INSERTED WHERE INSERTED.PeriodId = Period.PeriodId
Note, I normally don't advocate having "meaningful" surrogate keys (an oxymoron), but in this case, its very handy in testing to quickly identify the period without going back to the source table.
To present this to the users, I provide a view that formats the dates to show the date range in descending order of the period along with the length in months. For this scenario, this is easier to work with than a calendar control and works great for a drop down as you can see below
The stored procedure for formatting this is shown below:
CREATE procedure [Reports].[SelectCurrentPeriod]ASSELECT TOP (100) PERCENT PeriodId, CAST(StartDate AS VARCHAR(10)) + ' To ' + CAST(EndDate AS VARCHAR(10)) + ' (' + CAST(MonthsInPeriod AS VARCHAR(2)) + ' Months)' as PeriodDescFROM dbo.PeriodWHERE (EndDate < (SELECT MAX(LoadDate) AS Expr1 FROM dbo.LoadHistory))
There you have it, a period table and some supporting procedures. For our next tip, we'll look at how the period table actually comes into play when linking up with the data.