Official blog of the development team of StreamInsight.
At first glance, hopping windows seem fairly innocuous and obvious. They organize events into windows with a simple periodic definition: the windows have some duration d (e.g. a window covers 5 second time intervals), an interval or period p (e.g. a new window starts every 2 seconds) and an alignment a (e.g. one of those windows starts at 12:00 PM on March 15, 2012 UTC).
Logically, there is a window with start time a + np and end time a + np + d for every integer n. That’s a lot of windows. So why doesn’t the following query (always) blow up?
A few users have asked why StreamInsight doesn’t produce output for empty windows. Primarily it’s because there is an infinite number of empty windows! (Actually, StreamInsight uses DateTimeOffset.MaxValue to approximate “the end of time” and DateTimeOffset.MinValue to approximate “the beginning of time”, so the number of windows is lower in practice.)
That was the good news. Now the bad news. Events also have duration. Consider the following simple input:
Because the event has no explicit end edge, it lasts until the end of time. So there are lots of non-empty windows if we apply a hopping window to that single event! For this reason, we need to be careful with hopping window queries in StreamInsight. Or we can switch to a custom implementation of hopping windows that doesn’t suffer from this shortcoming.
The alternate window implementation produces output only when the input changes. We start by breaking up the timeline into non-overlapping intervals assigned to each window. In figure 1, six hopping windows (“Windows”) are assigned to six intervals (“Assignments”) in the timeline. Next we take input events (“Events”) and alter their lifetimes (“Altered Events”) so that they cover the intervals of the windows they intersect. In figure 1, you can see that the first event e1 intersects windows w1 and w2 so it is adjusted to cover assignments a1 and a2. Finally, we can use snapshot windows (“Snapshots”) to produce output for the hopping windows. Notice however that instead of having six windows generating output, we have only four. The first and second snapshots correspond to the first and second hopping windows. The remaining snapshots however cover two hopping windows each! While in this example we saved only two events, the savings can be more significant when the ratio of event duration to window duration is higher.
Figure 1: Timeline
The implementation of this strategy is straightforward. We need to set the start times of events to the start time of the interval assigned to the earliest window including the start time. Similarly, we need to modify the end times of events to the end time of the interval assigned to the latest window including the end time. The following snap-to-boundary function that rounds a timestamp value t down to the nearest value t' <= t such that t' is a + np for some integer n will be useful. For convenience, we will represent both DateTime and TimeSpan values using long ticks:
How do we find the earliest window including the start time for an event? It’s the window following the last window that does not include the start time assuming that there are no gaps in the windows (i.e. duration < interval), and limitation of this solution. To find the end time of that antecedent window, we need to know the alignment of window ends:
Using the window end alignment, we are finally ready to describe the start time selector:
To find the latest window including the end time for an event, we look for the last window start time (non-inclusive):
Bringing it together, we can define the translation from events to ‘altered events’ as in Figure 1:
Finally, we can describe our custom hopping window operator:
By switching from HoppingWindow to HoppingWindow2 in the following example, the query returns quickly rather than gobbling resources and ultimately failing!
As you can see, instead of producing a massive number of windows for the open start edge e0, a single window is emitted from 12:00:15 AM until the end of time:
Adjusted Events
StartTime
EndTime
Payload
6/28/2012 12:00:01 AM
12/31/9999 11:59:59 PM
e0
6/28/2012 12:00:03 AM
6/28/2012 12:00:07 AM
e1
6/28/2012 12:00:05 AM
6/28/2012 12:00:15 AM
e2
6/28/2012 12:00:11 AM
e3
Query
1
2
3
Regards, The StreamInsight Team