Official blog of the development team of StreamInsight.
I tend to partition StreamInsight operators into three classes: the basic “temporal/relational” operators; the “time travel” operators, and; the “elephant” operators. The basic operators illustrate the rules. The time travel operators bend the rules. The scan operators break them altogether. While I’d like to focus on time travel today, we can’t bend the rules before knowing what they are. Let’s begin with the basics!
A StreamInsight query describes how the results of a normal relational query change over time. The start time of an event indicates the time that a row is inserted into the relational input. The end time indicates when that row is deleted. StreamInsight’s basic temporal/relational operators can be understood in those terms. For instance, the StreamInsight ‘join’ operator produces output for coincident input tuples matching some predicate. Tuples are coincident if their time intervals overlap. Another way of looking at it: all matching tuples across all time logically produce output, but that output may have an empty temporal intersection. The basic temporal/relational operators:
Time travel operators bend the rules because they allow an input event to contribute to results outside of their original time interval! The time travel operators:
The fundamental time travel operator is alter event lifetime because the others can be expressed in terms of it. (I’ll leave this as an exercise for the reader). A hint for hopping window operators: one way of thinking about a hopping window is that events belonging to the same window are reassigned to the same time interval so that they can contribute to the same output time interval!) While there are several variations on the alter lifetime operator, they all boil down to the following components:
If you do modify start times and durations, make sure that the modifications to start and end times are monotonic. In other words, make sure that the changes do not affect the relative order of start and end edge events. Otherwise, you risk introducing CTI violations. CTI violations bring down the query: they indicate that a new event has arrived affecting a result that has already been committed! Like traveling through time to alter the past, violating CTIs causes (can I even use that word here?) bad things to happen.
Shifting all events forward or backward by a fixed amount is safe because the CTIs are moving with the events*:
Making time pass faster or slower doesn’t violate the monotonicity requirement, but depending on the nature of the input can introduce CTI violations. If the input stream contains end edge events, then they could also violate CTIs if not modified, as in the following example which attempts to double the speed at which time passes:
We can address the failure by ensuring that end edges also advance at double-speed:
There is a very important loophole that can be exploited for duration modifications. If the duration selector returns a fixed value for every event, StreamInsight no longer needs to produce end edges internally, and those non-existent end edges no longer pose a CTI violation risk:
In summary, there are several completely safe forms of time travel in StreamInsight:
While it may be safe to stray from these rules for specific data and query combinations, I suggest playing it safe!
* Notice that the start time selector function does not allow behavior to be affected by event payload. This is because CTI events, which are also modified by the selector, do not carry payloads.
Regards, Colin Meek/The StreamInsight Team