Hi Folks,
As I mentioned some a few posts ago, I recently made a move to the StreamInsight team. Well, we’ve just released the new version, StreamInsight 1.2, so it’s time to start talking about what we’ve been up to.
One of the big features in the new version is resiliency, which helps you build highly available systems that use StreamInsight. This post gives a high-level, no-code overview of what this functionality does and how it can be used. More detailed posts will follow.
To motivate the feature, consider a few use cases:
The form of resiliency introduced in StreamInsight 1.2 allows SI to take and restore checkpoints. A checkpoint is a serialized form of all of the internal state SI keeps for a query. When SI restarts, it will automatically restart a previously-running query and restore its state from the checkpoint, effectively putting the query at a known point when the checkpoint was taken.
You can control when checkpoints are taken: take them more frequently and you’ll lose less in a crash; take them less frequently and you’ll keep the impact on the system low. Let’s consider how this would be used in our first use case:
Perhaps all of this is fine for a silly temperature reading, but what if someone were to tweet a Bieber sighting during the outage? To avoid missing this critical information, StreamInsight needs some help: something needs to keep track of the events that occurred during the outage, as well as those that occurred before the outage but after the last checkpoint, and be ready to send them along when SI comes back up.
That something is the input adapter—or more properly, whatever the input adapter is connected to. In other words, there has to be a component sitting in front of StreamInsight that keeps track of recent events and can replay them for SI. And this component has to be independent and highly available as well.
If we have this in place, we’re ready to tackle our second use case:
And that’s not bad. Unless events have consequences—which is certainly the case with stock trades. To eliminate these duplicates, StreamInsight needs a little more help, this time on the output. As SI produces events, the output adapter—again, more properly, whatever the output adapter is connected to—has to remember them.
When StreamInsight comes back up after an outage, it will essentially tell its output adapters the last time it remembers. It also guarantees that every event that it produced after that time will be produced again. Since the output adapter knows what these events were, it can remove the duplicates when it sees them instead of acting on them. And this is what we need in our third use case:
Hopefully this gives a feel of what’s possible with checkpointing. I’ll have more to say on specifics over the next few weeks. In the meantime, be sure to pick up the release and take a look at the documentation. And there’s a full end-to-end demo available on CodePlex as well.
Cheers, -Isaac