Today I’m starting on another spike project. So far there are only questions to investigate, no answers. I’m sharing my plans with you because I believe doing so might help some of you and I’m hopeful that you might share your insights with me as we seek to solve this problem together.
But aren't you solving versioning in the next release?
Why yes – if you saw my session at PDC10 – Windows Workflow Futures you know that we are working on solutions for the next release. However, you have projects that are happening right now and we want to provide answers for you today.
What is "Versioning"?
"Versioning is the creation and management of multiple releases of a product, all of which have the same general function but are improved, upgraded or customized." – SearchSoftwareQuality.TechTarget.com
When developing solutions with Windows Workflow Foundation developers have two elements which must be versioned together as the system evolves over time.
The Workflow Runtime uses a Workflow Definition to create a Workflow Instance. The Workflow Instance creates and updates the Workflow Instance State at runtime. This state includes information about the activities that we executing and the state of variables in the workflow as well as other internal data.
When the Workflow Instance is persisted, the Workflow Instance State is stored in an Instance Store. At some later time, the Workflow Runtime will create a new Workflow Instance and load the Workflow Instance State from the Instance Store into the Workflow Instance.
Windows Workflow Foundation in .NET 4 does not provide explicit support for versioning. Yet we know that any serious implementation must provide a solution for versioning. The purpose of this spike project is to investigate the various dimensions of the versioning problem and to propose solutions.
Activity Library Versioning
This deals with how workflows resolve dependencies to assemblies which contain types that they require. This has been previously investigated and the results published in the following blog posts. This area is out of scope for this spike.
Service/Data Contract Versioning
Versioning Workflow Services is very similar to the general purpose problem of versioning Web Services with regard to Service / Data Contracts. There are well known techniques and resources dealing with this problem therefore this area is out of scope for this project.
Workflow Versioning
While any activity can be considered a workflow, for the purposes of this project a Workflow is defined as the activity which is invoked by the workflow runtime via WorkflowApplication, WorkflowInvoker or WorkflowServiceHost.
These are questions that need to be investigated
To test this scenario
Try Changing a Workflow Definition in various ways
Make a list of the results when various kinds of changes are made that cause incorrect behavior
Investigate options for adding version information to the persistence store
These scenarios represent the target solution scenarios for investigation in this spike. The output of the spike is to produce working scenarios which I can demo showing techniques which achieve the goals.
Given
When
Then
So That
Questions
Ultimately this project is for you. My goal is to help you create robust long lived solutions and getting versioning right is a key element of that. Perhaps you have thoughts about my project. Are there areas I should investigate that are missing? Are there solutions that you have found worked well for you? Just leave a comment and let me know.
Omg. Ron, I just have no words. Just want to say thank you for all work you have done. Looking forward for new posts.
Warning. Bad Eanglish.
My opinion on versioning. I think it may be just enough in some cases to introduce some kind of workflow key-bookmarks (or regions, or containers). When workflow reaches this bookmark, the workflow instance should be aware about where it has stoped. At the momment we have idle instance with the data about the place where particular workflow instance is actualy stoped. Then we can take decissions about can the workflow be safly updated (We can safely update the part which is not executed yet if we work with simple sequential workflow for example). Then we compare two versions of the xamls, and update parts which is between key-bookmarks is subversion like manner, and take care about updating conflicts (variable deleted/introduce etc)
For this system to work workflow definitions should be saved with versioning data in the persistance database. (Is it already done?)
This is quiet obvious things, I belive you already thinked about the problem in this way. But anyway, I just want to deliver feedback, because this is the least thing that I can do to thank you for providing much smoother way to dive inside the WF4 world.
Hi Ron,
I am so glad that somebody as MS is asking this question!
I am usually tasked with making the versioning recommendations within our organisation, and getting some solid direction from Microsoft (and the community) will make my job a lot easier.
I personally have found that in developing a versioning practice, the options available tend to lay themselves out along a linear range of diametric oppositions.
One end represents high granularity (large number of loosely coupled elements), which gives the developer massive power to change the product, but also increases the development cost of maintaining all these 'versioning boundaries'. The other end represents low granularity, fewer components, lower development cost when making a change, but higher customer impact, an increased likelihood that a given change will break compatibility with some element of the product - resulting in a lot of "You just can't use those components together, you have to upgrade X to use Y".
Finding the best position along this scale is a question of resource available to maintaining infrastructure and releases, available tooling, and complexity of the product itself.
I don't see much difference between managing the versioning of compiled binaries in an extensible product, and Xaml in a distributed workflow system.
From a functional perspective, we have come to the same conclusion you have. The two goals we have when rolling out new components, is either to repair an issue rendering existing components non-functional, or to supersede an operational component to change functionality.
The former has strict rules regarding what may be changed, to support 'hot-swap'. The latter, is a complete reversion, and while it must continue to interoperate with the same contractual interfaces that the previous component was expected to, it brings with it all of its tightly coupled dependencies - so the component and those directly supporting it, are considered to belong to a separate lifecycle.
If we look at hot-fixing the ideal scenario would be, applying a hot-fix does not reboot the AppDomain and fail running instances, but all running instances are persisted, once that is achieved, the AppDomain is recycled and instances are rehydrated if necessary. Any which fail to rehydrate into the hot-fix because of versioning rule conflicts (breaking changes) are displayed in the AppFabric dashboard.
The hot-fix should apply to all active workflow instances, because, as you have mentioned, there may be a bug that requires the fix to be applied to workflows that have already started.
Non-hot-swap upgrades should be achieved "side by side" - exactly as you have described.
We have considered using a WCF routing mechanism to ensure that messages are always routed to the version of the workflow that was responsible for creating the workflow instance. That would require that the serialized workflow store information about its version, and the workflow exposes multiple endpoints
i.e.
http://localhost/workflow1/v1.0/ This endpoint specifically points to v1.0 endpoint
http://localhost/workflow1/v1.1/ This endpoint specifically points to v1.1 endpoint
http://localhost/workflow1/ This endpoint will always redirect to the 'newest' endpoint, allowing new workflows to be started as v1.1 (in this case)
Admittedly, everyone could just post messages to the http://localhost/workflow1/ endpoint, and if it's not a message that should start a new workflow, then it must contain a correlation id, and that id could be used to determine which endpoint version to route to, and those 'endpoint versions' could be private, not accepting direct messages (to prevent people from starting old workflow versions once they have been superseded).
This, would be pretty painless - it would have no impact on how clients interact with AppFabric, and it will allow the service contract to change between versions because correlated messages will be redirected to the appropriate version of workflow.
While the concept of allowing a workflow state to 'rehydrate' into a newer workflow version is a wonderful thought - it's not as simple as inspecting the workflow to determine if a workflow instance can be rehydrated into a newer workflow pattern.
Some activity that has yet to be executed may be expecting that a prior activity has manipulated some external resource in a specific way - and that 'manipulation' may have changed between versions. So, while the workflow might simply be passing a different argument to some external resource, which won't affect the rehydration of the workflow negatively, the external resource has been 'initialized' in such a way that is only compatible with the prior version of workflow design... there's no way you can know this.
As a rule of thumb, I suppose if your workflow makes any calls to external resources, and the arguments (or rules that builds the arguments) to that external call changes (some might tricky static analysis there), then you have to fail the 'in-place upgrade'.
Adam Langley
www.winscribe.com
@Adam - Wow - great feedback . You raise some excellent points I will have to consider.
@Dmitry - thanks - keep the ideas coming!
I love your blog posts, and check the RSS feed daily. Great work, I've been waiting for something exactly like this (second only to a date on the vNext stuff ;).
I wonder if there's a "hacky" interim solution similar to what you plan for the full vNext where something vaguely similar to Xamlinjector is used to modify the in-store view of the persistence data based on the deltas between the old and new workflow.
It wouldn't even have to be fully automated - basically I know the difference between these two workflow definitions, what changes do I need to make to the persistence data to reconcile them. For example, I know I added a WriteLine between these two activites, so I'm going to have a program troll through persistence and do an "Insert(something)" between activities X and Y.
Sounds like something MS would probably be loathe to officially support, though, and I understand the reasons why.
@jvrobert I've got some ideas that I'm looking into. One thing is certain this investigation has been very revealing so far and will certainly drive some new thinking into our next release as well.