A team member and I found an interesting problem yesterday that I thought I'd share. We found the problem by luck, and the fix was weird. Perhaps there is an easier fix out there.
The problem manifested itself this way:
We needed to build our five different components into different MSI files (don't ask). Each of the five components refers to one or two "base class" assemblies that are included in each MSI. Previously, we had a single solution for each component that creates the assembly and then builds the MSI. Most of the assemblies end up in the GAC.
We were running into problems where we would end up accidentially installing two copies of a base class component into the GAC.
Our solution was to create a single solution file that builds all of the assemblies and builds all of the MSI files. This way, we could use project references and we'd only get one version of a dependent assembly in any MSI file.
The MSI for installing Assembly A is very similar to the MSI for installing Assembly B, because A and B are very similar. They both inherit from the same base objects. The problem was this: After creating the new solution file, and carefully checking every MSI, it appeared that we had it right: MSI-A would install Assembly A, while MSI-B would install Assembly B.
We saved the project and checked it into version control. Then ran our build script. MSI-A would have Assembly A, and MSI-B would have Assembly A as well. Assembly B was not included in any MSI at all!
Opening the project back up showed that, sure enough, MSI-B was defined to use the project output from project A, even though we specifically told it to use B. Fixing the reference using Visual Studio didn't help. The moment we saved and reopened the solution, the MSI would once again show that it refers to the wrong Assembly.
When project B was created, the programmer made a copy of all of the files of project A, and put them into another directory. He changed the names a little and ran with it. It never occured to him to open up the Project file and change the Project GUID for the new project.
The project GUID is a unique id for each project. It is stored in the project file, but the solution files and the install projects use them as well. Since we had two projects in the same solution that used the same GUID, then VS would just pick the first project with that GUID when building the MSIs. As a result, we had two MSIs with Assembly A and none with Assembly B.
The answer that we went through was to open one of the two project files, in notepad, and change the Project GUID. Then, go through every solution file that referenced that project file and change the referencing GUID value. We had to be careful with our solution file that contained both projects, so that we left one project alone and added the other.
This worked. The effect was odd. I thought I'd post the problem and our solution in case anyone else makes the mistake of creating an entire project by copying everything from another project, and then putting them both in the same solution file.
Adding only management to a flawed software development process makes it worse.
I was having a discussion the other day about the reasons for using SOA. If the liklihood of defects in a system are logarithmically proportional to the complexity of the system, I noted, then SOA is useful because you can create a collaboration of interacting systems, where each system is as simple as possible, and some logic moves to the collaboration or orchestration between them.
To which my friend replied: so if a team has 10 members, and one is not functional, the rest of the team can adapt, but if a team has 10 members, but communication is screwed up, then the team itself is dysfunctional. That's worse. So, can SOA create dysfunctional collaborations? Can we create a "team" of systems that hate each other?
What if one system is best served by mistakes that show up in another? Can that system engage in passive-agressive behavior with another system? What about codependency? Can two systems behave in a manner that is counterproductive to both, but makes both of them look effective from the outside?
Do our test plans need to start including common team dysfunctional behaviors as test scenarios?
I was reminded recently of the fact that long running transactions, especially those involving multiple databases, cannot be made to follow the ACID rules of database transactions. On its face, this is completely true. However, I'm thinking that there are mechanisms that could be used to allow the positive effects of ACID to remain, even when the actual implementation is not available in the automated manner we are used to.
As a refresher: A is atomicity (which means that the entire transaction has to occur or not occur... failure means to roll it back). C is consistency (if part of a transaction breaks a rule, then the entire transaction fails), I is isolation (two people performing actions on the data should not affect one another), and D is durability (committed transactions are not lost when power fails or other adverse events occur).
So if a long running transaction causes a change in Database D1, then is transmitted to a remote system, where the next day, it affects Database D2, (where it could fail), then we lose both Atomicity (because the transaction was committed to D1 even before it is known to be successful at D2) as well as Isolation (since a user could ask both databases for info in the mean time, and get two different answers.
However, the positive effects of ACID come when viewed from the viewpoint of the user. The user is not a concept. He or she is real. They have a goal and a purpose for using the database. If you can present ACID-like interactions to them, then these flaws can be minimized.
In order to do this, I'd suggest that a "system of record" is kept seperate from the systems interacting in the transaction. An interaction with the "system of record" would occur at the last step of the long running transaction. That interaction would only occur if all prior interactions were successful. All users who want the "correct" information would be encouraged to check there. This gives you a kind of atomicity, since a change would not occur in this system until all parts of the transction are complete.
Similarly to Atomicity, Isolation can be met from this location as well, since queries to this system would not return different results depending on the status of various transactions, until those transactions completed and updated the system.
So while long running transactions don't meet the ACID test, systems that support and defect long running transactions can be set up to provide the benefits of ACID transactions fairly readily.
I'm specifically looking for feedback on a workflow component I am working on.
We have implemented a workflow engine that maintains a distinction between ResponsibleFor and AssignmentTo. We have ambiguity around the combination of this concept with group assignment. I'd like to know what you think.
Here's the conundrum:
A workflow item has the alias of the person who is responsible for it in a particular stage of the workflow. Responsibility can span stages. Each stage is (ideally) aligned with a business task within an overall process.
Within a stage, we have steps. This "containment" allows us to model two of the three levels of abstraction within a single workflow model. Multiple people can be assigned to a different steps at the same time, and the workflow language has constructs for split and join to allow these activities to be coordinated. The interesting thing to note about this is that assignment does NOT span from one stage to another. When you leave a stage, all assignments are wiped clean, and the next stage gets an assignment to whomever is responsible for the item. This provides a way for the workflow modeler to use ad-hoc workflow, while constraining the users from messing up their own business process.
Now add the notion of "assign to group." Our customers have asked for a way to indicate that a group of people should be made responsible for a work item, and any one member of the group can "take responsibility" for the item. The taking of responsibility is not an event. It simply occurs. In our app, a person visits a web page, sees the items assigned to the group, selects the item he or she wants to own, and clicks "assign to me".
My questions, your thoughts:
In your organization, would it be more correct to say that a "group" is responsible for an activity, or that a person is responsible and that the person has delegated assignment to the group. In other words, if a group is responsible, who is accountable if the item isn't worked? If the item is ultimately escalated to someone, wasn't that person responsible all along?
If you go with the idea of group responsibility: does it span stages? If we accept that a group is responsible for a set of stages, then one person can take responsibility at one stage... does the responsibility revert to the group when the next stage begins? In effect, can the group be responsible but an individual be assigned?