A friend of mine pointed out an interesting post by Scott Hanselman that used a clever phrase: "having a High Bus Factor" which is to say: if the original developer of a bit of code is ever hit by a bus, you are toast.
The example that Scott gave was a particular regular expression that I just have to share. To understand the context, read his blog.
private static Regex regex = new Regex(@"\<[\w-_.: ]*\>\<\!\[CDATA\[\]\]\>\</[\w-_.: ]*\>|\<[\w-_.: ]*\>\</[\w-_.: ]*\>|<[\w-_.: ]*/\>|\<[\w-_.: ]*[/]+\>|\<[\w-_.: ]*[\s]xmlns[:\w]*=""[\w-/_.: ]*""\>\</[\w-_.: ]*\>|<[\w-_.: ]*[\s]xmlns[:\w]*=""[\w-/_.: ]*""[\s]*/\>|\<[\w-_.: ]*[\s]xmlns[:\w]*=""[\w-/_.: ]*""\>\<\!\[CDATA\[\]\]\>\</[\w-_.: ]*\>",RegexOptions.Compiled);
I must admit to having developed code, in the (now distant) past that had a similar high bus factor. Nothing as terse as the above example, thank goodness, but something kinda close. On two occasions, actually. I look back and hope that I have learned, but I'm not certain that I have.
The trick here is that I do not know the developer who follows me. He or she will know some basic and common things. The problem lies deeper... It is where my expertise exceeds the ability of a maintenance developer to understand my code... that is where the break occurs.
So how do we avoid this? How does a good developer keep from creating code with a High Bus Factor?
It isn't documentation. I have been using regular expressions for decades (literally) and the above code is wildly complicated, even for me. No amount of documentation would make that chunk of code simple for me to read or maintain.
Pithy advice, like "use your tools wisely" won't help either. One could argue that regular expressions were not being appropriately used in this case, and in fact, the blog entry describes replacing it because it wasn't performing well when larger files were being scanned. That isn't the point.
I would state that any sufficiently powerful technique (whether regex, or the use of an advanced design pattern, or the use SQL XML in some clever way, etc) presents the risk of exceeding the ability of another developer to understand, and therefore, maintain it.
Where does the responsibility lie for insuring that dev team, brought in to maintain a bit of code, are able to understand it? Is it the responsibility of the development manager? The dev lead? The original developers? The architects or code quality gurus? The unit tests?
Is it incumbent upon the original dev team to make sure that their code does not have a High Bus Factor? If so, how?
I'm not certain. But it is an interesting issue.
We have an easy notion of the data dictionary: a description of the data at rest in a OLTP system. But what about the data in motion? That's where the Business Event Schema comes in.
More than a simple XML schema, a business event schema is a description that contains the following elements:
This is, IMHO, a key deliverable for any architect attempting to describe a business process and how systems that are involved in that process can be integrated with one-another in a real-time fashion. This event-driven integration goes hand-in-hand with service oriented architecture, in that the systems involved are loosely coupled, with explicit boundaries, using well defined data schemas, and at a coarse-grained level of interaction.
A team member and I found an interesting problem yesterday that I thought I'd share. We found the problem by luck, and the fix was weird. Perhaps there is an easier fix out there.
The problem manifested itself this way:
We needed to build our five different components into different MSI files (don't ask). Each of the five components refers to one or two "base class" assemblies that are included in each MSI. Previously, we had a single solution for each component that creates the assembly and then builds the MSI. Most of the assemblies end up in the GAC.
We were running into problems where we would end up accidentially installing two copies of a base class component into the GAC.
Our solution was to create a single solution file that builds all of the assemblies and builds all of the MSI files. This way, we could use project references and we'd only get one version of a dependent assembly in any MSI file.
The MSI for installing Assembly A is very similar to the MSI for installing Assembly B, because A and B are very similar. They both inherit from the same base objects. The problem was this: After creating the new solution file, and carefully checking every MSI, it appeared that we had it right: MSI-A would install Assembly A, while MSI-B would install Assembly B.
We saved the project and checked it into version control. Then ran our build script. MSI-A would have Assembly A, and MSI-B would have Assembly A as well. Assembly B was not included in any MSI at all!
Opening the project back up showed that, sure enough, MSI-B was defined to use the project output from project A, even though we specifically told it to use B. Fixing the reference using Visual Studio didn't help. The moment we saved and reopened the solution, the MSI would once again show that it refers to the wrong Assembly.
When project B was created, the programmer made a copy of all of the files of project A, and put them into another directory. He changed the names a little and ran with it. It never occured to him to open up the Project file and change the Project GUID for the new project.
The project GUID is a unique id for each project. It is stored in the project file, but the solution files and the install projects use them as well. Since we had two projects in the same solution that used the same GUID, then VS would just pick the first project with that GUID when building the MSIs. As a result, we had two MSIs with Assembly A and none with Assembly B.
The answer that we went through was to open one of the two project files, in notepad, and change the Project GUID. Then, go through every solution file that referenced that project file and change the referencing GUID value. We had to be careful with our solution file that contained both projects, so that we left one project alone and added the other.
This worked. The effect was odd. I thought I'd post the problem and our solution in case anyone else makes the mistake of creating an entire project by copying everything from another project, and then putting them both in the same solution file.
Adding only management to a flawed software development process makes it worse.
I was having a discussion the other day about the reasons for using SOA. If the liklihood of defects in a system are logarithmically proportional to the complexity of the system, I noted, then SOA is useful because you can create a collaboration of interacting systems, where each system is as simple as possible, and some logic moves to the collaboration or orchestration between them.
To which my friend replied: so if a team has 10 members, and one is not functional, the rest of the team can adapt, but if a team has 10 members, but communication is screwed up, then the team itself is dysfunctional. That's worse. So, can SOA create dysfunctional collaborations? Can we create a "team" of systems that hate each other?
What if one system is best served by mistakes that show up in another? Can that system engage in passive-agressive behavior with another system? What about codependency? Can two systems behave in a manner that is counterproductive to both, but makes both of them look effective from the outside?
Do our test plans need to start including common team dysfunctional behaviors as test scenarios?
I was reminded recently of the fact that long running transactions, especially those involving multiple databases, cannot be made to follow the ACID rules of database transactions. On its face, this is completely true. However, I'm thinking that there are mechanisms that could be used to allow the positive effects of ACID to remain, even when the actual implementation is not available in the automated manner we are used to.
As a refresher: A is atomicity (which means that the entire transaction has to occur or not occur... failure means to roll it back). C is consistency (if part of a transaction breaks a rule, then the entire transaction fails), I is isolation (two people performing actions on the data should not affect one another), and D is durability (committed transactions are not lost when power fails or other adverse events occur).
So if a long running transaction causes a change in Database D1, then is transmitted to a remote system, where the next day, it affects Database D2, (where it could fail), then we lose both Atomicity (because the transaction was committed to D1 even before it is known to be successful at D2) as well as Isolation (since a user could ask both databases for info in the mean time, and get two different answers.
However, the positive effects of ACID come when viewed from the viewpoint of the user. The user is not a concept. He or she is real. They have a goal and a purpose for using the database. If you can present ACID-like interactions to them, then these flaws can be minimized.
In order to do this, I'd suggest that a "system of record" is kept seperate from the systems interacting in the transaction. An interaction with the "system of record" would occur at the last step of the long running transaction. That interaction would only occur if all prior interactions were successful. All users who want the "correct" information would be encouraged to check there. This gives you a kind of atomicity, since a change would not occur in this system until all parts of the transction are complete.
Similarly to Atomicity, Isolation can be met from this location as well, since queries to this system would not return different results depending on the status of various transactions, until those transactions completed and updated the system.
So while long running transactions don't meet the ACID test, systems that support and defect long running transactions can be set up to provide the benefits of ACID transactions fairly readily.