Welcome to MSDN Blogs Sign in | Join | Help

Welcome to The Metaverse

Navigating the service-oriented, identity aware metaverse

News

  • Disclaimer:
    The content of this blog are my own personal opinions and do not necessarily represent Microsoft's position, commitments or strategy. In addition, my thoughts and opinions often change, and as a weblog is intended to provide a semi-permanent point in time snapshot you should not consider out of date posts to reflect my current thoughts and opinions.




    Add to Technorati Favorites
Response to feedback on my "Developing Distributed Services Today" whitepaper

There's an interesting discussion thread ensuing over on The Server Side about my "Developing Distributed Services Today" whitepaper and I thought I'd take a little time to discuss some of the comments made in that thread.

First, let me just provide clarity on the purpose of the paper.

Ever since PDC 2003, where Joe Long first stood up and presented our early guidance on aligning with indigo, we have been inundated with requests for clear, published guidance on how and when to most appropriately use Microsoft's current distributed systems technologies in order to minimize the amount of code that is impacted by the introduction of Indigo. Customers were not asking how they should use individual features of, for example COM+, they have asked "should we be using COM+ at all?" and "what about .NET Remoting", etc. Despite the fact that I and many others have been providing much of this guidance on our blogs, in newsgroups, at conferences and during briefings with many, many customers here in Redmond and at customer sites all over the world, the message still wasn't getting out there. Internally to Microsoft too, other product groups have been asking the same questions.

In order to deal with the sheer volume of requests, I began writing my blog shortly after PDC 2003 and summarizing our guidance in order to get the word out as broadly as possible. I then began writing the whitepaper giving very, very detailed guidance on what these service things are anyway, how we thought of services and the benefits they'd provide and then presented in detail the guidance you see summarized in this paper. However, it took over a year, 5 complete rewrites and dropping the whitepaper to 1/3 of it's original size (and depth) to get it through internal review before it was published in the form that you see. Personally, I'd rather have given you all the detailed version, but we are where we are.

The paper essentially summarizes a concise set of guidance, helping you make decisions that are right for you while deciding which of Microsoft's distributed systems technologies to use and where.

Now, on to some of the specific comments raised in the thread I mentioned above:

System.Transactions
System.Transactions is brand-new technology which provides a very powerful promotable transaction capability for the first time on any platform. System.Transactions neither depends upon nor requires DTC or COM+ from an architectural perspective, but does promote management of a transaction to the DTC when a transaction *must* be promoted from an in-proc local transaction or a machine-wide transaction to span resources beyond the machine boundary. It is this notion of seamless promotion that is (for me at least) most exciting about System.Transactions. System.Transactions is a feature of .NET Frameworks 2.0 and is used by both Indigo and COM+ for local, machine-wide and distributed transactions on the Microsoft platform. What Indigo adds to this picture is the ability to engage in WS-AtomicTransactions for cross-platform and (potentially) cross-organizational transaction negotiations.

Distributed Transactions
The nature of distributed transactions is broad, complex and varied. In the whitepaper, I defined transactions in this way:

A transaction in this context is a two-phase commit transaction often associated with databases such as SQL Server or other transacted resources including MSMQ. The purpose of transactions is to ensure that two or more changes applied across two or more transacted resources are completed atomically, or if there is a failure or problem detected during the transaction, that all changes are reversed so that no partially updated data is left in the system.

On the thread, Peter Lin & StarTrooper (are you guys reading this?) seemed to be rather upset by my definition of a transaction. Here's what they have to say:

I don't like this definition of transactions from the article. The purpose of transactions in a distributed environment isn't just a simple commit/rollback or atomicity. In real world distributed transactions, the systems have different transaction thresholds and states. The point of distributed transactions is to provide a normalized way of handling difference in state and manage different transaction thresholds.

I'm sorry if you guys don't like my definition of a transaction, but if you'd be so kind as to go and read any books on transaction processing like Jim Gray's classic "Transaction Processing: Concepts and Technologies" or Philip Bernstein & Eric Newcomer's "Transaction Processing for the Systems Professional" and you'll find very clear and explicit definitions of the terms and concepts. Yes, there are other definitions and other transaction types and concepts that exist (compensating transactions, queued transactions, etc.). Yes, transactions can be nested. Yes, you can override default isolation levels to let parts of your app read partially/un-committed data if you really want to. Yes, you can build a system from compensating transactions. Yes, you can build transactional queued apps. Yes, the term "transaction" has been horribly overridden and misused - how many people have seen the term transaction being used where operation or action would have been more appropriate?

HOWEVER, by and large, when we talk about transactions and distributed transactions we are really talking about keeping data, state and information consistent. Consistent Data == Two Phase Commit transactions == Distributed Transactions == Local Transactions == Machine-wide Transactions. And before you flame me again == is (= x 2) which means "equivalent" or "related" to!

And that's what I stated clearly in my paper - I specifically defined the type of transactions I was discussing and the fact that transactions are most often associated with databases or data stores such as SQL Server. I did this because I was trying to convey that many developers use transactions unnecessarily and that they incur locks where no locks are needed significantly harming their systems' scalability and often introducing deadlocks and livelocks where none should exist. Finally on this subject, this paper was not an exhaustive treatise of transactions - it deals with some important scenarios and some general guidance that applies to many (if not most), but not all possible scenarios out there in the wild. In the context I presented the discussion, I stand firmly behind my words.

Posted: Friday, July 29, 2005 10:30 AM by RichTurner666

Comments

StarTrooper said:

You say:

"System.Transactions is brand-new technology which provides a very powerful promotable transaction capability for the first time on any platform. System.Transactions neither depends upon nor requires DTC or COM+ from an architectural perspective, but does promote management of a transaction to the DTC when a transaction *must* be promoted from an in-proc local transaction or a machine-wide transaction to span resources beyond the machine boundary. It is this notion of seamless promotion that is (for me at least) most exciting about System.Transactions. System.Transactions is a feature of .NET Frameworks 2.0 and is used by both Indigo and COM+ for local, machine-wide and distributed transactions on the Microsoft platform. What Indigo adds to this picture is the ability to engage in WS-AtomicTransactions for cross-platform and (potentially) cross-organizational transaction negotiations."

Interesting wording but that concept was already implemented by Forte (1997), a distributed computing development environment now owned by SUN that proved very popular years ago...till the SUN came ;-)

Using that tool, you had the concept of transactional objects and distributable transactional objects running transactions across several nodes(servers), platforms (hardware) and OS. Which is basically what you say about being able to run from within your transaction manager (whatever it may be).

So I guess the "brand new technology tag" is ok... as long as it is used for the MS camp. It's been around for quite a while already in other camps though.

As for your transaction definition comments, I'll answer then after finishing my big supersized beef-quarter delight McDonals. ;-)
# July 31, 2005 11:29 PM

peter lin said:

I honestly haven't read through Jim Gray's book from cover to cover. I've only read parts of referenced in journals and other articles. I agree with you the text book definition is what you stated, but I dislike text book definitions. Real world development hardley ever fits neatly into text book definitions.

I'm definitely not an expert or remotely close. Most of my experience with distributed transactions has been in integration environments, so I can only state what I know first hand. In most cases, the consistent data in terms of the database is assumed. Given most databases like Sql Server, Oracle, Db2 and Sybase all provide ACID support, that should not be an issue.

Take Microsoft's MSN for example. Say I call MSN's system to delete an account. That means my system is using Passport 3.0 for authentication, which then redirects me to the actual subscription server. I send a delete account call, which means in MSN's system the account is marked for deletion. In reality, the account stays in the system and isn't deleted for a while. It is up to the discretion of MSN to decide when that account is purged from the database.

Say a customer signs up with one MSN partner. some time later, the customer moves to another city, so the cancell the account with the ISP, which sends a delete account to MSN's system. A month later the customer signs up with a new ISP and wants to use the same MSN account. If the customer created the account directly through MSN, they are ok. On the other hand, if the customer account was created with a ISP specific brand, should the customer be allowed to re-use the same username?

The data in the database is consistent. But the customer will be rather angry at MSN and the ISP if they can't re-use the previous username. Text book definitions are fine for marketing material, but they don't teach you how to really build distributed systems.

my apologies if my criticism was rude or overly harsh. but the real world is far more complex than text book definitions.

peter
# August 1, 2005 11:28 AM

RichTurner666 said:

Thanks for replying guys.

StarTrooper: Sure, Forte was an interesting system, particularly at the time Borland bought them, prior to selling it to Sun. I am pretty sure (but correct me if I am wrong) that you didn't have the ability to have several objects inside a single project that coordinated ACID transactions between one another via an in-proc transaction manager ... which, when necessary, automatically promoted the transaction to a machine-wide transaction manager ... which, when necessary, automatically promoted the transaction to be coordinated by a distributed transaction manager such as DTC.

Oh, and you'd better watch those Supersized McMeals - you *know* it'll catch up with you in the end! ;)

Peter: What you describe in this scenario is a very real system, yes, but not a transactional system. Had the "close account" and "delete hotmail account" been bound in a transaction, then both changes would have been submitted atomically, or both rolled back. Now, having said that, it's up to each transacted resource to actually perform the deletion and each will usually have an SLA for completing such a task. If we're talking about a system that supports real 2-phase commit and "true" transactions, that system will have to perform the actual account deletion immediately else break data integrity rules by permitting an account to remain in existence, whilst it's state would be "deleted".

There are many, many reasons why systems defer updates to a point in time, which I am sure we're all familiar with. It's also true that many such systems break data integrity rules. However, this is a failing of the design/implementation of the system, not of "transactions" per se. I can show you many instances of huge-scale systems that correctly execute transactions, observe data integrity rules and maintain good, solid, reliable, performant levels of operation. It's all down to how you design and build large-scale or distributed systems - if you design for correctness, your system will have a much higher chance of success than if you don't!
# August 1, 2005 2:10 PM

About Forte... said:

"StarTrooper: Sure, Forte was an interesting system, particularly at the time Borland bought them, prior to selling it to Sun..."

I think we are talking about diferent products...which Forte are you talking about?
As far as the one I am talking about... I was able to promote transactions besides server/platform/OS boundaries...and all handled by an unique environment manager.
# August 1, 2005 7:42 PM

Peter Lin said:

Thanks for responding. It's clear my explanation was poor and rather obtuse, so I'll attempt to explain it better.

<i>Peter: What you describe in this scenario is a very real system, yes, but not a transactional system.</i>

I would agree with you my integration scenario is beyond database transactions or 2-phase database transactions. Having said that, very few real world cases I've worked on first hand fit the simple 2-phase commit definition. Say I work on a internet service. The business only provides the physical connection an everything else is co-branded through partners. My business partners have direct customers and co-branded customers. Lets say the signup process goes like this:

1. a customer picks an username
2. the system must check the username is available in all partner systems
3. if the username is available, the system should reserve the username
4. each system is independent and the business process for activation, deletion, suspension and unsuspend are different
5. the cancellation of a the internet access results in a termination call to all partner systems.

<i>Had the "close account" and "delete hotmail account" been bound in a transaction, then both changes would have been submitted atomically, or both rolled back.</i>

Using the case I just described. my system makes a call to MSN, which changes the account from active to terminated. Say my system deletes the account after 3 months for business reasons. Let's say MSN's system doesn't actually allow me to delete an account and only allows me to change the account's status to terminated. The account might actually stay around indefinitely for legal reasons. by legal reasons, I mean government regulations may require portal providers retain the emails in the event the data is needed for an active investigation.

<i>Now, having said that, it's up to each transacted resource to actually perform the deletion and each will usually have an SLA for completing such a task. If we're talking about a system that supports real 2-phase commit and "true" transactions, that system will have to perform the actual account deletion immediately else break data integrity rules by permitting an account to remain in existence, whilst it's state would be "deleted".
</i>

Now, lets say MSN agrees to send a delete confirmation to my system when deletion is complete. Since the elapsed time from the time I originally sent the "terminate account" transaction and the confirmation may be months or a year, my system needs to retain the state information. If my business only worked with MSN, the scenario would be done at this point. Since my system integrates with other systems, the second part of 2-phase transaction becomes rather difficult.

to make it more realistic, say a customer cancells the account and signs up with a competitor. After 2 weeks, the customer returns, and wants to use the same username. I have a new promotion with a different portal and the user chooses it. From the perspective of my system, I ignore MSN altogether. this makes sense, since the customer doesn't want MSN, so the account should still be deleted. When my system gets a delete confirmation, I have several choices.

1. simply repond "success" even though the account is actually active with a different portal
2. track the state of MSN and the new portal. this creates a huge overhead, so I don't like this option
3. ignore the confirmation, and hope it doesn't cause MSN to throw an exception

On the otherhand, if the customer decided to go with MSN, I could send a rollback to MSN, and change the state of account from terminated to active.

<i>
There are many, many reasons why systems defer updates to a point in time, which I am sure we're all familiar with. It's also true that many such systems break data integrity rules.</i>

real life systems breaks academic rules all the time. I can't think of a single system that I have worked on or know, which fits the simple form of 2-phase commits.

<i>However, this is a failing of the design/implementation of the system, not of "transactions" per se.</i>

I don't consider this a failing of design. No one is clarivoiant, so these imperfections and short comings are a fact of life.

<i>I can show you many instances of huge-scale systems that correctly execute transactions, observe data integrity rules and maintain good, solid, reliable, performant levels of operation.</i>

I know of plenty of systems that do handle 1 and 2 phase transactions correctly within internal systems. Once a business starts integrating with other businesses and systems, things get complex very fast.

<i>It's all down to how you design and build large-scale or distributed systems - if you design for correctness, your system will have a much higher chance of success than if you don't!</i>

Frankly, that's not possible. if anything, large scale global systems deal with dozens of complex integrations. that's why it costs so much to propogate a transaction through out the internal systems and out to all partners.

really only small systems can stay true to the textbook definition of 2-phase commit. probably the simplest example is the classic travel booking scenario. A travel agent reserves a flight for the customer. If the customer likes the travel plan and decides to take it, the agent sends the second part of the transaction. If not, the agent sends a cancell. In practice, things don't work that cleanly. If 2-phase commit worked like textbook definitions, flights would never be over booked.

peter lin
# August 1, 2005 9:38 PM

StarTrooper said:

I think the point Peter is trying to make is that transactions are more than just 2PC stuff but unfortunatelly it is often related by the vast majority to database transactions. The real world says that transactions and their context relate directly to the domain they are being executed in and therefore relate closely to the application handling the whole thing.

My personal view is that the definition of transaction you have used "rollback/commit" is just a left over of the old days where everything was DB driven and stateless. In a real distributed scenario, you should be able to consistently keep and synchronize DATA and the STATE of the objects holding that data across several boundaries, not just platform related but also application and domain related. The interesting thing here is that we have added one additional level to the whole transaction thing: the control of the data and its state (which may be in memory or across several servers) + the persistance mechanism used for storing that data (and maybe its state). If I remember well, that was the limitation of having transactions spawning across several COM components where they were stateless and didnt have any retry/failover capabilities nor control over the whole thing.

Sorry for my messy working but English is not my mother tongue.
# August 2, 2005 12:50 AM

RichTurner666 said:

Hey Peter - thanks for the reply. The scenario you're describing above is not a single transaction. You have several transactions bound within the scope of a larger-grained business process which has to take various remedial actions upon a failure - forms of "compensation" if you will.

It is infeasible to assume that a transaction "must always" be applicable over every business operation. Transactions are mechanisms that help one ensure the integrity or "correcness" of the state of any set of data - regardless of where it's stored - in-memory, on disk, on tape, on punched card, etc. The guarantee that a transaction provides is based upon the notions of ACID and ensures that if you add/delete/change several things within several stores, then all of those changes are applied atomically, or all are rolled back atomically. That's all it guarantees. Transactions require locks in order to prevent multiple simultaneous updates to records within a scope that is already being changed in order to prevent data being overwritten or incorrectly reversed. You don't want these locks in place for long periods of time so we try (through careful design) to keep transactions as short-lived as possible. Can/should transactions be used for all data modification scenarios - heck no - the world would come to a stand-still if we even thought about trying to.

The mechanism that each transacted resource uses in order to abide by the commitments it makes is up to the resource manager. Some use logs - some are linear, some are circular. Some are persisted, some are not. Each has strengths and weaknesses.

However, the notions of a two-phase-commit transaction remain consistent: Phase 1 - "Can everyone commit this?" ... if so, then phase 2 - "Do it!". Yes, there are places this breaks down, but in general, this is an essential mechanism which has stood the test of time ever since they were first introduced in the 1960's based upon Jim Gray's (Turing Award winning) research on Transaction Processing.

Regardless of your beliefs/experience of the validity of 2PC transactions, in the context of the whitepaper the definition of a transaction remain the same and their applicabiliy as a tool to help ensure the integrity of data are inescapable.
# August 2, 2005 5:27 PM

RichTurner666 said:

Hey StarTrooper - thanks for taking the time to reply.

As I mentioned in my earlier reply to Peter, the definition of a transaction in the context of this paper was accurate - we're talking about two-phase commit transactions in a distributed landscape.

And for the record, you can build very powerful and complex systems from COM+ components that support transactions and maintain the correct transactional semantics. It takes care, but it can be done. If a component (or its hosting machine) fails and doesn't respond during the 2PC commit process, then all other changes will be removed. If the component had begun making changes in other transacted resources, those changes will not yet be comitted and the transaction will (should) timeout and the part-changed records be removed from the resource. In this way, we still have consistent data.

But as I pointed out above, 2PC transactions aren't always applicable. Often, other forms of compensation are required.

Take, for example, what happens when you buy a car. If you class the transaction as the moment you sign the purchasing papers to the instant your car is delivered, then what happens if you cancel your order while your car is getting painted in the production factory? You can't disassemble the car back to it's components! In this case, the manufacturer would probably shuffle orders, using your part-completed vehicle for another customer's order or just sending it off as unclaimed stock. Either way, that's not a scenario that 2PC transactions can help you - that's where compensation comes into play.

But that's not what we defined, and not what we should be arguing about in this case.

The point of this discussion was whether I defined 2PC transactions - distributed or otherwise - incorrectly. In the context of what I wrote in my paper, I did not - 2PC transactions are 2PC transactions, regardless of whether you like them or not!
# August 2, 2005 5:36 PM

Sam Gentile's Blog said:

# August 3, 2005 4:42 PM

Sam Gentile's Blog said:

# August 3, 2005 11:47 PM

Mitch L. said:

Thanks for your white paper. To me it answers the questions I had regarding best technology choice to minimize Indigo migration costs.

I would venture to guess you were not trying to ascertain the definition of transactions, it does not seem to be core to the subject you are trying to address. Please correct me if I'm wrong, but the question you are trying to answer is what set of technologies - providing that you have a choice and are not committed to one already - should you use today in order to minimize the costs of migrating to Indigo?

I guess it’s kind of hard to stay on point with us technical guys sometimes :)

I attended the Tampa Indigo Road Show you and Ami conducted and it was great. Especially the code first show slides later approach.

By the way, I personally liked Indigo better than Windows Communications Foundation, but I guess you eventually had to let the marketing guys come in a have their fun :)

# August 12, 2005 12:26 PM

Anelia said:

Good blog
# September 1, 2005 4:22 PM
Anonymous comments are disabled
Page view tracker