Inside Architecture

Notes on Enterprise Architecture, Business Alignment, Interesting Trends, and anything else that interests me this week...

November, 2004

Posts
  • Inside Architecture

    Storing configuration settings for your DLL to use

    • 5 Comments

    One common complaint about the .NET framework is that there is only one config file for the application, even if there are many assemblies (exes and dlls).  This post contains advice for how the author of a DLL can keep the configuration settings for that DLL seperate.

    The config file gets its name from the .EXE, not the DLLs.  Therefore, if your U/I is the EXE (which it usually is), then your dlls will be getting their settings from the config file for the EXE.

    It is often the case, however, that the DLL provides services that can be configured, and that you would want to configure those services differently for each application that uses the DLL.

    On the other hand, if you want to provide a set of "common services" using your object layer, then you can create an XML file that the DLLs will use. So, how does the dll find it's XML file?

    I've answered this question many different ways in different apps, trying things out to see what works.  I've got three answers:

    1. Put the name of the XML file into the registry during install.  The DLL looks to the registry to get the config file name.  This is useful if you are writing a DLL that needs to run from Biztalk or Sharepoint, since you cannot control either their executable or the directory in which they are installed.
    2. Give the XML file a fixed name, hardcoded into dll itself.  Have the DLL look into the application directory (where the EXE lives) to find the XML file.
      • Variation: in the app.config for the EXE, provide a seperate section for the DLL to use.  That section will contain the name of the XML Config file.  If no name is given, use the hardcoded name.  If neither is found, use basic default settings or raise an error in the constructor of your classes.
    3. During install of your app, create a specific directory where nothing but the XML config file will live.  When it comes time to go looking for the file, take the first XML file that you find that lives in that directory.
      This is convenient when transferring your app from dev to test to production, because you can have three files: dev.cml, test.cml, and prod.cml (I renamed xml to cml on purpose). 
      When you install the app, all three are placed in the directory.  The next step in the install is to ask the person doing the install "what environment is this" and, using their response, rename the proper file to the "xml" extension.

    In all three cases, loading an XML is not as difficult as it first appears.

    Take a look at the XSD.EXE tool that is delivered with the .NET SDK (a free download from Microsoft).  Using the XSD tool, you can point at an XML and the tool will generate a class that the XML will deserialize into. 

    Using this class, and the XML deserialization methods built into the framework, you can very easily load the entire XML file into an object that contains other objects.  Now, you can use that object "tree" to inspect your settings very easily. 

    In fact, if you change the values in the settings, it is very easy to save those changes back to the XML config file by simply using the serialization functions (something that is a bit more difficult with the config files).

    I hope this provides useful information.

  • Inside Architecture

    SOA and BLOBs -- using SOA principles for block-oriented data transfer (Updated)

    • 4 Comments

    Abstract: What happens when a business transaction, in Service Oriented Architecture, is too big to fit into a simply SOAP transaction?  This (updated) article describes a problem of this nature and the solution that allows block-oriented data transfer to work in an SOA-based application.

    Introduction

    Some things should simply not be done. 

    If I have a large batch file, with ten thousand records in it, and I want to transfer it from point A to point B, using an SOA model to transfer the each record, one at a time, is really dumb.  The folks who believe that "all things must be done in XML" will not gain any points with me on this.

    On the other hand, sometimes a single record (a single business transaction) is big... big enough to consider block-oriented data transfer.  This article is about one such situation and how I am proposing to address it.

    The big business transaction

    I deal with documents.  Some of them are in "source" format (word documents, powerpoint presentations, Infopath forms, even PDF documents), while others are simply scanned images of a document (in TIFF mostly).  These documents, the metadata that describes them, and the relationships that bind them, can form the basis for a "set of documents" that business can understand.  A good example would be the papers you have to sign when you buy a house.  There are deeds and warranty trusts and loan papers and all kinds of stuff.  I don't know what they all are, but I do remember that it took hours for my wife and I to sign them all.

    Together, these documents make a package.

    And now for the problem: we want someone to be able to submit all or part of a package of documents, from one party to another, over the web. 

    Doesn't sound so hard, does it?  Surely, we aren't the only folks to deal with something like this, but I haven't seen many examples of how this is done in other XML-based solutions.  Not even legal e-filing, where this seems a natural requirement.  Perhaps I just missed it. 

    A "business document package" contains header information and many documents.  The list of documents changes over time.  In other words, I can create a set with four documents, add a fifth, then replace the third.  Each document can be a TIFF or another large-format document file (too big to fit in a SOAP message on HTTP). 

    The SOA Mismatch

    Service oriented architectures usually present the notion of a "business document" or "business transaction."  For the sake of clarity, I will use "business transaction" since my transactions themselves contain binary objects that just happen to contain documents... it would be too confusing to describe any other way.

    So we have a business transaction.  This can be implemented in many ways.  SOA says that a document is self-contained and self-defining.  Therefore, the document set must be self contained and self defining.

    Normally, in the SOA world, if a business transaction is updated, we could simply replace the entire transaction with entirely new values.  So, if a transaction is an invoice, we would find the existing invoice header, delete all the rows associated with it, replace the values in the header, and add in the rows from the document.  All this is done as "data on the inside." 

    The problem is that the entire contents of the business transaction are huge.  Our self-contained transaction contains the header information and all of the scanned documents.  If each document is 2Megs, and we have 14 of them, then a 28MB SOAP message starts to seriously stretch the capabilities of the protocol.  It is literally too big to fit into a SOAP message without serious risk of HTTP Timeouts. 

    So, we need the concept of an "incomplete" sub-transaction... and that's where the solution lies. 

    (Note from nick: we decided to go a different direction: I've added details at the end of this posting).

    The SOA solution

    In our interaction, we have two computers.  The sending side, where the transaction originates and the receiving side, that needs to end up with all the data.  Both sides are computer applications with database support underneath. 

    The new transaction is created by the sending side.  It will send the document header and enough information for the receiver to know what documents survive into the final form.  Any existing documents that changed will be deleted from the receiving side.  All documents that don't exist on the receiving side, when this process is done, are represented as an "incomplete records" in the receiving end's database, along with some size data.

    Now, the sending side asks the receiving side for the id of a document that is marked as "incomplete".  The receiving side responds with a message stating that "SubDocument 14332 in document set AB44F is incomplete.  We have block 9 of 12".

    The sending side will then go to the database and extract enough data to send just one block... in this case block 10.  That could be simply 100K in size.  Wrap that up in a SOAP message and send it.  The receiving side will get the message, which contains a complete header document and the contents of this block.  The interaction is done, and will start over with the sending side asking for the id of a document that is marked as incomplete.

    The conversation

    So, it look like this:

    Sender sends:

    <MyDocumentSet id="849C751C-FF5C-4438-A3F0-055B9EE786E3" >
       <Metadata Filer="Nick Malik" CaseNumber="ABC123" ---other stuff --- />
          <Contents>
             <Document id="EBDE445D-5C26-43da-A142-E12A350EC1B6" name="MyDocument1.pdf" --- other header info --- />
             <Document id="9E4F8C83-B2D1-4aee-8C53-B235D026CD1E" name="Document2.doc" --- other header info --- />
             <Document id="05B10DAA-2A01-406b-AAB0-6BAEEF98F7A8" name="MyDocument3.ppt" --- other header info --- />
             <Document id="7135612A-CE48-4371-ABFC-F8EF70DF76CF" name="MyDocument4.pdf" --- other header info --- />
          </Contents>
       </MyDocumentSet>

    Sender gets the message and checks to see if that document set already exists.  If it does not, simply create the document set on the receiver side with four incomplete documents.  A much more interesting case happens if the document set already exists on the receiver side... so let's look at that.

    The receiver looks up document set 849C751C-FF5C-4438-A3F0-055B9EE786E3 and sees that it currently contains five documents.  The first three documents in the existing document set are named in the list above.  The fourth document above doesn't exist in the existing document set, so it is an addition.  The other two documents in the destination document set must be deletions.

    So we delete the two extra documents on the receiver side and add a document for MyDocument4.pdf, and flag it as incomplete.

    Now, the sender asks the receiver for the id of any incomplete documents.  The sender replies with the id of the fourth row above: 7135612A-CE48-4371-ABFC-F8EF70DF76CF and the fact that no blocks of data have been successfully stored.

    The sender side gets this response and decides to send block one of that document.  It goes to the database, gets the first 50,000 bytes of data, encodes it with Base64 encoding, and sends it back to the receiver as the following:

    <MyDocumentSet id="849C751C-FF5C-4438-A3F0-055B9EE786E3" >
       <Metadata Filer="Nick Malik" CaseNumber="ABC123" ---other stuff --- />
          <DocumentBlock id="7135612A-CE48-4371-ABFC-F8EF70DF76CF" name="MyDocument4.pdf" --- other header info --- >
               <Block totalblocks=12 thisblock=1 size=50000>
    FZGl0OzI0NTk2MDs+Pjs+Ozs+O3...a really long string of base64 characters ... 
               </Block>
          </DocumentBlock>
       </MyDocumentSet>

    The receiver now appends this data to the current document on the receiving end.  Note that the receiver "knows" that, even though this message is complete, the document is not complete, because this is block 1 of 12 (see the <Block> tag above).

    The sender then asks again: what documents are not complete.

    The receiver responds again: 
    Document 7135612A-CE48-4371-ABFC-F8EF70DF76CF is not complete... we only have one block of 12. 

    The sender sends block 2... and on it goes until the last block is sent.  At this point, the reciever gets the final block, marks the document as complete, and appends the last set of data to the database.  The next time the sender asks "what is not complete" the receiver responds "everything is complete"

    The loop terminates.

    The motivation for doing block-oriented data transfer this way

    Certainly, we could use FTP or some other mechanism for file transfer.  This method, though, has some characteristics that are interesting.  First off, this protocol is stateless.  That means that, at any time, the sender could stop asking about the status of documents on the receiver side, and nothing is lost.  The sender can go offline, or go to sleep, or lose connectivity, and nothing bad happens.

    Secondly, because the block sizes are relatively small, SOAP doesn't time out.  We can handle extraordinarily large files this way (theoretically in the terabyte range).

    Thirdly, the sender doesn't have to know much about the receiver.  It doesn't have to know if the document set already exists in the database on the receiver side, because the header data is sent with every block.  Therefore, no Commands are being sent.  (See my previous blog on "commandless" documents).

    Pros and Cons (updated)

    At the time of my first posting, this idea was being floated to our development team.  There are pros and cons to this solution that I can discuss in more detail now.

    The advantage of this model is that the receiving side is not getting any data that it doesn't want or know what to do with.  The sending side asks "what do you need," and the receiving side responds with "file X Block 10".  However, this is still a communication protocol.  If the sending side decides not to ask, the receiving side has no option but to leave the content of its database incomplete. 

    This is (a) counter-intuitive, and therefore hard to explain to business users and the development team alike (as I have discovered), and (b) we have mixed the details of data transmission with the details of data representation.  I hadn't thought carefully about this when I first wrote it, but, on hindsight, it's a bad idea.

    An SOA transaction should be complete, self-describing, and self-contained.  The process above saves us from sending the same bits more than once over a wire.  That's its biggest advantage.  But that's not our biggest cost.  All of the wires that I care about, in my application, are owned by my company, and they are utilized at a fairly low rate.  Therefore, we don't save any measurable dollars by making the data transfer process efficient.

    On the other hand, if we seperate out the data transmission from the data representation, then we can test each seperately.  I can test data transmission of a 20 GB file by transmitting any 20GB file and comparing the results with the original.  I can test data representation by creating the business document on one end and copying it to the other using sneaker-net (walking it over) from one dev machine to another.  This test isolation is important for reducing complexity, and that will save measurable dollars... real money from my bottom line.

    The forces that led us to SOA still exist: we want to decouple the sides from each other and we must transfer these large transactions over HTTP or HTTPS connections.

    The new interim solution

    We decided to seperate the data transmission from the data representation.  Therefore, we will create an envelope schema that simply provides a transaction id, the current block number, to total number of blocks, and a data field. 

    So a transmission could look like this:

    <Transmission id="39B2A4DD-AD68-4ae9-AA68-FCC6A48A0FFA">
               <Block totalblocks=12 thisblock=1 size=50000>
    FZGl0OzI0NTk2MDs+Pjs+Ozs+O3...a really long string of base64 characters ... 
               </Block>
    </Transmission>

    What goes in the Block?  A base-64 encoded form of the entire business transaction itself (possibly compressed).

    The receiving side will collect together all the block, assemble the actual stream, decode it, and load it into an XML object.  From that, we can extract embedded documents.

    This data is not optimized for transmission

    We get a lot of data inefficiency in the data format here.  If we haven't thought seriously about compression before, it's starting to become important now.  Here's why:

    Uploaded document, in PDF form, is a page of text.  Notepad would represent it as about 1K.  In PDF, it would be about 5K because PDF includes things like fonts and formatting.  That's fine. 

    In our business document, that 5K becomes 6.7K, because, in our business document, we are embedding it in Base64 text.  Base64 is a format that represents three bytes (24 bits) as four characters of six bits each (24 bits).  Add about 2K of header information (to make our document complete) and the business transaction size hits 8.7K.  At this point, we take that 8.7K transaction and encode it, again, as Base64 for the sake of block transfer.  We now get 11.6K.

    Our PDF went from 5K to 11.6K.  That's double it's original size, and that's assuming UTF-8 encoding in the XML.  If we go with UTF-16 encoding, the XML files can hit 20K. 

    On the other hand, if we compress just before we pack the data into blocks for transmission, we can take that 8.7K document and compress it down to just over 5 K, (even though it is character data, it is not going to compress further, because it is randomized, which removes the advantage of compression).  We take that 5K document and encode it in Base64, we go back up to 6.7K.  Now, that is efficient for data transmission.

    The receiving side has to decompress, of course, but this may be worth it.

    Conclusion

    After reviewing the initial proposal to embed the data transmission mechanism directly into the data representation structure, we rejected the idea in favor of a mechanism that wraps the data representation structure with a data transmission structure.  This allows us to test data transmission seperately from data representation.  It also allows us to stick to the original idea of keeping all of the business data together in a single business transaction, regardless of how large it grows to be.

  • Inside Architecture

    Three levels of abstraction in BPM - Part 1: Business Unit Level

    • 3 Comments

    I identified, in an earlier post, that I believe that there are three levels of abstraction in business process modelling.  The highest level (what is often called the 30,000 foot view), is the Business Unit Level.

    This blog entry will discuss some of the attributes of models at the Business Unit level and some of the rules that should be followed to keep your models clear, consistent, and useful.

    To see the overview, visit:
    http://blogs.msdn.com/nickmalik/archive/2004/09/14/229641.aspx

    At the business unit level, you are modelling the interactions between business units involved in a set of related business processes.  A business unit may be departments within a company, or may represent partner relationships, or customer relationships.  One example I am somewhat familiar with is the flow of medical claims data from a medical provider to one or more insurers who may (or may not) provide some payment against the claim. 

    It is important to note that different departments in a single organization may very well be part of the model.  This level is not restricted to showing only the boundaries created by actual business ownership.  However, to keep this level of abstraction clean, you need to know the boundary for where to start, and where to end. 

    The boundary for the business unit level begins when a business document is completely defined.  For example, a medical claim may contain information about the ambulance charges, room charges, durable medical equipment, and doctor's fees for procedures provided through the institution.  An entire process may take place to create the medical claim (it usually does) and an ongoing process may take place to validate, modify, and/or supercede it.  However, this process is outside the scope of our business unit level model until the business document is created.

    In some scenarios, the fact that a document has been communicated to a partner actually defines its existence.  For example, if I send a claim to an insurer, I will assign a claim number that is unique to me.  In some systems, the claim number is not assigned until I send the claim, and at that point, it becomes a claim.  Before that, no claim exists.  Additional charges for the same patient would be sent on another claim.  In this sense, the fact that the document needed to be transmitted stands as the "moment of creation" for the document.  When defining the business unit level processes, this is a boundary.

    As I mentioned in my overview, the business unit level of workflow description is characterized by a single set of steps, one "thread" of interaction.  Of course, business doesn't really follow a single pathway.  However, the interaction will follow a single "typical" thread which can be modified by the addition of messages.

    So, in our example:

      • The hospital sends the insurance claim to Blue Cross. 
      • Five days later, the hospital sends a message to Blue Cross asking them for status of the claim. 
      • Two days after that, Blue Cross sends a message asking for more information. 
      • The next day, a reply arrives with the results of a particular test (for example). 
      • A week later, a message is sent from Blue Cross to the hospital providing information on a payment that the insurer is making to the institution to cover the charges.  Blue Cross sends a document to their bank instructing payment to be forwarded to the hospital.

    Some definitions:

    Business Document: A business document is a self-contained, uniquely identifiable document that exists as a snapshot of data in existence at a point in time.  

    Message: A message is a self-contained request or command that is passed from one party to another with the expectation that the message itself, or the response to it, will affect the business processes in one of the parties involved.  Messages refer to documents.  They do NOT refer to people, or entries in a database (both of which change over time).  Messages refer to a document that does not change. 

    Some considerations:

    The life-span of the workflow model matches the lifespan of the document.  If a document continues to live, on one side or another, after the primary interaction is through, then the workflow model continues to exist as well.  As a result, a great deal of document management issues, as well as information archiving and retrieval, can creep into workflow.  Remember that, at this level of detail, we deal with documents and messages.  The methods that a business unit may employ to fulfill their committment to respond to a message are not useful details at this level of abstraction.

    One document may give birth to another document.  In our example, a claim gives birth to a payment.  Note that a payment is not a message.  It stands alone from a claim. A document may be sent from the insurer to the hospital informing them of the existence of a payment, along with messages that tie the payment to specific claims. These two documents may (normally will) have different lifecycles.

    So, in conclusion, the Business Unit Level illustrates the interaction between business units, is tied together with the creation of a document and continues through it's lifespan, and describes the interactions in terms of the document and messages that refer to it.

    [minor edits made in Aug of 2008]

Page 1 of 1 (3 items)