Sign In
Gilles' WebLog
Not actually a Blog
Translate This Page
Translate this page
Powered by
Microsoft® Translator
Options
Blog Home
About
Email Blog Author
Share this
RSS for posts
Atom
RSS for comments
Search
Advanced search options...
Search In:
Everything
Blogs
Forums
People
Groups
Places
Pages
Date range:
All Time
Last Year
Last 6 Months
Last 3 Months
Last Month
Last Week
Last Two Days
Tags
BizTalk 2004
BizTalk 2006
Code
Japan
Non computer
Archive
Archives
June 2009
(1)
November 2007
(1)
September 2007
(1)
June 2007
(1)
July 2006
(2)
June 2006
(3)
May 2006
(1)
April 2006
(1)
March 2006
(4)
June 2005
(3)
January 2005
(1)
September 2004
(10)
August 2004
(7)
July 2004
(9)
June 2004
(18)
May 2004
(2)
The eXcel files: Do not use Office Object Model when processing Office documents with BizTalk 2004
MSDN Blogs
>
Gilles' WebLog
>
The eXcel files: Do not use Office Object Model when processing Office documents with BizTalk 2004
The eXcel files: Do not use Office Object Model when processing Office documents with BizTalk 2004
Gilles Zunino
28 Jun 2004 9:34 AM
Comments
2
Every once in a while, someone
posts
on microsoft.public.biztalk a question about processing Office files in BizTalk. Sometimes it is Word documents, sometimes it is Excel workbooks. This post explains a typical trap you should not fall in and why.
What are the options?
In general, you have at least the following options:
Do not process the file itself but process a
reference
to the file (i.e. URL to the file for instance),
Arrange for the file to be saved in XML before submission to BizTalk 2004,
Arrange for the file to be saved in a comma separated format (or any other delimited format) and use the flat file disassembler to produce XML messages in the receive pipeline,
Write a custom receive pipeline component to crack open the Office file and build an XML message yourself.
These options are sorted by complexity: the reference processing is indeed the simplest way and does not offer much flexibility. Writing a custom pipeline component is the most complex solution with high flexibility.
Option 1
is very good if your intent is to perform some kind of workflow on the document (i.e. document workflow process). You will save large amounts of bandwidth on your network and improve performances on the BizTalk server by
not
sending the actual document to BizTalk for processing. Instead, send a small XML document that contains all the informations you need and add a reference (an URL for instance) to the actual file. The file can be stored in Windows Sharepoint Services, on a file share ... Of course this option may not always be possible. Often, business processes require data from within the document to execute properly.
Option 2
suggests that you put the conversion burden on whoever submits the file to your BizTalk server. The conversion happens before submitting the document to BizTalk so the document is already XML when you receive it. Microsoft Office 2003 allows to save documents in XML so it might be easy to write a quick VBA Excel macro or an Office AddIn (COM or C# / VB.NET) that converts the file to XML and then submits the result to BizTalk. Users could even be presented with a new button on the toolbar "Submit ..." that triggers the macro and the submission to the back end system for processing.
Option 3
takes advantage of the fact that in practice, Comma Separated files have been the data exchange format between businesses for a very long time. If your Excel documents happen to be stored as comma separated files (or if you can arrange for them to be saved in that format easily), then you can use the BizTalk flat file disassembler to create XML messages. In fact, files do not have to be comma separated. The flat file disassembler is highly customizable and can parse most of delimited files as well as most positional files. For more informations, you can refer to
this page
.
Option 4
requires you to write a BizTalk 2004 custom component (a receive pipeline disassembler component to be precise) that will take an Office file and will convert it to an XML document. I have already
outlined how a custom disassembler can be implemented
. Office documents can get fairly large so the disassembler should be able to operate without loading the whole document in memory.
However, this approach requires a way to understand the binary format of the specific Office document you are parsing. For instance, there are numerous documents on the internet which describe the
Excel file format
and one might even find libraries allowing to manipulate Office files programmatically. Another way is to use the appropriate Office application's Object Model to load the document and manipulate it.
There are caveats in using Office applications to load the document on the server: Office applications are client applications and were never tested in a server environment. This means that Office applications may or may not behave appropriately under stress and/or performances may or may not degrade when the number of documents to load grows. Also, Word or Excel are quite heavy in memory.
This web page
on Microsoft web site explains with more details why it is no a good idea to run an Office application on the server side.
In general, no one should be using client components on a server. For instance,
XMLHTTP
is a client component and
should not be used on the server
side (because it relies on WinInet, a client side component).
2 Comments
Blog - Comment List MSDN TechNet
Comments
Loading...