Out of the Angle Brackets
Xml is ubiquitous. No doubt about it. It is being used almost everywhere and almost by everyone. This includes places where huge amounts of data are being processed. This means xml files (or streams) used there are also huge. And the bigger the Xml file the harder it is to process. The two biggest problems are:
These are problems indeed but there is a good chance they are solvable. First, take a look at the structure of the Xml in the source Xml. Then look at the XPath expressions or Xslt stylesheet. How much information from the source Xml are you actually using? Probably the bigger the file is and the more complex its structure the less data you are actually using. So, if you don’t actually use some data what’s the point of even trying loading it? Filter this data out. You can do it in a streaming fashion. Instead of using the XmlReader from the Xml API implement your own which will report the stuff you really need and ignore all you don’t really need (i.e. project). Depending on how much you need you can save a lot. Now you document can fit in the memory and the queries or transformations will be faster – they don’t need process nodes or attributes that are never being used. If you don’t feel like writing your own reader you can try using XPathReader http://msdn.microsoft.com/en-us/library/ms950778.aspx (note that the article is aged and may be using some old APIs but the basic idea is the same).
If the above steps don’t help you may try splitting your one big task to a few smaller tasks you can run sequentially. Doing this will hopefully enable you to achieve what your goal.