Binary Files and the File System Object Do Not Mix

Binary Files and the File System Object Do Not Mix

  • Comments 33

OK, back to scripting today.

But before I get back to scripting issues, one brief correction. An attentive reader noted that "The Well-Tempered Clavier" was in fact designed to sound good on a "well tempered" instrument, not an "equally tempered" instrument. The difference is that a "well" temperament is designed so that every key sounds good, but is allowed to have some badly-out-of-tune intervals that must be avoided. (Traditionally these are called "wolf intervals".)

There was considerable controversy when equal temperament was introduced in Europe. I suppose it was the "what is the One True Bracing Style?" ridiculous issue of the day.

Another commenter pointed out that you could translate my wav-writing program into VBScript by using the File System Object to write out the bytes. To simplify their code down to a program that writes out individual bytes:

' DO NOT DO THIS
Set FSO=CreateObject("Scripting.FileSystemObject")
Set File=FSO.CreateTextFile("c:\test.bin", True)
For i = 0 to 255
  File.Write Chr(i)
Next
File.Close

And sure enough, this writes out a binary file consisting of those bytes.

Please don't do that. See that line that says "CreateTextFile"? We wrote that method to create a text file, not a binary file. Though this code might appear to work, it actually does not. Text files are more than just binary files that can be interpreted as text. Text files have to conform to certain rules to ensure that they can be sensibly interpreted as text in the local code page. If that's not 100% clear to you, read Joel's article on the subject before we go on.

Let me give you an example that clearly fails. What does this program do?

Set FSO=CreateObject("Scripting.FileSystemObject")
Set File=FSO.CreateTextFile("c:\test.bin", True)
For i = 0 to 255
  File.Write Chr(&hE0)
Next
File.Close

If you said "it writes out a binary file consisting of 256 E0 bytes," bzzt! Sorry, try again. The correct answer is "it writes out a binary file consisting of 256 E0 bytes on any operating system where the user's default ANSI code page does not define E0 as a lead byte in a DBCS encoding, like, say, Japanese, in which case it writes out 256 zeros."

In the Japanese code page, just-plain-chr(E0) is not even a legal character, so Chr will turn it into a zero. 

If I were whipping up a little one-off program on my own to write out a binary file -- well, I'd personally do it in C, but I can see how some people might want to do it in script. But there's a big difference between writing a one-off program that you're going to delete in five minutes, and writing a general-purpose utility program that you expect people around the world will use. That's an entirely different standard of robustness and portability. Do not use the FSO to read/write binary files, you're just asking for a world of hurt as soon as someone in DBCS-land runs your code.

I have been asked many times over the years if I know of a scriptable object that can read-write true binary files in all locales. I do not. Anyone have any suggestions? I would have thought given the number of people that have asked me, that some third party would have come up with something decent by now.

  • Eric, I've had a lot of luck using the ADO Stream object.
  • I have used this in the past, but I don't know if it works in all locales. I have never had to deal with that.

    Function SaveBinaryData(FileName, ByteArray)
    Const adTypeBinary = 1
    Const adSaveCreateOverWrite = 2

    'Create Stream object
    Dim BinaryStream
    Set BinaryStream = CreateObject("ADODB.Stream")

    'Specify stream type - we want To save binary data.
    BinaryStream.Type = adTypeBinary

    'Open the stream And write binary data To the object
    BinaryStream.Open
    BinaryStream.Write ByteArray

    'Save binary data To disk
    BinaryStream.SaveToFile FileName, adSaveCreateOverWrite
    End Function
  • Google says:
    http://www.google.com/search?q=vbscript+binary+file

    And those paths generally lead to the Adodb.Stream solution.
  • I've heard that, and I've also heard from people that it doesn't work well from script, so I don't know who to believe. How do you create the binary array? VBScript only supports creation of arrays of variants.
  • I found that text mode works OK like so.
    Only for writing, though.
    -----------------------------------------

    //JScript version

    var str = WScript.CreateObject("ADODB.Stream");
    str.type = 2; //adTypeText
    str.charset = "iso-8859-1";
    str.open();

    for(var i = 0; i < 0x100; i++){
    str.writeText(String.fromCharCode(i));
    }
    str.saveToFile("c:\\temp\\bin.bin", 2);
    str.close();
    str = null;

    'VBScript version

    dim str
    set str = WScript.CreateObject("adodb.stream")
    str.type = 2
    str.charset = "iso-8859-1"
    str.open

    for i = 0 to &hff
    str.writeText(ChrW(i)) 'uses ChrW
    next
    str.saveToFile "c:\temp\bin.bin", 2
    str.close

    -----------------------------------------

    There still is a problem when you try to read some of the byte values 0x80 - 0x9f: when you read them they turn into completely different values. I guess this also relates to encoding.

    I heard you could acquire an array of bytes like this (haven't tried myself):

    Set DM = CreateObject("Microsoft.XMLDOM")
    Set EL = DM.createElement("tmp")
    EL.DataType = "bin.hex"
    EL.Text = [some text in hex format]
    bin = EL.NodeTypedValue
  • Further to this, is there a reason why Binary read/write was left out of the File System Object? I would have thought given the number of people that have asked you, that microsoft would have come up with something decent by now :)

    Surely it's just a simpler version of the FSO.OpenTextFile code?
  • We certainly considered it. However, there are two main factors. First, and most important, we decided that the Script Team wanted to be in the business of building the script engines themselves, not the objects that those engines would script. We looked around the company and realized that other teams were working on object models for administration (WMI), email (CDONTS), database access (ADO), web servers (IIS), etc. Our tiny team could never do as good a job as those fully staffed and dedicated teams, and to try would have taken away time from stuff that _wasn't_ a massive duplication of effort. So we finished off the FSO and called it done. (This also explains why we did not add any features to the WScript.Network object, etc, when we inherited the WSH codebase.)

    Second, adding binary file reading/writing is not as straightforward as you might think. Exposing a straightforward array of bytes on disk is only the very first step. To do it right and make it usable, we'd want to provide things like default serialization of all simple data types -- strings, ints, doubles, singles, currencies, etc. But once you bite that off -- big endian or little endian? Length prefixed? How do you handle seeking? What if the user is reading a file that has a DBCS string embedded in it and wants to translate it into a Unicode string?

    You have to think about the real-world problems that people are going to have to solve with this tool, and there are a LOT of different scenarios for binary files. We didn't want to bite that off. It didn't seem like a very "scripty" scenario.
  • You can use SoftArtisans' FileManager. It's like FSO, but can handle binary files.

    http://fileup.softartisans.com/fileup-120.aspx
  • Even though writing binary files is not a 'scripty' scenario, it is something that people will want to do now and then. I don't see any reason why you couldn't have added simple binary read/write functions so you don't have to muck around with ADO stream objects or CreateTextFile.
  • I've been reading & writing binary data the "ADODB.Stream" object for years without any problems, but then again I haven't been using anything other than the UK codepage. But since it's got native binary handling, surely in this particular case binary is binary is binary?!
  • ADODB.Recordset is quite popular in... certain communities. Unfortunately it requries that you have created an ADODB.Stream object, and I don't know how you go about populating that with arbitrary content.
  • JSDB based on spidermonkey
    can read/write binary files

    see www.jsdb.org

    and much more than that: database connection, socket server, E4X etc..

    when I feel WSH is limited by something I automatically move to JSDB, both running ECMAScript code, portability made easy :).
  • I use Perl right now to read files and don't have any problems at all with binary. It doesn't require any special methods or variable types or even library includes. It's native to the language itself. It's unfortunate that the creators of Perl (an ancient scripting language by all comparisons), has always allowed working with binary files even across different platforms, but yet VBScript and JScript script developers are just left without access to any such basic routines as working with binary file data, even through the use of ActiveX controls (because there apparently aren't any).

    Keep in mind that ADODB.Stream is not a solution because it's disabled on most Windows machines now due to the security vulnerabilities it's imposed with Internet Explorer. You know, it's always nice when a workaround is suggest, then it's not even really available which I guess defeats the purpose.

    --Randall
  • "However, there are two main factors. First, and most important, we decided that the Script Team wanted to be in the business of building the script engines themselves, not the objects that those engines would script."

    Ok, that's reasonable. Add a few functions to the FSO to support binary byte reads and writes and y'all are done. Simple, eh? :-)

    "Second, adding binary file reading/writing is not as straightforward as you might think. Exposing a straightforward array of bytes on disk is only the very first step. To do it right and make it usable,... "

    Well, that's ONE approach. Another, quite simple approach, is to NOT be the end-all and just support reading and writing a series of bytes. If someone needs to make it more "usable", they can do it themselves - that's the cost of dealing with BLOB data. And that's exactly why y'all won't know whether it's big-endian, Unicode, or dollars. Just let me get at the bytes and I'll do whatever is necessay to interpret/manipulate the data. :-)

    As it is now, I seem to be left with a choice of: a) moving to another language, or b) limiting functionality. Neither is a good solution.

    Thanks.
  • Why is this such a big deal MS??   UN*X systems have been doing this from day 1 !!  I guess it all stems from MS (or CP/M) making the decision years ago that text files NEED CR/LF pairs to be called text files rather than the UN*X philisophy that files are a stream of bytes and it's up to the end-user (or script developer) to decide how to interpret the bytes (or words or quadword ...).
Page 1 of 3 (33 items) 123