Larry Osterman's WebLog

Confessions of an Old Fogey
Blog - Title

Riffing on Raymond - FindFirst/FindNext

Riffing on Raymond - FindFirst/FindNext

  • Comments 16

As I mentioned, I've been Riffing on Raymond a lot - Yesterdays post from Raymond got me to thinking about FindFirst and FindNext in MS-DOS.

As Raymond pointed out:

That's because the MS-DOS file enumeration functions maintained all their state in the find structure. The FAT file system was simple enough that the necessary search state fit in the reserved bytes and no external locking was necessary to accomplish the enumeration. (If you did something strange like delete a directory while there was an active enumeration on it, then the enumeration would start returning garbage. It was considered the program's responsibility not to do that. Life is a lot easier when you are a single-tasking system.)

The interesting thing about the fact that MS-DOS kept its state in a the reserved bytes of the find structure was that there were a bunch of apps that figured this out.  And then they realized that they could make suspend and resume their searches by simply saving away the 21 reserved bytes at the start of the structure and spitting them into a constant find first structure.

So a program would do a depth first traversal of the tree, and at each level of the tree, instead of saving the entire 43 byte FindFirst structure, they could save 22 bytes per level of the hierarchy by just saving the first 21 bytes of the structure.  In fact, some of them were even more clever, they realized that they could save just the part of the reserved structure that they thought were important (something like 8 bytes/level).

And that's just what they did...

Needless to say, that caused kittens when the structures used for search had to change - these apps looked into the internal data structures and assumed they knew what they did...

 

  • Did you do anything as a workaround for that? Do these misbehaving applications still run under XP?

    How often did Microsoft programs take advantage of internal knowledge like this that was not guaranteed between implementations?

    Just curious.
  • Kristoffer, did I say that these were Microsoft apps? They weren't (full disclosure: one of the apps was the MS-DOS system utility TREE, which was totally allowed to look at internal data structures, since it was a part of the OS).

    And the answer is that we figured out how to make them work - this was a big deal for the DOS redirector, but fortunately, we did have 5 bytes to work with, and the original redirector authors had figured out how to make this work.

  • Hilarious... What about creating a RoR category among your postings? And then we'll see when Raymond gets around to create his RoL counter...
  • When I was working at Alfa on Mainlan for Sage, I remember the amount of cursing caused by Xtree Gold which IIRC did directory searches using FCB style (11h/12h) calls rather than FindFirst/FindNext(4eh/4fh).
    I seem to remember folks doing an awful lot of debugging of Xtree to make it work at all.
  • I'm curious, in the days before the net, how did undocumented info like this get circulated enough to make such an impact when the structures changed? Or did every company just reverse-engineer the structures on its own?
  • Mike, I have no idea. I've wondered that myself.

    My suspicion is that they reverse engineered it - you do a find first, then a find next and notice that only 6 bytes of the structure had changed (it was a structure in your app, after all)...

    But I really don't know for sure.
  • People used to do this on AmigaDOS too. The version of the AmigaDOS manual for the v2.0 OS warned that some programs only preserved the 32-bit key field between directory enumeration calls. That, coupled with AmigaDOS's BCPL heritage, must have driven filesystem authors nuts.
  • Very few people seemed to actually reverse engineer the DOS structures. It was much easier to reverse engineer the programs that those people wrote.

    Plus, it wasn't like the world was totally isolated before the WWW came about...
  • Back in those days, the first place I would have looked for this info was Ralf Brown's Interrupt List. I used to get it off FidoNet BBSes, but of course it's now available on the Internet:

    http://www-2.cs.cmu.edu/~ralf/files.html

    Indeed, Ralf Brown has documented much (though not all) of the contents of the data structure in question. The relevant entry is available online:

    http://www.ctyme.com/intr/rb-2977.htm#Table1626
  • Mike Dunn> The net is nothing new. Back in the late eighties and early nineties, before I ever saw the internet, I was on the fringes of the BBS scene. You could chat on message boards, send personal messages (email) and download/upload files there quite happily.

    If you wanted bigger than a local BBS, systems existed. CompuServe, Prodigy, AOL and (here in the UK) CIX all had tens of thousand or even millions of users.

    And then there was FidoNet. That was massive, and had a distinct "Internet feel" about it. It linked small BBSes into a massive network for mail and discussion exchange, and was mightily impressive.

    I recall getting Ralph Brown's Interrupt List from these sorts of places. All sorts of technical documentation could be gotten from BBSes - even reverse engineered details on the latest Tseng accelerated video cards, 3Com network cards and so forth.

    (FidoNet didn't just do technical stuff, though. Lots of topics could be discussed in the echoes - the FidoNet equivalent of newsgroups on the internet.)

    Continent or world-spanning communications via computer is nothing new. What is new with the Internet is the sheer ubiquity. FidoNet united BBSes because it had very open standards, allowing many clients to be written and any BBS to join the network. The internet had pretty much the same advantage - its standards were open, and required no licences. Recall that even though Microsoft owned the desktop, their own CompuServe-like closed network (MSN) flopped hugely when it came to its original goals - to crush AOL/CompuServe/Prodigy and The Internet. That's probably partly also down to a lack of existing users and material on MSN, but the fact is that the open standards of the Internet eventually crushed the closed, non-interoperating networks of AOL/CompuServe/Prodigy/MSN...

    Given how ubiquitious a common communication medium is now (newsgroups, email, web forums), I'm actually surprised that so little undocumented stuff gets out of companies when compared to those internet days. Maybe undocumented tidbits are floating around in soe dark, dank corner of the internet - but with Microsoft and others making good information available via MSDN and various vendor SDKs, I'd imagine that the need for digging deeply has probably mostly gone away. It's SO much easier to find out API information these days than it was in the pre-Internt days...
  • See http://www.ctyme.com/intr/rb-2977.htm :)

    There was already a community behind those things too, mainly running around in BBSs. Often their results would've been published in specialized papers. I remember when I was just 13/14 y.o. I used to buy every number with the Ralph Brown's Interrupt List :)

    If you have a good collection of these papers, you can notice how the world is changed when Windows 95 got out. In a few months the average article moved from "how to query block transfer status using IDE I/O ports" to OLE and mostly high level stuff. Suddenly knowing every function of INT 21 and INT 2F was useless at best. In a few weeks, TopView, DESQview, QEMM, 386MAX, XMS, EMS, HMA and a number of other technologies simply were not of any interest anymore. Suddenly DirectX mean knowing how to call INT 10 for 130 different vga cards was just a memory.

    Fascinating :)
  • Reverse engineering isn't as hard as people think. It is all about basic detective work, finding the tiny important facts in a mass of noise. I got my job at BioWare by reverse engineering their NWN script compiler and creating my own compiler.
  • "TREE ... was totally allowed to look at internal data structures."
    Reminds me of Personal CP/M-86. The RENAME command accesses system variable 82h (documented system variables go from 1 to 5) and it turns out what it's doing is saving the current process's FindFirst status and later recalling it. (PCP/M keeps at least some FindFirst/FindNext information in the process table rather than in the calling process).
  • (I'm sorry this is off-topic, but there's no more commenting on the relevant entry)

    After following a link here, I've been reading most (if not all) of your posts with much interest. One not too long ago mentioned the Microsoft antispyware beta, which I just downloaded and ran. I couldn't find a proper link for feedback on it, so I thought you might be able to forward a typo:

    In the Spyware Threat Details section, there is a typo in the following description (I know the html won't work):

    About Adware Bundler: A bundler is a software program that installs adware on your computer either with your permission or without. Most of the software classified as a bundler requires that the adware program(s) be installed in order for the actual software to complete installation or run. In addition in most cases if the adware is removed the software will <b>seize</b> to function as well.

    "seize" in the last sentence should properly be "cease". On an aesthetic note, "About Adware Bundler" should probably be "About Program (Adware Bundler)", and a comma after "In addition".

    I've read the comment policy, and I have no problem with you moderating this - though I do apologize if you feel the need to do so.

    (And while I'm on a roll with stuff you probably care little about - has anyone at Microsoft [that you know of] given serious thought to Windows on Linux? That is, running all the API's, the GUI and such on top of the Linux kernel [or even the BSD one, due to its commercially-friendly license]? Could be a whole new way to 'end' the Windows vs. Linux battle...)
  • <<My suspicion is that they reverse engineered it - you do a find first, then a find next and notice that only 6 bytes of the structure had changed (it was a structure in your app, after all)... >>

    Nothing of that sort. You just went out and bought an 'Undocumented DOS' (or later 'Undocumented Windows') book which had all these things described in it. Then you used your judgement as to which of the structures are expected to make it to the next version of DOS/Windows unchanged.

    Sometimes you wrote something like 'If it is DOS ver. A, assume that this structure is X, if it is B, assume the structure to be Y, and so on.

    Life was indeed colorful back then.


Page 1 of 2 (16 items) 12