Larry Osterman's WebLog

Confessions of an Old Fogey
Blog - Title

How did we make the DOS redirector take up only 256 bytes of memory?

How did we make the DOS redirector take up only 256 bytes of memory?

  • Comments 18
In one of my early posts, I mentioned a status review we had with BillG for the DOS Lan Manager redirector (network filesystem).

I also talked to Robert Scoble about this in the last of my Channel9 videos.  One thing that somehow got missed in both the original article (later updated) and the video was our reaction to Bill's feedback.

The simple answer is that we fixed the problem.  My team didn't do much with the transports and network drivers (because they were out of our scope), but we were able to do something about the footprint of the redir.exe program (it was a T&SR application).

When we were done with it, I managed to shrink the below 640K running size of redirector to 128 bytes in size, beyond which I couldn't figure out how to go.

The question that obviously comes up is: How did you manage to do that?  Raymond, please forgive me for what I'm about to disclose, for within this tale lie dragons.  This discussion is for historical purposes ONLY.  I don't recommend it as a practice.

The MS-DOS redirector was actually originally written as a part of the MSDOS.SYS (IBMDOS.COM) binary.  For obvious reasons (not every user in the world had a network card, especially in 1984), the redirector was split out from the DOS binary after the product shipped.  In fact, when I took over the redirector project, the binary used to link with hundreds of unresolved external symbol errors (because the redirector linked with some, but not all of the MS-DOS binaries).  One of the first things that I did while working on the project was to clean this up so that the redirector would cleanly link without relying on the MS-DOS objects.  But being a part of MS-DOS, it was written in "tiny" mode - the code and data were commingled internally.

The first thing I did when trying to shrink the footprint of the redirector was to separate the code and data segments.  Today, this seems utterly obvious, but in 1986, it was a relatively radical idea, especially for real-mode software.  Once I had split the data and code, I was able to make the data segment relocatable.  This change was critical, because it enabled us to do a boatload of things to reduce our footprint.  One thing to keep in mind about the redirector was that even though the data (and eventually code) was relocatable, the motion wasn't dynamic.

The first thing I did was to lay the redirector's code and data as follows:

Code

Initialization Code

Data

Initialization Data/Dynamic data (allocated after startup)

By laying out the code and data this way, I could slide the data over the initialization code after the initialization was done.  It wasn't that much of a real savings, however, since the original redirector simply started the dynamic data at the start of the initialization code (and left it uninitialized).

The next thing to do was to take advantage of a quirk in the 286 processor.  The 8086 could only address one megabyte of memory, all the memory above 640K was reserved for system ROMs. (A quick aside: DOS could (and did) take advantage of more than 640K of RAM - DOS could address up to 1M of RAM, all the processor could support if it wasn't for the system ROMs.  In particular, there were several 3rd party memory cards that allowed mapping memory between 640K and 0xB0000, which was the start of video memory).  With the addition of the 286 processor, the machine could finally address more than 1M of RAM.  It turns out that if the machine had more than 640K of RAM, most systems mapped the memory above 640K to above 1M.  Unfortunately, there were a number ofapplications that depended on the fact that the 8086 could only address 1M of RAM, and performed arithmetic that assumed that physical address 0xFFFF0+0x30=0x000020.  To control this, the PC/AT and its successors defined a software controllable pin called the "A20 line" - if it was disabled , memory access between 1M and 1M+64K was redirected to 0, if it was enabled , then it was mapped to real memory.  This is really complicated, but the effect was that if you enabled the A20 line, an application could have access to 64K of additional memory that didn't impact any running MS-DOS applications!  This 64K was known as the "High Memory Area", or HMA.

Because the powers that be knew that this would be a highly contentious piece of real estate (everyone would want to party on it), Microsoft (or Intel, or IBM, I'm not sure who) wrote a specification and a driver called HIMEM.SYS.  HIMEM.SYS's purpose was to arbitrate access to that 64K chunk of RAM.

Well, for the DOS Lanman redirector, we wanted to use that area, so if we were able to reserve the region via himem.sys, we moved the data (both dynamic and static) up to that memory.  On every entry to the redirector, we enabled the A20 line (via himem.sys), and on every exit, we disabled the A20 line.

That saved about 30K of the 60K MS-DOS footprint, so far so good. 

The next step in the process was to remove our dependencies on himem.sys.  Around this time, Lotus, Intel and Microsoft had defined a specification for an expanded memory manager, known as LIM.  This allowed a 3rd party memory card to bank swap memory into the 0xA0000->0xFFFFF memory region.  Marlin Eller joined the team about that time, and he wrote the code to move the data segment for the DOS redirector into LIM (if himem.sys wasn't available, and LIM was).  After finishing that work, he moved on to other projects within Microsoft.  That's where things stood for Lan Manager 1.5, the data had been removed, but nothing else.  A HUGE improvement, but we weren't satisfied.

So far, we were just moving the data around, we hadn't done anything to deal with the 30K or so of code.

The next thing we did was to split the redirector up still further:

"Low" code
"Low" data

Code

Initialization Code

Data

Initialization Data/Dynamic data (allocated after startup)

We added a low code and data segment.  The "low" code segment contained all the external hooks into the redirector (interrupt handlers, etc), and code to enable the HMA and LIM segments.  We then moved the data into LIM memory, and the code into the HMA.  This was a bit trickier, but we managed.

So we now had a low code segment that was about 2K or so, and the code and data was moved up out of the 640K boundary.  Normally, I'd be satisfied with this, but I love a challenge.

The next step was to look long and hard at the low code.  It turns out that most of the low code didn't really NEED to be low, it was just convenient.  Since the code had been moved into the HMA, all I needed to do was to have a low-memory stub with enough code to enable the HMA, and dispatch to the corresponding function in high memory.

The other thing I realized was that the MS-DOS PSP (Program Segment Prefix, the equivalent of a task in MS-DOS) contained 128 bytes of OS stuff, and 128 bytes of command line (this is where Raymond starts cringing).  Since the redirector didn't use the command line, I figured I could re-use that 128 bytes of memory for my stub to enable the high memory area.  And that's what I did - I used the 128ish bytes of command line to hold the interrupt dispatch routines for all the entrypoints to the redirector (there were about 4 of them), and pointers to the corresponding routines in the high memory area, and the code to enable (and disable) the HMA.

And voila, I had a 0 footprint redirector.  The only negative that came from this was that applications that enumerated the "running" processes didn't handle the "code-in-the-command-line" thing.

Btw, the work I did here was pretty much totally clean.  I used the linker to define the segments that were relocated, I didn't do any of the other sleazy things that MS-DOS programmers did to make their code small (like combining multiple instructions together relying on the relative offset of a jump instruction to form the first byte of a different instruction).  It was actually a pretty cool piece of work.

Oh, and this description doesn't really give the full flavor of what had to be done to get this to work.  A simple example: Because I had to handle moving the data over the code that was performing the move - that meant that I need to first move the initialization code out of the way (past the end of the data), jump to the moved initialization code, move the data over the original initialization code, then terminate the application.

But we eventually (for Lan Manager 2.2) had a 0 footprint redirector.  It took some time, and it didn't work for every configuration, but we DID make it work.

 

  • I just read your earlier post that you pointed to, and I had a question: does Bill *really* curse like that? I've only ever seen him on TV or whatever, but he seems so mild-mannered to me :)

    Great story, by the way. Back when code was code, and groovy was groovy!
  • Back in 1986, he cursed like that. He doesn't any more (raising kids will do that to you).

    It was fun writing this, actually - dredging up all the things I had to do to make this puppy work was a fascinating experience.
  • I think you could actually move it farther into the OS area if you really needed to. The end of the OS structures in the PSP was the default FCB's which would have no longer been used by that time. If you really want to live dangerously, before the FCB's are some reserved areas, an unneeded interrupt call, the environment segment (not using any environment variables right?). I think you could move data all the way back to where DOS stored the interrupt vectors which was only about 20 bytes in.
  • Nicholas, you're right, but I figured I'd gone about as far as I could go - I could have overwritten the handle table too, but it really wasn't worth the effort. 256 bytes should be good enough for anyone.
  • > separate the code and data segments. [...]
    > in 1986, it was a relatively radical idea,
    > especially for real-mode software.

    It wasn't radical in embedded systems. Often the data size remained under 64KB but the total code size exceeded 64KB, which is why Intel defined the compact model the way they did.

    > The 8086 could only address one megabyte of
    > memory,

    Yes. In 2002 I had occasion to read part of the Intel 8086 processor manual, where Intel said that with 1MB of address space it was unlikely that anyone would ever have a problem with address space limitations. I laughed just as much as when reading it for the first time in 1980.

    > all the memory above 640K was reserved for
    > system ROMs.

    That was not a limitation of the 8086. Maybe IBM set it that way when they entered the PC market, or maybe other PC makers had already settled on it, but the 8086 was used in lots of other equipment besides PCs and it did not have that limitation.
  • Norman,
    In 1984, a person who is currently considered a visionary in the industry (no, he doesn't work for Microsoft) said that no properly designed piece of software would ever require more than 128K (that's kilobytes) of RAM.

    The 0xA0000 line was a restriction that the IBM hardware engineers came up with. It wasn't an MS-DOS or an 8086 limitation.

    The world was different back then.
  • Larry,

    cool.

    WM_THX
    thomas woelfer
  • That's incredible how all this has changed ...

    I've started my career as a developer just a few years ago, Windows UI stuff, mainly.
    When we (myself and my former software company) moved to .NET, I believe the most important arguments against this decision was performance.
    Now I am quite a good .NET developer, I got my MS certifications and keep learning about this new technology, trying to get the best from a "managed" way of writing applications.

    My father (he was a developer too, but a few decades ago) told me that one of his best satisfactions had been to make a program fit into just a few HUNDREDS bytes of memory.
    Now (yes, just now, a few minute before typing this comment) I'm fighting to reduce my middle-tier memory footprint below 30 MILLIONS of bytes !

    > The world was different back then

    I could not agree more !
  • Once upon a time, I too wrote a DOS redirector for a PC network (we also did the server, stack and other bits, also the hardware. But mostly I did the redirector. And server. And stack). This was back in the days of DOSs 4 and 5, when windows were things you looked through as you tried to puzzle out where your stack was going.

    We too swore mightily, but mostly at MS, sometimes at Intel. Oh, the joys we had at untangling the re-entrancy rules. The undocumented flags (InDOS. "You don't need to know about that" said MS, primly. "But we can't make our software work without it." "Then you shouldn't be writing that software."). The strange case of the utilities shipped with DOS that didn't use DOS - I never truly understood what was going on in GWBASIC (or was it QBASIC?)'s custom keyboard handler, nor why it was there. FCBs. Wildcard expansions. Mystery API calls. Determining video modes from write-only registers and incomplete BIOS support.

    Code footprints were the least of our worries, although since some of us (OK, me) had grown up shoehorning stuff into 1K Z80 machines, the gentle art of byteshaving and T-state saving came naturally. Good programming practice, less so. First we had to disassemble DOS, then we disassembled SideKick, then we tried to fit everything back together again - only then could we start to introduce our *own* bugs.

    Then we decided to do a 386 version. Oh, and Windows. But that, as they say, is a story for another day...

    R
  • Actually I believe that InDOS was eventually documented... But the others weren't.

  • 11/8/2004 6:00 PM Larry Osterman

    > In 1984, a person who is currently
    > considered a visionary in the industry (no,
    > he doesn't work for Microsoft) said that no
    > properly designed piece of software would
    > ever require more than 128K (that's
    > kilobytes) of RAM.

    I thought that before 1984 I was already reading predictions of databases containing more than 1GB of data. Surely there was some anticipation of doing nontrivial operations on those data.

    Also before 1984 I designed and partly coded an 8086-based system which served 50 terminals, with about 2KB of data being displayed on each terminal and 4KB being read into buffers in anticipation of being displayed. Oops, more than 128KB.

    But anyway, sure, as I mentioned, Intel designed the compact model the way they did because most embedded systems had requirements opposite to the above, usually the amount of RAM really was under 64KB but the amount of code was larger.

    > The world was different back then.

    And the world back then was already different from the world back then...

    11/9/2004 8:35 AM Rupert

    > [...] "Then you shouldn't be writing that
    > software."

    Did they back that up with offers for refunds on DOS? Intel's RMX-86 already had multitasking for years, and it wasn't even that commie red free source movement (though it was open source if you bought it).
  • Your comment:

    --------------------
    Unfortunately, there were a number ofapplications that depended on the fact that the 8086 could only address 1M of RAM, and performed arithmetic that assumed that physical address 0xFFFF0+0x30=0x000020.
    --------------------

    has what I think is some interesting (in a twisted sort of way) background.

    The main reason for this problem being a somewhat common problem is that MS-DOS maintained some semblance of CP/M compatibility. CP/M system calls were made by jumping to a particular low-memory address (0x005), and MS-DOS officially supported this method of making system calls.

    MS-DOS supported this by placing a far jump at that offset in the process PSP. When the process performed a near jump to the 'magic' CP/M address, the far jump coded there would vector to MS-DOS.

    So, the far call instruction had to be at Offset 5 of the PSP, but the offset 6 of the PSP contained some other data - something to do with memory size information (another holdover from CP/M, but I forget exactly what). This forced part of the far call address to a particluar value, so MS-DOS had to rely on segment wrap to be able to use something in the other part of the far address which would still evaluate to the memory location that was the real target of the call.

    Anyway, this falls into the class of very ugly hacks that Microsoft (actually, this may have been done by Tim Patterson for QDOS - the precursor to MS-DOS) had to do in the name of backward compatibility.

    Maybe Raymond should schedule a blog entry for this topic... then again, maybe it should be forgotten.
  • Mike,
    That's not totally relevent in this case. The thing about the "CALL 5" programming convention is that it only worked for COM files - and COM files were limited to 64K in size (this isn't really true, but it sufficies for the purposes of this discussion). There was no way for a program that used CALL 5 as its system call mechanism to access more than 64K of RAM. So DOS did rely on segment wrap, but not on the 1M physical memory wrap.

    The 1M physical memory wrap issue came about because applications thought they could access FFF0:10 and access physical byte 0 of RAM.
  • Larry,

    I understand that the CALL 5 interface was for use by COM style programs which were generally limited to 64k of code, but the mechanism used by DOS to implement it depended on A20 wrap. As you say, it wasn't a mechanism for the process to access aditional memory.

    Pulling out some ancient notes and poking about in a DOS VM I have lying around for some reason, I found that the far jump at PSP:5 looked like:

    -u ES:5 ; ES points to the PSP

    1165:0005 9AEEFE1DF0 CALL F01D:FEEE ; the FEEE is at offset 6 of the PSP, which
    ; had some meaning to CP/M programs - something to
    ; do with the amount of memory in the
    ; 'Transient Program Area (TPA)'

    It turns out that F01D:FEEE in the himem area contains a jump to 116:10D0, just as the code at 0:00BE (which is F01D:FEEE wrapped at 1 meg) contains a jump to 116:10D0, so the CALL 5 will work even when A20 is enabled; however, it doesn't do this if DOS is not loaded into the HIMEM area, so if DOS=LOW and A20 happens to be enabled, CALL 5 will crash.

    Anyway, this is just a trivial bit of history which for some reason stuck in my head (no doubt because I thought it was quite clever when I learned about it). Another bit of triva regarding CALL 5 is that apparently it has had a bug since DOS 2.0 that caused it to jump to a location 2 bytes shy of where it really should have jumped, rendering it generally useless regardless of the 1 meg wrap situation.

Page 1 of 2 (18 items) 12