Larry Osterman's WebLog

Confessions of an Old Fogey
Blog - Title

What's the big deal with the Moore's law post?

What's the big deal with the Moore's law post?

  • Comments 19
In yesterday's article, Jeff made the following comment:

I don't quite get the argument. If my applications can't run on current hardware, I'm dead in the water. I can't wait for the next CPU.

The thing is that that's the way people have worked for the past 20 years.  A little story goes a long way of describing how the mentality works.

During the NT 3.1 ship party, a bunch of us were standing around Dave Cutler, while he was expounding on something (aside: Have you ever noticed this phenomenon?  Where everybody at a party clusters around the bigwig?  Sycophancy at its finest).  The topic on hand at this time (1993) was Windows NT's memory footprint.

When we shipped Windows NT, the minimum memory requirement for the system was 8M, the recommended was 12M, and it really shined at somewhere between 16M and 32M of memory.

The thing was that Windows 3.1 and OS/2 2.0 both were targeted at machines with between 2M and 4M of RAM.  We were discussing why NT4 was so big.

Cutlers response was something like "It doesn't matter that NT uses 16M of RAM - computer manufacturers will simply start selling more RAM, which will put pressure on the chip manufacturers to drive their RAM prices down, which will make this all moot". And the thing is, he was right - within 18 months of NT 3.1's shipping, memory prices had dropped to the point where it was quite reasonable for machines to come out with 32M and more RAM. Of course, the fact that we put NT on a severe diet for NT 3.5 didn't hurt (NT 3.5 was almost entirely about performance enhancements).

It's not been uncommon for application vendors to ship applications that only ran well on cutting edge machines with the assumption that most of their target customers would be upgrading their machine within the lifetime of the application (3-6 months for games (games are special, since gaming customers tend to have bleeding edge machines since games have always pushed the envelope), 1-2 years for productivity applications, 3-5 years for server applications), and thus it wouldn't matter if their app was slow on current machines.

It's a bad tactic, IMHO - an application should run well on both the current generation and the previous generation of computers (and so should an OS, btw).  I previously mentioned one tactic that was used (quite effectively) to ensure this - for the development of Windows 3.0, the development team was required to use 386/20's, even though most of the company was using 486s.

But the point of Herb's article is that this tactic is no longer feasible.  From now on, CPUs won't continue to improve exponentially.  Instead, the CPUs will improve in power by getting more and more parallel (and by having more and more cache, etc).  Hyper-threading will continue to improve, and while the OS will be able to take advantage of this, applications won't unless they're modified.

Interestingly (and quite coincidentally) enough, it's possible that this performance wall will effect *nix applications more than it will affect Windows applications (and it will especially effect *nix derivatives that don't have a preemptive kernel and fully asynchronous I/O like current versions of Linux do).  Since threading has been built into Windows from day one, most of the high concurrency application space is already multithreaded.  I'm not sure that that's the case for *nix server applications - for example, applications like the UW IMAP daemon (and other daemons that run under inetd) may have quite a bit of difficulty being ported to a multithreaded environment, since they were designed to be single threaded (other IMAP daemons (like Cyrus) don't have this limitation, btw).  Please note that platforms like Apache don't have this restriction since (as far as I know), Apache fully supports threads.

This posting is provided "AS IS" with no warranties, and confers no rights.

  • I'm not sure what you mean about it affecting Unix apps more. Unix processes can run on a hardware thread or extra core as simply as in an SMP machine today. If you mean that it will be harder to port Unix apps to take advantage of a *lot* of cores/threads, then yes maybe. But it is quite possible that when a lot of cores/chip is the norm, compilers will be able to do the work (i.e. speculative execution, parallelization yada yada).

    Multithreaded programming is really hard as it is. Mind numbingly hair tearingly hard. I hope we get better language and compiler support for it. C# and Java are only a few steps along a very long path.

    Windows has tons of single threaded baggage itself for GUI apps. Not very easy to take advantage of threading in the GUI except at a very coarse level.
  • I think the disagreement is between ISVs producing shrink-wrap software, and ISVs/VARs selling a solution that has specific performance goals to hit now. In the shrink-wrap ISV case, you can get away with barely adequate performance on lower, current configurations and good perf on new ones.

    In the contract software market, if you have performance goals to meet, the customer cares that you hit them - today.
  • Good point Mike.
  • I guess this is bad news for Longhorn. It seemed designed to be something that, much like OSX, would be too heavyweight for the commodity hardware of the day and would gradually speed up over sucessive generations.

    As Mr Box put it.
    http://pluralsight.com/blogs/dbox/archive/2004/12/17/3956.aspx
    The startup time of our new OS must be fixed
    As it seems as if running a 286
  • rk: The whole point here is that single threaded performance isn't going to improve.

    That means that single threaded applications aren't going to get any faster, even if the throughput on the CPU increases. So your IMAP SEARCH verb (typically a CPU intensive operation that involves searching large flat files) isn't likely to get any faster.

    In the past, that wouldn't be the case - as CPUs got faster, the SEARCH would get faster. That will no longer be the case without significant work to rewrite the code.

    On traditional *nix systems it was ok for you to have single threaded daemon processes - since the traditional *nix kernel was non preemptive, and traditional file I/O was synchronous, it really didn't matter. And as CPUs improved in speed of the applications improved at the same rate (this is true of Win32 applications as well).

    The thing is that the Windows SERVER applications were already written to be asynchronous, since asynchronous I/O and threading were built into the OS from day one. And server applications took advantage of it.

    It didn't hurt that Windows process overhead is significantly greater than Windows thread overhead, while *nix's per-process overhead is significantly less. As a result Windows servers were almost always written from scratch to take advantage of threads and async I/O.

    *nix applications didn't have these abilities available in the OS, so they weren't written to take advangage of them (this, by the way, is also why naive ports of *nix applications to Windows almost always suck).

    So in the server/daemon space, Windows apps have a leg up over traditional *nix ones.

    Remember: I'm not talking about Apache CGI scripts, or Java Beans, etc. I'm talking about daemons like IMAPD, QPOPPER, FTPD, etc.
  • Edward,
    The Longhorn you knew isn't the Longhorn that is. In addition, the alpha LH tree (what was distributed last year) had absolutely NO perf tuning done on it - it was an utterly bloated pig of an OS, and everyone working on it knew that. The cycle of development on the alpha LH was "Write code now, tune later". We've changed that model as a part of the longhorn restart work, so everything about that build no longer applies.

    Also, it's pretty clear from the post that Don's not talking about the current LH builds - they're really quite zippy.
  • What about non-blocking servers such as the <a href="http://www.zeus.com/products/zws/">Zeus Web Server</a> that achieve parallelization by using multiple processes? You don't always need multiple threads. Also, inetd-style applications have always worked well with multiple CPUs, since each connection is handled by a separate process.
  • Yes, it's certainly true that a single IMAP search operation won't get any quicker - but things like imapd and ftpd are all multi-processing - there's a separate process for each client they're serving. Any many respects this is already a head-start on many Windows-based server applications that are single-process-based. UNIX-style systems have typically thrived on multiprocessor platforms for precisely this reason. It's not until a single "atomic" operation is too slow for comfort that such things will start to take advantage of multiple threads.

    Most of the time, naive ports of UNIX applications to Windows suck because they're written with the assumption that processes are cheap; Windows threads are a lot closer to UNIX processes than Windows processes are. UNIX applications fork() and fire off another process when they need to multiprocess an operation; Windows applications create another thread. Fundamentally, they're doing the same thing - and from a scheduling point of view (especially over multiple CPUs) they do it pretty similarly. You're effectively left with a situation where you have to use different tools on different systems to achieve almost the same result.

    This is the big reason why you don't often see heavily-threaded stuff on UNIX - there just isn't the same need as there is on Windows. The old mantra was always "don't use threads unless you know what you're doing, and know that you need to". On UNIX it actually makes sense for this to continue to apply; it doesn't preclude breaking down intensive operations into chunks which can be parallelized as necessary, though - and that's something I'd like to see more of.

    Don't get me wrong - none of this should be construed as an argument - I'd love to see an imapd that was multiprocessing in the traditional sense, but multithreaded for heavy operations like searches. On UNIX-style systems, this is the ideal - all the existing benefits of separate processes (which I won't go into here) are retained, whilst the added bonus of parallelism for individual operations can be enjoyed.
  • David,
    The thing is that these single threaded processes are going to hit a hard wall in performance. The other big issues that single threaded inetd style apps are likely to hit are related to unexpected serializations (like serialization in the filesystem or network driver) - without an asynchronous processing model, their ability to improve their individual instance performance will be limited.
  • Btw, Mo, that's exactly my point - the old mantra is about to hit a wall (at least in performance).

    The overall throughput of these systems is likely to continue to improve (albeit not as fast, since a CPU with 2 (or 4 or 16 or 32) cores isn't going to be as fast as a machine with 2 (or 4 or 16 or 32) CPUs (due to shared caches etc)). But the performance of individual instances won't. Which means that my IMAP client isn't going to benefit from rolling out a newer, more powerful machine.

    My server might be able to service more clients, but the client's aren't going to see any benefit.
  • Set the Wayback Machine for 1995...
    I distinctly remember that the price of RAM plummeted (like, from $500 to $200 for the same quantity of memory) shortly after Win 95 was released and people started buying RAM to be able to run it at a usable speed. Until that point, it wasn't unheard of to spend $1000 on RAM that today you wouldn't even bother giving away (like 64 megs).
  • I started my programming career on Alpha VMS systems where everything was run as a seperate process, and synchronisation was done my kernel objects and messages. They were nice and clean and the other processes only knew what they needed to know in order to do their job right. As such I've always seen multi-threaded apps as a bit of a hack because of the high process startup cost on windows. OK fine, you get your own stack, but you have all these lock issues and synchronisation issues caused by sharing the heap that simply do not occur if you write discrete processes to do discrete tasks. If you needed to share memory between processes you would explicitly created a named shared memory area and put in place a series of locks on different parts of it to keep it safe. If you needed to process multiple requests you'd simply fork a new process to handle each one because the startup cost was so low and the benefit of simplicity was so great. Of course there were costs too. IPC mostly consisted of creating a data structure and passing it by value through a mailbox message, which involved a lot of copying of data around, but it was safe and reliable so long as the process reading the mailbox didn't get stuck waiting for IO to complete or something.
  • I remember when Microsoft Bob released -- 1995, I believe -- that it required 8 MB of RAM at a time when the bulk of the consumer installed base had only 4 MB. This wasn't realized (or at least acknowledged) until late in the product cycle, so when the product boxes left manufacturing, they had to add a "requires 8 MB" sticker to every box.
  • Hi Larry,

    >> We were discussing why NT4 was so big.

    Did you discuss why NT4 would be so big as early as at the NT3.1 launch party or is this just a typo?
  • Yeah, it's a typo :)
Page 1 of 2 (19 items) 12