Larry Osterman's WebLog

Confessions of an Old Fogey
Blog - Title

Riffing on Raymond - Network performance...

Riffing on Raymond - Network performance...

  • Comments 12

I keep on doing this, clearly it's evidence of a lack of imagination on my part...

Raymond's post a while ago discussed some of the problems with network latency (no, I'm not going to touch that particular can of worms).

It's amazing how many people don't understand how big a deal this problem is.  When I joined the Exchange team back in the mid 1990s, the perf team was spending a HUGE amount of time analyzing the Exchange store RPC traces trying to figure out ways of squeezing out every single byte from the RPC traffic.

They'd defined compressed forms of Exchange EntryIDs, they were considering encoding Unicode strings using some neutral encoding (UTF8 hadn't been invented at that point, so they were trying to roll their own).

I came on the team and looked at what they were doing and was astounded.  They were sweating bricks trying to figure out how to squeeze out individual bytes of data from each packet.

The thing is that the reality was that for the vast majority of cases, all that work didn't actually make a difference.

The reason has to do with the basic nature of Ethernet based networking (token ring and ATM have different characteristics, but my comments here apply to them as well, it's just that the numbers and behavior characteristics are slightly different).

In general for all LAN networks, it takes essentially the same time to send one byte of data as it does to send 1K of data.  When you start sending more than 1K of data, then the numbers will start to grow (because you're sending more than one packet), but even then, the overhead of sending 10K of data isn't significantly higher than sending 1K.

On the other hand, round trips will KILL your performance.  So if you've got a choice between sending 100 messages with 1K in each message and 1 message with a 100K payload, you want to send the 1 100K message all the time.

Needless to say, I'm MASSIVELY glossing over the issues associated with sending data across a network, the above is simply a reasonable rule of thumb - the round trips are what matters, not the bytes being sent.

 

Now, having said all that, when you're dealing with dial-up networks, the rules are completely different.  On a 9600 baud connection, it takes one millisecond to send one byte, which means that every single byte counts.  In the Exchange case, since Exchange was designed for corporations with wired networks, it made sense to design the client/server protocol for the LAN environment.  But when you're designing a feature that's intended to be used over dialup, the rules are totally different.  Among other things to consider, on a dial up network, the modems themselves do compression, so compressing the data before transmission isn't always a benefit (compressing already compressed data tends to increase the size of the compressed data (assuming the compression algorithm's worth its salt)).

  • [Why is this under "Programmer Hubris"? Because it's about developers who find "an easy fix" and apply...
  • > When I joined the Exchange team back in the mid 1990s,
    [...]
    > (UTF8 hadn't been invented at that point

    UTF8 had been invented around 10 years before that point.

    However, UTF8 was already incompatible with national character set encodings (except for one nation's encoding).  UTF16 hadn't been invented yet.  Therefore actually it's understandable why someone might try to roll their own encoding at that point.
  • Maybe the two of you could work out a topic to both blog about ahead of time and you could do "trading fours with Raymond" instead.
  • According to Rob Pike (http://www.cl.cam.ac.uk/~mgk25/ucs/utf-8-history.txt), utf-8 was invented at the end of 1992. It was first announced to the public in 1993, but not standardized until 1996 (when UTF-16 was also standardized). It was likely that those Exchange developers didn't know about it back in 1994-5 when I presume Larry was noting this activity.

    Of course the point is moot because compressing utf-16 Unicode strings by encoding in utf-8 is pointless -- while ASCII gets shorter, most characters get longer.
  • No, about 3 years before that point.

    http://en.wikipedia.org/wiki/UTF-8#History
  • Also - interesting essay on latency: "It's the latency, stupid!"

    http://www.stuartcheshire.org/rants/Latency.html
  • Larry,

    I agree that it's amazing how many developers don't understand this.

    I also think it's amazing that I've never seen this information written down in such a drop-dead, straightforward manner until now.

    Thanks for taking the time to do it.
  • So yesterday, I commented on how I was glossing over lots of details about how to make a client/server...
  • Two people have written that UTF8 was invented in 1992 rather than in the mid-1980's.  Probably I stand corrected.

    In the mid-1980's I read a published paper about an encoding scheme.  Years later when I read the description of an encoding scheme called UTF8 the scheme looked very familiar but the name was not.  I assumed that it was just the same scheme and that it had been given a name.  Now I must assume it wasn't the same scheme.  I must have jumped to a conclusion based on inadequate recollections of the 1980's paper.  I apologize.
  • > assuming the compression algorithm's worth its salt

    I thought only encryption algorithm's worth was based on its salt ;).

  • PingBack from http://www.keyongtech.com/2244052-sql-server-8-421-max/2

  • PingBack from http://paidsurveyshub.info/story.php?title=larry-osterman-s-weblog-riffing-on-raymond-network-performance

Page 1 of 1 (12 items)