Larry Osterman's WebLog

Confessions of an Old Fogey
Blog - Title

Sometime soon, this internet thing is going to be really big, ya know.

Sometime soon, this internet thing is going to be really big, ya know.

  • Comments 13

But sometimes I wonder if it’s getting TOO big.

I don’t normally try to do two rants quick succession, but there was a recent email discussion on an internal mailing list that sparked this rant.

There’s a trend I’ve been seeing in recent years that makes me believe that people somehow think that there’s something special about the technologies that make up the WWW.  People keep trying to use WWW technologies in places that they don’t really fit.

Ever since Bill Gates sent out the “Internet Tidal Wave” memo 9 years ago, people seem to believe that every technology should be framed by the WWW. 

Take RPC, for example.  Why does RPC have to run over HTTP (or rather SOAP layered over HTTP)?  It turns out that HTTP isn’t a particularly good protocol for RPC, especially for connection oriented RPC like Exchange uses.  About the only thing that RPC-over-HTTP-over-TCP brings beyond the capabilities of RPC-over-TCP is that HTTP is often opened through firewalls.  But the downside is that HTTP is typically not connection oriented.  Which means that you’re either have to re-authenticate the user on every RPC or the server has to cache the client’s IP address and verify the client that way (or by requiring a unique cookie of some kind).

Why does .Net remoting even support an HTTP protocol?  Why not just a UDP and a TCP protocol (and I have serious questions about the wisdom of supporting a UDP protocol)?  Again, what does HTTP bring to .Net remoting?  Firewall pass-through?  .Net remoting doesn’t support security at all natively; do you really want unsecured data going through your firewall?  At least HTTP/RPC provides authentication.  And it turns out that supporting connection-less protocols like HTTP caused some rather interesting design decisions in .Net remoting – for instance, it’s not possible to determine if a .Net remoting client has gone away without providing your own ping mechanism.  At least with a connection oriented transport, you can have deterministic connection rundown.

Why does every identifier in the world need to be a URI?  As a case in point, one of our multimedia libraries needed a string to represent the source and destination of media content – the source was typically a file on the disk (but it could be a resource on the net).  The destination was almost always a local device (think of it as the PnP identifier of the dev interface for the rendering pin – it’s not, but close enough).  Well, the multimedia library decided that the format of the strings that they were using was to be a URI.  For both the source and destination.  So, when the destinations didn’t fit the IETF schema for URIs (it had % characters in it I believe, and our destination strings didn’t have a URI prefix) they started filing bugs against our component to get the names changed to fit the URI schema.  But why were they URIs in the first place?  The strings were never parsed, they were never cracked into prefix and object. 

Now here’s the thing.  URIs are great for referencing networked resources.  They really are, especially if you’re using HTTP as your transport mechanism.  But they’re not the solution for every problem.  The guys writing this library didn’t really want URIs, they really wanted opaque strings to represent locations.  It wasn’t critical that their URIs meet the URI format, they weren’t ever going to install a URI handler for the identifiers, all they needed to be were strings.

But since URIs are used on the internet, and the internet by definition is a “good thing” they wanted to use URIs.

Another example of an over-used internet technology: XML.  For some reason, XML is considered to be the be-all and end-all solution to a problem.  People seem to have decided that the data that’s represented by the XML isn’t important; it’s the fact that it’s represented in XML.  But XML is all ABOUT the data.  It’s a data representation format, for crying out loud.  Now, XML is a very, very nice data representation.  It has some truly awesome features that make representing structured data a snap, and it’s brilliantly extensible.  But if you’re rolling out a new structured document, why is XML the default choice?  Is there never a better choice than XML?  I don’t think so.  Actually, Dare Obasanjo proposed a fascinating XML litmus test here, it makes sense to me.

When the Exchange team decided to turn Exchange from an email system into a document storage platform that also did email, they decided that the premier mechanism for accessing documents in the store was to be HTTP/DAV.  Why?  Because it was an internet technology.  Not because it was the right solution for Exchange.  Not because it was the right solution for our customers.  But because it was an internet technology.  Btw, Exchange also supported OLEDB access to the store, which, in my opinion made a lot more sense as an access technology for our data store.

At every turn, I see people deploying solutions that are internet, even when it’s not appropriate.

There ARE times when it’s appropriate to use an internet technology.  If you’re writing an email store that’s going to interoperate with 3rd party clients, then your best bet is to use IMAP (or if you have to, POP3).  This makes sense.  But it doesn’t have to be your only solution.  There’s nothing WRONG with providing a higher capability non-internet solution if the internet solution doesn’t provide enough functionality.  But if you go the high-fidelity client route without going the standards based route, then you’d better be prepared to write those clients for LOTS of platforms.

It makes sense to use HTTP when you’re retrieving web pages.  You want to use a standardized internet protocol in that case, because you want to ensure that 3rd party applications can play with your servers (just like having IMAP and POP3 support in your email server is a good idea as mentioned above). 

URLs make perfect sense when describing resource location over the network.  They even make sense when determining if you want to compose email (mailto:foo@bar.com) or if you want to describe how to access a particular email in an IMAP message store (imap://mymailserver/public%20folders/mail%20from%20me).  But do they make sense when identifying the rendering destination for multimedia content? 

So internet technologies DO make sense when describing resources on the internet.  But they aren’t always the right solution to every problem.

 

  • >Ever since Bill Gates sent out the “Internet Tidal Wave” memo 9 years ago, people seem to believe that every technology should be framed by the WWW. <

    Boy are you Microsoft people Microsoft-centric. Good god. That Microsoft-internal memo wasn't the start of the web, you know.
  • My point, actually, isn't to point up your ignorance of the world outside Microsoft, or in fact to be insulting at all. It's to say that you're not the first person to realize this. I find your posts very insightful on occasion, but you would do very well to start googling outside the insular world of Microsoft.com. This particular rant is not original, nor, in 2004, is it insightful. It was insightful in 1998. Know what I mean?
  • Well, the RPC-over-HTTP was a god send in the Outlook/Exchange case! Without that feature you couldn't use Outlook/Exchange in any situation where you have a notebook and plug into networks at different locations (i.e. companies) at all, because every company I know blocks outgoing traffic on those ports that are used normally for Outlook-Exchange communication. So, REALLY excellent feature there!!! And don't talk about things like UPnP. That is nowhere deployed. I think every client app should communicate via http with servers, otherwise there are just SOO many hurdles to overcome to use it in certain places.

    The ideal solutions seems to be Groove: It always picks the correct way, the user NEVER has to worry about anything. It just ALWAYS works :)
  • David, you're right, being able to tunnel Exchange over HTTP out through corporate firewalls IS good, so maybe that wasn't a totally reasonable example. But just because RPC over HTTP was a good idea, why is .Net remoting over HTTP a good idea?
  • Going over HTTP is quite important for deploying applications to the consumers. It is not uncommon for an ISP and companies to install a firewall that blocks certain types of outgoing traffic, while HTTP traffic is usually let through. HTTP is a well-documented protocol too, so everyone knows how to work with it (i.e. how and when to cache it or not, how to modify it, etc).

    HTTP's popularity is the declaration that the transport protocol is not important to the application -- only the payload is. So HTTP is used as the easiest way to support getting the payload from point A to point B.

    In addition, things like WinInet made HTTP even more popular by making it extremely easy to write Windows app that talk to a web server.

    The emergence of XML also is simply just folks stumbling upon a format that becomes accepted as a good-enough way of sending/storing structured data. The fact that it might be suboptimal for certain use cases is often outweighed by the advantages of having it be in a format for which lots of tools and development frameworks already exist.
  • Ilya, I had a much longer version of this, but my KVM switch ate it. So you get the short form.

    I've already conseeded that HTTP was a good choice for Outlook's RPC protocol because of the firewall bypass.

    But XML is another story. Just because tools exist to parse and generate XML files doesn't mean that XML is the right choice for a data format.

    I'll pick on Exchange again. When Exchange was deciding to do a text representation for an NT security descriptor for its HTTP/DAV and OLEDB clients, hy did Exchange choose to use a propriatary XML format (http://msdn.microsoft.com/library/default.asp?url=/library/en-us/e2k3/e2k3/_exch2k_web_storage_system_security.asp)? The existing SDDL format (http://msdn.microsoft.com/library/default.asp?url=/library/en-us/security/security/security_descriptor_definition_language.asp) was sufficient, and equally interoperable.

    It was because XML was "internet" and buy definition "internet" was better than "proprietary".

    Look at Dare's litmus test. Think about the uses of XML that you know of. Do all of them fit the test? I didn't think so. The XML security descriptor doesn't IMHO - there are no non-windows clients that can possibly make reasonable use of the data (since they don't support NTs security model). .Net remoting's use of XML as a serialization format MIGHT, if .Net remoting is going to interoperate with a non Windows .Net remoting implementation. But I don't believe that interoperability with non-windows platforms was a design goal of .Net remoting.
  • @#$ s/buy/by/
  • >>"About the only thing that RPC-over-HTTP-over-TCP brings beyond
    >>the capabilities of RPC-over-TCP is that HTTP is
    >>often opened through firewalls."

    But is it a good thing? If an administrator has blocked ports used by RPC then it means that he/she doesn't want RPC calls to go through his/her network. There were several buffer overflows in RPC so it's understandable why an administrator would want to do it. Using HTTP because HTTP traffic is allowed to go though firewalls is essentially circumvention of security policy. If network owner doesn't want his/her network to be used for anything other than browsing web pages, then this policy should be respected and is should be possible to enforce such policy. Doing "everything over HTTP" makes it hard to separate different kinds of traffic.

    There is somewhat similar issue with Browser Helper Objects. Trojans and Spyware often register a Browser Helper Object and use it to communicate to the server, so it appears as if traffic originates from IE. Home users, who are most vulnerable to Trojans and Spyware usually use software firewalls like ZoneAlarm or Kerio. If Trojan/Spyware is a standalone .exe, then user gets a message like "adpoper.exe tries to communicate with ads.evilserver.com." If malicious piece of software is a Browser Helper Object, user doesn't get such message.

    I don't claim to be a security expert (I'm just a student:-)). That's just my first thoughts.
  • Dmitriy: it's an arms race between the firewall developers (and the system administrators) and the general software developers. To handle the fact that everything goes over HTTP, the firewall hardware/software does content filtering.

    So we start using 'standard' thing-over-HTTP protocols such as SOAP. Again, the content filter has to get smarter to determine if what's being sent over SOAP is sensible. And so on.

    The channel gets narrower, but we try to pump more data through it.
  • Larry: the whole XML being used for Everything really does worry me at times. It's one of those things where I really think that the UNIX way of doing things has taken over from the Proper Software Engineering way of doing things.

    For example, the UNIX way is to have everything as a plain text file (these days, if you're lucky, it's UTF-7 or UTF-8). Mail protocols? They're text based. They're CR/LF delimited. Computers work much happier when they can be given the length of the data before the data itself, but no... we'll terminate with a randomly placed CR/LF at the end of a line.

    This goes further to XML - let's have a storage format which takes perfectly nice BINARY data and expands it by a factor of 20 or more to create TEXT data!

    The usual argument for this is human readability and interoperability. (And the fact that with compression you don't have to worry about how much space it takes over the wire. Me, personally, I worry about how much energy compression and decompression takes that could be put to better use - like not powering a computer).

    Human readability? For a computer file format? Excuse me while I throw up. Computers read binary. Humans read text. Write a program to turn the binary into text. The data will be used faster by the apps that consume it, people will be able to edit it... everyone's happy.


    The final irony though? The fact that XML is nothing new. Back in the 80s there was another standard for passing data around which worked much better than XML - mainly because it had all the "benefits", plus it was a binary format from the get go - and it was called ASN.1.

    I hate XML. I really truly cannot see a good reason to use it for 90% of applications - certainly not as many as actually do use it. I really think that it's a cop out to avoid thinking on the part of the architect.
  • Hey Frodo - lighten up. He's talking about what has happened/is happening at Microsoft. Of course it's Microsoft-centric.

    He doesn't think the world revolves around Bill Gates, or that the WWW started with it - that's just your own biased spin on what he's saying.
  • This guy is really arrogant! Who the hell he thinks he is? Arrogant and a miser! Guys like Brian Valentine and David Treadwell have become SVP and GM. This guy still thinks he's a hot shot! Bullshit!
  • PingBack from http://mydebtconsolidator.info/story.php?id=17969

Page 1 of 1 (13 items)