IEInternals

A look at Internet Explorer from the inside out. @EricLaw left Microsoft in 2012, but was named an IE MVP in '13 & an IE userAgent (http://useragents.ie) in '14

Downloading ZIP-Based Formats

Downloading ZIP-Based Formats

  • Comments 4

More and more file formats are based on the ZIP format. The Open Packaging Conventions use ZIP as a base format, and that means frameworks like .NET’s System.IO.Packaging also generate files that are valid ZIP files. The Office 2007+ formats are ZIP-based, and more personally, Fiddler’s SAZ Format is ZIP-based.

Unfortunately, this trend toward ZIP-based packaging incurs a problem when dealing with file types that are not registered in the server’s configuration. When sending unknown types, a simple server will typically send a Content-Type: application/octet-stream header, indicating very generically that the download in question is of a binary type without providing specific information. Internet Explorer’s MIME-sniffing code kicks in and says, hey, I see that you’ve provided a generic type. Lemme check that content and see if I know what it is.

Now, the sniff for ZIP formats is dead-simple: Does the file start with 0x50 x4B (aka ‘PK’)? If so, then it’s probably a ZIP file. And in the case of ZIP-based formats, the browser’s technically right, but behaviorally wrong. If the server didn’t specify a Filename in a Content-Disposition: attachment header, Internet Explorer will promptly rename the file away from its original extension to .ZIP. The browser will then consult with Windows and determine that the .ZIP file should be opened by a MIME Handler.

For instance, downloading from http://webdbg.com/dl/saz.saz results in the following modal prompt:

image

If you choose Open, the MIME Handler is invoked and shows the guts of the ZIP file:

image

If you choose Save, the file will be saved to your downloads folder as a .ZIP. This is generally not what you want.

As a mitigation for this problem, Internet Explorer 9 included an exemption list for the most popular ZIP-based formats of 2010; downloads whose URLs bore the following extensions are not renamed:

.accdt; .crtx; .docm; .docx; .dotm; .dotx; .gcsx; .glox; .gqsx; .potm; .potx; .ppam; .ppsm; .ppsx; .pptm; .pptx; .sldx; .thmx; .vdw; .xlam; .xlsb; .xlsm; .xlsx; .xltm; .xltx; .zipx

To avoid this problem for all ZIP-based types, servers have two options:

  1. Send a specific MIME-type identifying the file’s type
  2. Use a Content-Disposition header to specify the filename

For instance, when the server is reconfigured to send a Content-Type: application/x-fiddler-session-archive MIME, the user gets the expected Download Manager notification, and the file extension is untouched:

image

The changing web suggests that it probably makes sense to get out of the business of sniffing ZIP files, as such sniffing is likely now causing more problems than it solves.

-Eric

  • I had this problem with Internet Explorer 8 on Windows XP, regrading Office 2007 formats and Outlook Web Access.

    My dad saved a Word 2007 or Excel 2007 file attachment from a message and it opened it as a ZIP file, driving him mad, as he does not know what a ZIP file is, or what to do with it when all he did was downloading his attachment.

    What solved it for me, weirdly enough, was installing the Microsoft .NET Framework Version 1.1 Redistributable Package, it fixed it.

    www.microsoft.com/.../details.aspx

    By the way, can you explain why that solved it?

  • > the most popular ZIP-based formats of 2010

    And .JAR wasn't listed? How was 'most popular' determined?

    (for reference, .zipx has 71,000 results on Google, whereas .jar has 3 million results)

  • Can you explain why this renaming behaviour is desirable in any situation?  When the download already has an extension trying to guess and override it seems like a bad idea.

  • @PhistucK: Fascinating. I don't think installing .NET1.1 was meant to have any intended impact on this scenario. Having said that, IIRC, that version *did* install a MIME Filter for application/octet-stream, so it's possible that there was some side-effect to having it installed.

    @fr: There's an incorrect premise in your question. As described in the post, in the scenario I describe, the file only has the name inferred from the URL; there's no such thing as an "extension" per-se. Would you expect a ZIP file delivered from a .PHP or ASPX page to retain that extension?

    @RichB: It's an excellent question. I don't recall exactly, but my expectation is that JAR wasn't on this list because it wasn't a "new" format-- it's existed since the 1990s, and thus most sites that host JARs already worked properly. In addition, JAR isn't a common end-user-download format (e.g. it's mostly used by webpages in the operation of applets) and users who do happen to directly download a JAR are more likely to be developers who would know what to do with it, even if it were renamed.

Page 1 of 1 (4 items)
Leave a Comment
  • Please add 4 and 3 and type the answer here:
  • Post