Comments on "Please Sir May I Have a Linker?"

Comments on "Please Sir May I Have a Linker?"

  • Comments 48

Several folks forwarded to me Joel Spolsky's post Please Sir May I Have a Linker in which he outlines the issues with the .NET Framework redist distribution problems and calls for a simpler solution:  a linker that could take your managed application and produce one atomic exe combining only the frameworks and engine components you needed to run.  Now deployment is simpler and smaller, config issues are far less worrisome, and you don't need to track tons of versions of the CLR coming out.  It's an appealing idea. Such technologies have been around for 50 years now, and we've played with some of them in our own space for prototyping purposes.

But do you really want a linker?

As is usually the case with this kind of technology there are lots of pros and cons.  Joel's post covers most of the pros so I won't repeat them.  I admit I have been tempted to write such a tool myself.  I was the lead developer for the Metadata team in V1, wrote the file format spec (along with SusanRS), and helped design the CLR portion of the MC++ compiler ("IJW" -- yes, I own my fair share of blame for that damn loader lock bug -- Whibey is on its way).  Although I don't code full time any more, I know I could create such a tool and it wouldn't even be that hard.  I personally would use it to isolate our build tools from interim breaking changes because compiling yourself with an older version of yourself can get tricky.

But consider the down sides:

Intellectual Property - There will be people left, right, and center on this one.  I don't want to provoke that debate with this post.  Suffice it to say it would have to be thought out.  Honestly it isn't even the worst of my worries at least for the core engine -- we already give away a lion's share of the code through Rotor (SSCLI).  Let's acknowledge it's an issue and move on.

Working Set - Say this kind of tool was wildly successful, and the majority of applications out there using the CLR wound up deploying this way.  Each of those processes would wind up with their own copies of the code used to run themselves, using their own address ranges because it's a linkers job to merge pieces into a new thing. There would be no sharing of pages whatsoever between processes.  This would drive up system wide working set, making all of us want to go to Fry's for more memory cursing on the way there about what a pig Windows had become now that the CLR is so popular.  There are legitimate working set issues we are addressing now; this makes a tough job even harder. 

Servicing - I have a love/hate relationship with this one.  I hate the fact that there is no way for me to patch the code with new bug fixes that may be making the system more unstable than it should be.  But as an app writer I love the fact that fixes other people apply to my machine don't muck up my perfectly working application.  You now understand why I am so conflicted about technologies like the GAC.  Call this one a wash.

Security - But here's the real kicker: security.  Put aside stereotypes and flames you might be tempted to hurl for a minute and think rationally about the problem.  Say you used a tool like this and produced a little P2P file share utility ala iTunes sharing music between computers.  You have this program resident on the Start bar on all your machines, listening on a port for any friends to come along.  Now along comes a virus that Microsoft needs to patch in a hurry.  How do I do that?  You've statically linked the code with the defect I need to patch into your P2P app and who knows how many other such apps?  The vulnerable app might have simply been copied to the disk making it harder to find.  How do I go patch that thing and not leave your machine vulnerable?  Such issues already exist for template libraries where updates can only occur through recompilation of anyone who has ever written and deployed a program using them.

There are more potential cons but you get the idea.  We could have a good and spirited debate about mitigation strategies, how in a perfect world security bugs could never exist, and you could point out that Lawn Darts were only dangerous for those who didn't know what they were doing.  There may even come a point where enough mitigation techniques can be brought to bare that I could be convinced it was ok to do this.  But I feel there is a lot of risk associated with such a tool, and even though it is appealing in a number of ways, it isn't necessarily the right thing to do.

  • The difference between this post and Mr. Spolsky's is that you respectfully approached the issue and he did not. Thank you for explaining the problem and separating out the emotion and politics associated with this issue. Posts like this help to explain why I read you and why I don't read Mr. Spolsky.
  • Excellent explanation.
  • It's still an issue in statically linking C++ libraries, yet this is still common practice on many platforms. While I understand the security argument, it's certainly not the trump card that you've played it as.
  • >It's still an issue in statically linking C++ libraries, yet this is still common practice on many platforms.

    Let's say a .NET app uses things from System, System.Windows.Forms, System.Xml etc, and statically links to all these assemblies. In the non-managed world that would be equivalent to statically linking to CRT, MFC, ole32, msxml etc. This is definitely not common practice (actually only CRT and MFC can be statically linked, and even that is strongly discouraged, for the reasons stated above).

    Linking in a few functions from a library might be OK but statically linking to entire runtime is a whole different thing.
  • There was one other point that seemed fairly obvious to me.

    He suggested that his programs would be around 5/6 MB each. The .Net framework is around 22MB. That is 3-4 times the size of his exe. So if he released 3 versions (or even patches) for any one version of .Net then the downlaod of the .Net runtime becomes insignificant.

    Add to that the fact there will be MANY applications from other vendors all being 5/6 MB or even more and you'll very quickly just over 22MB.
  • Funny, this is an issue I have been playing with the past few weeks. My problem is slightly different from what is portrayed here, but could be used using the same solution.

    I write lots of little tool programs (mostly commandline tools) for internal use in the research institute where I work. These tools tend to reuse some fragments of code I wrote before (e.g. commandline option handling). I really want to distribute these apps not via an installer, not via XCopy deployment, but via Copy deployment: just copy the one single executable file somewhere on the path and it works.

    The problem is that in the .NET framework, when you develop modularly, you also have to deploy modularly. What I miss, more than just a linker, is support for a .Net equivalent of 'static libraries', allowing modular development, but monolithic deployment.

    Unlike suggested in the original message, I have no need to link in any parts of the .Net framework; requiring the .Net framework to be installed on the client machine is perfectly acceptable. What I need is a way to fuse my own exe assembly with my own dll assemblies. And yes, for now it is perfectly ok if that just works with purely managed assemblies (excluding support for Managed C++).

    Note that the security argument in the original mail doesn't really apply to this scenario anymore.

    Just for information, I have been playing with a few avenues to get a monolithic exe.

    - One way is to use the 'source linking' option in VS.Net: When 'adding an existing item' to a C# project, pay attention to the dropdown arrow of the OK button of the dialog box, and change it to 'Link'. Just refer to the 'library' source files for each project you want to use the 'library' code. This is far from perfect, and may cause maintenance pains, but it works.

    - Another way is using Ildasm on all compiled assemblies, doing some voodoo to glue the .il files together, and next using ilasm to create a monolithic executable.

    - Yet other ways involve simulating a monolithic executable: package the library dlls into the executable as resources, or use your homebrew methods to append them to the executable (similar to self-extracting archives), and do some voodoo involving the AppDomain.Resolve event and one of the the Assembly.Load(byte[]) methods to load the dll from your 'archive' instead of from a file.
  • I for one would rather see a system that could piecemeal install the framework and perhaps assemblies that exist in the gac, something we could link into our apps directly.
    Pseudocode(native?) like:

    if (!clrIsInstalled)
    DoClrInstall();

    if (!neededAssembliesInstalled)
    DoAssemblyInstall();

    or what have you.

    There is obviously some problems with this, the major one being it would likely be *ALOT* more work than linking assemblies together. It would also require some kind of server side support(a distribution system), etc, and would really be a fair sized installer system that can be patched into an executable. It'd be quite a bit better than 5 or 6 meg assemblies though, I would suspect.

    The upside is, of course, most of the issues with the linker are circumvented(IP maybe an issue still and it does open new security issues, what if the *installer* code has a exploit?).
  • With respect to security being the "trump card", Pavel is right on here. You have to consider many factors when using this kind of technique, including (a) what scenarios will the code be exposed to and how much damage could it do in those scenarios, (b) what is the surface area of the code I will allow to manifest itself elsewhere, and (c) if I were exposed to a serious security issue, how would I react and protect my users? Since I own the CLR, my job is to try and make sure our code can't enable any nightmare scenarios, hence my caution.

    Steven - Good point on the aggregated overhead to the network at large.

    Luc - I know precisely what you are referring to here. There are engineering advantages in going this route: (a) you can limit your /r's while compiling your code to avoid picking up an even larger set of dependencies, (b) putting together smaller dll's that are related into fewer larger dlls can eliminate extra OS loader overhead and system working set, and (c) the scenario you mention. If you squint at a managed dll it kinda looks like an .obj file doesn't it? This is actually the technology I was referring to above that we had prototyped in the past. Take a look at this PDC deck {http://www.gotdotnet.com/team/pdc/4076/tls401.ppt&e=7421} (scroll to "MSIL Linking") and I think you'll be very happy to hear this is on the way! This technology is very useful in this way once you've vetted it with the checklist I mentioned.

    Daniel - I'm curious how close the new ClickOnce technology in Whidbey comes to your scenario? It doesn't really solve the .NET FX download (you'd have to add your own unmanaged shim). But it does strive to bring updates/missing files from a central server to your local machine for execution. {http://longhorn.msdn.microsoft.com/lhsdk/ndp/cpovrclickoncedeploymentoverview.aspx}
  • The issue here will go away in two or three years when most users have the framework installed. However, at present one of the main decisions facing a software developer is: "Can I afford to develop in a .NET language when this means I will lose a certain number of customers who are not able or willing to download 21MB just to try something out?".

    I am faced with this choice. I am developing an app. that will be launched from a single button on the IE toolbar. It has one dialog box and performs a very simple function. I want to use C#, which I know, but I may have to go back to C++ (which I don't like much or know as well) so that the download size doesn't put potential customers off.

    Your explanations for a linker being a bad idea are very clear and make a good case. Is there another solution then? Could MS divide the framework into 2MB chunks and download a bit with each of the next security patches? Could it be "download on demand" for the parts of the framework that an app uses? I know those are daft ideas but is there no choice but to wait until everyone has the framework?

    Joel's secondary point is that we have already had two versions of the framework. V1.1 wasn't a minor patch to V1.0, it was a complete new 21MB. How often is this going to happen?
  • Jason: I am curious exactly how far click once goes, I've yet to have had the time to play with it. My inital understanding was that ClickOnce was more oriented towards web deliever, I'll have to read up on it a bit more.
    Although, while it apparently does provide nessecery assemblies, patches, whatver, as you noted it doesn't help with the framework or runtime installation. An interim solution for installing the runtime itself piece by piece would be of value, just well beyond my skillset to achieve. Similarly a MS hosted distribution point for the framework would be important.

    In regard to Julians point, a download on demand system is much of what I was talking about. You'd need enough native code to check for the JIT, GC, and other core services to get the code up and running. We even get the luck of not having to consider portability(a .NET exe isn't going to run on anything but windows *without* a CLR being installed). However I don't even begin to understand what it would take to dynamically check for and then install the framework core components(try to bind to mscoree.dll, if that fails initate an install?). Although this is an area that the ClickOnce should have(and probably did) consider. To really achieve this Microsoft would have to create a much more miniature packaging system for the framework, something that includes only the core engine, not the libraries or asp.net or compilers, etc.
  • Good questions. Right now of course we have v1.0 and v1.1. We are working on the Whidbey release, which you saw at the PDC -- besides a new Visual Studio it is the power behind Yukon's SQL/CLR integration. And then finally we will have a version of the CLR that runs Longhorn (also seen at the PDC). Right now those last two are built from the same tree/source. Each of these new builds is (or will be) it's own thing with a separate redist. We actually started supporting parts of some XP SKU's and Windows 2003 with v1.1, so it comes with the OS in those cases. You can expect us to keep going that route.

    We're starting to veer into deployment and app compat which is worthy of some extra details in and of itself. Let me write something up a little more thorough that this edit box will allow and post that (stay tuned). One bottom line parting thought until then: we want people to write managed code on today's runtime, and we will do everything in our power to make that investment easier to deploy over time and work on the newer versions as they come out.
  • A complete non-issue within the Java world, see:

    http://www.manageability.org/manageabilityWiki/WholeProgramOptimization/view
  • You know, I'm not so much worried about a linker for the actual runtime. I'd like a linker for my own code. I don't want to have to include the source code files in multiple projects. Yet, I don't want to include references to many assemblies. I'd like the happy medium where my utilities can live somewhere in limbo and get brought into my assembly at the IL level.

    That way, I don't have to remember which file has the one method/class that I want to include in my project. You know, the one that does that thing like back in the day. What's it called? Oh yeah. Nope. Let's look over there. Nah...
  • Carlos - You should separate the architecture from current implementation. There is nothing inherit about MSIL binaries that disallow the kinds of features you are talking about. If you were willing to forgo richness like cross-assembly inlining of methods, reflection, and other data described technology that require the IL and/or metadata, then you could strip all of the above. Check out that MC++ PDC deck I included a link to above for more details about the Whidbey product, which in fact has many new features in this direction.

    Martin - I believe you are also describing the MSIL linker in whidbey I mentioned above. You compile up your utility code into a netmodule (eg: a dll), and then link many of them together into a deployable unit.
  • Martin, Luc: have you looked at ILMerge (at http://research.microsoft.com/~mbarnett/ilmerge.aspx)?

    Obviously this is a research rather than a production tool, so you should probably treat it with a little caution.
Page 1 of 4 (48 items) 1234