SimpleScript Part One: DllMain is Boring

SimpleScript Part One: DllMain is Boring

  • Comments 21

In talking with our support engineer it's just become more muddled.  I'm pretty sure now actually that the customer does not want to build a script engine, but whether they want to build a script editor, a script host or a script debugger is unclear.

Before I go on, let me take this opportunity to say that anyone who wants to create a script debugger from scratch really is crazy.  It took us many programmer-years to implement the Microsoft Script Debugger.  There are dozens of interfaces that you need to get right.  A debugger, even a simple, basic one like the MSD is a very complex piece of software.  I don't understand how the MSD works, and therefore I'm not about to try and explain it to anyone else!  Unless you have a lot of experience developing debugging technologies and understand it inside out, I'd recommend against going anywhere near the script debugger interfaces from the debugger side.  From the engine and host sides, maybe, but not the debugger side.

I'm going to forge boldly ahead with my plan to develop a script engine from scratch.  And if that works out, maybe we'll do a script host as well. This will take some time, be warned.

I'm going to post the code so far as "articles" so that they don't show up in the feed, and then call out any "interesting" parts of the code as posts.  I've just dropped in the first few files.  So far we can register and unregister the script engine, but not actually create the engine yet.

The code should compile against VC6 or above, and I'm only testing it on Windows XP. 

I don't normally like to get all legal, but I think this would be a good time to point out that standard disclaimers apply.  This code is provided as a public service.  I'm writing it from scratch in my spare time, and I don't have a phalanx of Microsoft testers ensuring that everything works.  Code is provided as-is, with no warranties expressed or implied, if you compile it and run it and the world ends, that's not my problem, etc, etc, etc.  Aside from that, you're free to do whatever you want to this code.

OK, now that we've got that out of the way, let's take a look at what's going on here.  So far this is just your basic Windows DLL.  This is pretty boring, but hey, we're just getting going here.

In simplescript.def we've got some boilerplate export code that says that this DLL exports the four standard functions to start up, register, unregister and shut down the DLL.

In guids.h I define some useful guids.  So far the only thing there is the class ID for our new engine, because we'll need that when we register it.  Since the guid is a structure, it must both be declared as a symbol in the header, and somewhere actually turned into object code.  The very short guids.cpp performs the latter task.

In headers.h we've got your straightforward "include the world" header file.  It might be wise to turn this into a precompiled header when this thing starts getting huge, but we'll cross that bridge when we come to it.

I am paranoid about putting assertions into my code.  The point of an assertion is to document in an active way what is known to be true about a program -- not what we hope is true, not what is true most of the time, but what must logically be true.  Assertions are better than comments because assertions will tell you if they are ever violated -- comments will not!  In assert.h you'll see that I define a macro that determines whether a given condition is true, and some special-purpose macros that check for things like "is this memory valid?"  More on those in a minute.

I sense the outrage -- didn't I just say that I hated macros? Indeed I do, but in this case, I really like the ability to automatically determine the file and line number, and I want this stuff to be zero-impact in the retail build, so in this case they're worth it.  Notice also that I am using macros in a very specific way.  I'm not introducing new control flow primitives, etc.  No one is ever going to use this thing in an expression.  Also note that, where possible, the macro immediately calls a method which can be debugged.

If you take a look at those methods in assert.cpp you'll notice a few oddities.  First off, you'll notice that I do not call IsBadWritePtr or IsBadReadPtr, because these guys are incredibly gross.  Basically, the way that IsBadFooPtr works is that it simply attempts to do the action, wraps the attempt in a try-catch, and returns true if an exception is thrown.  If the memory really is bad because you're at the end of the stack then this can potentially fault the stack in a bad way.  If you're in a multi-threaded scenario then IsBadWritePtr can mess up your program due to race conditions. 

Now, I realize that in this case we're talking about some debug-only code here -- if either of these ever return true then there's a major bug that's got to be fixed!  But still, they give me the shivers.  I would much rather have my assertions actually ask the operating system whether that's a good pointer than "try it an see" the way IsBadFooPtr does.  (And in fact there is some code in the script engines that needs to check whether a pointer is bad, and it is a main-line common scenario that it will be -- in a not-debug-only scenario like that you cannot rely on the IsBad methods.  Why we have to do that is another story, which I may blog about some time.)

And finally we come to the first bit of code that actually has semantic import -- the DLL startup, shutdown and registration code in dllmain.cpp.

The shutdown code is straightforward -- as you will see in the next episode, we will maintain a reference count on the DLL and only shut it down when there are no outstanding references.  The startup code is even more straightforward -- we simply cache the module handle in a global variable.

It is extremely important that the startup and shutdown code be boring, boring, boring.  If you don't understand why that is, or you feel tempted to put something cool in there, read this, this, this, this, this and this.

The registration code is the only code that's interesting at all here, and even it is pretty boring.  First off, take a look at the coding style of DllRegisterServer.  Except in some specific exceptions, the code I write in this sample is going to be brain-dead obvious insofar as the error code paths and object lifetimes are concerned.  Almost every method I write will follow this pattern:

  • if internal method, assert arguments are good.  If public method, check arguments for validity.
  • Initialize everything that needs to be freed later to NULL.
  • Initialize out parameters to NULL.
  • Do stuff.  If you get an error, go to the bottom of the function
  • Fill in “out” parameters
  • Free everything

Glancing through the code now, I see that there are a few places where I forgot to assert that input strings were valid.  I'll fix those up later.

Does this technique produce long, boring routines that emphasize the error handling and make it harder to see what the function is doing in the mainline case?  Yes.  But error handling and cleanup is sufficiently important that it is worth calling out -- I need to make sure that it is right, and the best way I know to do that is to make it obvious what everything is doing.  And besides, I'm going to keep the routines short enough that it should be fairly easy to see what they're doing even if every line of “main line“ code has three lines of error handling to go with it.

The registration code creates these keys:

HKCR\SimpleScript\(Default) = "SimpleScript Language Engine"
HKCR\SimpleScript\CLSID\(Default) = "{...}"
HKCR\CLSID\{...}\(Default) = "SimpleScript Language Engine"
HKCR\CLSID\{...}\ProgID\(Default) = "SimpleScript"
HKCR\CLSID\{...}\InprocServer32\(Default) = "c:\simplescript.dll"
HKCR\CLSID\{...}\InprocServer32\ThreadingModel = "Both"

as well as some category keys that say "this thing is a script engine".  The "OLEScript" tags are for backwards compatibility with VERY old script hosts, like in the Win16 timeframe -- strictly speaking I would imagine that they are no longer necessary, but it doesn't hurt.

The only thing interesting about the unregister code is that it fails silently.  Actually writing code that checks the error conditions, determines whether the keys couldn't be deleted because (a) they're not there, (b) they are there but you can't see them because you don't have access or (c) they are there but you don't have access to delete them is a pain in the rear.  It's not great that this fails silently when a non-admin tries to unregister the engine, but I'm not going to lose any sleep over it.

Next time, I'll build on this foundation a bit by adding a class factory and the skeleton of a script engine.  Then we can get into the actual script interfaces.

  • I don't think I'm ever going to have a need for writing my own script engine, but I'm going to enjoy reading your adventures over the next weeks/months. Especially if at the end of the day you try to do this in C# too.

  • Title says it all.
  • > Why not use ATL?

    You obviously haven't been reading this blog very long!

    I am not a big fan of ATL. I understand COM, I'm good at writing COM programs, and I don't like the abstraction that ATL provides. I don't like the way it adds new syntax to C++, I don't like smart pointers, I don't like templates, I don't like debugging ATL memory leaks through a dozen indirection layers.

    Read my past entries in the AAARGH! series or, way back, the "Smart pointers are too smart" entry for more details.
  • Just an observation:

    I know many programmers say goto is evil, but, I think you have shown a purpose for it's existence.

    It really cleaned up the error handling where it was used.
  • Leaving the general pros/cons of ATL aside, I think if the ultimate goal is to do this (mostly) in C#, it will be eaiser to get there from native C++/COM.
  • "goto" gets a bad rap. Yes, it can be misused, and when it is misused, it is horrible. But just because something can be misused is no reason to eschew using it entirely!

    Basically what I'm doing here is exception handling on the cheap. After all, if we were writing this code in exception handling style, the semantics wouldn't change:

    declare blah
    clean up blah

    All that is doing is making the gotos invisible, but they're still there. Remember, when it comes right down to it, an exception is a "goto" that carries state and transfers control to a cleanup block.

    I like the benefits of exception handling, but the exception handling style doesn't mix well with COM programming. Therefore I'm going to apply some discipline to my functions and get some of the benefits of exception handling without actually using exception handling.
  • > I don't like the way it adds new syntax to
    > C++, I don't like smart pointers, I don't
    > like templates, I don't like debugging ATL
    > memory leaks through a dozen indirection
    > layers


    > "goto" gets a bad rap

    Now I get it - you're a C programmer!
  • Yes, exactly. When I write COM programs, I'm essentially a C programmer. I use only a very small subset of the features of C++ when writing COM programs.

    When I move away from the directly-interfacing-with-COM parts of the engine and into the more "computer sciency" parts like the lexer/parser/generator I'll start using more and more C++ features -- inheritance, etc.
  • Actually I have been reading your Blog for a while and quite enjoying it:) I was almost hoping for an insite into whether using ATL was a personal preference or something more company wide.

    I have implemented COM objects using both ATL and straight C++ and while ATL doesn't buy you too much when implementing anything not supported by *Impl mixins it still reduce the initial burder. And while I almost always use ATL I almost never us smart pointer unless the scope of usage is very small.
  • "Personal preference" vs "company wide" isn't really the right way to look at it. It's very important to use the right tool for the job and not fall into the temptation of using tools just because they're the lastest cool thing. People often ask me whether we use C# because Bill issued some proclamation that thou shalt use C# -- but it isn't like that at all.

    Some teams use a lot of ATL, some don't. And in fact, we have a mixture of ATL, "straight" C++ and C# on my team, depending on what works best for a given task.
  • >>Yes, exactly. When I write COM programs, I'm essentially a C programmer. I use only a very small subset of the features of C++ when writing COM programs<<

    Soooo, I guess this begs the question. Are you a fan of OOP in general? It would be interesting to hear about which features of C++ you DO use. Any MI, templates? Or do you mold your C code around object oriented principles?

  • That's a big question which deserves an entry on its own. Let me just sum up by saying that OOP is a means, not an end. I am a big fan of OOP when it achieves my goals -- the efficient construction of large-scale software -- and opposed to it when it impedes those goals.

    As you know if you read my recent entries, I am very concerned about Object Happiness, the disease that strikes (typically novice) OOP programmers. It makes people write "Hello World" programs using virtual abstract base classes.

    To actually answer your question -- my team uses single inheritance a lot when modelling things that have clear IS A relationships. We are beginning to use C# generics more and more, but mostly as consumers of, not producers of parametrized types.

    I use multiple inheritance of concrete classes only very rarely and C++ templates even less often. Those things don't solve problems that I've got very well.
  • Also, pet peeve: to "beg the question" is to ANSWER a question in a circular manner, NOT to speak in such a way that encourages questions.

    For example, if you said "Why does Cindy Crawford look so good on those magazine covers?" and I answered "Obviously, because she's so photogenic!" then I have begged the question. I've answered the question in a way that because of its circularity, imparts no new information.
  • Unfortunately, "begs the question" is so commonly used to mean "speak in such a way that encourages questions" that for most situations, that is what it means. In like manner, "inflammable" should mean not flammable, but due to improper usage, its definition has changed.
  • That's proper usage. You are making an incorrect deduction based on a misleading ambiguity in Latin.

    In Latin, the prefix "in" usually means "opposite". This comes to us in English as the prefix "un". Most English opposites use the English form "un", obviously, but a few use the Latin form. ("That's indecent!" for example.)

    But "inflammable" clearly comes from "inflame", which does NOT mean the opposite of "flame". You think that if you get an inflammation of the ankle that it's going to feel cool, or hot?

    What's up with that? Where does this come from?

    It comes from the Latin PREPOSITION "in", which is used to INTENSIFY a word or phrase. So "inflammable" means "more intensely flammable".

    Leaving your incorrect etymology aside, more generally your position is known as the "descriptivist" position -- that the meaning of a word or phrase depends on what people intend it to mean, and that therefore you really can't get it "wrong". The opposite position is "prescriptivist" -- the belief that words have meanings and that they can be used wrong.

    Though I certainly appreciate that English is an evolving language with a rich and flexible idiom (see the Wordzguy blog of my colleague Mike Pope for some great examples), I'm also of the belief that C-A-T refers to a small housepet and E-L-E-P-H-A-N-T does not EVEN IF I say that "my elephant is getting into fights with the other neighbourhood elephants."

    There is value in preserving the distinction between subjects and objects, between flout and flaunt, between infer and imply, between disinterested and uninterested, enormity and enormousness...
Page 1 of 2 (21 items) 12