SimpleScript Part Two: Class Factories Are Also Boring

SimpleScript Part Two: Class Factories Are Also Boring

  • Comments 7

Before I get into it, a Lambda poster pointed me at the NullScript project, which is a very interesting illustration of how reverse engineering works.  It's an implementation of a "null" script engine -- an engine with no language -- in ATL, which the intrepid developer created in order to try and understand how ASP works.  Typical programmer! They could have just asked me, but where's the fun in that?  J 

I want to go well beyond the scope of the NullScript project, in several ways. 

First off, someone asked why not use ATL?  Aside from my general distaste for the ATL style of programming, the more fundamental reason is that ATL is all about hiding the details of how COM works from you.  One of the points of this exercise is to show how it works.  In ATL, you see

    COM_INTERFACE_ENTRY(IDispatch)

And what does that tell you?  It's just a black box, and when you open it up, it's full of weird macros that I don't understand.  I'd much rather show you guys how this works at a much less abstract level.

Second, the point of NullScript was to have the smallest feature set that still worked, because it was being used as a probing tool, not a language tool.  I want to actually talk about practical concerns here, like good language design, how to implement IDispatch correctly, etc, not just write a logging tool.

Today, more boilerplate code.  I've added a class factory.

For those of you who don't know much about COM internals, you might wonder how instantiating a COM object works.  Basically, it goes like this.

First, there's a progid -- the human-readable string that describes the object you want to create.  As I mentioned yesterday, we create registry keys for the progid that map it to a class id…

HKCR\SimpleScript\CLSID\(Default) = "{...}"

… and then map that class id to a DLL…

HKCR\CLSID\{...}\InprocServer32\(Default) = "c:\simplescript.dll"

Thus, the registry has enough information to determine everything you need to know to actually get the code in memory -- the location of the code and the unique identifier for the class.

To create the object, COM loads up the DLL and calls DllGetClassObject in dllmain.cpp.  It doesn't ask for an object instance, as you might expect.  Instead, it asks for a class factory -- an object that creates objects.  The class factory in classfac.cpp then knows how to create the actual object.  Why this indirection?  Because  DllGetClassObject is fundamentally not extensible but COM objects are extensible -- you can add any interface on that you like -- thus the convention is to create a COM object that does the creation work rather than put more smarts into the entrypoint.  For example, the object might implement IClassFactory2, which enables licensing semantics. 

If you take a look at the implementation, you'll see that we have your basic class factory going here.  There's nothing fancy.

You might wonder what the locking mechanism is for --  this is for the scenario where you know that you're going to be creating a whole bunch of objects and need to ensure that you're not unloading the DLL unnecessarily.  That's an extremely unlikely scenario for us, but I've implemented the code anyway because its cheap, easy and the right thing to do.

As you can see, all of the objects are thread safe so far.  We'll get into the threading model of the script engine in more detail later, but as a refresher, you might want to read this.

The class factory does everything but actually create the engine -- that's still E_NOTIMPL

How far can we get now?  If we compile up the code, register the DLL and run it through WSH:

<job>
<script language="SimpleScript">
Testing
</script>
</job>

Then we get the DLL loaded, the class factory created, and a call to create the engine, which fails:

Windows Script Host: An unimplemented function was called : SimpleScript

We're doing pretty well so far, but we're still a long way from computing 2 + 2 or writing "hello world".

Next time, I'll talk about the engine and site interfaces, engine state, and a skeleton of the engine interfaces, which I'll then flesh out over several entries.

  • Great series of articles. Keep them coming:)
  • These entries come just when I'm thinking about implementing an engine. I'll be following these to completion, much appreciated :)
  • I have something of a "newbie" question about the class factory interface. I realize that the function signatures in ClassFactory are probably predetermined by COM, but...why are there so many parameters of type (ClassFactory **) and (void **) instead of (ClassFactory *&) and (void *&)?

    I realize that using references instead of pointers for "out" parameters wouldn't eliminate the need to call AssertOutPtr all over the place, but it should at least discourage other progammers from calling these functions with bad parameters, shouldn't it? And it would make the implementations a bit more straightforward to boot.
  • It is very, very rare to see C++ references in COM classes. I'm not sure why that is, but I could make some educated guesses.

    The first thing that comes to mind is that since COM is a binary interface, you have to be able to make it work with both C and C++. We know that under the covers, a reference is a pointer which is constrained by the compiler semantics to always point to valid memory. How does one generate a C struct which enforces that constraint? You can't.

    Really, the notion of "this argument must be a non-null pointer" should have been baked into the COM type system in the beginning, but unfortunately it wasn't.

    Like I said in a comment to another posting, basically I think of writing COM code as though I'm writing C code with a few C++ features thrown in for convenience.
  • >We know that under the covers, a reference is a pointer which is constrained by the compiler semantics to always point to valid memory.

    Oh, if only. Just for kicks, compile the following (legal, unless I'm mistaken) C++ code...

    int foo(int & intref){
    return intref;
    }
    int * pint = NULL;
    int bar = foo(*pint);

    ...and see where the segfault occurs: at the call to foo() (where it should in an ideal world) or in the implementation of foo().

    I have no illusions that C++ references are any safer than C++ pointers, I just think they make for slightly nicer code.

    >Like I said in a comment to another posting, basically I think of writing COM code as though I'm writing C code with a few C++ features thrown in for convenience.

    Point taken.
  • Ignore my useless comment on Part 1... it was DllGetClassObject that was not doc-ed on the day(s) I checked...
  • Links to iwebthereforeiam.com [sellsbrothers.com tools] [sellsbrothers.com links] [cheztabor.com] [graphicboost.com] [jelovic.com] [asp.net] NullScript [mvps.org languages] [mvps.org active] [mvps.org wshfaq] [ittoolbox.com] [codemode.org] [remotenetworktechnology.com]

Page 1 of 1 (7 items)