How Do Script Engines Implement Object Identity?

How Do Script Engines Implement Object Identity?

  • Comments 7

I've talked a few times in this blog about the semantics of the equality operators in various languages. (Such as here, and here.) Recently a reader asked me how JScript implements object identity. That is, given two objects, how do we know if they are "the same object" or not?

For the remainder of this discussion I'm going to assume that we've got two non-null objects that we're comparing for identity. The details of how objects are compared to strings, numbers, etc, will have to wait for another blog entry.

Note also that in VBScript, the comparison operator for objects is Is, not =. Also, == and === in JScript behave identically if both arguments are objects; essentially the implementation of Is, == and === are all the same as far as object comparison goes.

Behind the scenes, everything in a VBScript/JScript program is represented as a variant. Most of the time, variants containing objects will have their variant type field set to VT_DISPATCH, and their pdispVal field set with a pointer to the object. (Occasionally we'll have a VT_UNKNOWN with a punkVal field instead, but lets ignore that corner case for now.) Suppose we want to compare two such objects for equality. We could pass the pdispVal of each to a helper function and compare the pointers.

bool AreObjectsEqual(IDispatch * pdisp1, IDispatch * pdisp2)
{
  if (pdisp1 == pdisp2)
    return true;
  return false;
}

And we're done. Wow, that was easy.

Except that it's wrong. In VBScript, those objects could be two different dispatch pointers to the same object because VBScript supports non-default dispatches. JScript does not support non-default dispatches, but in JScript there are also situations in which you can have two different dispatch pointers to the same object; we'll come to that later.

If its not clear why the two pointers to the same object could be numerically different, read Raymond's article on how COM objects are laid out in memory, and you'll see why. There can be many dispatch vtables.

How can we ever determine when two COM objects are the same? We have to rely upon one of the most important rules of COM: if an object multiply inherits IUnknown then calling QueryInterface for IUnknown on any of those implementations must always give back the same pointer. As you can see in Raymond's example of multiple inheritance, there are two IUnknown vtables. It is illegal for each of them to return a pointer to itself when QI'd for IUnknown; the implementation must consistently pick one of them every time.

So now it's pretty clear what we must do:

bool AreObjectsEqual(IDispatch * pdisp1, IDispatch * pdisp2)
{
  IUnknown * punk1 = NULL;
  IUnknown * punk2 = NULL;
  if (pdisp1 == pdisp2)
    return true;
  if (pdisp1 == NULL || pdisp2 == NULL)
    return false;
  pdisp1->QueryInterface(IID_IUnknown, (void**)&punk1);
  pdisp2->QueryInterface(IID_IUnknown, (void**)&punk2);
  // This should never fail, but better safe than sorry.
  if (punk1 != NULL)
    punk1->Release();
  if (punk2 != NULL)
    punk2->Release();
  // We're not dereferencing the pointers, so it is OK to use the
  // locals even after their release.
  if (punk1 == punk2)
    return true;
  return false;
}

Unfortunately, that's not the whole story either.

For security reasons, Internet Explorer sometimes creates proxy objects -- little objects that detect if they are being used safely, and if they are, then they forward the call to the real implementation, on a separate object. You can run into situations in IE where you have two proxy objects to the same "real" object -- as far as COM is concerned, all three are different objects, but as far as IE is concerned, that's an implementation detail of the security system which we would like to hide from the script users.

To solve this problem, IE implements a special interface that knows when two IUnknown objects are both proxies for the same object. The script engines actually do something like this:

bool AreObjectsEqual(IDispatch * pdisp1, IDispatch * pdisp2) {
  IUnknown * punk1 = NULL;
  IUnknown * punk2 = NULL;
  IObjectIdentity * pObjectIdentity;
  bool fRet = false;
  HRESULT hr;
  if (pdisp1 == pdisp2)
    fRet = true;
  else if (pdisp1 != NULL && pdisp2 != NULL) {
    // This must always succeed.
    pdisp1->QueryInterface(IID_IUnknown, (void**)&punk1);
    pdisp2->QueryInterface(IID_IUnknown, (void**)&punk2);
    if (punk1 == punk2)
      fRet = true;
    else if (punk1 != NULL && punk2 != NULL) { // This should never be false.
      hr = punk1->QueryInterface(IID_IObjectIdentity, (void **)&pObjectIdentity);
      if (SUCCEEDED(hr)) && pObjectIdentity != NULL) {
        hr = pObjectIdentity->IsEqualObject(punk2);
        if (hr == S_OK)
          fRet = true;
      }
    }
  }
  if (pObjectIdentity != NULL)
    pObjectIdentity->Release();
  if (punk1 != NULL)
    punk1->Release();
  if (punk2 != NULL)
    punk2->Release();
  return fRet;
}

Which is a heck of a lot of code just to determine if two pointers are equal, but that's the tax you pay for object-oriented programming. Of course, you don't need to do any of this rigamarole with IObjectIdentity unless you're writing code that messes around with IE objects.

  • Ah, so *that's* how to use IObjectIdentity... Gotta love those obscure interfaces that are "documented" but so minimally they're hardly even usable...
  • Yeah, the doc is pretty unclear. That's what I'm here for!
  • Compared to the plain C version, return lhs == rhs;, you're paying 2 lines of tax to C++ and 25 lines of tax to COM. More if you want to use { } for single line if statements. That's a heck of a lot for a 1 line function! Automatic memory management would let you reclaim 8 lines paid to COM.

    I wonder how many processor cycles this translates into in the end.
  • OK, how would you implement object equality in a world with C++-style multiple inheritance?

    The COM part of the tax is in managing the reference counts, but having some kind of mechanism whereby the default vtable can be determined seems to me to be inherent in multiple inheritance languages.
  • I'd relocate to a nicer world! One with tropical beaches, no or single inheritance, and simple == object equality. I may miss out on some neat features of course, but I enjoy seeing the sun set over the palm trees.

    The common user scenario for JScript of simple scripts in web pages can be done really simply. In C, or Haskell, or Fortran IV, or whatever language floats one boat. Enterprise administrators controlling machine policy by scripting ActiveX controls is the fringe scenario. If you want that, you have to live in the C++/COM world with all its complexity. The last 1% is taking an extra 2700% work here (and probably more to design and debug). I'm sure Microsoft has thick binders detailing the exciting business reasons for doing this, but the point I'm trying to make is that it's a lot of work to do so.

    In the article you said "that's the tax you pay for object oriented programming". But that's not a tax you're paying to OOP (necessarily). It's possible to write wonderfully OOPy code that just runs simple scripts in web pages and gets away with == equality. You're paying because you want to support big features.
  • "that's the tax you pay for object oriented programming".

    That's the tax you pay for using COM, which is allowing such things as "could be two different dispatch pointers to the same object".

Page 1 of 1 (7 items)