VBScript Terminators, Part Two

VBScript Terminators, Part Two

  • Comments 9

You guys came up with good answers to three of my four questions, which is about what I expected; question 2 was pretty hard.

To sum up:

QUESTION #1: Why does the termination logic go terminate, terminate, terminate, clear, clear, clear, instead of terminate and clear, terminate and clear, terminate and clear?

Because if the second object to be terminated has a terminator that accesses a property of the first object to be terminated and cleared, it will fail, which seems bad. We want to run all the terminators while the objects are still in good shape, and then blow them all away.

QUESTION #3: Why do we want to ensure that terminators don’t run twice?

Imagine a terminator which writes a "logging complete" message to a log file; you don't want it to run twice.

QUESTION #4: Why do we run the garbage collector at the end of every statement, instead of only at the end of every procedure/global block?

Because, though local variables are not going to go out of scope, temporary anonymous slots are. Consider

x = MyFunc("a" + s, new Foo)

That's going to allocate one temporary slot for the string and one for the object. The temporary string can be cleared whenever, but the temporary object should be released ASAP so that it's terminator runs ASAP. We therefore clean up all temporaries after every statement.

That leaves

QUESTION #2: In what scenario can a bad implementation crash the process and/or terminate the object twice?

When I first wrote the termination logic, the release code looked like this:

ULONG VBSClassInstance::Release(){
  --this->m_cRef;
  if (this->m_cRef == 0)
  {
    this->RunTerminator();
    delete this;
    return 0;
  }
  return this->m_cRef;
}

Looks like a perfectly straightforward implementation, right? But what if some bozo does this?

Dim Global
Class Foo
  Private Sub Class_Terminate()
    Set Global = Me
  End Sub
End Class
Sub Blah
  Dim Local
  Set Local = New Foo
End Sub
Blah
Set Global = Nothing

When Local goes out of scope, the terminator runs and sets the Global variable to the object which has just been terminated! Therefore it must live. We must write the VBSClassInstance::RunTerminator method to ensure that the terminator doesn't run twice, but that's the least of our problems. Look at the implementation of Release above carefully. When the terminator runs, the ref count will go back up to one, but we still delete the object! The script engine now has a global variable containing a pointer to deleted memory; this will crash the process, corrupt the heap, who knows what?

OK, so what if we go

ULONG VBSClassInstance::Release(){
  --this->m_cRef;
  if (this->m_cRef == 0)
  {
    this->RunTerminator();
  }
  if (this->m_cRef == 0)
  {
    delete this;
    return 0;
  }
  return this->m_cRef;
}

Is that better? Well, sure, it's better, but it's still wrong. Forget globals; consider this:

  Private Sub Class_Terminate()
    Dim TermLocal
    Set TermLocal = Me
  End Sub


Now what happens?  The "final" release when Local goes out of scope sets the ref count to zero and calls the terminator. The terminator increases the ref count to one by assigning it. Then when TermLocal goes out of scope, it calls Release on the object. We've now got a re-entrant Release method! The "inner" Release detects that the ref count has gone to zero and deletes the object. Then the "outer"  Release  reads from the now-invalid "this" pointer. Assuming that doesn't crash, it then probably corrupts the heap by releasing the object a second time.

The correct logic looks something like this:

ULONG VBSClassInstance::Release(){
  --this->m_cRef;
  if (this->m_cRef == 0)
  {
    ++this->m_cRef;
    // protects against re-entrant final release

    this->RunTerminator();
    --this->m_cRef;
    if (this->m_cRef == 0)
    {
      delete this;
      return 0;
    }
  }

  return this->m_cRef;
}

Writing correct shutdown logic is surprisingly tricky!

 

  • But the question is, how do you trigger to the script engine in a host environment, like in WScript.Quit, that you want to cleanup all global variables by running their terminators and then freeing their memory?
  • So if you didn't have compatibility to worry about, how would you like to handle object resurrection in a scripting language? Prohibit it entirely, finalizer called once per object, finalizer called every time object becomes unreachable? Is there an alternative more "scripty" than the others?
  • Well, like all language features, finalizers are tools. You pick your tool based on what job you want to accomplish.

    The people who designed the .NET GC had the problem of designing a GC that would work well with any language, but were particularly interested in application implementation languages like C# and VB.

    The .NET logic isn't exactly the same as the VBScript logic -- finalizers for object Beta can run even in Beta has a ref on Alpha and Alpha has already been finalized. But it is similar in several regards, including the fact that ressurrected objects are only finalized once.

    This page has lots of interesting information on .NET finalizers.

    http://msdn.microsoft.com/library/en-us/cpref/html/frlrfSystemObjectClassFinalizeTopic.asp

    If I were designing a scripting language today, I wouldn't include object finalization at all. You have a resource that needs cleaning up, script your own darn cleanup code.

    Heck, if I were designing a scripting language today it probably wouldn't even be object-based if I could help it. ML-style lists and tuples are a pretty darn scripty way to write programs.
  • Enigma2e: The finalizers are triggered when the engine is closed or reset. That is, when:

    * IActiveScript::Close is called

    * SetScriptState to UNINITIALIZED

    * SetScriptState to INITIALIZED

    So there you go. WScript.Quit closes the engine because of course it is not going to use it again. If you want to re-use the engine you can do a full or partial reset of the engine and it will throw away the existing object state as best it can.
  • Ugh, what a funny way of thinking about it.

    // We run the terminator when the last object is released.
    if (m_cRef == 1)
    RunTerminator();

    // Now release the object.
    int refs = --m_cRef;
    if (!refs)
    delete this;
    return refs;
  • Indeed, that would be a cleaner way to write this code. Clearly what happened to this code was that it evolved exactly as I described -- it was written wrong several times and then patched up, and is now a little crufty as a result.

    Which illustrates a good point: when you fix a bug, consider not just whether the fix is locally correct, but also whether re-jigging the function to make the whole thing cleaner is warranted. Sometimes it is, sometimes the added risk isn't worth it.

    In this particular case, the resulting code isn't so bad that it's an unreadable bug farm. When I started on the VBScript team, the DateDiff code had a bug in it. I investigated the history of changes to that function and discovered that there had been six bugs entered against it over the years (in the VB runtime library code base) and sure enough, the function had six special cases in it. Every time someone found an anomaly, they just stuck an if-then in there to make it better.

    Rather than stick a seventh, I threw away all the code and started over with a completely different approach, reducing it to four lines of obviously correct code instead of a dozen lines of unreadable mess. And now I ask how to implement that function as an interview question!
  • I admit that's a very tough call to make. It depends not just on the cost of testing the "more extensive" change, but also on the savings from not doing it until later (the "interest" accumulated on the unspent effort or from investing the effort elsewhere, if you will) vs. the risk that someone will take longer to understand the code next time.

    I was looking at it from another perspective though. Developers in general (myself included) are notorious for "solving the solution" when we really need to solve the problem. By thinking of it as you described ("how do we fix the case where ...") the answer was "we have to check the reference count after the operation instead". It would have helped to recognise that the problem was "We're running a method on an object that's effectively been deleted!" and reorder the code to avoid that. Maybe the distinction's only in my mind of course :)

    (When I first read the article, I wondered if the backstory was just artistic license - most people don't seem to write so fluently when they're recounting true stories instead of setting the scene for something fictional. Sorry for the unnecessary comments and thanks for clarifying.)
  • What *are* "ML-style lists and tuples"??? and what is an example of them?
  • New Jersey Standard ML is that strange beast, the practical research language. It's statically typed, has a very powerful type inference system (so you don't NEED to declare specific types when you declare, say, arguments to function, but you get the benefits of compile-time types). It's garbage collected. It supports both functional and imperative programming. It's designed for programming in the large.

    So far, that sounds like JScript .NET. One of the interesting things that sets NJSML apart is that the language strongly encourages recursive solutions by providing a very natural syntax for pattern-matching recursive data structures.

    Anyway, one of the basic patterns for making types in NJSML is the tuple:

    val triple : int * real * string = (2, 3.14, "hello")

    Here we have a variable "triple" of type "tuple of int, real and string" that is bound to a specific member of that tuple type.

    Tuples are like structs in C -- they have a strict hierarchical structure:

    val pair_pair : (int * real) * (string * string) = ((1,2.0), ("hello", "goodbye"))

    This is not a quadruple, this is a pair of pairs. The structure matters.

    Notice that, unlike C, the members of the structure can be anonymous -- you can drill down into the contents of the tuple via pattern matching. We could name them if we wanted. And we could also have a "names only" tuple which is called a "record":

    type complex = { re : real, im : real }
    val myComplex : complex = {re=1.2, im=2.1}

    A given tuple _always_ has the same number of members, and those members can be of any given type. A NJSML list, by contrast, has any number of members all of the same type. The empty list is written "nil".

    Lists are specified recursively: a list is either nil, or a head element followed by a tail list. You can therefore define pattern matching overloads for functions that manipulate lists. Suppose you want to compute the length of a list -- well, the length of the nil list is zero and the length of the list (h::t) -- that is, head followed by tail list is the length of the tail plus the 1 for the head:

    fun length nil = 0
    | length (h::t) = 1 + length t

    So if you then called the function

    length [2,4]

    the pattern matcher would say that's 2::[4], so add 1 to length 4::nil, so add one to length nil, that's zero, ok, 2.

    Notice that this function has only partially specified type. It takes any list, and the compiler deduces that it always returns integers, so that + is well-defined.

    NJSML is a really neat language, and I've just barely touched on it. There are lots of interesting things you can do with its type system.




Page 1 of 1 (9 items)