T4: VBScript and the Terminator

T4: VBScript and the Terminator

  • Comments 11

I had some free time on the flight to Canada and my laptop with me, so maybe I will do a little blogging on my vacation after all.

A week ago or so a reader asked me to talk a bit about how class termination works in VBScript.  Let’s start by looking at a simple case:

Class Foo
   Public Name
   Public Other  
   Private Sub Class_Terminate
       MsgBox Name & ": Goodbye, world!"
   End Sub
End Class

As you can see, the class has an “event handler” which is executed when an instance is just about to be released.  (I put scare quotes on there because, though it has the form of an event handler, in fact the underlying mechanism that runs the terminator is completely distinct from the other event handling code in VBScript.)

JScript, as you may know, has a nondeterministic mark-and-sweep garbage collector.  It’s “nondeterministic” because it runs pretty much when it feels like it. There is a scheduling algorithm, but from the perspective of the program, it’s pretty much random; it’s quite tricky to predict when it’s going to run.  And it’s “mark and sweep” because every collection, it runs through every script-allocated memory block, checks if it is still accessible by the script, and if it isn’t, the memory is freed.  This makes it mostly immune to the “circular reference” problem, where two or more otherwise dead objects reference each other.

VBScript’s garbage collector is completely different.  It runs at the end of every statement and procedure, and does not do a search of all memory.  Rather, it keeps track of everything allocated in the statement or procedure; if anything has gone out of scope, it frees it immediately.  We can see this in action by watching terminators run.

Let’s consider a simple example.

Sub Bar
   Dim X
   Set X = New Foo
   X.Name = "X"
End Sub
MsgBox "Starting Bar"
Bar
MsgBox "Done Bar"
 
outputs, as you’d expect:

Starting Bar
X: Goodbye, world!
Done Bar

As soon as Bar’s End Sub runs, X goes out of scope.  Since that’s its last reference, the terminator runs.

It’s that “last reference” part that’s the rub.  Objects are reference counted in COM – every object keeps track of how many other objects are keeping it alive, and when that drops to zero, the object deletes itself.

The trouble is, what if two objects are each keeping a ref on each other, but are otherwise dead?  The “circular reference” problem wasn’t so much a problem before we added classes to VBScript. After all, without the ability to define new objects in script, the only objects you really need to worry about are external ActiveX objects – and not even JScript can break circular references involving ActiveX objects, because the ActiveX objects know nothing about JScript’s garbage collector, and vice versa.

But with objects in VBScript, suddenly it becomes quite bad.  You can very easily write scripts that consume memory in the form of objects that reference each other circularly, and they won’t go away when they go out of scope:

Sub XYZ
   Dim Alpha, Bravo
   Set Alpha = New Foo
   Alpha.Name = "Alpha"
   Set Bravo = New Foo
   Bravo.Name = "Bravo"
   Set Alpha.Other = Bravo
   Set Bravo.Other = Alpha
End Sub

MsgBox "Starting XYZ"
XYZ
MsgBox "Done XYZ"

outputs

Starting XYZ
Done XYZ

Uh oh.  Neither of those guys was terminated when their variables went out of scope.  All the memory associated with them is still reserved by the heap. But they’re not reachable now.  There are no variables that reference them.  There’s nothing the script programmer can do about it now.

It would be unfortunate indeed if such circular references really did live until the end of the process, when the whole heap is cleaned up, both because we’d leak memory, and because those terminators might be doing something interesting.

Fortunately, we can do slightly better than that. If you run the code above in Windows Script Host, you’ll see that the terminators run as soon as the script ends; they are cleaned up eventually.  In IE, the terminators run when the page is torn down.  And in ASP, the terminators run when the page is served.  (Though of course in ASP you don’t want to be putting up message boxes, or accessing the ASP object model – the page is just about to get served, don’t try to mess with it!)

How was this implemented?  And how does this solve the memory leak problem?

It’s pretty straightforward. Every time a new instance of a VBScript class is created, it adds itself to a special list maintained by the engine, and whenever one is destroyed, it removes itself from the list.  When the VBScript engine itself is closed by the host, the engine runs down the list and runs all the terminators of all the VBScript objects left on the list. Then it runs down the list a second time and clears every field of the object.  This breaks the circular reference. Better late than never.

In Windows Script Host, this really isn’t much help because the engine isn’t closed until the process is about to be torn down anyway. But in IE and even more importantly, ASP, script engines are closed much more aggressively. It was very important that we not add a new feature to the language that made it really easy to write ASP pages which leaked memory and forced you to reboot your server frequently.

There are some wrinkles that we ran into when implementing the shutdown logic. Let’s see if you can deduce what they are; I’ll give the answers in my next post.

QUESTION #1: When I first wrote the VBScript class code, that’s not exactly the shutdown sequence I wrote. The original logic ran down the list of objects once, going terminate, clear, terminate, clear, terminate, clear.  Soon after that I rewrote the shutdown sequence to the present terminate, terminate, terminate, clear, clear, clear.  Why did I make the change to a two-pass implementation? What was wrong with the original one-pass implementation?

QUESTION #2: The code that implements VBScript classes checks to see if the instance has been terminated already before it calls the Class_Terminate method.  It should be clear from the foregoing why that is – if an object is in a circular reference and manages to survive until the engine shuts down, then the object will probably be terminated while there are still references held on it.  The later “clear all fields” phase of the shutdown sequence could then release all the references, triggering the “we’ve just released the last reference, so call the terminator” logic.  We therefore have to protect against the terminator running twice.

There’s another way that an object can be terminated twice that we protect against.  In this scenario, not only can the object be terminated twice, but an incorrect implementation of the object deletion code can crash the process.  What is it?

Hint: the scenario does not involve circular references. It has just a single object being terminated because it is going out of scope and has no other references.

QUESTION #3: Speaking of which, why do we want to ensure that terminators don’t run twice?

QUESTION #4: This is not a question about termination per se, but there is a termination angle in the answer.  Why do we run the garbage collector at the end of every statement, instead of only at the end of every procedure/global block?  After all, no local variable is going to go out of scope until the end of its procedure! Doesn’t that seem like a lot of work for no possible gain?

"I'll be back" in the new year -- have a good festive holiday season everyone!


 

  • Q4) Because VBScript programmers liked to write very very long procedures - and only collecting at the end of a procedure would be a very long wait. These statements may not even be in a procedure - they could be global.

    BTW - It cannot have been "very important" to add the circ ref cleanup code to VBScript - otherwise the same code would have appeared in VB6 and my colleague wouldn't have spent the past couple of weeks tracing memory leaks in code written by long-forgotten developers...
  • Q3: You'd get problems when you try to clean up certain resources twice (like closing connections or something) when the programmer of the class does not expect his terminator to run more than once.
  • Q1: If you terminated and then cleaned in object 1, then object 2's terminate would error out if it had a reference to object 1 and tried to call an obj1 method (for whatever reason).
  • > only collecting at the end of a procedure would be a very long wait

    But only collecting WHAT? Local variables do not go out of scope until the end of the procedure, so what would be collected? Why isn't every collection after a statement a no-op?

    > These statements may not even be in a procedure - they could be global.

    OK, same objection. Global variables don't go out of scope until the engine is destroyed, so why do collections at all in global scope? In JScript, you do collections to destroy circular-referenced objects that are otherwise dead. But VB's collector does not detect circular references. So why do we bother ever doing a collection except to free up local variables?

    > It cannot have been "very important" to add the circ ref cleanup code to VBScript - otherwise the same code would have appeared in VB6

    This is a specious argument. VB6 and VBScript have a lot in common in terms of the language they parse, but they have almost NOTHING in common in terms of their usage cases.

    In VB6, the "program" does not end before the process ends. Therefore VB6 relies on the destruction of the entire process heap to clean up circular references when the program ends.

    But in script, the "program" ends all the time -- script is all about starting up little programs, running them, ending them, and starting another. That's how ASP works! If we waited for the process to go down, every script that created circularly ref'd VBScript class instances would leak memory, and eventually take down the ASP process.

    There are lots of things we added to script that were important for script but not in VB6. The "class" statement, Eval and Execute, a runtime that can compile and run programs in chunks, before the whole program is downloaded -- these were necessary for script, but not for VB6.

  • PatternGuru: Correct, we ensure that the terminator runs exactly once so that the author can rely upon that invariant.
  • Steve: Correct! We need to ensure that an object is not cleaned until everyone is done with its properties.
  • Eric> But only collecting WHAT [after a statement]?

    Maybe to let "= Nothing" actually do something. Or maybe VBS creates a bunch of anonymous objects (ex: perhaps with the string addition operator).
  • RJ beat me to it with the anonymous objects. Also Mr. Lippert beat me to it in the original posting, with Question 2 being the answer to Question 1.

    Meanwhile, the original posting talks twice about terminators executing under Internet Explorer, and at least once about the possibility that terminators might do something interesting. Surely you'd better not depend on terminators doing anything interesting under Internet Explorer? Some days my Internet Explorer windows don't crash, some days they offer multiple times to send crash reports to Microsoft, and there are still some occasions when they crash without offering to send reports. The average is still more than once per day. Whatever someone might be using a terminator for in IE, you'd better be prepared for it not getting executed.

    By the way, did this post cause Arnold Schwarzenegger a lot of confusion over his job execution?
  • Setting an object to Nothing does immediately clean it up if that's the last reference.

    "Anonymous" objects are the thing to think about here. A statement can create arbitrarily many temporary strings and objects that need to be cleaned up.

  • Here's a shot at Q4:

    Since VBScript is an interpreted language, statements are executed one line at a time. Therefore, whenever the IL (can't think of the proper name) is executed it makes sense to clean up after every call because when the statement is finished the interpreter could come across objects with zero references (ie the "Anonymous" objects Eric refers to).

    Think about the size of the script if the interpreter DID NOT deallocate all of the anonymous strings you may use during execution. String concatenation would be a lot more expensive than it is now in terms of taking up memory.
  • Quick correction:

    Think about the size of the script if the interpreter DID NOT deallocate all of the anonymous strings you may use during execution.

    I meant dellocate anonymous strings at the end of the method or global block.
Page 1 of 1 (11 items)