SimpleScript Part Four: Finite State Machines and Script Engines

SimpleScript Part Four: Finite State Machines and Script Engines

  • Comments 11

Last time I said that I'd discuss finite state machines (also sometimes called finite state automata, same thing.)  The FSM is a fundamental idea in theoretical computer science because it models computing machinery in a very simple, abstract and general way.  Basically it goes like this:  an FSM has a finite number of states (duh).  Each state accepts a finite number of inputs.  Each state has rules which describe the action of the machine for every input.  An input may cause the machine to change state.

That's extremely general, I know.  Let's consider a specific example -- a machine which has two inputs: a slot that can take in quarters, and a button.  It has two outputs: quarters and gumballs.  And it has two states: GOTQUARTER and NOQUARTER.  We can chart the actions given a certain input:

 

Insert Quarter

Press Button

NOQUARTER

Change state to GOTQUARTER

Do nothing

GOTQUARTER

Output the quarter

Output gumball,

Change state to NOQUARTER

That's a pretty darn simple machine.  Modern CPU's are also finite state machines -- after all, every bit in a machine can only be in two states, and there are only so many possible binary inputs from mice, keyboards, hard disks, and so on.  The number of possible states a modern computer can be in is mind-bogglingly huge -- far more than the total number of particles in the universe -- but because the rules for the state transitions are very well defined it all works out nicely.  (Because the number of states is so large, it is often more helpful to model machines as "Turing machines", which can have unlimited internal storage.  But I'm digressing.)

The script engines use this idea of finite state machines explicitly.  A script engine has six possible states, and it performs certain actions as it transitions between those states based on "inputs".  In this case, the inputs are via calls to the various methods on the script engine.  The six states are Uninitialized, Initialized, Started, Connected, Disconnected and Closed.  A script engine is created by a host -- Windows Script Host, Internet Explorer, Active Server Pages, whatever.  The host is responsible for changing the script engine state as it sees fit, within the rules I'll describe over the next few entries.

The script engine begins life Uninitialized.  An uninitialized engine does not know anything about its host, and therefore cannot call any methods provided by the host.  There's very little you can do to an uninitialized engine, but what little you can do, you can do from any thread, subject to the rental model.  I'll talk more about threading issues later.

There's only one way to initialize an engine, and that's to pass a pointer to the host into SetScriptSite.  Once an engine is Initialized it becomes apartment threaded, with a few exceptions that I'll talk about later.  There is a lot more you can do with an initialized engine -- add named items (again, more on those in the future), add script source code, and so on.  However, the code cannot actually run yet.  At this point, the most commonly used input that affects the script engine state is the appropriately named SetScriptState method.  More on that below.

A Started engine actually runs the script code, but does not hook up event handlers.  (I discussed implicit event binding last time.)  Moving the engine into Connected state hooks up the event handlers.  If the host wants the event handlers to temporarily stop handling events, the hose can move the engine into Disconnected state.

That's the usual order in which things happen at startup.  To tear down the engine, the IActiveScript::Close method tears down event handlers, throws away compiled state, throws away the reference to the host and moves the engine to Closed state. 

The actions associated with SetScriptState are easiest to summarize in a table.  (Except for illegal calls, this also sets the engine state to the new value.)

 

Old State

New State passed to SetScriptState

Uninitialized

Initialized

Started

Connected

Disconnected

Closed

Uninitialized

Do nothing

Error -- use SetScriptSite

Error - can't start an uninitialized engine

Error

Error -- can't disconnect an unconnected engine

Error

Initialized

Discard some named items

Throw away host information

Do nothing

Execute script code

Execute script code

Bind events

Error

Discard all named items and script blocks

Throw away host information

Started

Discard some script blocks, and as above

Discard some script blocks

Do nothing

Bind Events

Error

As above

Connected

Discard event bindings, and as above

Discard event bindings, and as above

Discard event bindings, execute code

Do nothing

Suspend event handling

Discard event bindings, and as above

Disconnected

As above

As above

As above

Resume event handling

Do nothing

As above

Closed

Error

Error -- use SetScriptSite

Error

Error

Error

Do nothing

As you can see, I've written methods that implement these state semantics in engine.cpp, though plenty of them are still stubbed out.

Next time, I want to talk more about the threading model, explain away a few oddities, and then we'll change gears for a bit and implement a simple language.  That'll get us into language design, how to expose syntax errors, and so on.

  • Eric,

    I'm a little confused about the Unitialized and Closed states. They seem to have the same permitted operations. Transition to same state allowed, transitions to other states disallowed, SetScriptSite to initialize engine. The difference seems to be that moving to Uninitialized kills only some of the scripts and named items. Moving to Closed kills all of these. "Some" is probably going to get elaboration in the future I guess.

    What confuses me is their relation to the starting state. Logically, the engine is going to start with no scripts or named items. Moving to Unitialized from a running engine might leave you with scripts or named items. This seems different from the starting state. Moving to Close leaves you with no scripts or named items. Why isn't the starting state equivalent to Closed?
  • Excellent question. Your logic is impeccable -- there really does seem to be very little difference between Uninitialized and Closed except for the fact that an Uninitialized engine might be holding on to persistent scripts/items and a closed engine never does. (There are other differences as well having to do with debugging semantics and other advanced features, but they are not germane to your point.)

    There are two reasons why a freshly initialized engine starts in Uninitialized state. First, it would just be bizarre to create a brand new engine, ask it what its state was, and get "closed", don't you think? You didn't close it, so why is it closed?

    Second, though technically it is legal to take a closed engine and initialize it again, the intention of the Close method is that you call it right before you are about to do the final release on the engine.

    Therefore it would again be a little weird to start the engine in closed state, because the only thing you're supposed to do with a closed engine is destroy it!

    Looking at this design decision now, I think it would have been clearer to have another state in there -- Closed, Uninitialized, Reset, Started. Make "Closed" REALLY mean "Closed, we're not kidding", don't try to initialize the engine again. Make "Uninitialized" mean "fully uninitialized", "Reset" mean "non-persisting items discarded". Then if you wanted to fully uninitialize an engine to recycle it you could do so without having to Close it and then initialize it again.

    That would have been a more self-documenting design, but we're stuck with what we've got now.
  • A bit belatedly I got around to look at your engine code (it's Passover here in Israel). My first question is: are you going to put all these files in a zip for easier download and viewing?

    My second question comes from looking at the ScriptEngine members: why do you explicitly use this-> for almost all member access? Especially since you do prefix data members with m_. I also use the m_ prefix for members (s_ for static members) so I don't use this->. Do distinguish between member and non-member functions I use :: for non-members. Is the explicit use of this a habit you picked up from using JScript? Do you also code this way in C#?

    Third question (sort of related): why don't you use initializer lists in the constructor?
  • I hope you're having a pleasant Passover. Easter Sunday was absolutely gorgeous in Seattle.

    1) Good idea. Next time I do an update, I'll zip them up. (I might also start posting the change logs.)

    2) You're right -- with the m_ convention it is redundant.

    In general I'm pretty agnostic on stylistic things like "this->", brace positioning, etc. I don't have any really strong arguments one way or the other -- I try to just pick something that looks good and go with it.

    Different codebases do different things. The original vbscript/jscript codebase generally doesn't use this-> . The WSH codebase uses :: everywhere for non-members. (WSH was not originally written by the script team and was in a completely different coding style.)

    Your conjecture that I picked up this habit from C# and JScript is correct. If I were to make some kind of weak justification, I'd say that I prefer code to be as explicit as possible. If it's dereferencing the "this" pointer, I like that dereference called out in the code.

    I suppose I could get in the habit of :: to call out non-member functions as well, but I probably won't.

    3) Two reasons. First, it feels really weird to me to put a breakpoint on an item in a list. Second, it looks really weird to me to assign a value by using the syntax used for calling a constructor on something that doesn't have a constructor function. This just looks wrong to me, always has:

    CFoo::CFoo() : m_pBlah(NULL) {

    What? m_pBlah isn't a function that takes an argument. Call me crazy, but I want my assignments to look like assignments. I understand that this is really nice if you need to initialize members that have constructor functions, but I think that extending that syntax to things that are not objects leads to code that just looks weird.

    Maybe that's not a great reason, but hey, I'm writing this thing, so my irrational prejudices rule!
  • > I hope you're having a pleasant Passover

    Thank you, I have (it's almost over). Glad to hear you enjoyed Easter.

    Personally I'm too lazy for a coding convention that requires an explicit use of this. It's one of the things I dislike about JavaScript. I even used with for a while to avoid it, but with is just too broken.

    As for initializer lists, I use them because it means that the same initialization works just as well for both regular pointers and smart pointers (yes, I know you don't like them). Or more generally, the same initialization works just as well for built-in types and user defined types.
  • Another question about the Closed state - why does it exist at all? As far as I can tell the only thing you can really do in this state is dispose of the script engine instance. So why no just do that? I can understand the need for explicit closing in a GC environment like .NET but not in a reference counted environment like COM. Does it have to do with circular ref counting (with the host)?

    Looking at your code I see that you do call Close in your destructor. I seem to recall the script engine that shipped with IE4 did not do this while the version that shipped with IE5 did. The reason I recall this is that I had to debug strange behavior in an application that used the script engine. The problem turned out to be that in certain scenarios the script engine wasn't closed before it was released. This is example of the point I previously made about why it's problematic to put the onus of proper shutdown on the client instead of on the service provider (see my post about determinate finalization).
  • Yes, it has to do with circular references. The host has a ref to the engine, the engine has a ref to the site if it is initialized. The host tells the engine to break the circular ref by closing the engine.

    I agree that it is unfortunate that the host is responsible for the correct use of a complex interface like the script engine. That's why we wrote the Script Control, which does all the complicated stuff for you.

    I'm not sure whether the engine used to implicitly Close on the final release or not -- I'm at home right now and I don't have access to the logs. I'll check when I get into work.
  • The engine destructor called Close in the May 24th 1996 build, which is as far back as the logs I have on my machine go. That predates me, and certainly predates IE4!

    You must be thinking of some other bug that we fixed between IE4 and IE5 -- there were a lot of them...

  • That thing sure spits out a lot of quarters.... (isn't it meant to change the state to NOQUARTER when it has output one?)
  • D'oh, sorry, not paying enough attention. This is what a cold does to you :(
  • Eric, please correct the sentence:

    "Because the number of states is so large, it is often more helpful to model machines as "Turing machines", which can have unlimited internal storage"

    A Turing machine is a Turing machine because it can read/write its storage in random order. This makes it recognize languages of type a^nb^nc^n. Take away the write capability, it reduces to a FSA, capable of only recognizing a^n. Strip away the random access, languages recognized reduce to a^nb^n.

    Turing machines never have infinite states. No model for a computing machine has infinite states. What can be infinite is

    1- input set (can be infinite in any case)

    2- output set (can be infinite in any case)

    3- memory (aka. internal storage, is infinite only starting from push down automata)

    Consider the problem of having infinite states in Raymond style: What would happen if a machine could be constructed which had infinite states?

    1- There would be no way to encode those states in any base. A simple state machine can be encoded in ceil( log2(states) ) number of bits. This number is always infinite for infinite states.

    2- The machine's control would alone require infinite storage.

Page 1 of 1 (11 items)