The JScript Type System, Part Three: If It Walks Like A Duck...

The JScript Type System, Part Three: If It Walks Like A Duck...

  • Comments 8

A reader asks "can you explain the logic that a string is not always a String but a regexp is always a RegExp? What is the recommended way of determining if a value is a string?"

 

Indeed, you are correct:

 

print(/foo/ instanceof RegExp);             // true

print(new RegExp("foo") instanceof RegExp); // true

print("bar" instanceof String);             // false

print(new String("bar") instanceof String); // true

print(typeof("bar"));                       // string

print(typeof(new String("bar")));           // object

 

Why's that?

 

First off, the question about strings.  In JScript there is this bizarre feature where primitive values -- Booleans, strings, numbers -- can be "wrapped up" into objects.  Doing so leads to some bizarre situations.  First off, as you note, the type of a wrapped primitive is always an object type, not a primitive type.  Also, we use object equality, not value equality. 

 

print(new String("bar") == new String("bar")); // false

 

I highly recommend against using wrapped primitives.  Why do they exist?  Well, the reasoning has kind of been lost in the mists of time, but one good reason is to make the prototype inheritance system consistent.  If "bar" is not an object then how is it possible to say

 

print("bar".toUpperCase());

 

? Well, actually, from the point of view of the specification, this is just a syntactic sugar for

 

print((new String("bar")).toUpperCase());

 

Now, of course as an implementation detail we do not actually cons up a new object every time you call a property on a value type!  That would be a performance nightmare.  The runtime engine is smart enough to realize that it has a value type and that it ought to pass it as the "this" object to the appropriate method on String.prototype and everything just kind of works out.

 

This also explains why it is possible to stick properties onto value types that magically disappear.  When you say

 

var bar = "bar";

bar.hello = "hello";

print(bar.hello); // nada!

 

of course what is happening is logically equivalent to:

 

var bar = "bar";

(new String(bar)).hello = "hello";

print((new String(bar)).hello); // nada!

 

See, the magical temporary object is just that -- magical and temporary.  Once you've used it, poof, it disappears.

 

But this magical temporary object does not appear when the typeof or instanceof operators are involved.  The instanceof operator says "hey, this thing isn't even an object, so it can't possibly be an instance of anything".  For both consistency and usability, it would have been nice if "bar" instanceof String created a temporary object and hence said yes, it is an instance of String.  But for whatever reason, that's not the specification that the committee came up with.

 

Second, your question about regular expressions is easily answered now that we know what is going on with strings.  The difference between regular expressions and strings is that regular expressions are not primitives.  Just because you have the ability to express a regular expression as a literal does not mean that it is a primitive!  That thing is always an object, so there is no behaviour difference between the compile-time-literal syntax and the runtime syntax.

 

Third, your question about how to determine whether something is a string is surprisingly tricky.  If typeof returns "string" then obviously it is a string, end of story.  But what if typeof returns "object" -- how can you tell if that thing is a wrapped string? 

 

It's not easy.   instanceof String doesn't tell you whether that thing is a string, it tells you whether String.prototype is on the prototype chain.  There's nothing stopping you from saying

 

function MyString() {}

MyString.prototype = String.prototype;

var s = new MyString();

print(s.constructor == String);            // true

print(s instanceof String);                // true

print(String.prototype.isPrototypeOf(s));  // true

 

So now what are you going to do?  JScript is excessively dynamic!  Basically you can't rely on any object being what it says it is.  JScript forces people to be operationalists.  (Operationalism is the philosophical belief that if it walks like a duck and quacks like a duck, it is a duck.)  In the face of the kind of weirdness described above, all you can do is try to use the thing like a string, and if it acts like a string, it s a string. 

 

  • Verry interesting. I had most of that already but was thrown off by the "magic temporary" construction. Thanks!
  • Eric writes, "...cons up a new object..." Busted. We knew there has to be a Lisp geek hiding in there.
  • Yeah, I knew someone would bust me. In truth, I have never written any nontrivial programs in Lisp, though I did a fair amount of Scheme programming when I was at UW. I just use the expression "cons up" to get geek cred from hard core lisp freaks. Whoops, did I say that out loud?
  • > all you can do is try to use the thing like a string, and if it acts like a string, it s a string Only problem here is that most anything can come out looking like a string, given the toString() method. Maybe ECMAScript should just go the way of TCL - declare everything a string a be done with it ;-) Oh, and also thanks for the "magic temporary". Just when you think you know everything about JavaScript, it throws you a curveball. BTW, this "magic temporary" sort of brings C++ anonymous temporaries to mind.
  • My comment above - again Remember Me doesn't seem to work. After some more though I should have realized the "magic temporary" behavior. If for nothing else then the example I gave for part one: Number.prototype.showType = function() { alert(typeof(this)); } (3).showType(); I even made a comment about auto-boxing. The reason I did not think through the full implications of this is that I was caught up in the concept that in ECMAScript everything is an object. Must say I'm somewhat disappointed that this isn't really the case (even though in most cases you really can't tell the difference). BTW anybody who has used Netscape's LiveConnect (at least version 4.x) knows the situation there is even worse. Strings returned by applets are of the type java.lang.String, which JavaScript identifies only as objects, and not even string objects. You need to apply the String function explicitly to these objects if you want to manipulate them using JavaScript. Rhino OTOH handles this scenario very well.
  • As I've pointed out in a comment to a previous post, JavaScript uses duck typing, that is it matches functionality by name (of property). In this respect the "real" type of an object doesn't mater very much, only the functionality it exposes. Consider the following example: function foo() { return { hello : "world" , bye : "everybody" }; } var a = foo(); var b = foo(); var c = new Object(); Do the objects referenced by a, b and c have the same type? If your answer is "yes, they all have the type Object" you miss the fact that a and b share a common structure. However, by any other criteria mentioned in the posts so far, a and b have nothing in common. To take it farther, what if I tack another property on to b: b.x = "y"; The fact that JavaScript objects are so malleable, and can be modified both internally and externally after they have been created, makes much of the type related info irrelevant. It is a shame, however, that ECMAScript does not provide a means to do operator overloading or create special properties like String's length. This would make it possible to create ADTs that are functionally equivalent to the internal types.
  • > duck typing Maybe we should read this as "ducks out of typing" ;-) > much of the type related info irrelevant So why don't we just call JScript an untyped (not-typed) language? Languages without data types are called "typeless" (BCPL, MCPL) JScript seems to have types but is not typed.
  • > if it acts like a string, it’s a string. And if it acts almost, but not entirely, unlike a string, it’s something that is almost, but not entirely, unlike a string, but will be mistaken for a string if the feature that is tested happens to fall in that “not entirely” exception category :)
Page 1 of 1 (8 items)