The JScript Type System, Part One

The JScript Type System, Part One

  • Comments 14

I thought I might spend a few days talking about the JScript and JScript .NET type systems, starting with some introductory material.

Consider a JScript variable:

var myVar;

Now think about the possible values you could store in the variable. A variable may contain any number, any string or any object. It can also be true or false or null or even undefined. This is a rather large set of possible values. In fact, the set of all legal values is infinite.Countably infinite, and in practice limited by available memory, but in theory there is no upper limit.

A type is characterized by two things, a set and a rule. First, a type consists of a subset (possibly infinitely large) of the set of all possible values. Second, a type defines a rule for transforming values outside the set into values in the set. (This rule may specify that certain values are not convertible and hence produce "type mismatch" errors.)

For example, String is a type. The set of all possible strings is an (infinite) subset of the set of all possible values, and there are rules for determining how all non-string values are converted into strings.

JScript Classic is a dynamically typed language. This means that any value of any type may be assigned to any variable without restriction. It is often said -- inaccurately -- that "JScript has only one type". This is true only in the sense that JScript has no restrictions on what data may be assigned to any variable, and in that sense every variable is "the same type" – namely, the "any possible value" type. However, the statement is misleading because it implies that JScript supports no types at all, when in fact it supports six built-in types.

JScript .NET, by contrast, is an optionally statically-typed language. A JScript .NET variable may be given a type annotation which restricts the values which may be stored in the variable. This annotation is optional; an unannotated variable acts like a JScript variable and may be assigned any value.

JScript has the property that a value can always describe its own type at runtime. This is not true in, say, C, where you can have a void* and no way of asking it "are you pointing to an integer or a string?" In JScript, you can always ask a value what its type is and it will tell you.

The concept of subtyping is not particularly important in JScript Classic though it will become quite useful when we discuss JScript .NET classes later. Essentially a type T1 is a subtype of another type T2 if T1's set of values is a subset of T2's set of values. A type consisting of the set of all integers might be a subtype of a type consisting of all the numbers, for instance. (This is not how the integers are traditionally construed; the C type system makes integers and floats disjoint types, where the integer 1 and the float 1.0 are different values that happen to compare as equal -- but comparisons across types is a subject for a later blog entry.)

Anyway, JScript Classic has six built-in types, all of which are disjoint. They are as follows:

The Number type contains all floating-point numbers as well as positive Infinity, negative Infinity and a special Not-a-Number ("NaN") values. It may seem odd that "Not-a-Number" is a Number but this does in fact make sense. NaN is the value returned when an operation logically must return a number but no actual number makes sense. For example, when trying to convert the string "banana" to a Number, NaN is the result. Because numbers in JScript are actually represented by a 64 bit floating point number there are a finite number of possible Number values. The number of numbers is very large (in fact there are 18437736874454810627 possible numbers, which is just shy of 2^64.) Numbers have approximately fifteen decimal digits of precision and can range from as tiny as 2.2 x 10^-308 to as large as 1.7 x 10^308.

The String type contains all Unicode strings of any length (including zero-length empty strings.) The string type is for all practical purposes infinite, as the length of a string is limited only by the ability of the operating system to allocate enough memory to hold it.

The Boolean type has two values: true and false

The Null type has one value: null

The Undefined type has one value: undefined. All uninitialized JScript variables are automatically set to undefined

The Object type has an infinite number of values. An object is essentially a collection of named properties where each property can be a value of any type. In JScript many things are objects: functions, dates, arrays and regular expressions are all objects.

Types are not themselves "first class" objects in JScript, though they are in JScript .NET. I'll discuss that, along with the differences between prototype and class inheritance, in later entries.

  • One thing that really bugs me about JavaScript, a.k.a. JScript Classic, is the myriad yet incompatible methods to determine object type. The main problem being the various ways to define what constitutes a unique JavaScript type. For example, one could argue that the definition you have presented is to weak in that the Object group is not sufficiently specific. The obvious way to determine object type is the typeof operator. This operator returns a different set of values than the one you enumerated: "number," "string," "boolean," "object," "function," and "undefined." That is, null is identified as an object, yet JScript functions are magically identified as being something more than just any old object. Also, a few may know that JScript's typeof can also return "unknown" in certain weird cases. Add to this confusion that typeof(3) is "number" but typeof(new Number(3)) is "object". I can understand why this is, but is can be confusing. For example, what would be the output from the following code: Number.prototype.showType = function() { alert(typeof(this)); } (3).showType(); Automatic boxing, I know. Another way to determine types is using the constructor property. This allows you to determine if an object is an Array for example: if ( x.constructor === Array ) ... or you can use the instanceof operator. You might also care about the subclass of a particular type, as determined by it's prototype. In this case you would use the isPrototypeOf method. Finally, since JavaScript uses duck-typing, what might actually be relevant is not an object's exact type, but which properties it implements. In this case you would use the in operator, or maybe even propertyIsEnumerable. The situation is even worse when you involve objects that are external to JScript. For example, DOM methods are not identified as functions and DOM collections may look a bit like arrays but aren't.
  • I've seen slightly different uses of the vocabulary that your article uses. Type systems can usefully be classified along two axes, namely static / dynamic and strong / weak. Static typing means variables have associated (compile-time) type information and can only be bound to values of some type. By your examples, JScript classic is dynamically typed, since a variable can be bound to values of different types during its lifetime. This would not be possible in a statically typed language, such as C. Strong typing essentially means that there's a typeof operator - i.e., a value (not the variable, but the thing it is bouund to) holds information about its type. C is weakly typed: a value is just a bit pattern in memory without any explicit type information, and there's nothing forbidding anyone from reinterpreting a `long' value as a pointer to a structure. This would not be possible in a strongly typed language.
  • I'll second Rudi on this. The definitions that he mentions are becoming more pervasive. If Strong/Weak and Static/Dynamic aren't seen as orthogonal then you have the situation where Smalltalk and early C are considered to have the same kind of type system, and that's nearly a criminal misrepresentation. It equates having well defined errors detected at runtime with silent memory corruption that crashes a system if you are lucky, or just makes things randomly and silently wrong over the life of a system, if you aren't.
  • "implies that JScript supports no types at all" Much of the confusion and ambiguity in the terms we use seems to stem from fear of negative connotations. It's unclear to me why we cannot work with the terms defined by Luca Cardelli (in Handbook of Computer Science and Engineering, Chapter 103. CRC Press, 1997; http://citeseer.nj.nec.com/cardelli97type.html ) . typed (untyped) "A program variable can assume a range of values during the execution of a program. An upper bound of such a range is called a type of the variable. Languages that do not restrict the range of variables are called untyped languages." safe (unsafe) "A program fragment is safe if it does not cause untrapped errors to occur. Languages where all program fragments are safe are called safe languages." So, Smalltalk is untyped and safe (no untrapped errors), and C is typed and unsafe. (Of course, he also defines explicit/implicit types, static checking/dynamic checking, strongly checked/weakly checked...)
  • You describe a subtype in terms of a subset of the possible values of a super type. But in practice (in most all exaples I have seen , in textbooks and so forth) the opposite is true... This convention is also enforced by some language rules (which I will get to). For example the class java.util.zip.JarFile extends java.util.zip.ZipFile, but it represents both a zip file AND a jar file, so clearly it's set of possible values is larger. But it is also a subtype of ZipFile.... how can this be? does it violate the rules of subtypeing? and if it did, why does the compiler not reject it? Because most all languges have subtypes inheret attributes of the supertpe, it is easy for developers to make these kinds of mistakes.
  • >It's unclear to me why we cannot work with the terms defined by Luca >Cardelli (in Handbook of Computer Science and Engineering, Chapter 103. >CRC Press, 1997; http://citeseer.nj.nec.com/cardelli97type.html >typed (untyped) >"A program variable can assume a range of values during the execution of a >program. An upper bound of such a range is called a type of the variable. >Languages that do not restrict the range of variables are called untyped >languages." It sounds like he is talking about variables, and that's fine, but in some OO languages you don't have data variables, you have references to objects. Calling languages like these untyped doesn't make much sense because the objects themselves obviously have types. In other words, the languages features are at a level of abstraction where that definition isn't very relevant. Another example. In some languages you don't have to be particularly concerned if you attempt to increment an integer beyond its range. The object adjusts its range dynamically. Now which is the variable? The object or the reference that holds it? If you say it is the reference, then all references have the same type, but if you say it is the object, well, what is its type really? And, do you ever assign an object to another? In many OO languages you can't you can only assign references.
  • >It's unclear to me why we cannot work with the terms defined by Luca >Cardelli (in Handbook of Computer Science and Engineering, Chapter 103. >CRC Press, 1997; http://citeseer.nj.nec.com/cardelli97type.html >typed (untyped) >"A program variable can assume a range of values during the execution of a >program. An upper bound of such a range is called a type of the variable. >Languages that do not restrict the range of variables are called untyped >languages." It sounds like he is talking about variables, and that's fine, but in some OO languages you don't have data variables, you have references to objects. Calling languages like these untyped doesn't make much sense because the objects themselves obviously have types. In other words, the languages features are at a level of abstraction where that definition isn't very relevant. Another example. In some languages you don't have to be particularly concerned if you attempt to increment an integer beyond its range. The object adjusts its range dynamically. Now which is the variable? The object or the reference that holds it? If you say it is the reference, then all references have the same type, but if you say it is the object, well, what is its type really? And, do you ever assign an object to another? In many OO languages you can't you can only assign references.
  • > You describe a subtype in terms of a subset of the possible values of a super type. But in practice (in most all exaples I have seen , in textbooks and so forth) the opposite is true > For example the class java.util.zip.JarFile extends java.util.zip.ZipFile, but it represents both a zip file AND a jar file, so clearly it's set of possible values is larger. If JarFile is a subtype of ZipFile then every instance of JarFile is also an instance of ZipFile. But that's the definition of "subset" -- if every member of X is a member of Y then X is a subset of Y. Therefore, I'm not following your train of thought here. The set of all objects that are ZipFiles contains all JarFiles plus all the non-JarFile instances of ZipFiles, so how could JarFiles be a larger set?
  • You seem to have confused the "set of possible values." With type systems here we're concerned about the number of possible values a variable can hold. A variable of type ZipFile can reference any ZipFile object, and any JarFile object. But a variable of type JarFile can reference only JarFile objects, not ZipFile objects that are not JarFile objects. Thus, a JarFile variable holds a subset of the values that can be held in a ZipFile variable.
  • Eric Lippert Writes: "I'm not following your train of thought here...if every member of X is a member of Y then X is a subset of Y. " Thank you for clearing me up. My confusion was in thinking that increasing the functionality in the subclass increases the values it can represent... like a TitledBorder with a Border superclass. I just assumed that the Border type does not include a border with a title, because it does not properly handle the title, ( has no operators for title) it shold not include values with titles.... only the TitledBorder type should include that valeu. But a TitledBorder also can handle a border with not title, as it is also a border. So I just assumed that TitledBorder will represnt a superset.. But as you note, the Border type should include borders that have titles. Even if it can not represent thoese titles properly. It just seemed silly that a border without a title is not a valid TitledBorder (or a zipfile without a manifes is a valid jar file).. but the reverse is ture, even though i have no way of representing/manipulating the title/manifest. But i see my fault. :)
  • > It sounds like he is talking about variables, and that's fine Yes, the definition is in terms of the range of values that can be assigned to a variable. > but in some OO languages you don't have data variables, > you have references to objects The distinction between value types and reference types is unimportant to this definition. In a typed language we may restrict a variable to hold references to a particular type - such as POINTER TO ZipFile. In an untyped language we don't do that. > Now which is the variable? The object or the reference that holds it? > If you say it is the reference, then all references have the same type I'm unsure what you mean. Let's be more concrete (Smalltalk): | myVariable | myVariable := OrderedCollection new. Now myVariable holds a reference to an instance of OrderedCollection. myVariable := Dictionary new. Now myVariable holds a reference to an instance of Dictionary. myVariable is "the variable". Any value can be assigned to myVariable. The language is untyped. In some other language ArrayList myVariable = new ArrayList(); Now myVariable holds a reference to an instance of ArrayList. myVariable = new HashMap(); ***Fails - the range of values that myVariable may hold is restricted to references to ArrayList instances. The language is typed. This kind of definition has been called misleading because languages without typed variables do have values of different types. OTOH being *untyped* is the easy explanation for why we have ad-hoc polymorphism and loose-coupling in these languages. They got rid of type restrictions - proud to be untyped and safe. In that universe, "JScript Classic" would be untyped, and "JScript .NET" would be typed (presumably "untyped" variables are "restricted" to some universal type).
  • >This kind of definition has been called misleading because languages >without typed variables do have values of different types. >OTOH being *untyped* is the easy explanation for why we have ad-hoc >polymorphism and loose-coupling in these languages. They got rid of type >restrictions - proud to be untyped and safe. Except they didn't. myVariable := Dictionary new. myVariable floggle. Sending floogle to myVariable is a type error, it is just one detected at run-time.
  • Can we find a better forum for this discussion? >> They got rid of type restrictions > Except they didn't. Wasn't it clear from the context that "type restrictions" refered to type restrictions on variables? Ain't none of those in Smalltalk ;-) I'm a bit surprised that there's anything controversial about this - back in 1990, Justin Graver & Ralph Johnson were quite clear that "Smalltalk is untyped", see "A Type System for Smalltalk". http://citeseer.nj.nec.com/graver90type.html > Sending floogle to myVariable is a type error It might be if Dictionary was a type, but it isn't - it's a class. Implementation is inherited, specification is ignored (notably in Dictionary being a subclass of Set - see the paper). So "Sending floogle to myVariable" is just messageNotUnderstood ;-)
  • PingBack from http://www.hilpers.fr/526142-hs-total-objet-ou-pas/2

Page 1 of 1 (14 items)