Fabulous Adventures In Coding

Eric Lippert's Blog

The JScript Type System, Part One

I thought I might spend a few days talking about the JScript and JScript .NET type systems, starting with some introductory material. 

 

There is a lot of terminology associated with type systems. What exactly is weak typing? What is a subtype? Just what is a type anyway? These terms are often bandied about and seldom actually defined precisely.

 

Consider a JScript variable:

 

var myVar;

 

Now think about the infinite number of possible values you could store in myVar. A variable may contain any number, any string or any object. It can also be true or false or null or undefined. This is a rather large set of possible values. In fact, the set of all legal values is infinite.  (Countably infinite, and in practice limited by available memory, but in theory there is no upper limit.)

 

A type is characterized by two things, a set and a rule. First, a type consists of a subset (possibly infinitely large) of the set of all possible values. Second, a type defines a rule for transforming values outside the set into values in the set. (This rule may specify that certain values are not convertible and hence produce "type mismatch" errors.)

 

For example, string is a type. The set of all possible strings is an (infinite) subset of the set of all possible values, and there are rules for determining how all non-string values are converted into strings.

 

JScript Classic is a weakly typed language. This means that any value of any type may be assigned to any variable without restriction. It is often said -- inaccurately -- that "JScript has only one type". This is true only in the sense that JScript has no restrictions on what data may be assigned to any variable, and in that sense every variable is "the same type" – namely, the "any possible value" type. However, the statement is misleading because it implies that JScript supports no types at all, when in fact it supports six built-in types.  The "weak" in "weakly typed" refers to the weak restrictions on variables, not on the type system per se.

 

JScript .NET, by contrast, is an optionally strongly typed language. A JScript .NET variable may be given a type annotation which restricts the values which may be stored in the variable. This annotation is optional; an unannotated variable acts like a JScript variable and may be assigned any value.

 

The concept of subtyping is not particularly important in JScript Classic though it will become quite useful when we discuss JScript .NET classes later. Essentially a type T1 is a subtype of another type T2 if T1's set of values is a subset of T2's set of values. A type consisting of the set of all integers might be a subtype of a type consisting of all the numbers, for instance.  (This is not how the integers are traditionally construed; the C type system makes integers and floats disjoint types, where the integer 1 and the float 1.0 are different values that happen to compare as equal -- but comparisons across types is a subject for a later blog entry.)

 

Anyway, JScript Classic has six built-in types, all of which are disjoint. They are:

 

The Number type contains all floating-point numbers as well as positive Infinity, negative Infinity and a special Not-a-Number (NaN ) value. It may seem odd that "Not-a-Number" is a Number but this does in fact make sense. NaN is the value returned when an operation logically must return a number but no actual number makes sense. For example, when trying to convert the string "banana" to a Number, NaN is the result. Because numbers in JScript are actually represented by a 64 bit floating point number there are a finite number of possible Number values. The number of numbers is very large (in fact there are 18437736874454810627 possible numbers, which is just shy of 2^64.) Numbers have approximately fifteen decimal digits of precision and can range from as tiny as 2.2 x 10^-308 to as large as 1.7 x 10^308.

 

The String type contains all Unicode strings of any length (including zero-length empty strings.) The string type is for all practical purposes infinite, as the length of a string is limited only by the ability of the operating system to allocate enough memory to hold it.

 

The Boolean type has two values: true and false.

 

The Null type has one value: null.

 

The Undefined type has one value: undefined. All uninitialized JScript variables are automatically set to undefined.

 

The Object type has an infinite number of values. An object is essentially a collection of named properties where each property can be a value of any type. In JScript many things are objects: functions, dates, arrays and regular expressions are all objects.

 

Types are not themselves "first class" objects in JScript, though they are in JScript .NET.  I'll discuss that, along with the differences between prototype and class inheritance, in later entries.

Published Wednesday, November 05, 2003 1:38 PM by Eric Lippert

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Dan Shappir said:

One thing that really bugs me about JavaScript, a.k.a. JScript Classic, is the myriad yet incompatible methods to determine object type. The main problem being the various ways to define what constitutes a unique JavaScript type. For example, one could argue that the definition you have presented is to weak in that the Object group is not sufficiently specific. The obvious way to determine object type is the typeof operator. This operator returns a different set of values than the one you enumerated: "number," "string," "boolean," "object," "function," and "undefined." That is, null is identified as an object, yet JScript functions are magically identified as being something more than just any old object. Also, a few may know that JScript's typeof can also return "unknown" in certain weird cases. Add to this confusion that typeof(3) is "number" but typeof(new Number(3)) is "object". I can understand why this is, but is can be confusing. For example, what would be the output from the following code: Number.prototype.showType = function() { alert(typeof(this)); } (3).showType(); Automatic boxing, I know. Another way to determine types is using the constructor property. This allows you to determine if an object is an Array for example: if ( x.constructor === Array ) ... or you can use the instanceof operator. You might also care about the subclass of a particular type, as determined by it's prototype. In this case you would use the isPrototypeOf method. Finally, since JavaScript uses duck-typing, what might actually be relevant is not an object's exact type, but which properties it implements. In this case you would use the in operator, or maybe even propertyIsEnumerable. The situation is even worse when you involve objects that are external to JScript. For example, DOM methods are not identified as functions and DOM collections may look a bit like arrays but aren't.
November 6, 2003 4:40 AM
 

Rudi Schlatte said:

I've seen slightly different uses of the vocabulary that your article uses. Type systems can usefully be classified along two axes, namely static / dynamic and strong / weak. Static typing means variables have associated (compile-time) type information and can only be bound to values of some type. By your examples, JScript classic is dynamically typed, since a variable can be bound to values of different types during its lifetime. This would not be possible in a statically typed language, such as C. Strong typing essentially means that there's a typeof operator - i.e., a value (not the variable, but the thing it is bouund to) holds information about its type. C is weakly typed: a value is just a bit pattern in memory without any explicit type information, and there's nothing forbidding anyone from reinterpreting a `long' value as a pointer to a structure. This would not be possible in a strongly typed language.
November 6, 2003 9:19 AM
 

Michael Feathers said:

I'll second Rudi on this. The definitions that he mentions are becoming more pervasive. If Strong/Weak and Static/Dynamic aren't seen as orthogonal then you have the situation where Smalltalk and early C are considered to have the same kind of type system, and that's nearly a criminal misrepresentation. It equates having well defined errors detected at runtime with silent memory corruption that crashes a system if you are lucky, or just makes things randomly and silently wrong over the life of a system, if you aren't.
November 6, 2003 9:36 AM
 

Isaac said:

"implies that JScript supports no types at all" Much of the confusion and ambiguity in the terms we use seems to stem from fear of negative connotations. It's unclear to me why we cannot work with the terms defined by Luca Cardelli (in Handbook of Computer Science and Engineering, Chapter 103. CRC Press, 1997; http://citeseer.nj.nec.com/cardelli97type.html ) . typed (untyped) "A program variable can assume a range of values during the execution of a program. An upper bound of such a range is called a type of the variable. Languages that do not restrict the range of variables are called untyped languages." safe (unsafe) "A program fragment is safe if it does not cause untrapped errors to occur. Languages where all program fragments are safe are called safe languages." So, Smalltalk is untyped and safe (no untrapped errors), and C is typed and unsafe. (Of course, he also defines explicit/implicit types, static checking/dynamic checking, strongly checked/weakly checked...)
November 6, 2003 12:42 PM
 

Just some guy said:

You describe a subtype in terms of a subset of the possible values of a super type. But in practice (in most all exaples I have seen , in textbooks and so forth) the opposite is true... This convention is also enforced by some language rules (which I will get to). For example the class java.util.zip.JarFile extends java.util.zip.ZipFile, but it represents both a zip file AND a jar file, so clearly it's set of possible values is larger. But it is also a subtype of ZipFile.... how can this be? does it violate the rules of subtypeing? and if it did, why does the compiler not reject it? Because most all languges have subtypes inheret attributes of the supertpe, it is easy for developers to make these kinds of mistakes.
November 6, 2003 8:20 PM
 

Michael Feathers said:

>It's unclear to me why we cannot work with the terms defined by Luca >Cardelli (in Handbook of Computer Science and Engineering, Chapter 103. >CRC Press, 1997; http://citeseer.nj.nec.com/cardelli97type.html >typed (untyped) >"A program variable can assume a range of values during the execution of a >program. An upper bound of such a range is called a type of the variable. >Languages that do not restrict the range of variables are called untyped >languages." It sounds like he is talking about variables, and that's fine, but in some OO languages you don't have data variables, you have references to objects. Calling languages like these untyped doesn't make much sense because the objects themselves obviously have types. In other words, the languages features are at a level of abstraction where that definition isn't very relevant. Another example. In some languages you don't have to be particularly concerned if you attempt to increment an integer beyond its range. The object adjusts its range dynamically. Now which is the variable? The object or the reference that holds it? If you say it is the reference, then all references have the same type, but if you say it is the object, well, what is its type really? And, do you ever assign an object to another? In many OO languages you can't you can only assign references.
November 7, 2003 12:11 AM
 

Michael Feathers said:

>It's unclear to me why we cannot work with the terms defined by Luca >Cardelli (in Handbook of Computer Science and Engineering, Chapter 103. >CRC Press, 1997; http://citeseer.nj.nec.com/cardelli97type.html >typed (untyped) >"A program variable can assume a range of values during the execution of a >program. An upper bound of such a range is called a type of the variable. >Languages that do not restrict the range of variables are called untyped >languages." It sounds like he is talking about variables, and that's fine, but in some OO languages you don't have data variables, you have references to objects. Calling languages like these untyped doesn't make much sense because the objects themselves obviously have types. In other words, the languages features are at a level of abstraction where that definition isn't very relevant. Another example. In some languages you don't have to be particularly concerned if you attempt to increment an integer beyond its range. The object adjusts its range dynamically. Now which is the variable? The object or the reference that holds it? If you say it is the reference, then all references have the same type, but if you say it is the object, well, what is its type really? And, do you ever assign an object to another? In many OO languages you can't you can only assign references.
November 7, 2003 12:12 AM
 

Eric Lippert said:

> You describe a subtype in terms of a subset of the possible values of a super type. But in practice (in most all exaples I have seen , in textbooks and so forth) the opposite is true > For example the class java.util.zip.JarFile extends java.util.zip.ZipFile, but it represents both a zip file AND a jar file, so clearly it's set of possible values is larger. If JarFile is a subtype of ZipFile then every instance of JarFile is also an instance of ZipFile. But that's the definition of "subset" -- if every member of X is a member of Y then X is a subset of Y. Therefore, I'm not following your train of thought here. The set of all objects that are ZipFiles contains all JarFiles plus all the non-JarFile instances of ZipFiles, so how could JarFiles be a larger set?
November 7, 2003 1:20 AM
 

Curt Sampson said:

You seem to have confused the "set of possible values." With type systems here we're concerned about the number of possible values a variable can hold. A variable of type ZipFile can reference any ZipFile object, and any JarFile object. But a variable of type JarFile can reference only JarFile objects, not ZipFile objects that are not JarFile objects. Thus, a JarFile variable holds a subset of the values that can be held in a ZipFile variable.
November 7, 2003 3:34 AM
 

Just some guy said:

Eric Lippert Writes: "I'm not following your train of thought here...if every member of X is a member of Y then X is a subset of Y. " Thank you for clearing me up. My confusion was in thinking that increasing the functionality in the subclass increases the values it can represent... like a TitledBorder with a Border superclass. I just assumed that the Border type does not include a border with a title, because it does not properly handle the title, ( has no operators for title) it shold not include values with titles.... only the TitledBorder type should include that valeu. But a TitledBorder also can handle a border with not title, as it is also a border. So I just assumed that TitledBorder will represnt a superset.. But as you note, the Border type should include borders that have titles. Even if it can not represent thoese titles properly. It just seemed silly that a border without a title is not a valid TitledBorder (or a zipfile without a manifes is a valid jar file).. but the reverse is ture, even though i have no way of representing/manipulating the title/manifest. But i see my fault. :)
November 7, 2003 3:11 PM
 

Isaac said:

> It sounds like he is talking about variables, and that's fine Yes, the definition is in terms of the range of values that can be assigned to a variable. > but in some OO languages you don't have data variables, > you have references to objects The distinction between value types and reference types is unimportant to this definition. In a typed language we may restrict a variable to hold references to a particular type - such as POINTER TO ZipFile. In an untyped language we don't do that. > Now which is the variable? The object or the reference that holds it? > If you say it is the reference, then all references have the same type I'm unsure what you mean. Let's be more concrete (Smalltalk): | myVariable | myVariable := OrderedCollection new. Now myVariable holds a reference to an instance of OrderedCollection. myVariable := Dictionary new. Now myVariable holds a reference to an instance of Dictionary. myVariable is "the variable". Any value can be assigned to myVariable. The language is untyped. In some other language ArrayList myVariable = new ArrayList(); Now myVariable holds a reference to an instance of ArrayList. myVariable = new HashMap(); ***Fails - the range of values that myVariable may hold is restricted to references to ArrayList instances. The language is typed. This kind of definition has been called misleading because languages without typed variables do have values of different types. OTOH being *untyped* is the easy explanation for why we have ad-hoc polymorphism and loose-coupling in these languages. They got rid of type restrictions - proud to be untyped and safe. In that universe, "JScript Classic" would be untyped, and "JScript .NET" would be typed (presumably "untyped" variables are "restricted" to some universal type).
November 7, 2003 5:02 PM
 

Michael Feathers said:

>This kind of definition has been called misleading because languages >without typed variables do have values of different types. >OTOH being *untyped* is the easy explanation for why we have ad-hoc >polymorphism and loose-coupling in these languages. They got rid of type >restrictions - proud to be untyped and safe. Except they didn't. myVariable := Dictionary new. myVariable floggle. Sending floogle to myVariable is a type error, it is just one detected at run-time.
November 7, 2003 5:19 PM
 

Isaac said:

Can we find a better forum for this discussion? >> They got rid of type restrictions > Except they didn't. Wasn't it clear from the context that "type restrictions" refered to type restrictions on variables? Ain't none of those in Smalltalk ;-) I'm a bit surprised that there's anything controversial about this - back in 1990, Justin Graver & Ralph Johnson were quite clear that "Smalltalk is untyped", see "A Type System for Smalltalk". http://citeseer.nj.nec.com/graver90type.html > Sending floogle to myVariable is a type error It might be if Dictionary was a type, but it isn't - it's a class. Implementation is inherited, specification is ignored (notably in Dictionary being a subclass of Set - see the paper). So "Sending floogle to myVariable" is just messageNotUnderstood ;-)
November 7, 2003 9:21 PM
 

Re:[HS Total] [Objet ou pas?] - Page 2 | hilpers said:

January 22, 2009 11:15 AM

Leave a Comment

(required) 
(optional)
(required) 

  
Enter Code Here: Required
Submit

About Eric Lippert

Eric Lippert is a senior developer on the Microsoft C# compiler team. Before that he worked on the framework of Visual Studio Tools For Office. Before that, he worked on the compilers, runtimes and tools for VBScript, JScript, Windows Script Host and other Microsoft Scripting technologies. He lives in Seattle and spends his free time editing books about programming languages, playing the piano, and trying to keep his tiny sailboat upright in Puget Sound.

This Blog

Syndication


© 2009 Microsoft Corporation. All rights reserved. Terms of Use  |  Trademarks  |  Privacy Statement
Microsoft
Page view tracker