The JScript Type System, Part Five: More On Arrays In JScript .NET

The JScript Type System, Part Five: More On Arrays In JScript .NET

  • Comments 5

As I was saying the other day, CLR arrays and JScript arrays are totally different beasts. It is hard to imagine two things being so different and yet both called the same thing. Why did the CLR designers and the JScript designers start with the same desire -- create an array system -- and come up with completely different implementations?

 

Well, the CLR implementers knew that dense, nonassociative hard-typed arrays are easy to make fast and efficient. Furthermore, such arrays encourage the programmer to keep homogenous data in strictly bounded tables. That makes large programs that do lots of data manipulation easier to understand. Thus, languages such as C++, C# and Visual Basic have arrays like this, and thus they are the basic built-in array type in the CLR.

 

Sparse, associative, soft-typed arrays are not particularly fast but they are far more dynamic and flexible than Visual Basic-style arrays. They make it easy to store heterogeneous data in any table without worrying about picky details like exactly how big that table is. In other words, they are scripty. Languages such as JScript and Perl have arrays like this.

 

JScript .NET has both very dynamic, scripty arrays and more strict CLR arrays, making it suitable for both rapid development of scripts and programming in the large. But like I said, making these two very different kinds of arrays work well together is not trivial.

 

JScript .NET supports the creation of multidimensional hard-typed arrays. As with single-dimensional arrays, the array size is not part of the type. To annotate a variable as containing a hard-typed multidimensional array the syntax is to follow the type with brackets containing commas. For example, to annotate a variable as containing a two dimensional array of Strings you would say:

 

var multiarr : String[,];

 

The number of commas between the brackets plus one is equal to the rank of the array. (By this definition if there are no commas between the brackets then it is a rank-one array, as we have already seen.)

 

A multidimensional array is allocated with the new keyword as you might expect:

 

multiarr = new String[4,5];

multiarr[0,0] = "hello";

 

Notice that hard-typed array elements are always accessed with a comma-separated list of integer indices. There must always be exactly one index for each dimension in the array. You can't use the ragged array syntax [0][0]. 

 

There are certain situations in which you know that a variable or function argument will refer to a hard-typed CLR array but you do not actually know the element type or the rank, just that it is an array. Should you find yourself in one of these (rather rare) situations there is a special annotation for a CLR array of unknown type and rank:

 

var sysarr : System.Array;

sysarr = new String[4,5];

sysarr = new double[10];

 

As you can see, a variable of type System.Array may hold any CLR array of any type and rank. However, there is a drawback. Variables of type System.Array may not be indexed directly because the rank is not known. This is illegal:

 

var sysarr : System.Array;

sysarr = new String[4,5];

sysarr[1,2] = "hello";  // ILLEGAL, System.Arrays are not indexable

 

Rather, to index a System.Array you must call the GetValue and SetValue methods with an array of indices:

 

var sysarr : System.Array;

sysarr = new String[4,5];

sysarr.SetValue("hello", [1,2]);

 

The rank and size of a System.Array can be determined with the Rank, GetLowerBound and GetUpperBound members.

 

Thinking about this a bit now, I suppose that we could have detected at compile time that a System.Array was being indexed, and constructed the call to the getter/setter appropriately for you, behind the scenes.  But apparently we didn't.  Oh well.

 

Next time: mixing and matching JScript and CLR arrays.

  • So far the only potential downside I see in this design is that you cannot write a generic function that works for both JScript arrays and CLR arrays if the arrays have a rank higher than 1. I assume the following function will work on both types of arrays: function sum(arr, length) { var result = 0; for ( int i = 0 ; i < length ; ++i ) result += arr[i]; return result; } Would have been even nicer if CLR arrays in JScript were somehow made to support the length property if they were one dimensional. Anyway, you could not write such a function if arr was, say, two dimensional. BTW, I can't resist, using the BeyondJS JavaScript library you could write the above code as: var sum = arr.fold("+");
  • > I assume the following function will work on both types of arrays: Indeed. > you cannot write a generic function that works for both JScript arrays and CLR arrays if the arrays have a rank higher than 1 Yep, but there are no JScript arrays with rank higher than one, so basically this is saying that you can't write a generic function that handles arrays of different ranks -- but wait a minute, that is what System.Array is for! ie, those rare cases where you don't know the rank at compile time. > Would have been even nicer if CLR arrays in JScript were somehow made to support the length property if they were one dimensional. Dude, wait for it. I said I'd discuss interoperability in my NEXT blog! :-) > var sum = arr.fold("+"); I assume that your fold operator calls eval if the thing passed in is not a function object?
  • > Yep, but there are no JScript arrays with rank higher than one Technically you are correct, but practically you simply create an array of arrays. And the resulting syntax looks just like C++ or Java. That was my point actually, that for a 2D JScript array you write a[1][2] while for a CLR array you write a[1,2]. > that is what System.Array is for You misunderstood me. I wasn't looking to write a function that would work for any rank. I was looking for a function that would work for, say, a 2D JScript array and a 2D CLR array. While I fully understand the reasons you chose the indexing syntax used for multi-dimensional CLR arrays, I simply pointed out that as result they are not polymorphic with multi-dimensional JScript arrays. > Dude, wait for it. You caught me, I'm the impatient type ;-) >I assume that your fold operator calls eval if the thing passed in is not a function object? BeyondJS implements a mechanism of converting strings to functions:"+".toFunction() will generate a binary function "!".toFunction() will generate a unary function. You can also do "-".toFunctionUnary() or "-".toFunctionBinary() to control which version is generated. Here is the implenetation: String.prototype.toFunctionUnary = function() { eval("function __unary__(op) { return " + this + " op; }"); __unary__.op = this.valueOf(); return __unary__; }; String.prototype.toFunctionBinary = function() { eval("function __binary__(op1, op2) { return op1 " + this + " op2; }"); __binary__.op = this.valueOf(); return __binary__; }; String.prototype.toFunction = function() { return ",!,~,++,--,new,delete,typeof,void,".indexOf("," + this + ",") > -1 ? this.toFunctionUnary() : this.toFunctionBinary(); };
  • Yeah, there's no interoperation between ragged arrays and two-d arrays. But there is no interoperation between ragged CLR arrays and two-d CLR arrays either! Ragged arrays and rectangular arrays are pretty much separate concepts. In fact, the whole notion of rank of a ragged array is ill-defined -- you can have a ragged array that is 3-d in some axes, 2-d in others, 1-d in still others, etc. There is no sensible notion of "rank", so making them interoperate is more trouble than its worth. Your implementation is pretty slick. (A less functional but perhaps more performant approach would be to generate all the unary and binary operator functions once and put them in a lookup table, rather than searching that string every single time and reconstructing the function object every single time.)
  • You are quit correct about both points. With regard to ragged arrays: I always found it amusing that C++ employs the same exact syntax for accessing ragged and contiguous array. So a[1][2] would generate wildly different code base on the definition of a. OTOH it did buy you that polymorphic behavior I mentioned before. With regard to BeyondJS, our motivation was always functionality, with performance a consideration but not more. Anyway, fold generates the function once, and then applies it iteratively to all the members. So the performance hit of generating a new function every time is relatively minor when compared to the cost of the loop.
Page 1 of 1 (5 items)