Visual Insights

A blog on data visualization, info-graphics and business intelligence.

Introducing the Microsoft Visualization Language

Introducing the Microsoft Visualization Language

Rate This
  • Comments 19
Vedea-200 Microsoft Visualization  Language
The Vedea Project
Create.  See.  Understand.

When I introduced the Microsoft Computational Science Studio a couple weeks ago, I also hinted at the ‘Vedea’ project – a new visualization language that we have been working on in the Computational Science Laboratory here in Microsoft Research in Cambridge, UK.  I’ll now use this post to introduce you to the Microsoft Visualization Language, Vedea in more detail.

Vedea is a prototype of a new experimental language for creating interactive infographics, data visualizations and computational art.  It is designed to be accessible to people who are either new to programming or whose primary domain of expertise is something other than programming.  We wanted to give those users a tool that they can use to realize their own vision and visualizations without having to engage skilled programmers, but have it be an environment that skilled programmers would not find limiting.

The motivation for building Vedea comes from the Microsoft Computational Science Studio.  It’s no use facilitating modelling and computation if you don’t give the user a way to visualize the results.  Simple charts are ok in general for simple data sets, but not for facilitating deep interactive exploration of data with many dimensions or for facilitating the type of exploration that leads to speculative visual exploration or visually-inspired ‘aha’ moments.

The inspiration for Vedea is the same set of goals underlying the language “Processing” (http://processing.org/).  Both Processing and Vedea have similar goals and similar audiences.  We’re all (the scientists and technologists in MSRC’s Computational Science Laboratory) frequent users and admirers of Processing, but we felt there was an opportunity here for us to take those concepts further by capitalizing on some new Microsoft technologies.

The Language

The Microsoft Visualization Language is built on .net 4.0’s new Dynamic Language Runtime.  This gives us some important advantages over more traditional language implementations.  Syntactically, the Vedea language looks a lot like C#.  In its simplest form though, there are no class decorations – just a collection of functions.  You can introduce classes if you want to do object-oriented programming, but they are not required and your topmost functions aren’t wrapped in any of the syntactic trappings of a class.

The second thing to notice about the language is that it is dynamically typed.  Variables take on the type of whatever is assigned to them and, if you like, you don’t have to declare variables at all – they are defined the first time they appear on the left of an assignment.

The DLR also allows us to play some neat late-binding tricks with data sources, COM objects and the like by defining properties and methods based on what we find out about those objects at runtime.  More about that below.

Another unique aspect of the language is its implementation of bindings.  You might be familiar with bindings from WPF and Silverlight, and the concept here is similar, though expanded.  The simplest example of a binding is as follows:

textbox.Text := slider.Value;

This is a ‘binding’ that you call once in your program, but which ‘forever’ binds the value of the slider to the text in the textbox.  If the slider is moved, the text will change to match the current slider value.  Since bindings can be cancelled, ‘forever’ really means ‘until you do something to cancel the binding.

You can also do the following:

textbox.Text :=: slider.Value;

In this case, if you move the slider, the textbox’s text will change, but if you type a value in the textbox, the slider will change to match that value.  Text and slider values are of different types, so some value conversion is happening under the covers here.

Things get much more interesting when you combine bindings with collections defined with LINQ syntax:

myData = DataSet(“mydata.csv”);
currentYear := slider.Value + 1900;
bubbles := from row in myData 
  where row.Year :== currentYear 
  select new Circle() 
    { 
      X = row.Latitude, 
      Y = row.Longitude, 
      Radius = row.Population * scalingFactor, 
      Fill = BlackBodyPalette(1., 1., row.DeltaCarbon) 
    };
Scene[“USMap”].Add(bubbles);

The first line connects the variable ‘myData’ to the dataset in mydata.csv. The CSV file contains columns for Year, Latitude, Longitude, Population and DeltaCarbon (possibly in addition to other columns) which is why we can access those columns as if they were properties of ‘row’.

The second line binds the variable currentYear to the slider’s value plus 1900. As the slider is moved by the user, currentYear will get updated.

The third line creates a collection of circles (‘bubbles’), one for each row in the csv file whose Year value matches currentYear. The Boolean comparison :== is the same as == except that the Boolean value is bound to both variables in the conditional expression. A change to either variable in the comparison will invalidate the comparison, which will cause the collection to be re-evaluated. As a result, as the user moves the slider, the visual collection of bubbles will get updated to reflect the current year’s data. (nb: an identifier column can be specified in order to obtain least-flow visualizations or transitional animations). Using ‘==’ instead of ‘:==’ would have resulted in a one-time comparison and the collection would not be updated as the slider is moved (which might be desirable in some cases). In the ‘select’ portion of the LINQ expression, we create the circles and set their properties based on values found in that row of the data. So as you move the slider, not only will bubbles move and change size and color, but the number of bubbles that are visible will change based on the outcome of the ‘where’ clause.

Finally, we add the entire collection of bubbles to some member of the scenegraph called ‘USMap’.  Presumably that is one of our map objects and adding the collection as a child of that map will use the map’s lat/lon coordinate system and position the bubbles according to the lat/lon values found in X and Y.

The end result is that we have produced an interactive timeline visualization using less than a dozen lines of code (or four statements). It is true that the code is ‘denser’ – that is, more functionality in less typed space – and consequently a user might spend more time working on fewer lines of code, but we are betting that the number of failure modalities is smaller (fewer ways to mess up); and there’s less code ‘real estate’ to observe and comprehend.  Of course, we still support immediate-mode graphics for anyone who wants to ignore this declarative syntax and create old-school visuals and animations in the render loop.

Data

With a few exceptions, when it comes to visualizations, data is king and time is precious so you want to spend a minimum amount of time thinking about the ‘plumbing’ required to get your data into your visualization (things like file formats, storage location, memory allocation, virtualization, pagination, etc.) and spend the majority of your time working with the data itself.

In the Microsoft Visualization Language, we leverage another project of the Computational Science Laboratory called ‘Scientific Data Set’ to make working with data as simple and efficient as possible.  Scientific Data Set is a hugely important piece of technology for both Vedea and Computational Science Studio, so it will get a lot of ‘airtime’ here.

Scientific Data Set uses a collection of ‘providers’ to support access to a wide variety of back-end storage, including comma-separated variable (csv) and variations on csv; netCDF and HDF; SQL; Excel; and a host of others.  The data from all of those back-end storage formats is presented through a consistent API that supports named variables, meta-data on variables (units of measure, dimensionality and other annotations) and the use of variables to form coordinate systems onto other variables.  For instance, you could have separate vectors of latitude, longitude and altitude.  You could use latitude, longitude and altitude to store a three-dimensional structure of variables containing temperature, pressure and particulate count and then re-use the latitude and longitude vectors to index into a two-dimensional structure containing variables with calibration data for the recording station at that lat/lon.

In Scientific Data Set (ScDS), you can then access your variables not only by their integer indices, but also by the lat/lon variables.  In the future, ScDS will also support interpolation so that if you turn on interpolation and ask for data at a lat/lon/alt that does not exist in the data, you will get an interpolated value which you could use for re-sampling the data.

In Vedea, ScDS data is a top-level syntactic construct.  When you open a .CSV file, the variable you use for that CSV file will have member variables for each column in the CSV. For instance, given this CSV file (mydata.csv):

image

If you use the following Vedea code:

myData = DataSet(“mydata.csv”);

Then myData will have the following indexed member variables available:

          myData.Sample
     myData.Population
     myData.Reading

and by ‘indexed’ member variables, we mean that you can get at the values like this:

print(myData.Reading[1]);

which would output 0.0201.  The purpose behind this is to make less typing yield more results. That means that beginning programmers have less syntactic overhead to deal with and experienced programmers can get where they want to go more quickly.

Graphics

The graphical capabilities, especially when combined with simple data handling, are the most exciting aspect of Vedea.  The current generation of infographics don’t look anything like the traditional pie and bar charts.  Take a look at information aesthetics or Bestiario and you’ll get an immediate sense of the types of graphics that we are targeting with Vedea.  They combine color, hierarchy, shape and line into clear, concise and informative (or entertaining) visuals.  Vedea seeks to bring these visual capabilities to people who are more in tune with their data; their vision for representing that data; and their audience than they are with programming.

The graphics features of Vedea are designed to build upon the native capabilities of XNA and GDI to make common tasks much simpler and to require less code to achieve those tasks.  Those features include the following (which will all be discussed in more depth in upcoming posts):

  • Hierarchical scenegraph
  • A full set of 2d primitives (polygons, lines, ellipses, circles, curves, arcs, images, text, etc)
  • A full set of 3d primitives (cubes, spheres, capsules, cylinders, pipes, images, text, meshes, textures, materials, height-maps, cameras and lighting)
  • Solid and alpha-blended (semi-transparent) colors
  • Map object (using data from Bing Maps/Virtual Earth) which can be combined with 2d and 3d primitives
  • A rich library of rendering utilities (e.g., Perlin noise, warping functions, color management)
  • Animation features (linear and exponential interpolation for smooth variation of any Vedea variable or visual property)
  • Direct binding of data to visuals (e.g., create one circle for every row in a database; bind properties of the circles to columns in the database)
  • Planned support for Network viewers, volumetric renderers, physics effects and other high-level graphics constructs
Retained-mode Graphics

In Processing, a programmer works without a scene-graph. We call this ‘immediate mode’. In order to change a visual object you have to erase some or all of the scene and then redraw it. In general, this is a very easy system for beginning programmers to grasp. But for complex scenes with large numbers of objects, some portion of which change from frame to frame, the overhead quickly adds up either in terms of the cost of redrawing or in terms of the amount of code you need to dedicate to optimizing your drawing. Very quickly, either performance suffers or the programmer is required to do more work than they might be prepared for. Experienced programmers can also find the situation frustrating due to the amount of code overhead they face. Similarly, the Z-order (back-to-front) order of objects is determined by the order in which you draw objects. Implementing move-back or bring-forward functionality means that you have to implement an internal list of graphical objects which amounts to implementing a scene-graph in your program’s code.

Vedea supports the immediate drawing mode, but also allows the user to add objects to a ‘scene graph’. Use of a scene graph is called ‘retained mode’ graphics. In Vedea, you can type:

Ellipse(100,100,20,20);

to create a 20x20 circle at 100, 100. But you can also do this :

circle = new Ellipse(100,100,20,20);
Scene.Add(circle);

This creates a circle and displays it just like in the immediate-mode example. However in the immediate mode example, to change the size of the circle, you must erase your drawing and re-draw the circle. In the retained-mode example, you can do the following:

circle.Width = 25;
circle.Height = 25;

Now your circle has changed sizes. Likewise, you can change the z-order like this:

circle.ZOrder = 500;

This means that your circle will appear behind any objects with a ZOrder less than 500 and in front of any with a Z-order greater than 500. In the immediate mode example, if you change your circle, you would have to redraw everything that appeared in front of that circle (which means you had to keep track of that or redraw the whole scene). In Vedea, the Vedea runtime will keep track of that bookkeeping for you.

Hierarchical Scene-Graph

Above, we saw that we could add a circle to the scene graph. In Vedea, it is also possible to add objects to each other. For instance:

r1 = new Rectangle(100,100,10,10);
r1.Fill = “Red”;
Scene.Add(r1);
r2 = new Rectangle(30,30,5,5);
r2.Fill = “Green”;
r1.Add(r2);

We created a rectangle r1, set its fill color to red and then added it to the scene. Then we created a green rectangle called ‘r2’ and added it to r1. r2 will actually be drawn at 130,130 because it is a child of r1 and we said that r2 should be at 30,30 relative to its parent, which in this case is r1. (Note that r1’s parent is the scene itself). It is also useful here to note that although I used “Red” and “Green” as colors, there are six different valid ways to specify colors.

Now if I rotate r1 with r1.Rotate(45) then not only will rectangle r1 rotate, but rectangle r2 will orbit 45 degrees around the center of r1. When you rotate a parent, the frame of reference for all the children rotates too.

This becomes especially interesting with maps (or any other container object that implements its own coordinate system). If you add a map to a scene, then you can add shapes, images, etc as children of the map image and the coordinates, width and height of these child objects will be interpreted in the coordinate system of the map instead of as pixel locations. If the map is zoomed, rotated or panned, the child objects will behave correctly without requiring any further code to adjust their position.

Runtime and Authoring Environment

Vedea programs can be edited in Notepad or an html text input and are compiled by the Vedea runtime.  We are working on a standalone compiler that will allow you to create intermediate code that can be used at runtime instead of source code.  The Vedea runtime (the program that compiles and runs Vedea code and hosts the graphical display window) can run either on the desktop or as a client-side web control.  One of the key areas we are concerned with in the runtime is to increase the performance and cross-platform compatibility of the web-based runtime, though Vedea is dependent on .net 4.0 and the DLR so even with OpenGL compatibility, we may have to wait for ongoing Mono development before we can reach beyond the Windows family.

Getting Vedea

The Microsoft Visualization Language and its runtime will first be available on via http://research.microsoft.com/Vedea.  You won’t find it there now (you just looked, didn’t you?) but we will be posting it very early in the coming year.  We’re eager to get it out for people to play with, but we have a bit of work to finish and a fair bit of packaging to do before we can post it on the web site.  Forums will be available for questions and discussion and I’ll post samples and discuss features here.  Vedea is an ongoing experiment here at Microsoft Research Computational Science Laboratory and like all experiments (and all pre-release software) it’s final shape and outcome is uncertain, but there has been a lot of interest in it and we’re eager to get some community experience and feedback.  I hope you found this interesting and hope you will give the Microsoft Visualization Language a spin when we post the first downloadable version.

Leave a Comment
  • Please add 2 and 4 and type the answer here:
  • Post
  • I'm extremely interested in giving this a spin. I want something that I can use to visualise aspects of our source code.

  • Will Vedea be a .Net language? Why not use an existing language syntax?

    Thanks,

  • @Tom: Sounds good, I've heard from a number of folks who want to apply it to code visualization.

    @gvider: Yes, it is based on the .net 4.0 DLR technology, so it is a new .net language.  The reason for not using an existing dynamic language syntax (Python for instance) is that we wanted to augment the syntax to 'bake' certain data and graphics idioms right into the language and to lessen the slope on the learning curve for getting to those constructs.  The syntax is probably closest to C# or java, but with some obvious differences relating to the typing of variables and parameters.  The basic goal is less syntax yielding more results (and more readily) while still providing mechanisms for more skilled programmers to get fine-grained control over their visuals.  The existing .net languages all either had syntactic constraints and details that didn't work for us, or compile-time complexities that we wanted to avoid, or both.

  • Would you be interested in doing a presentation on Vedea for the Business Intelligence part of the UK SQL Server User Group? I can see there would be a lot of uses for this technology in the world of corporate BI.

  • Please provide a sign up mechanism so we can be notified when the CTP becomes available.

    Thanks, Robin

  • Interesting stuff.

    Do you use an M grammar to define Vedea? Oslo would give you a parser etc, so was curious how that fits in.

  • @ChrisWebb: sure - Please send me an email to me through this site and I will respond directly to see if we can work out scheduling.

    @rcottiss: Please grab the RSS feed from this blog.  I will announce the availability here.  That's the best I can offer at this time.

    @david.ing: We didn't use M but used Antlr to generate the lexer and parser.  The Antlr-generated parser emits a DLR expression tree that we can either persist or execute.  The main motivation for the choice was that M was quite new at the time we started and I was already familiar with Antlr.

  • Ummm... this is classic Microsoft.  You guys are talking about XNA and GDI, and anybody who might actually use your stuff is thinking and learning about WPF.  What gives?

  • You're right - classic Microsoft - find the right tool for the right job.  WPF and Silverlight are great technologies for rapid development of sumptuous user experiences, but not necessarily the best choice for a researcher/artist-friendly authoring environment for scalable infographics and data visualizations.  That's why we broadened our exploration beyond the set of 'latest things'.  In the end, there likely will be WPF and Silverlight renderers, but for cross-platform compatibility with Mono, you might want GDI and for extreme scale and advanced rendering (like volumentrics) you'll need DX technologies.

  • Is Vedea dead?   This doesn't appear to be any closer to a CTP then when it was first announced.

  • Any word on this? The Research site says "early 2010" but we're approaching late 2010.

  • Vedea and vedea-like concepts are still very much alive. Obviously, I completely underestimated both the internal interest and impact of Vedea and the toll that would take on getting it out to external people, not to mention all the overhead of designing, building and supporting it for internal people along with my other responsibilities in the Computational Science Laboratory.  It has had a very positive personal impact as well, as I will shortly be joining a product team to work on related technologies.  Before doing making that move though, it is my intention to push through the latest internally reported bugs and get a CTP version out to research.microsoft.com as soon as possible.  I'll also get a bit more chatty here about info-graphics/info-visualization topics in general.

  • I was considering making a code generator for Processing using C#, but then I learned about Vedea in a StackOverflow post.  After watching your talk introducing Vedea I got a bit giddy about the binding semantics and interpolation feature.  Those two things alone will lead to much more concise, expressive code than even Processing offers.

    I'm really looking forward to the first CTP release.  Do you have an updated estimate for when that will happen?

  • I m really waiting for vedea to examine it.

  • This looks to be a great project.  I too eagerly await the CTP.   If there are any additional details you can share, we are hungry for it.  Thank you!

Page 1 of 2 (19 items) 12