I hope everyone had a great holiday and that Santa was good to you.
My kids made out like troopers this year with the grandparents buying them a Wii. Yes I know I work for Microsoft -- but we already have an XBOX 360 – actually we have 2 of them but that’s another story. I probably should feel more guilty about having a Wii but I didn’t buy it myself… and… ok and its pretty fun to play… I mean show the kids how to use it. I just hope my friend Richard in the XBOX team is not reading my blog.
As for me, got some sweaters, some books and some gift cards which I used to buy the complete Rocky DVD collection and the complete Indiana Jones DVD collection.
With the year quickly coming to a close, I realized I had limited time to blog a couple more times before 2010 started and that I owed everyone a follow up to my previous blog article on building Excel Add-Ins to turn Excel into a Spatial Spreadsheet.
In this somewhat lengthy follow on blog article I will revise and extend the COM automation add-in that was introduced in the previous article.
In the 3rd article I will introduce a second Excel add-in which will provide functionality for doing visualization of spatial data within the Excel spreadsheet.
In the last article I mentioned that there were some annoying side effects to using the automated COM automation support from .NET. The main side effect was the exposure to COM automation of some of the inherited public methods from the Object base class (Equals, GetType, ToString). While this is not a catastrophic problem, it does introduce confusion for end users, since these methods have nothing to do with spatial analysis.
In order to hide from Excel the inherited methods from the Object base class we want to have finer control over what .NET will expose as the COM/Automation interface for our automation add-in.
The first step to achieving finer grained control over the exposure of our spatial functions is to refactor the current MySpatialFunctions class into a a class that is derived from an Interface, let us call it IExcelSpatialFunctions.
1: namespace ExcelSpatial
3: public interface IExcelSpatialFunctions
5: bool INTERSECTS(string Shape1, string Shape2);
7: // All other methods to be exposed to Excel
Now we can decorate the IExcelSpatialFunction interface with some attributes from the System.Runtime.InteropServices and System.ComponentModel namespaces. These attributes will control how the interface will be exposed as a COM Automation interface for Excel.
The [ComVisible(True)] attribute is used to expose the IExcelSpatialFunctions interface as a COM interface and the [Guid()] attribute is used to specify a unique ID for this interface.
Note: The [Description()] attribute is used to provide additional help about the exposed functions and appears in the resulting type library for the COM component. Unfortunately that information will not be displayed by Excel (in the formula picker dialog for example) but I include it for completeness and the description should be visible to developers integrating with the COM automation add-in in VB or other COM enabled hosting development tools.
1: using System;
2: using System.Collections.Generic;
3: using System.Linq;
4: using System.Text;
5: using System.Runtime.InteropServices;
6: using System.ComponentModel;
8: namespace ExcelSpatial
12: public interface IExcelSpatialFunctions
14: /// <summary>
15: /// Returns True if a geography instance intersects another.
16: /// </summary>
17: /// <param name="wktShape1"></param>
18: /// <param name="wktShape2"></param>
19: /// <returns></returns>
20: [Description("Returns True if a geography instance …")]
21: bool INTERSECTS(string Shape1, string Shape2);
23: // All other methods we wish to expose to Excel.
We then update our original MySpatialFunctions class to derive from the interface. I also took the liberty to rename the class to ExcelSpatialFunctions.
The primary change from the previous implementation is to change the [ClassInterface()] attribute to ClassInterfaceType.None from the original ClassInterfaceType.AutoDual. We are being “explicit” about our exposed client interface with IExcelSpatialInterface as the interface to our class, so we don’t want an auto-generated COM interface for the class.
We also specify the Guid for the class using the [Guid()] attribute and the ProgId for the class using the [ProgId()] attribute. The ProgId is the human-readable string that will be used in Excel when selecting a formula category as well as the the name that will appear in the add automation add-in dialog box in Excel.
1: // ...
2: using System.ComponentModel;
3: using System.Runtime.InteropServices;
4: using Microsoft.SqlServer.Types;
7: namespace ExcelSpatial
13: [Description("Spatial add-in for Excel based on ….")]
14: public class SpatialFunctions :
18: #region IExcelSpatialFunctions Members
20: /// <summary>
21: /// Returns True if a geography instance intersects another.
22: /// </summary>
23: /// <param name="wktShape1"></param>
24: /// <param name="wktShape2"></param>
25: /// <returns></returns>
26: [Description("Returns True if a geography instance...")]
27: public bool INTERSECTS(string Shape1, string Shape2)
29: bool bReturnValue = true;
31: SqlGeography geoShape1 = Shape1.ToGeography();
32: SqlGeography geoShape2 = Shape2.ToGeography();
34: bReturnValue = geoShape1.STIntersects(geoShape2).IsTrue;
36: return bReturnValue;
39: // All the other methods to expose to Excel
43: // Other supporting code for the add-in
Note: For those familiar with the String class and looking at the above code, you may be asking yourself “I don’t remember there being a ToGeography() method on the string class.” Don’t worry, you’re not forgetting things, read on to next section…
My previous blog post had those methods chopped off, so I’ve updated them and reposted them below. The following methods are called by VS (or by regasm.exe) to register your class within the registry to make it visible to COM Automation clients like Excel.
public class SpatialFunctions :
#region COM / Automation Registration Methods
public static void RegisterFunction(Type type)
RegistryKey key = Registry.ClassesRoot.OpenSubKey(
System.Environment.SystemDirectory + @"\mscoree.dll",
public static void UnregisterFunction(Type type)
private static string GetSubKeyName(Type type, string subKeyName)
System.Text.StringBuilder s = new System.Text.StringBuilder();
A large part of the Excel add-in is about handling textual representations of spatial data, converting them to SQL spatial types, performing calculations and then returning back an appropriate textual representation of the result.
It turns out there are several well adopted textual representations for spatial data. SQL Server’s type library supports two of the most common: WKT (well known text) and GML (geographical markup language). A third popular standard is KML which was created by Google (through an acquisition I believe). There are also some newcomers including GeoJSON.
It would be great if we build some of this “text handling” directly into the spatial types and the string class in an extensible manner. Ideally I would like to be able to do things like:
1: String strSomePoint = "POINT(-8.6666 27.666)";
3: SqlGeography geo = strSomePoint.ToGeography();
5: strWKT = geo.ToWKT();
6: strGML = geo.ToGML();
7: strKML = geo.ToKML();
9: SqlGeography geo2 = strGML.ToGeography();
The SQL Spatial Types are sealed classes and we don’t easily have access to the source code for string to extend it, but that doesn’t mean you can’t extend them. You can use .NET Extension Methods to add new methods to existing classes.
For those not familiar with creating a .NET extension method, here is the generic pattern for creating one:
1: public static class SomeClass
3: public static ReturnType methodname(this Type anInstance, ....)
5: // anInstance is the instance of Type that we can act on
In this case we are interested in extending the string type and the SqlGeography type. I chose to create a separate project as part of my solution to generate an assembly that hosts a single SpatialTextHandling class which uses .NET Extension methods to extend the String and the SqlGeography classes. This was completely optional, the extension methods could have easily been part of the main project.
This packaging allows these extension methods to be reused in other projects that leverage the SQL spatial types. In order to “consume” the extensions, you simply include the SpatialExtensions assembly in your project and include its namespace into your code.
public static class SpatialTextHandling
private static string FINGERPRINT_GML =
"<?xml version=\"1.0\" encoding=\"utf-8\"?>";
private static string FINGERPRINT_KML =
"<?xml version=\"1.0\" encoding=\"utf-8\"?><kml ";
private static string FINGERPRINT_GEOJSON = "[";
/// Returns a SqlGeography object based on the text representation
/// contained in the string, attempts to
/// autodetect the text representation if hint not provided.
/// <param name="p_strGeographyAsText"></param>
public static SqlGeography ToGeography(this string p_strGeographyAsText)
SqlGeography geoShape = null;
// Assume WKT as the default and exhaustively try
// everything else if it fails.
geoShape = SqlGeography.Parse(p_strGeographyAsText);
// Must not have been WKT, fall through and try
// to check for other text formats
if (null == geoShape)
// TODO: Implement JSON Parsing
throw new ArgumentException("JSON not yet implemented.");
else if (p_strGeographyAsText.StartsWith(FINGERPRINT_KML))
// TODO: Implement KML Parsing
throw new ArgumentException("KML not yet implemented.");
else if (p_strGeographyAsText.StartsWith(FINGERPRINT_GML))
// XML Format, assume it is GML for now, at some point we'll need to support
// mutliple XML representations.
System.Xml.XmlTextReader xmlTextReader =
System.Data.SqlTypes.SqlXml xml =
geoShape = SqlGeography.GeomFromGml(xml, 4326);
catch (Exception xmlException)
// Problem parsing the GML encoding
throw new ArgumentException("Unsupported text for a shape.");
/// Returns the WKT representation of a geography object.
/// <param name="p_geography"></param>
public static string ToWKT(this SqlGeography p_geography)
/// Returns the GML representation of a Geography object.
public static string ToGML(this SqlGeography p_geography)
// Additional methods
You’ll notice the main extension method is defined on the string class and is called ToGeography(). The ToGeography() method attempts to auto-detect the text encoding or representation of the string and turns it into a new instance of a SqlGeography class.
public bool INTERSECTS(string Shape1, string Shape2)
bool bReturnValue = true;
SqlGeography geoShape1 = Shape1.ToGeography();
SqlGeography geoShape2 = Shape2.ToGeography();
bReturnValue = geoShape1.STIntersects(geoShape2).IsTrue;
So in the INTERSECTS function above we can now pass in a string that is encoded in any of the supported textual representations, in fact Shape1 could be in WKT format while Shape2 is in GML format.
This becomes pretty powerful when you think about how users work with Excel, often combining data from different sources and copying and pasting data from different web sites and other spreadsheets. The ToGeography() extension method attempts to make it transparent to the end user.
Please note that this SpatialExtensions class is a work in progress with the current implementation supporting only WKT and GML. My goal is to eventually detect and automatically support KML, GeoJSON and GeoRSS.
The other methods in this SpatialExtensions class extend SqlGeography with convenience methods to return different text representations and as you’ll see below methods to handle specifying some constraints for the text representation.
A textual representation is a really convenient representation to achieve interoperability between applications and to provide human-readable formats for end users, but in the case of complex spatial objects (detailed country borders, etc.) it can get quite lengthy.
Excel 2007 and earlier have a limitation on the number of characters that can be stored in an Excel cell and that limit is 32K characters. While this only affects complex shapes with lots of vertices, it would be nice if our add-in could somehow automatically make sure that results it returns are always represent-able as a string less than 32,000 characters long.
In order to reduce the complexity (# of a vertices) for a spatial object we can turn to the Reduce method on the SqlGeography type. The “trick” is to ensure that all our functions that return a string representation of a SqlGeography call Reduce if they are going to be “longer” than 32,000 characters.
I added some special versions of the ToWKT() and ToGML() extension methods to the SqlGeography class in the SpatialTextHandling class that allow the specification of a maximum string size. These functions are the same as the original version, but they make a call to a helper method called ReduceForTextRepresentation() to ensure the resulting string will be less than the maximum specified.
Caveat: I make no guarantees about the validity of the approach I use to “guess” at an appropriate value for the epsilon parameter to Reduce(). It seems to work relatively well for those cases I tested, but I’m sure the algorithm could be made more correct and any thought you have on this are appreciated.
public static string ToWKT(this SqlGeography p_geography,
// Assume all will be well
string strGeography = p_geography.ToWKT();
// Check to make sure the string is not too big.
if (strGeography.Length > p_nMaxLength)
// Reduce the shape as best we can to make it
// fit in the string length specified.
strGeography = ReduceForTextRepresentation(p_geography,
// Return the string
private static SqlGeography
// Take a best guess as to the right value for reduce by
// looking at the "string length" per point in the shape,
// estimating what the maximum # of points that is acceptable
// should be, and then using the average distance/length per
// point * the ratio of current points to needed points
// as a basis for epsilon in the reduce formula.
int nNumPoints = p_geography.STNumPoints().Value;
double fShapeLengthPerPoint =
p_geography.STLength().Value / nNumPoints;
int nStringLengthPerPoint =
p_nCurrentStringLength / nNumPoints;
int nMaxPoints =
p_nStringTargetLength / nStringLengthPerPoint;
// how many new points to each original points must we have?
double fOverByFactor =
(float)nNumPoints / (float)nMaxPoints;
// Not sure if this is "scientific" but tell the shape to reduce
// so that we have a tolerance of n * the average length of line
// segments in the shape. It probably over compensates.
double fReduceTolerance =
((fShapeLengthPerPoint / 2) * fOverByFactor);
// Let's try it
Up until now the functions I have shown take relative simple parameters, for example the INTERSECTS function takes two strings and returns a string. Sometimes it is useful for a method to act on a list of values or in the case of Excel, on a range of cells. One class of functions that could benefit from knowing how to handle a range of cells as a parameter would be aggregate functions.
To this point our add-in and the spatial methods we have created have been pretty generic and have not had to have any knowledge of Excel or the Excel Object Model. In order to accept parameters that represent ranges of cells and possibly return results that represent a list of values it becomes necessary to include some Excel namespaces and assemblies into our project.
You will need to add a reference to the Microsoft.Office.Interop.Excel assembly to your project and include its namespace in the ExcelSpatialFunctions class definition files.
using Excel = Microsoft.Office.Interop.Excel;
We can now start defining some functions that accept Excel cell ranges as parameters.
The example below will accept a range of cells containing various spatial objects (LINES, POINTS, etc.) and will return a GEOMETRYCOLLECTION that represents the UNION of all those shapes.
First we add a UNIONMANY function declaration in the IExcelSpatialFunctions interface which will take a generic object for the cell range (we will cast that object to an Excel.Range in the implementation).
public interface IExcelSpatialFunctions
[Description("Returns an object representing the union of…..")]
string UNIONMANY(object Range);
The implementation the UNIONMANY function consists of iterating through all the cells specified in the Range and repeatedly calling the union method exposed by the resulting SqlGeography of the previous iteration with the contents of the current cell.
As you can imagine, creating the UNION of many complex shapes can result in a very complex shape (lots of vertices), one whose textual representation may far exceed Excel’s cell content limit mentioned earlier in the article. We leverage the SpatialExtensions work from above to ensure that that UNIONMANY result will result in a shape that will fit in the 32,767 character limit.
[Description("Returns an object representing the union of two or more geography instances.")]
public string UNIONMANY(object Range)
string wktUnionManyShape = "";
SqlGeography geoUnionManyShape = null;
// Get reference to the excel range object
Excel.Range cellRange = (Excel.Range) Range;
// How many cells are we dealing with
int nRows = cellRange.Rows.Count;
int nColumns = cellRange.Columns.Count;
// Now go through all the remaining cells and keep "unioning" them.
for (int nColumn = 0; nColumn < nColumns; nColumn++)
for (int nRow = 0; nRow < nRows; nRow++)
// Look up the cell in question in Excel
Excel.Range currentCell =
(Excel.Range) cellRange.Cells[nRow+1, nColumn+1];
String wktCurrentShape =
SqlGeography geoCurrentShape =
if (null == geoUnionManyShape)
// This is the first shape in the union,
// so we just use it as is
geoUnionManyShape = geoCurrentShape;
// We need to perform the union of the current shape
// with the ongoing union shape
SqlGeography geoUnion =
// Update the ongoing union for the next iteration
geoUnionManyShape = geoUnion;
catch (Exception cellEx)
// Ignore bad stuff in conversion (union those that
// are valid, ignore those that are not)
throw new ArgumentException("Expected geographies",ex);
if (geoUnionManyShape != null)
There are some additional useful aggregate functions that could be implemented using a same approach (for example a LINESTRING constructor which uses the builder API to build a line-string from a set of points).
One set of functions that I am particularly interested in investigating in the future would be aggregate functions to perform clustering analysis on spatial data.
For example doing cluster analysis from customer location data to find best potential locations for new business branch offices or doing analysis on web traffic to find potential locations for data center load balancing or caching strategies.
Up to this point Excel has been in the driver’s seat sort of speak. That is Excel “calls into” our add-in and for the most part we read what Excel gives us to perform some calculations.
It is also possible for our add-in to also take the driver’s seat from time to time and call into Excel itself. There are limited reasons to do this, so my recommendation is to use this carefully.
While I much prefer to make components know very little about each other, there are some cool things you can do if you know you are being called from Excel. For example populating the spreadsheet, formatting cells or even calling Excel provided functions.
When Excel loads an add-in as part of its startup, it will investigate if that add-in implements a special interface called Extensibility.IDTExtensibility2 (from the Extensibility assembly). If our class implements the IDTExtensibility2 interface, then Excel will call its methods upon loading and unloading our add-in. That provides us a great place in time to gain a reference to the Excel application object.
So we update the ExcelSpatialFunctions class to also derive from the IDTExtensibility2 interface and minimally implement the 5 methods it declares. The only method we are truly interested in is the OnConnection method which will provide us a reference to the Excel application object.
public class SpatialFunctions :
private Excel.Application m_excelApplication = null;
. . .
#region IDTExtensibility2 Members
void Extensibility.IDTExtensibility2.OnConnection(object Application,
ref Array custom)
// Keep a reference to the Application object from Excel
m_excelApplication = (Excel.Application)Application;
. . .
One potential idea for leveraging our knowledge of Excel would be to somehow indicate through formatting that a particular cell is hosting a spatial function result. The trick is knowing which cell caused our function to be called since Excel does not pass in a reference to the calling cell (there is no “this” reference to the current cell or cell range).
It turns out you can in fact get that reference through the Excel application object and its method get_Caller. Once you have a reference to the cell or cell range you are free to do what you please to it. In addition to formatting, in the case of array formulas you can use the reference to infer the size of the return array. I also believe but have not yet tried it that this could be a method for achieving asynchronous calculations.
The following code is for a helper method to let you get a reference to the cell or cell range that caused the call to a function.
private Excel.Range GetCallingCellRange()
Excel.Range rangeCallingCells =
Once you have access to the Excel application object you can navigate to other add-ins and make calls into them, you call built-in Excel functions and also do higher level order functions like printing and other Excel operations. A lot of power… use it wisely since your add-in could affect the user experience the end user expects from Excel.
Most business users don’t start out with some ready made WKT encoded geospatial data sets. They typically start from a list of customers, accounts, sites, web logs or more specifically postal addresses and IP Addresses.
In order to make these add-ins truly approachable to all information workers we should add some ability to turn standard postal and IP address information into spatial objects, specifically POINTs.
Luckily for us there are a variety of geocoding web services out on the “cloud” and Microsoft offers such a service for postal addresses through Bing Maps (the MapPoint Web Services and SDK).
Note: Using MapPoint web service requires a Bing Maps developer account, you can sign up for a developer account here. Signing up will provide you with the credentials (username, password) you will need to make calls against the web service.
I added a PARSEANDLOCATE function to the IExcelSpatialFunctions interface and to the ExcelSpatialFunctions class that accepts a string and expects that string to contain a postal address or partial postal address and returns a POINT spatial object with the latitude and longitude of the specified address, zip code or region.
Thankfully the MapPoint Web Service does the heavy lifting here as it can handle different combinations or permutations of addresses relatively well.
In order to access the web service you will need to sign up for a developer account (see above) and you will need to add a Web Reference to your project to point to the MapPoint web service URL (for example: http://staging.mappoint.net/standard-30/mappoint.wsdl) and get a proxy class to the web service.
The MapPoint Web Service SDK provides a Find service which is the service that provides geocoding, it expects an Address object as its parameter. The same service provides a helper method to turn a string address into an Address object.
using ExcelSpatial.MapPoint; // reference to mappoint web service proxy
public string PARSEANDLOCATE(string FullAddress)
Address parsedAddress = MapPointParseAddress(FullAddress);
SqlGeography geoLocation = MapPointFindLatLon(parsedAddress);
#region MapPoint Web Service Helper Methods
/// This helper method will return a POINT SqlGeography that
/// is the LatLon of the address.
/// <param name="p_mappointAddress"></param>
private SqlGeography MapPointFindLatLon(Address p_mappointAddress)
SqlGeography geoPoint = null;
// Create instance of the mappoint web service proxy
FindServiceSoap findService = MapPointGetFindService();
// Create the FindAddressSpecification
FindAddressSpecification findAddressSpec =
findAddressSpec.InputAddress = p_mappointAddress;
findAddressSpec.DataSourceName = "MapPoint.NA";
// For any addresses we find -- we want the LatLong returned
FindOptions findOptions = new FindOptions();
findOptions.ThresholdScore = 0;
findOptions.ResultMask = FindResultMask.LatLongFlag;
// Just get the 1st / best match
FindRange findRange = new FindRange();
findRange.StartIndex = 0;
findRange.Count = 1;
findOptions.Range = findRange;
findAddressSpec.Options = findOptions;
// Make the call to get any matching addresses (with
FindResults findResults =
// Were there any matches?
if (findResults.NumberFound > 0)
if (findResults.Results.Length > 0)
double lat =
double lon =
SqlGeography.Point(lat, lon, DEFAULT_SRID);
catch (Exception ex)
throw new Exception("There was a problem.", ex);
/// Helper method that wraps retrieving instance of web service
private FindServiceSoap MapPointGetFindService()
// Create instance of the mappoint web service proxy
FindServiceSoap findService = new FindServiceSoap();
findService.Credentials = new NetworkCredential(
/// Helper method that wraps call to the MapPoint web service
/// <param name="p_strFullAddress"></param>
private Address MapPointParseAddress(string p_strFullAddress)
FindServiceSoap findService = MapPointGetFindService();
Address address = findService.ParseAddress(p_strFullAddress, null);
Text is great and as developers we tend to spend most of our day dealing with text in the form of code, but spatial data really lights up when you can visualize and interact with it using a map.
What would be really awesome (ok I’m a little biased here) would be the ability to embed a mapping control directly into the spreadsheet and to link the map control’s rendering to ranges of cells and formulas in the spreadsheet.
That would indeed be cool! (again I’m a little biased here), maybe it could look something like this:
In the next set of blog articles we will develop a second Excel add-in that will show one approach for achieving that result.
I’ll then post a final article in this series about creating an installer to install both add-ins as this can be a particular challenge when combining different types of add-ins into a single installer.
I am hoping to post the complete code for both Excel add-ins on CodePlex early next year so folks can download and extend the project, but I first need to clean up the code a little bit.
I look forward to hearing about your own experiments with the SQL Server spatial types and the Excel Spatial Spreadsheet example.