Spatial Data Type Structure

Spatial Data Type Structure

  • Comments 14

Hi Folks,

As part of a larger effort to document our protocols, SQL Server has just released documentation on the structure of our spatial types.  The document is also available in PDF.  Note that all of this is preliminary, and while I doubt it will change much, it could.  I had very little to do with this effort; you have Ed to thank.

It turns out that this structure is pretty simple, and there are folks out there who have fairly easily reverse-engineered it.  Now that we’re publishing the structure, let me make one very specific warning: if you are going to play around with the binary format, do not set the valid bit (v) unless you’re absolutely sure the object is valid.

Setting this should not cause SQL Server itself any problems: you won’t crash anything.  We just can’t guarantee that you’ll get the most predictable results.  And things get very difficult to fix.

Let’s work an example.  Consider the following linestring: LINESTRING (0 0, 10 0, 5 0, 10 0).  This is clearly invalid according to the OGC, since it overlaps itself.  We can put this into a geometry element and play with it:

declare @g geometry = 'LINESTRING (0 0, 10 0, 5 0, 10 0)'
select @g.STIsValid()            -- 0
select @g.MakeValid().STLength() -- 10

We can’t run the STLength() without first making the object valid or we’ll get an error.  We can now pull out the binary version of this to play with:

select @g

And we get:

0x000000000100040000000000000000000000000000000000000000000000000024400000000000000000000000000000144000000000000000000000000000002440000000000000000001000000010000000001000000FFFFFFFF0000000002

We’re particularly interested in the highlighted byte, which contains a number of flags, all currently false.  Most of these aren’t particularly dangerous to play with.  For example, the lowest-order bit tells SQL Server that the item contains Z values.  If you flip it, you’ll just end up with an error, since the data that follows doesn’t actually contain any Z values:

declare @g geometry = 0x000000000101040000000000000000000000000000000000000000000000000024400000000000000000000000000000144000000000000000000000000000002440000000000000000001000000010000000001000000FFFFFFFF0000000002
 
Msg 6522, Level 16, State 1, Line 1
A .NET Framework error occurred during execution of user-defined routine or aggregate "geometry": 
System.IO.EndOfStreamException: Unable to read beyond the end of the stream.

The valid bit, the third-least significant, is a little touchy, though.  Let’s see what happens if we flip that on our instance and run a few operations:

declare @g geometry = 0x000000000104040000000000000000000000000000000000000000000000000024400000000000000000000000000000144000000000000000000000000000002440000000000000000001000000010000000001000000FFFFFFFF0000000002
select @g.STIsValid()  -- 1
select @g.STLength()   -- 20

What’s happening?  First, the system is trusting the bit and telling us that the instance is valid.  This shouldn’t be too surprising: the bit was added so that we wouldn’t have to perform an expensive check every time we check the validity, and we’re using the optimization.

Second, the result of the STLength() operation is a bit odd.  It’s calculating the length used to draw the invalid object, not the length of the object itself.

Okay, you say: let’s make this instance valid and see if we can fix things:

declare @h geometry = @g.MakeValid()
select @h.STLength()  -- 20
select @h.ToString()  -- LINESTRING (0 0, 10 0, 5 0, 10 0)

Ack!  We’re trapped.  MakeValid() tries to preserve the input geometry whenever possible, and therefore refuses to touch a valid instance.  Our instance isn’t actually valid, but we said it was.

So if you’re creating a geometry from scratch, how are you to know how to set this bit?  That is, how on Earth do you figure out if an instance is valid?  Unless you’re absolutely sure, let the system do it for you: tell SQL that it isn’t valid and then run MakeValid().  Unlike setting the value true, there’s absolutely no harm in doing this—other than perf.  In fact, when MakeValid() sees an object that isn’t marked as valid, it will first check to see if the instance is valid.  If it is, MakeValid() won’t muck with the data, but will just flip the bit to true.

And keep in mind that although you can play with these structures, most of the time you shouldn’t need to.  Use the built-in methods, and use the builder/sink API if you need to do something fancy.  As always, there are plenty of examples on the SQL Server Spatial Tools CodePlex project.

Cheers,
-Isaac

  • Isaac,

    I'm a bit confused.  I always thought single line strings are always valid, but what you have there is a non-simple linestring since it crosses itself.  Are those 2 things the same in SQL Server? I couldn't find any OGC document that explains it well so not sure what OGC ST_IsValid really means now.

    I tried your above in PostGIS and SQL Server

    SQL Server returns 0 for STIsValid()

    and

    for STIsSimple()

    A .NET Framework error occurred during execution of user-defined routine or aggregate "geometry":

    System.ArgumentException: 24144: This operation cannot be completed because the instance is not valid. Use MakeValid to convert the instance to a valid instance. Note that MakeValid may cause the points of a geometry instance to shift slightly.

    System.ArgumentException:

      at Microsoft.SqlServer.Types.SqlGeometry.ThrowIfInvalid()

      at Microsoft.SqlServer.Types.SqlGeometry.STIsSimple()

    PostGIS re

    turns true for ST_IsValid and false for ST_IsSimple.

  • Hi Regina,

    The OGC specs aren't always the most lucid, and on this point they are silent, so we had to read between the lines.  A number of the methods don't seem to add up if they allow overlapping segments.  E.g., what should the boundary points of such a thing be?

    To make this explicit, what should the boundary points of the following objects be:

    A: LINESTRING (0 0, 10 10, 5 5)

    B: LINESTRING (0 0, 10 10)

    C: MULTILINESTRING ((0 0, 10 10), (5 5, 10 10))

    If they're all permitted, shouldn't these all be the same?  If they aren't all the same, what should the result be if you took the union of each one with the point (0 0)?  Does it make sense for the answer to change after this ostensible no-op?

    If A and C are invalid (but equivalent to B) then the answers become very simple.  Otherwise we seem to have a bit of a mess.

    Note that they eliminated similar problems with Polygons and MultiPolygons by disallowing such overlaps.

    Slightly off topic, but one thing that always bugged me about the spec is the following contradiction:

    - They explicitly allow curves that cross themselves.

    - They state that curves are homeomorphic to a real, closed interval.

    Don't know what to make of that one, except that it's clear there's at least one bug in the curve definition.  :)

    Cheers,

    -Isaac

  • Hi Issac,

    I agree that the specs aren't the best/clear and they may have several bugs, but my interpretation of the specs also follow what Regina has outlined.

    Validity is not mentioned for point or linear features but it is discussed for polygons.  Conversely, simplicity is not mentioned for areal features, but it is discussed in some detail for both point and linear features.  It's because of this lack of clarity that JTS, GEOS, and PostGIS have understood that validity is a term reserved for areal features, and simplicity, a term for point/linear features.

    When you first introduced your test LINESTRING, you state "This is clearly invalid according to the OGC".  If I may counter, I think the LINESTRING is clearly non-simple because it self-intersects, but it is inherently *valid* by [lack of] definition.

    In response to your boundary question, my interpretation of the specs would say that

    A: is non-simple.  Ok, the specs don't explicity state that the endpoints of a linestring are the last points of a simplified linestring, but the aformentioned projects (and I believe the GIS community at large) consider MULTIPOINT((0 0), (5 5)) to be the boundary.

    B: is simple with a boundary of MULTIPOINT((0 0), (10 10)).

    C: is non-simple because the criteria is false - "A MultiCurve is simple if and only if all of its elements are simple, the only intersections between any two elements occur at points that are on the boundaries of both elements."  Additionally, due to their funny "mod-2" rule, "A point is in the boundary of a MultiCurve if it is in the boundaries of an odd number of elements of the MultiCurve", the boundary would be MULTIPOINT((0 0), (5 5)).

    You're right.  Now we do indeed have somewhat of a mess, since all three cases are are permissible (that is valid), but A and C are non-simple.  But, I think there are many real-world use cases where users will want all three cases to be permissible so that operations can be performed on them (like length). (Off the top, I'm thinking of someone importing GPS data into a spatial database, ordering the points based on time, converting the collection into a LINESTRING and getting total distance traveled. )

    Cheers,

    Kevin

  • Hi Kevin,

    First, let me concede that my "clearly invalid" statement is, well, clearly wrong.  :)  I generally think it was the right call, but you should feel free to file a Connect issue if you'd like to see the behavior change.  (http://connect.microsoft.com)  I'm sure the spatial team will be willing to look into this more.

    Also, I think your analysis of the boundary points of C is spot on.  I forgot another case, which we should include for completeness:

    D: MULTILINESTRING ((0 0, 10 10), (8 8, 15 15)).  Here, all of the points are boundary points according to the OGC.

    Finally, while I understand your use case, if it's one the OGC wanted to handle, they seem to fail to do so.  They don't define how such paths are to be carried through various operations (e.g., what happens in union case I mention?) and they lack the operations needed to truly deal with such paths.

    Cheers,

    -Isaac

  • Isaac,

    I'm not so much bothered about the answer to the STIsValid question here as I am about the fact that SQL Server throws an error when asked if this linestring is simple.  I think most people more commonly ask if their linestring is simple or not rather than if its valid or not.

    Regardless of how we interpret the OGC specs on validity,  can we not agree that this particular line string is not simple.

    I did some other tests and confirmed that not all crossing linestrings are invalid in sql server, but they are not simple.  So I guess its mostly in cases you and Kevin describe where the line crosses its boundary and is not closed.  

    It must be really hard to try to be standards compliant when the standards are so open to interpretation, so thank you so much for your efforts and sorry to be such a whiny customer.

  • Hi Regina,

    Yeah, we have no problem with linestrings that cross themselves, only those that overlap themselves.

    I'm a little confused by the statement that, "not all crossing linestrings are invalid in sql server, but they are not simple."  Do you have examples of crossing linestrings that we don't call simple?

    Here are a few other random thoughts:

    You should note that for cases like these that we call invalid---including cases that are unambiguously invalid in the OGC spec---you should be able to call MakeValid and get a new object that is valid.  So if @g.Method() gives you an error complaining that @g is invalid, @g.MakeValid().Method() should work.

    Yeah, the OGC specs are a little tough.  Worse is the case with round-Earth, where there simply are no specs.  We tried to remain relatively OGC-compatible, but the OGC specs presume a flat Earth.

    Finally, you oughtn't worry about being a "whiny customer".  The truth is we made the decision to handle invalid instances in the way we did in order to improve the user experience, and we want to hear if we've done the opposite.  Please do file a Connect issue if you'd like to see things changed: it goes right into our tracking system, and will be routed to the spatial team.

    Cheers,

    -Isaac

  • I'm not sure there is anything to file a ticket about.  Just a little confused at this point.  I always thing of linestrings as trajectories so its perfectly valid to me that you would walk forward and then back along the same line. But then since OGC validaty is based on an intersection matrix (so I think and should return false if an area is not well defined) the concept of directionality doesn't really exist with intersection matrix anyway to my knowledge so it can't really talk about it as far as non-areal is concerned.  Hmm not sure I'm even making sense here.

    Sorry wrong choice of words about crosses. I always forget what those definitions mean.  I don't think its actually possible for a line to cross itself.   Then again I'm having a hard time visualizing something that overlaps itself that is not an invalid polygon (then its a don't know) -- if I take the definition of something overlapping is something where the space is not completely contained by the other and of the same dimension.

    What I meant to say is that not all linestrings with self intersections in SQL Server are invalid, but if they have a self intersection and are valid, then they are not simple.  

    Its hard to verify overlap in SQL Server since Overlap throws an error if a linestring is considered invalid and if I make it valid then it no longer overlaps. So I guess a catch 22 there.

    If I asked PostGIS if this linestring overlaps it gives me an answer of no.

    SELECT ST_Overlaps(CAST('LINESTRING (0 0, 10 0, 5 0, 10 0)' As geometry), CAST('LINESTRING (0 0, 10 0, 5 0, 10 0)' As geometry));  --> returns false

    Now this self-intersecting linestring behaves the same in both SQL Server and PostGIS.  Both come back with -- false (0)

    SELECT (CAST('LINESTRING (10 0, 10 1, 5 0, 10 2, 0 0, 5 0)' As geometry)).STIsSimple()

    SELECT ST_IsSimple('LINESTRING (10 0, 10 1, 5 0, 10 2, 0 0, 5 0)')

    and they don't overlap themselves.

    I asked one of my friends to test this in Oracle and his answers were even more bizarre. So I'm having him verify we are talking about the same line string.

    Oracle -- gave -- IsSimple (Yes), IsValid (yes)

    So needless to say I'm not sure this is as big of an issue as I was making it out to be or worth even complaining about since I have no idea what the preferred behavior should be.

  • Hi Regina,

    I'm getting a little confused about what is your expected behavior.

    If you don't reject L = LINESTRING (0 0, 10 0, 5 0, 10 0) as invalid, then by my reading of the OGC definition of overlaps, L should not overlap itself.  One of the requirements for A to overlap B is that the intersection of the interior of A with the exterior of B cannot be empty.  Clearly, though, the interior and exterior of L are disjoint, and so L overlaps L is false.  PostGIS appears self-consistent here.

    But this reasoning applies to *any* linestring L, so L overlaps L false for any such L.

    As to the linestring M = LINESTRING (10 0, 10 1, 5 0, 10 2, 0 0, 5 0), it is not simple according to the OGC definition, which roughly states that the only two points of a simple linestring that may touch are its two endpoints.  Here, the point (5 0) is both an endpoint and an internal point, and so the linestring is not simple.  You should double check the Oracle result; it's a bug if it is what you describe.

    The issue that you might consider filing is a relaxation of our relatively stringent enforcement of what we call valid.  The example L above is a good one.  As Kevin corrected me, it's not explicitly disallowed by the OGC, but you can get funny behavior with them.  E.g., L union L  is not necessarily L, and if L is not simple, L union L could be.  We chose to be conservative and not confuse people with these; you could argue that we should support them and let users be confused if they hit something like this.

    Cheers,

    -Isaac

  • Isaac,

    My point is as you sait that L = LINESTRING(0 0, 10 0, 5 0, 10 0) does not overlap itself.  So its not really the overlapping issue that disqualifies this linestring.  I can say its self-overlapping (though I guess that term doesn't exist really).  If I take the intersection matrix of these 2 in PostGIS for example

    SELECT ST_Relate('LINESTRING (0 0, 10 0, 5 0, 10 0)','LINESTRING (0 0, 10 0, 5 0, 10 0)')

    SELECT ST_Relate('LINESTRING (10 0, 10 1, 5 0, 10 2, 0 0, 5 0)','LINESTRING (10 0, 10 1, 5 0, 10 2, 0 0, 5 0)')

    I get 1FFF0FFF2.  

    So clearly (or at least as far as PostGIS is concerned), I can't really distinguish these 2 from a intersection matrix analysis.

    I'm mostly trying to understand what cases SQL Server 2008 considers linestrings invalid.  Stating it in different words.  Your definition of validity seems useful if you are building a road for example.

    Lets say we have a linestring following a direction and if at any point in its movement it meets its past path at only finite points, then it is valid but not simple.  If it meets itself at infinite points (a line) (and is not closed) then it is invalid?

  • Aha---I see!  I've been using the word "overlaps" in two very different ways.  I should be more careful, but the OGC spec unfortunately encourages this.  (More on that in a second.)

    The OGC spec has a very particular---and perhaps peculiar---definition of "overlaps".  Under this definition, a linestring can never overlap itself.

    The way I meant it is that the set of points covered by the figure more than once is infinite.  LINESTRING (0 0, 10 0, 5 0, 10 0) overlaps itself over the segment from (5 0) to (10 0).  These are the cases we disallow.

    Let me get on my soapbox for a second and complain that the OGC doesn't exactly encourage good use of the language.  The crosses and touches predicates are particularly strange.  Consider these two linestrings:

    G = LINESTRING (0 0, 10 0)

    H = LINESTRING (0 5, 5 0, 10 5)

    Do these two touch?  Do they cross?  Does this comport with your intuitive understanding of these terms?  :)

    Anyway, apologies for the confusion.

    Cheers.

    -Isaac

  • No problem.  Thinking more about this.  I like your idea of invalid check, but then as Kevin says its not always the desired response like if you are doing track and field but useful if you are building roads.  Wondering if it makes sense to have 2 definitions of it or at least as far as linestrings are concerned allow those other operations even if SQL Server considers them invalid.

    For example in this case - I would expect to be able to ask if this string is simple, overlaps etc.

    Getting back to the oracle issue.  I confirmed that yes for

    LINESTRING (0 0, 10 0, 5 0, 10 0)

    It returns true for IsValid

    and true for IsSimple

    But Oracle doesn't really have an IsValid bit for linestring as it does for Polygons I guess.  Not quite sure what that means. I guess that just means as the case with PostGIS it will always say a linestring is valid.  The simple answer is a bit odd though.  I think we agree its a bug or their interpretation of the specs is very different.

  • Isaac,

    I've read the OGC SFS 1.1 spec (Ref: OGC 05-126, Date: 2005-11-22, Version: 1.1.0) a number of times and I am confused.

    The spec defines validity only for Polygons (6.1.11.1 - "The assertions for Polygons (the rules that define valid Polygons) are as follows") and, in the version I have, does not even mention an IsValid function at all for any geometry type. It clearly defines IsSimple() and isClosed().

    (Though clearly there are some basic things one doesn't want a linestring to have such as: 1) a single point; 2) duplicate points, that could form the basis of an IsValid() function. I have an old SQL3/MM document that has ST_IsValid() in it but your implementation is only OGC SFS 1.1 compliant, right?)

    I cannot find in the OGC SFS 1.1 document I have where is says that it...  "has a very particular---and perhaps peculiar---definition of "overlaps".  Under this definition, a linestring can never overlap itself."

    Where does it say this?

    Yes, it talks about determining Overlaps on page 26 but this is about determining the relationships between two objects and this is not about validity but about relationship.

    It seems to me from reading the specifications that linestrings are only subject to basic validity (see above) and isSimple, isClosed but that is all. Polygons are specifically subject to greater validity as one would expect. But the lack of extra documentation for linestrings as against polygons doesn't mean one can "fill in the gap" as if it wasn't properly filled in the first place.

    Since there are plenty of other people on the SFS 1.1 committee (Microsoft was not on those committees AFAIK) did you discuss these issues of interpretation with them?

    Finally, Regina, Oracle doesn't have an IsValid bit for any of its geometry types. You can store incorrect geometries if you want: Oracle will only report them as being incorrect if you run sdo_geom.validate_geometry. So, here, ESRI can store involuted polygons (See http://www.spatialdbadvisor.com/oracle_spatial_tips_tricks/110/esri-arcsde-exverted-and-inverted-polygons-and-oracle-spatial/) without Oracle complaining and even thought there are non-standard - having been created before the standards were written.

    regards

    Simon

  • Kevin is right when he says:

    'When you first introduced your test LINESTRING, you state "This is clearly invalid according to the OGC". If I may counter, I think the LINESTRING is clearly non-simple because it self-intersects, but it is inherently *valid* by [lack of] definition.'

    Having managed the spatial aspects of a large agency whose staff were very spatially aware I would not like to tell them that their GPS tracklog of a piece of rare and endagered fauna (say a Tassie Devil) was invalid and could not be stored because the devil happened, in the sample period, double back on itself!

    Real life data has a habit of making standards look silly.

    regards

    Simon

  • Hi Folks,

    This has been a good thread, but I think it's starting to get a little muddled.  Let me try to summarize this core points in this thread:

    - I stated that the OGC spec "clearly" disallows linestrings that "overlap".  This statement had two problems.

    - First, the term "overlaps" has a very specific meaning in the OGC context, and a linestring can never overlap itself by their definition.  (In pure OGC terms, a.Overlaps(a) is always false.)  I meant it colloquially to mean that it didn't fold back over itself.

    - Second, the spec is silent on the question of whether these are valid or not, and so their validity is not so clear.  One can interpret this silence to mean that all linestrings are valid.  But given the number of holes in the spec, we chose to interpret them as invalid for several reasons: as a parallel construction with polygons; the strange behavior of methods with such objects; and as a generous reading of the "homeomorphic" statement mentioned above.

    (Note also that SQL/MM defines validity for all objects.  They also call a curve the "homomorphic image of a real, closed interval".  Unlike the (wrong) definition in OGC, I'm not even sure what this means here.)

    - Regardless, there are cases where representing such cases could be useful, such as representing track logs that may fold back on themselves.  And there's no reason in principle that we couldn't operate on them.  Change here should be possible.

    (Note that we currently allow you to store such invalid linestrings (in geometry) but require that you make them valid before we'll operate on them.)

    - If you would like to see a change you should really post an item to Connect (http://connect.microsoft.com).  (If I file the issue, you'll get no visibility as to what the spatial team does with it.  And while I have some influence, I'm no longer on the spatial team.)

    Cheers,

    -Isaac

Page 1 of 1 (14 items)