#### Introduction:

As part of the blog series TSQL - Solve it YOUR Way, today's topic is a variation of the popular T-SQL question titled "Gaps and Islands" followed by three different solutions presented by three of the more helpful and creative contributors in the TSQL MSDN forums, Steven Wang, Alejandro Mesa, and Tom Cooper.

Gaps and Islands questions revolve around sequential data with missing values. The term "gaps" refers to the ranges of missing values, while the team "islands" refers to the ranges of existing/sequential values. This topic takes on many forms and shows up occasionally in SQL Server interviews, with variations such as the following:

- Find your favorite sports team's longest winning streak given the win/loss results
- Given a table of natural numbers, return a two column table showing the missing numbers (ex: missing from range 6-8, 11-21, etc)
- Find the missing Order IDs given an Orders table with sequential Order IDs
- Write a function that returns the hours worked in the past month by a specific employee

**Topic: Given a series of integer ranges representing painted segments of a bridge, find the areas of the bridge where there is no paint, as well as the areas where the paint has been overlapped.**

Recently, a question was asked in the TSQL MSDN forums that was a variation of the gaps and islands question, with a small twist. The question revolves around a bridge of length 100 that has been painted in various segments as shown with the following data:

For this exercise, we need to determine how to return the gaps (the areas where the bridge is NOT painted) as well as where the paint has been overlapped.

#### Solution #1: Provided by Steven Wang

- /****************************************************************
- Solution Preparation:
- 1. Create a temp table #BridgeP
- 2. Create a temp table #BridgeMaxLength to hold the max length of all bridges
- 3. Insert some test data
- 4. Create nonclustered index to boost the performance for large volume of data.
- Note: the solution I presented here is inspired from the book:
- Inside Microsoft SQL Server 2008: T-SQL Programming
- ****************************************************************/
- Create Table #BridgeP( ID Int Identity(1,1) Not Null
- , BridgePID int Not Null
- , FromLen Decimal(14,4) Not Null
- , ToLen Decimal(14,4) Not Null
- , Constraint pk_BridgeMeaurement Primary Key (ID)
- , Constraint check_From_Less_than_to Check(FromLen <= ToLen)
- );
- Create Table #BridgeMaxLength ( BridgePID int Not Null
- , BridgeMaxLen Decimal(14,4)
- );
- Insert into #BridgeP
- Values (1, 10.0002, 20.0004)
- , (1, 18, 21.0004)
- , (1, 11.1234, 20.0002)
- , (1, 40, 50.0003)
- , (1, 19.11, 35)
- , (2, 46.2966, 50.6472)
- , (2, 31.7084, 46.1286)
- , (2, 178.2604, 190.6864)
- , (2, 38.7394, 43.7234)
- , (2, 27.266, 43.3031)
- , (2, 122.8653, 142.6958)
- , (3, 14.7856, 21.6676)
- , (3, 67.6054, 68.1884)
- , (3, 64.8543, 66.4028)
- , (3, 123.331, 134.4094)
- , (3, 199.3904, 201.1482)
- , (3, 88.1347, 103.9153);
- Insert into #BridgeMaxLength
- Values (1, 100.0000)
- , (2, 200.0000)
- , (3, 250.0000);
- Create Nonclustered Index idx_Bridge_From_To On#BridgeP (BridgePID, FromLen, ToLen);
- Create Nonclustered Index idx_Bridge_To_From On#BridgeP (BridgePID, ToLen, FromLen);
- /**********************************************************************************
- Solution starts here:
- 1. Since the bridge paint measurement records have overlaps, we first need to identify
- all those paint starting points for a bridge. These starting points must not be adjacent
- to any end painting points recorded and must be less than any other starting points within
- the same packed painting group.
- **********************************************************************************/
- ;With RangeStart
- As
- (
- Select Distinct A.BridgePID, A.FromLen As RangeSTart
- From #BridgeP A
- Where Not Exists
- ( Select *
- From #BridgeP B
- Where B.BridgePID = A.BridgePID
- And B.FromLen < A.FromLen
- And B.ToLen >= A.FromLen
- )
- )
- /**********************************************************************************
- 2. With the same logic as starting point but in a reversed way, the ending points
- can be identified as below:
- **********************************************************************************/
- , RangeEnd
- As
- (
- Select Distinct A.BridgePID, A.ToLen As RangeEnd
- From #BridgeP A
- Where Not Exists
- ( Select *
- From #BridgeP B
- Where B.BridgePID = A.BridgePID
- And B.ToLen > A.ToLen
- And B.FromLen <= A.ToLen
- )
- )
- /**********************************************************************************
- 3. Since a bridge should always start from 0 and end with the maximum length, we add
- these 2 points together with all starting points and end points.
- The union all query results must come back with an even number (pairs of start and
- end point) for each bridge.
- 4. In order to put starting point and an end point pair into the same group. I used
- row_Number function:
- (Row_Number() Over(Partition By BridgePID Order by GapStart) - 1) / 2
- If the row number is 1, it returns back 0; if the row number is 2, it also return back
- 0.
- **********************************************************************************/
- , GapValue
- As
- (
- Select BridgePID, GapStart
- , (Row_Number() Over(Partition By BridgePID Order by GapStart) - 1) / 2 As MyGroup
- From
- (
- Select BridgePID, 0.0000 As GapStart
- From #BridgeMaxLength
- Union All
- Select BridgePID, RangeStart
- From RangeStart S
- Union All
- Select BridgePID, RangeEnd
- From RangeEnd E
- Union All
- Select BridgePID, BridgeMaxLen
- From #BridgeMaxLength
- ) X
- )
- Select BridgePID, Min(GapStart) As GapStart, Max(GapStart) As GapEnd
- From GapValue
- Group By BridgePID, MyGroup
- Having Min(GapStart) <> Max(GapStart)
- Order by BridgePID, GapStart;
- Drop Table #BridgeP;
- Drop Table #BridgeMaxLength;

**Explanation of Steven's solution:**

As documented in the comments throughout the solution, Steven takes the following approach.

- Since the bridge paint records have overlaps, we first need to identify all of those paint starting points for the entire bridge. These starting points must not be adjacent to any end painting points recorded and must be less than any other starting points within the same packed painting group.
- Using similar logic, but in a reversed order, we will identify the ending points.
- Since a bridge should always start from 0 and end with the maximum length, we add these 2 points together with all starting points and end points. The union all query results must come back with an even number (pairs of start and end point) for each bridge.
- In order to put starting point and ending point pairs into the same group, I use the Row_Number function:

(Row_Number() Over(Partition By BridgePID Order By GapStart) - 1) / 2

If the row number is 1 or 2, then return 0.

#### Solution #2: Provided by Alejandro Mesa

- SET NOCOUNT ON;
- USE tempdb;
- GO
- Declare @BridgeP Table(
- BridgePID int,
- FromLen Decimal(14,4),
- ToLen Decimal(14,4)
- );
- Declare @MaxBridgeLenght Decimal(14,4) = 100.0000;
- Insert into @BridgeP
- Values
- (1,10.0002,20.0004)
- , (2,18.0000,21.0004)
- , (3,11.12345,20.0002)
- , (4,40.0000,50.0003)
- , (5,19.1100,35.0000)
- ;
- WITH C0 AS (
- SELECT
- FromLen, ToLen
- FROM
- @BridgeP
- UNION ALL
- SELECT
- FromLen, ToLen
- FROM
- (VALUES (0, 0), (100, 100) ) AS T(FromLen, ToLen)
- )
- , C1 AS (
- SELECT
- FromLen AS ln,
- +1 AS [type],
- 1 AS sub
- FROM
- C0
- UNION ALL
- SELECT
- ToLen AS ln,
- -1 AS [type],
- 0 AS sub
- FROM
- C0
- )
- , C2 AS (
- SELECT
- ln,
- SUM([type]) OVER(
- ORDER BY ln, [type] DESC
- ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
- ) - sub AS cnt
- FROM
- C1
- )
- , C3 AS (
- SELECT
- ln, ((ROW_NUMBER() OVER(ORDER BY ln) - 1) / 2) + 1 AS grp
- FROM
- C2
- WHERE
- cnt = 0
- )
- , C4 AS (
- SELECT
- MIN(ln) AS FromLen,
- MAX(ln) AS ToLen
- FROM
- C3
- GROUP BY
- grp
- )
- , C5 AS (
- SELECT
- *,
- LEAD(FromLen, 1, ToLen) OVER(ORDER BY FromLen) AS nxt_FromLen
- FROM
- C4
- )
- SELECT
- ToLen AS GapStart,
- nxt_FromLen AS GapEnd
- FROM
- C5
- WHERE
- nxt_FromLen - ToLen > 0.0001;
- GO
- /*
- GapStart GapEnd
- 0.0000 10.0002
- 35.0000 40.0000
- 50.0003 100.0000
- */

The beauty of this approach is that it could easily be adapted to report on multiple bridges or roads, whatever it is that you are painting, as in the following modified example.

- SET NOCOUNT ON;
- USE tempdb;
- GO
- DECLARE @BridgeP TABLE (
- BridgeID int NOT NULL,
- BridgePID int,
- FromLen Decimal(14,4),
- ToLen Decimal(14,4)
- );
- Insert into @BridgeP
- Values
- (1, 1,10.0002,20.0004)
- , (1, 2,18.0000,21.0004)
- , (1, 3,11.12345,20.0002)
- , (1, 4,40.0000,50.0003)
- , (1, 5,19.1100,35.0000)
- , (2, 1,10.0002,20.0004)
- , (2, 2,18.0000,21.0004)
- , (2, 3,11.12345,20.0002)
- , (2, 4,40.0000,50.0003)
- , (2, 5,25.1100,35.0000)
- ;
- -- Itzik's approach
- WITH C0 AS (
- SELECT
- BridgeID, FromLen, ToLen
- FROM
- @BridgeP
- UNION ALL
- SELECT
- B.BridgeID, T.FromLen, T.ToLen
- FROM
- (SELECT DISTINCT BridgeID FROM @BridgeP) AS B
- CROSS JOIN
- (VALUES (0, 0), (100, 100) ) AS T(FromLen, ToLen)
- )
- , C1 AS (
- SELECT
- BridgeID,
- FromLen AS ln,
- +1 AS [type],
- 1 AS sub
- FROM
- C0
- UNION ALL
- SELECT
- BridgeID,
- ToLen AS ln,
- -1 AS [type],
- 0 AS sub
- FROM
- C0
- )
- , C2 AS (
- SELECT
- BridgeID,
- ln,
- SUM([type]) OVER(
- PARTITION BY BridgeID
- ORDER BY ln, [type] DESC
- ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
- ) - sub AS cnt
- FROM
- C1
- )
- , C3 AS (
- SELECT
- BridgeID,
- ln,
- ((ROW_NUMBER() OVER(PARTITION BY BridgeID ORDER BY ln) - 1) / 2) + 1 AS grp
- FROM
- C2
- WHERE
- cnt = 0
- )
- , C4 AS (
- SELECT
- BridgeID,
- MIN(ln) AS FromLen,
- MAX(ln) AS ToLen
- FROM
- C3
- GROUP BY
- BridgeID,
- grp
- )
- , C5 AS (
- SELECT
- BridgeID,
- FromLen,
- ToLen,
- LEAD(FromLen, 1, ToLen) OVER(PARTITION BY BridgeID ORDER BY FromLen) AS nxt_FromLen
- FROM
- C4
- )
- SELECT
- BridgeID,
- ToLen AS GapStart,
- nxt_FromLen AS GapEnd
- FROM
- C5
- WHERE
- nxt_FromLen - ToLen > 0.0001;
- GO
- /*
- BridgeID GapStart GapEnd
- 1 0.0000 10.0002
- 1 35.0000 40.0000
- 1 50.0003 100.0000
- 2 0.0000 10.0002
- 2 21.0004 25.1100
- 2 35.0000 40.0000
- 2 50.0003 100.0000
- */

**Alejandro's Explanation:**

I see here two problems in one. The first one would be finding overlapping ranges, and then identifying gaps.

Itzik Ben-Gan included these two problems in his last book about window functions. He explains the solutions for SQL Server 2012 and also for previous versions in the following book:

Microsoft® SQL Server® 2012 High-Performance T-SQL Using Window Functions

http://shop.oreilly.com/product/0790145323088.do

I am going to use his approach to tackle your question since I didn't see any solution using the new support to the OVER clause (window functions).

Finding gaps using LEAD function is simple and I think that reading the code is enough. For the solution about packing overlapping ranges, the key is in splitting the range (each row) in two sets, one for the initial of the range (FromLen) and another for the end (ToLen), and adding a type (+1 or -1) depending if it is the start (FromLen) or the end of the range (ToLen). We are going to calculate the running total of [type] minus the [sub], and we will be interested only on those where the formula ([cnt]) is zero. This will give us pairs of start and end, and we can identify the pairs by enumerating those rows and using ((ROW_NUMBER() OVER(...) - 1 / 2) + 1), the rest is grouping and calculating the MIN and MAX per group.

Note: I didn't include an ORDER BY clause for presentation purpose, neither setup proper indexes to support the OVER clause.

#### Solution #3: Provided by Tom Cooper

- -- Sample data
- Declare @Bridge Table(BridgePID int, fromlen int, tolen int);
- Insert @Bridge(BridgePID, fromlen, tolen) Values
- (1 , 10 ,20),
- (2 ,18 ,2),
- (3 ,11 ,20),
- (4 ,40 ,50),
- (5 ,19 ,35);
- -- Create Numbers cte
- ;With N2 As (Select 1 As Number Union All Select 1),
- N4 As (Select na.Number From N2 na Cross Join N2 nb),
- N16 As (Select na.Number From N4 na Cross Join N4 nb),
- N256 As (Select na.Number From N16 na Cross Join N16 nb),
- Numbers As (Select Row_Number() Over (Order By n.Number) As Number From N256 n),
- -- Now get only the missing numbers and a row_number value over those numbers
- MissingNumbers As
- (Select n.Number, Row_Number() Over (Order By n.Number) As SeqNbr
- From Numbers n
- Where n.Number <= 100 And Not Exists(Select * From @Bridge b Where n.Number Between b.fromlen And b.tolen))
- -- Now product the islands report
- Select Min(m.Number) As GapStart, Max(m.Number) As GapEnd
- From MissingNumbers m
- Group By m.Number - m.SeqNbr;

**Tom's Explanation:**

As noted in the original forum post, this is essentially an “Islands and Gaps” problem. The desired result is all the islands of unpainted points. But the input data is somewhat atypical. In a typical islands and gaps problem, the data is a series of points (for example, you have a table where each row had two columns – employeeid and sickday (datatype date)). And you want a report show the ranges of days an employee was out sick. So that you could produce a report like Employee 5 was out Jan 5 thru Jan 7. There is a good deal of information on how to solve this type of problem available both online and in books. One good discussion is by Itzik Ben-Gan at http://www.manning.com/nielsen/SampleChapter5.pdf.

So if the input data was a set of rows showing all the unpainted points, using that technique would solve our problem. But instead, the data is a series of ranges of painted points (possibly overlapping). So we want to first find all the points which are not in any of those ranges. To do this we need to first get all numbers between the minimum and maximum points (1 and 100 in this case). We do that either by using a Numbers table that was previously created or by building a Numbers cte as part of the query. I choose to build one with a cte since that way I did not have to assume the OP had a permanent numbers table. The method I used is a common one, the earliest reference I know to it is also by Itzik Ben-Gan at http://www.sqlmag.com/article/sql-server/virtual-auxiliary-table-of-numbers.

Once I had this table of numbers I then kept only the numbers which were in the range 1 to100 and were not in any of the painted ranges in the original data. What’s left is a standard islands and gaps problem.

#### Conclusion:

As you can see, all three of the above solutions provide the result we were looking for, but do so in a very different style. The original thread provides variations of the solutions presented here as well as further discussion. Each of these solutions leverages different SQL Server language constructs and includes different considerations in the final solutions. I hope that you are able to learn a lot by trying out the problem yourself and then reading through the additional solutions.

Special thanks to Steven, Alejandro, and Tom for their valuable forums contribution and for contributing to this series!

Hope that helps,

Sam Lester (MSFT)

#### Contributor Bios:

**Steven Wang **has worked with SQL server for more than 10 years. Steven is an active SQL server community participant and speaks at events such as TechEd, CodeCamp, SQL User Group etc.

Blog: www.MSBICOE.com | LinkedIn: Steven Wang | MSDN Profile: Steven Wang - Shangzhou

**Alejandro Mesa **has been working with SQL Server for more than 10 years, being an MVP since 2007. His passion is T-SQL and he is active in both English and Spanish forums under the alias of Hunchback.

**Tom Cooper **began his programming career in 1968, began working with database software in 1977, and first worked with Microsoft SQL Server in 1994 (version 4.21). He is now very happily retired.