Welcome to MSDN Blogs Sign in | Join | Help

Microsoft Excel

The team blog for Microsoft Excel and Excel Services.
Creating a Master-Detail View in Excel

Today's author, Dan Battagin, a Program Manager on the Excel team, talks about joining two tables in Excel (a.k.a. returning multiple rows for VLOOKUP).

Today, we'll take a look at the VLOOKUP function, and work out a way to get around its major drawback - it returns only a single value that matches the lookup.

If you're like me, there are times when you have a big table of data which is pretty well normalized, and you want to pull all of the rows out of that table that match a certain criteria. Well, it's easy to get the first match in that table by using VLOOKUP (in fact, it's easier than ever with the new structured references in Excel 2007):

Figure 1: VLOOKUP is an easy way to return a value out of the first row that contains a customer name.
Figure 1: VLOOKUP is an easy way to return a value out of the first row that contains a customer name.

As you can see above, with structured references in Excel 2007, not only is VLOOKUP easy to use, but it's actually pretty readable - we're matching the value of A9 in the first column of Table1 and returning the value from the 3rd column ($25.00). That's really nice, and very useful (especially if you use VLOOKUP as part of a calculated column in a table) but it does have the drawback that it doesn't ever allow us to return any of the values for the second "Dan" in the list. And, if you're like me, this can be pretty frustrating. I'm constantly working with lists of Excel bugs (yep, we've got some bugs), materials lists for woodworking projects that I'm currently involved in, price lists for whatever current home remodel I'm working on, etc. where I really need to see more than just that first row.

So today, let's look at how we can actually make this work, and retrieve all of the values from a table that match a certain criteria.

Before we get started on the solution, let's lay out the data that we're going to use. Since it's always a pain to come up with data sets for work examples like this, I'm going to use an old standby - the Northwind Sample Database that ships with pretty much every version of Access. Of course, this being Excel, I'm going to copy the portion of the data that I need into worksheets. (Yeah, yeah, I'm a database guy too, and I know we could do the joins in SQL, but that's not always the case, so bear with me.) OK, so here's the data I'm going to copy onto sheets in Excel:

Figure 2: The "Customer" table, inserted on a sheet called Customers, and named "tblCustomers" for easy referencing in the future. Figure 3: The "Orders" table, inserted on a sheet called Orders, and named "tblOrders" for easy referencing in the future. Figure 4: The "OrderDetails" table, inserted on a sheet called Details, and named "tblDetails" for easy referencing in the future.
Figure 2: The "Customer" table, inserted on a sheet called Customers, and named "tblCustomers" for easy referencing in the future. Figure 3: The "Orders" table, inserted on a sheet called Orders, and named "tblOrders" for easy referencing in the future. Figure 4: The "OrderDetails" table, inserted on a sheet called Details, and named "tblDetails" for easy referencing in the future.

So you don't have to do this yourself, I've also made a copy of the spreadsheet available so you can just follow along (see the attachments at the bottom of this post). Of course, if you want to walk through this yourself, go right ahead, that's a pretty cool compliment.

So now that we've got all of our source data, I want to put together the actual Master-Detail form. Nothing too fancy, but basically I want to select the Order Number that I want to look at, and see some details about that order (who ordered it, order status, etc.) as well as all of the items that are part of that order (item name, price, quantity, etc.). Pretty standard stuff. Here's our target end result:

Figure 5: The master-details form where I can pick an order number and see all of the details.

Figure 5: The master-details form where I can pick an order number and see all of the details.

In the available workbook, you'll find this form put together already, but if you're doing it yourself, here are the steps you'll want to take (note that these steps skip the formatting of this form, since not everyone likes green the way I do):

  1. Enter the following static strings in the sheet:
    1. B1: Order Number
    2. C3: Order Information
    3. C4: Customer
    4. F4: Order Date
    5. F5: Status
    6. F6: Salesperson
    7. F7: Ship Date
    8. C10: Order Details
    9. C11: Product
    10. D11: Quantity
    11. E11: Unit Price
    12. F11: Discount
    13. G11: Total Price
    14. H11: Status ID
  2. Create a Named Range that includes all of your Order IDs, which we'll use to create the Order ID drop down using Data Validation. To do this, click on the Formulas tab | Define Name and enter:
    1. Name: OrderIds
    2. Refers to: =tblOrders[ID]

      Tip: Data Validation cannot refer to ranges on a different sheet than the one that contains the validation, but by using a named range, you can get values from another sheet (and in this case, get a dynamic list of values using the structured reference to get an entire table column, even if new values are added to it in the future)
  3. Select D1 and name it rngOrderId (we'll use this later).
  4. With D1 still selected, click on the Data tab | Data Validation and create a new List type validation with Source: "=OrderIds". Boom, now you have your drop down.

    OK, now we're going to create some "normal" VLOOKUPS as we discussed above already, with a bit of a twist to make them a bit more robust: we're going to use the MATCH function instead of a hard-coded column number for the value we want to retrieve from the source table.
  5. Select C5 and enter the following function:
    =VLOOKUP(rngOrderId, tblOrders, MATCH(C4, tblOrders[#Headers], 0), FALSE)

    Here we've said find the Order ID that I've selected in the Orders table, and return the value from the column that MATCHes Customer.
  6. Just like C5, we're going to setup the rest of the normal VLOOKUP functions to return metadata about the order - this is all the "Master" data in this Master-Detail form:
    1. C6: =VLOOKUP($C$5,tblCustomers,MATCH("Address",tblCustomers[#Headers],0), FALSE)
    2. C7: =VLOOKUP($C$5, tblCustomers, MATCH("City",tblCustomers[#Headers],0), FALSE) & ", " & VLOOKUP($C$5, tblCustomers, MATCH("State",tblCustomers[#Headers],0), FALSE) & " " & VLOOKUP($C$5, tblCustomers, MATCH("Zip",tblCustomers[#Headers],0), FALSE)
    3. H4: =VLOOKUP(rngOrderId, tblOrders, MATCH(F4, tblOrders[#Headers], 0), FALSE)
    4. H5: =VLOOKUP(rngOrderId, tblOrders, MATCH(F5, tblOrders[#Headers], 0), FALSE)
    5. H6: =VLOOKUP(rngOrderId, tblOrders, MATCH(F6, tblOrders[#Headers], 0), FALSE)
    6. H7: =VLOOKUP(rngOrderId, tblOrders, MATCH(F7, tblOrders[#Headers], 0), FALSE)
Figure 6: The completed Master section of the Master-Details form.

Figure 6: The completed Master section of the Master-Details form.

Right, so that's not too bad, and get's us something that's pretty robust, even if we add additional data to our source data tables - just as long as we keep the headings in our form matching the column headings in the tables. OK, so now let's move on to the "Details" part of the form - and this is where we'll get to find a solution to the VLOOKUP limitation of only returning a single value.

Before we build up the actual solution, let's talk about some of the elements of the formula we're going to create:

  1. We're not actually going to use VLOOKUP! Since we need to return multiple items, what we really need is a way to return an array (list) of values, and VLOOKUP just doesn't allow for that. Instead, we'll use the INDEX function to return the value at a specific row and column intersection.
  2. We need a way to designate which item in the array we want to show in the cell, and we'll use a combination of the SMALL and ROW functions to accomplish that.
  3. We want this to be robust in the same way that we made our VLOOKUP function robust - that is, we want to be sure that if we add additional columns to our source data, that these functions don't break. Just like with the VLOOKUP solution, we'll use MATCH to ensure this.

    Tip: A nice side effect of what we're going to do here is that this formula is totally fillable within the "Details" portion of the form, which makes it pretty easy to work with/edit.
  4. Lastly, we want to have a bit of error handling - specifically, we'll use IFERROR (a new function in Excel 2007) to ensure that we just show a "blank" cell if there is an error in the calculation.

OK, so without further ado, let's see that function, as it exists in cell C12 (note that when you enter this function, you enter it without the curly braces, and you press Ctrl+Shift+Enter to commit the cell, which adds the curly braces, making it an array function):

Figure 7: Array function used to return the Nth item that matches a specific value.

Figure 7: Array function used to return the Nth item that matches a specific value.

That looks pretty complex, so let's take a look at what's actually going on here.  I've pretty printed sections of the formula for easier reading, from the inside out, since that's how Excel will ultimately calculate it:

{
=IFERROR(
          INDEX(tblDetails,
                    SMALL(
                              IF(tblDetails[Order ID]=rngOrderId,
                                   ROW(tblDetails[Order ID])-ROW(tblDetails[#Headers])
                              ),
                              ROW(1:1)
                    ),
                    MATCH(C$11, tblDetails[#Headers], 0)
          ),
          "")
}

Let's look at the SMALL function first.  Small works by taking an array (list) of values, and returns the Nth smallest value from that list. 

                    SMALL(
                              IF(tblDetails[Order ID]=rngOrderId,
                                   ROW(tblDetails[Order ID])-ROW(tblDetails[#Headers])
                              ),
                              ROW(1:1)
                    ),

In this case, the array of values is determined by the IF function.  Specifically, IF the Order ID for a row in the Details table equals the Order ID I've selected in the drop down, then add the row number of that row (minus the row number of the heading row, in case the table doesn't start in row 1) to the array of values.  And, once the full Details table is analyzed in this way, return the 1st item in the array - ROW(1:1) returns 1.

Now, you'll see a couple tricks that we've used here:

  1. Instead of ROW(1:1) to return the first item in the array, we could have just used the value 1 (or the value 4 if we wanted the 4th item in the array).  However, by using the ROW function, Excel will adjust the formula for us as we fill it down a range of cells, so the next row will contain ROW(2:2) for example.
  2. We used structured referencing to make the formula more robust.  We could have had cell references for the tblDetails[Order ID] and for the [#Headers], etc. but those would not have adjusted as the source table was modified.  By using structured referencing, we have a pretty solid solution here. (and more readable)

OK, let's move on to the INDEX function next.  INDEX works by taking a 2d array (table) and returns the value that is at the specified row and column position in that array.

          INDEX(tblDetails,
                    SMALL(
                              IF(tblDetails[Order ID]=rngOrderId,
                                   ROW(tblDetails[Order ID])-ROW(tblDetails[#Headers])
                              ),
                              ROW(1:1)
                    ),

                    MATCH(C$11, tblDetails[#Headers], 0)
          ),

In this case, we know that the entire SMALL function returns the ROW(Nth) value in the list of Order Details that match the Order ID that I entered - so in this case the 1st value, which is the row number that we want to retrieve a column out of in the Details table.  The only thing left then is to specify the column number that we want to retrieve, which as we learned earlier is what MATCH is used for - in this case, taking the value in C11 and finding the column in the Details table that has the same name (Product in this case).

Right, so we see a couple more tricks here:

  1. To specify the 2d array (table of data), we are again using a structured reference - tblDetails - as opposed to defining the array using a range reference (A1:G50 or whatever). This gives us robustness.
  2. Ditto with the MATCH function. We could have used the column number explicitly, but by using MATCH, we can much more easily make changes to the source table without having to worry about whether it will break the rest of our model.
  3. Lastly, you'll see that with the MATCH function, we've made it so that C$11 will adjust across, but will not adjust down - that will come in handy as we want to fill this formula to create our details list.

And with that, all we're left with is the IFERROR function and the array function designators. IFERROR is pretty simple - it basically says: if an error occurs while calculating the stuff I contain, replace the error value with an empty string (""). The array function designators (curly braces) are what allow INDEX and SMALL to work over the entire Details table range.

OK, so now that we've discussed the formula in C12, let's quickly finish up the form (and watch the magic of our robust formulas at work). With C12 selected, grab the little fill handle in the bottom right corner of the cell and drag it to H12.

Ooooohhhhhh. Aaaahhhhhh. Notice how that C$11 auto-adjusted to C$12, C$13, etc. as you moved across.

Next, with C12 - H12 selected, grab the fill handle and fill it down 10 rows or so. When you let go, you'll see, as if by magic, that all of the Order Details appear (as appropriate) and now as you change the Order ID at the top of the sheet, the entire Order Details updates.

Figure 8: The completed details table, with the array formula filled across then down in order to retrieve all details.

Figure 8: The completed details table, with the array formula filled across then down in order to retrieve all details.

Pretty cool huh? We had to create a formula that was a bit more complex than just using VLOOKUP, but we also got a Master-Detail view working in a very robust way, using only built-in Excel formulas (no code, etc.).

Of course, I'm sure there are other, equally slick ways of doing this - if you've got one, let's see it!

Posted: Thursday, October 23, 2008 9:35 PM by Joseph Chirilov
Attachment(s): Master-Detail.zip

Comments

dermotb said:

Actually, I think there is a much simpler way (although there's lots to learn from your example).

First, we only need to look up each Details row once, not 6 times over. Looking it up can be done with this formula, copied down to find all the order items:

L13=L12+MATCH(rngOrderId,OFFSET(Details!B$1,L12,0,10000,1),0)

The first time you use it, L12=0 and it will find the first instance. The next time, it will start looking from the next row down from the value in L13, so it won't duplicate any finds (hence the Offset). When you get an error, you know you've run out of records.

Once you know the row, bringing in each of the 6 items is a simple OFFSET function from the top of each column,eg =OFFSET(Details!C$1,L12,0)

# October 24, 2008 3:40 AM

Neal O said:

I agree lots to learn here. Never come across SMALL function.

Is it me who just does not like using formulae like the one in figure 7 in a production environment with the inherent complexity and issues regarding maintainability etc this brings? I would instictivly resort to a bit of VBA to extract this data using the Advanced Filter.

Dermot, I like your solution better - thanks for the post.

More importantly does this not point the finger at VLOOKUP (which I rarely use there are better alternatives - try Decision Models for example). With all the experience in the Excel team isn't it time for some extensions to it in the next version of Excel? I would like to see the ability to search on any column or columns (ie compound key), use of value of first cell in a column to reference it not an index number, specify the retutn value on not found and not have to put up with #n/a.

# October 24, 2008 4:38 AM

Joakim Westin said:

Neal,

You can supply return values if you use the IFERROR() function. So you can have a VLOOKUP that either finds a value or else returns a value that you specify:

=IFERROR(VLOOKUP();yourvalue)

or wasn't that what you wanted?

/j

# October 24, 2008 7:41 AM

Simon Murphy said:

Nice one Dan, and Dermot.

I think I'd go for pivot tables as a first choice, then as Neal suggested a simple advanced filter, before I went for a formula based thing.

Neal those sound like good improvements for the lookup family. I think it makes loads of sense to have the return on error value an integral parameter instead of having to wrap it in an IFERROR.

# October 24, 2008 2:55 PM

Greg Hingsbergen said:

Since it uses the If function, Dan's method can be expanded to allow multiple lookup values (such as a Customer and a Order Date).  I like Dermot's simpler solution, but I don't see a way to extend it beyond the one variable.

I'd use Pivot Tables for my own personal use, but for setting up something to distribute to others - this seems like a powerful and easy way to let them look up multiple records that match the criteria they select.

# October 24, 2008 3:22 PM

Dan Battagin said:

Hi folks, thanks for the feedback –

Dermotb - that's a nice option you've got there - I knew there'd be some other options here that I hadn't thought through.

Neal O - I've done this with VBA myself many times (sometimes with advanced filter, sometimes w/SQL statements, etc.).  My goal this time was to not use any VBA, primarily because I wanted to have the file exist as an XLSX file, which can't contain VBA.  Plus, I didn't want users to have to remember to click on the Trust Bar in order to enable my code and then email me telling me that the solution was "broken" (how often do people forget to do that?!?)

And RE improvements to VLOOKUP - we're always thinking about how to improve the function experience in Excel.  And we've talked about VLOOKUP more than once.  It just turns out that we've got tons of people asking for tons of different features, and we have to prioritize.  Yeah, we hate that too :).

Simon and Greg - Pivots are a nice alternative (I've used them too), but are sometimes a pain to build to get them to "look" like tables, and of course there are the flexibility constraints with how to set the initial input parameter - and whether that can be done in one place or not (since you can't just go typing in the page filter area of the Pivot).  

Of course, this does go to show that XL is a pretty flexible tool in that there are several ways to accomplish the same basic task.

Cheers, and keep the feedback coming.

# October 24, 2008 5:10 PM

dermotb said:

There is a simple way to search on multiple criteria. You set up an extra column in the Details sheet, and in each row, concatenate the search fields. Then you look up the records using the same approach above, using the (concatenated) criteria provided by the user. We have a couple of very successful systems which do this.

Of course, you need to ensure that the concatenations are copied down far enough, especially if you are going to be adding more data. We have a cell which checks for this and shows a warning if necessary.

# October 24, 2008 5:41 PM

sam said:

Dont user Vlookup / Match - Use Index/Match its faster

Use Counta($C$11:C11) instead or Row(1:1) its faster

Improve the Match function

=Match(WhatToMatch, WhereToMatch, MatchType, InstanceNum)

This will ensure we dont have to do this circus...of building an array formula

But ofcourse there was a huge demand for new cube functions which prompeted its inclusion

Better still just build a query with OrderNo as the parameter to pull the line details

# October 25, 2008 5:41 AM

Colin Banfield said:

This example demonstrates some interesting lookup techniques.  However, for this particular type of model, I usually avoid the approach taken to create the detailed portion of the report for the following reasons:

1) Lack of scalability.  If there are a few hundred records and the dataset isn't expected to grow a lot, then it's ok, otherwise importing thousands of records into Excel for the sake of selecting a few can be very inefficient.

2) Number of formulas.  The order detail range in this report has 23 rows and a total of 138 array formulas.  Why not 15 rows or 50?  The problem is, you have to set up the range with a number of order line items that you don’t expect to exceed and heaven forbid if some idiot customer decides to give you a larger than expected order.  The range doesn’t expand automatically.  

As a practical matter, I assume that the report creator must know something about the data source – its location and its table structure, in order to import the appropriate tables.  That’s half the job done. The creator should at least know (or learn) how to use MSQuery to filter data they want to use. As a minimum, they should be able to select specific columns in a table and create a parameter filter.  All this can be done visually in MSQuery so that the user doesn’t have to know or learn SQL code.  To keep things simple, they don’t even have to handle joins.  The reward for this minimal knowledge is a simpler and much more scalable model.

The first step then would be to launch Query and import the Customer and Order Tables – but only the columns used in the report to minimize the amount of data in Excel.  The detail table (with the appropriate fields) would occupy the area currently used by the order detail header and the 138 formulas in the report.  The order ID in D1 would be used as a parameter for the detail table query.  Now instead of a table with potentially thousands of records (or more), you import only the line items for the specific order and the items appear directly in the report.   The range grows or shrinks automatically when another order ID is selected and the banded formatting is maintained.  No VBA required. As a bonus, you could turn on the totals row for the details table, so that the total shows just below the last order item row.

The model could be further refined by, say, creating a year drop-down and using the selected year as a parameter for the order table (year of Order Date).  This would minimize number of order records imported into Excel (unless there’s only one year of data).

# October 25, 2008 12:06 PM

sebastien said:

I frequently use MS Query as Colin's proposes above.

Other interesting features:

- can auto-run when a parameter changes i.e. in this case when D1 changes.

- can be set to auto fill the formula columns adjacent to the query in the sheet.

- can maintain formatting.

- ...

Drawbacks:

- the data location, whether in the current book or another book or dB must be fix. Moving it would break the connection. While this makes perfect sense, I wish there were a connection option for the database location = ThisWorkbook which would automatically point to the current book wherever it is.

- There used to be a memory leak in the xl driver when querying data in an open book. I believe this has been fixed; am I correct?

# October 28, 2008 4:26 PM

sam said:

Sebastian..

"There used to be a memory leak in the xl driver when querying data in an open book. I believe this has been fixed; am I correct?"

Of course not....this was not a problem faced by the "real" users hence MS has not fixed it....

The "Real" users had a big issue with the size of the the Paste button....it was not very discoverable in the earlier version.....hence the size had to be increased in 2007 on a priority ...

# October 31, 2008 10:52 PM

Jonah Feld said:

The workbooks I create often require this kind of feature, but I go about it a very different way. I try to use PivotTables with page fields controlled by VBA when possible, but VBA is not always an option.

In large books calculation speed can become a big concern, so I avoid Array Formulas and volatile functions like OFFSET (and try to avoid VLOOKUP) whenever possible.

The example provided in this post makes extensive use of inefficient Array Formulas, repeating them unnecessarily. It's pretty cool and I definitely learned a new technique, but I'd do it differently.

Because the Orders Table and Detail Table are sorted by ID, you can solve this problem without VBA, OFFSET, VLOOKUP, or Array Formulas. Two simple helper cells is all it takes:

Row_Start: =MATCH(rngOrderId,tblDetails[Order ID],0)

Detail_Count: =COUNTIF(tblDetails[Order ID],rngOrderId)

If you add a 'Detail Slot' column in K (values of 1..n) then all you need in Order Details C12 (dragged down and right) is:

=IF(Detail_Count>=$K12,INDEX(tblDetails,Row_Start-1+$K12,MATCH(C$11,tblDetails[#Headers],0)),"")

This formula WON'T perform any lookup functions unless there is a need, further speeding calculation. More gains could be made by calculating the column number once and referring to that, instead of repeating it in each cell.

Still, this doesn't get around drawback #2 that Colin pointed out. If you have very unpredictable data and need a dynamic result without PTs or VBA, I think MS Query might be the only option.

# November 5, 2008 10:56 AM

Jonah Feld said:

Quick follow up - if you want to audit your data to make sure your fixed detail row allocation is large enough for an edge case, here's an easy way:

=COUNTIF(tblDetails[Order ID],MODE(tblDetails[Order ID]))

# November 5, 2008 11:53 AM

Eniola said:

PLease help me reference a header for a minimum value in an array

# November 12, 2008 11:46 AM
New Comments to this post are disabled
Page view tracker