Welcome to MSDN Blogs Sign in | Join | Help

Microsoft Excel

The team blog for Microsoft Excel and Excel Services.
Analyzing Data: Functions or PivotTables

Today’s author, Monica Poinescu, a Software Developer in Test on the Excel team, discusses two different approaches to analyzing data in Excel.

Edit: I've attached a file at the bottom of this blog that contains spreadsheets of the examples discussed in this post.

My earlier blog on the new Excel 2007 function SUMIFS spawned a very interesting discussion (thanks to everyone who posted comments there): when trying to analyze/aggregate data in a table, how do we decide whether to use functions versus PivotTables?

This blog outlines reasons to use one option or another. To better illustrate the two alternatives I’ll consider a real estate inspired example: let’s say I have list of homes for sale and their corresponding characteristics:

clip_image002

and I’m trying to find the average price for those homes which have at least 3 bedrooms, a garage and are between 5 and 10 years old. Just by looking at the table, we see that only house3 and house4 satisfy all conditions and the average of their prices is 312,500.

Here is a functions based solution:

=AVERAGEIFS(G2:G6,C2:C6,">2",E2:E6,"yes",F2:F6,">1999",F2:F6,"<2004")

which returns $ 312,500.

To build a corresponding PivotTable, one can use several filters:

clip_image004

Advantages of PivotTables:

  1. While AVERAGEIFS is limited to using at most 127 conditions, PivotTables can handle more than 127 conditions.
  2. The elements of the set that fulfill all conditions are listed in the resulting PivotTable.
  3. PivotTables have a lot of flexibility: the ease of use offered by the new UI allows for a very quick detailed analysis of different available options. Nested layers in a PivotTable offer added results visualization.
  4. Several different approaches are possible: one can construct different PivotTables that answer the same question above.
  5. Particularly useful for large data sources: when relying on an external data source, you don’t need to bring all the data in Excel and one could, for example, use OLAP databases.

Advantages of using functions:

  1. It’s easier to see in one glance all the conditions being used either by looking in the formula bar, or by listing all criteria in separate ranges.
  2. The result updates immediately when adding rows to the source table, while PivotTables need to be refreshed.
  3. The criteria, when referenced in a cell, could be as well the result of another formula, while value filters in PivotTables can only use constants.
    For example, in the formula above, one could replace
    =AVERAGEIFS(G2:G6,C2:C6,">2",…
    with
    =AVERAGEIFS(G2:G6,C2:C6,I4,…
    where I4 contains another calculation ( e.g. =">"&FIXED(SUM(1,1),0)).
    The image below shows a corresponding PivotTable filter:
    clip_image006
    The last field will not accept formulas, only numbers.
  4. Formulas take little space and are easy to move around in a sheet.

Note that in both cases you can use the wildcard characters to define criteria. Also both solutions deal in a similar manner with missing data or errors in the range.

I don't know if I can be fully objective on this question because I'm more of a formula person myself. If I've missed a reason you should pick one approach over another, feel free to let me know the reason you use formulas or PivotTables for summarizing data by leaving a comment.

Posted: Monday, March 23, 2009 12:27 PM by Joseph Chirilov
Attachment(s): examples.zip

Comments

Stuart said:

Another big advantage of the pivot table is being able to drill down on the information to quickly see the items that meet your criteria.

# March 23, 2009 8:33 PM

Jeff L said:

To me this question is especially valid, but from a different perspective: Which is more efficient/performant when analyzing large data sets?

Your example is ironically perfect for me: I am in the real estate industry building analytical models based on transactional data of thousands of rows.  Before learning about SumIfs I built the models to use pivot tables to aggregate, then used indexes to find my value in the PT.  Can Sumifs, AverageIfs, etc offer the same performance as the Pivot Table, or will a large reporting toolset bog down?

I'd love to hear if anyone has insight to that.

# March 24, 2009 7:07 AM

Jeff L said:

To me this question is especially valid, but from a different perspective: Which is more efficient/performant when analyzing large data sets?

Your example is ironically perfect for me: I am in the real estate industry building analytical models based on transactional data of thousands of rows.  Before learning about SumIfs I built the models to use pivot tables to aggregate, then used indexes to find my value in the PT.  Can Sumifs, AverageIfs, etc offer the same performance as the Pivot Table, or will a large reporting toolset bog down?

I'd love to hear if anyone has insight to that.

# March 24, 2009 7:07 AM

Charles Williams said:

Both SUMIFS/COUNTIFS and Pivot tables are very efficient compared to SUMPRODUCT/array formulae.

For large datasets one major factor in the choice is the useage scenario:

if you are looking for what-ifs calculations by changing the input data/calculations SUMIFS/COUNTIFS would be a good choice.

If your input data is relatively static but you want to do whatif analysis with drilldown then Pivot Tables are a good choice.

If you just want to aggregate relatively static data and then do a relatively standardised report either will do, but a third alternative of an SQL aggregation query feeding either a summary sheet or a pivot table might be best.

# March 24, 2009 1:05 PM

sam said:

Unfortunately the limited aggregate functions available in a pivot table and the limited functions supported in a calculated fields and items make use of formulas unavoidable (either inside the SQL Statements - for external Data or in adjacent columns)

# March 25, 2009 2:07 AM

Jeff L said:

@ Charles, Interesting that you mention using SQL...in some of the things I've done that seemed like it would be a more efficient way, but the methods I tried either failed miserably (programmatically) or were ridiculously slow.

But that said I was probably not being very efficient.  How would you apply the SQL to an in-sheet data set--create an array and use the array as the "from"?

# March 25, 2009 1:22 PM

sebastien mistouflet said:

I try to use Pivots as much as possible, but:

- Pivots don't handle complex formulas (eg: rule-based accounting...)

- and do not deal with different nested groups, eg in column: Sales by Product Line, followed by another metric by another grouping, say employee salaries by {manager, Non-manager}.

It is sometimes possible to precompute in the data table directly and use the new precomputed fields in the pivot, but not always. In that case the Formula-type Report works better.

Now I try to separate data entry and reporting so I use SQL most of the time to pull to the reporting book.

Jeff L, you can use the Excel odbc driver to query a *closed* XL file like a database. You can try on an *open* XL file too, but there is still (i believe) a memory leak in the driver in that case.

# March 25, 2009 10:59 PM

DaveP said:

Consider a simple Data List (2003) and Subtotal for average, number, max, min, and/or sum.

Use AutoFilter and secure required numbers with =SUBTOTAL(F1,$G$9:$G$1000)

# March 26, 2009 1:27 PM

Han said:

Would you mind to offer a download link for each table you have in your blog post? So that readers can practice with them.

Thanks.

# March 28, 2009 10:27 PM

findsarfaraz said:

If I understood correctly.

Averageif = Sumif/countif

Also, can we have downloadable tables so the beginner can practice.

# April 2, 2009 6:19 AM

Spence Lloyd said:

Question:  Can a SUMIFS criteria include a cell reference?  I can't seem to figure out how to make that work.

# April 2, 2009 8:10 AM

Monica Poinescu said:

You can now find the files used for the examples I used in my blog above- see 'Attachment(s): examples.zip'.

I also included some examples of criteria passed as a cell reference and an example using criteria with wildcards.

I hope that helps - and thanks for your comments.

# April 3, 2009 1:59 PM

karrroxtraining@gmail.com said:

Thanks for the information. I had been looking for such a simple example which help describe things in such a simple but informative fashion. Thanks.

# April 9, 2009 1:49 AM

EVaino said:

I was just having this discussion with a colleague last week. I've always been a formula person but he convinced me to give Pivot tables a try. I'm still skeptical though.  I do acknowledge that functions can bog things down, especially array functions on large workbooks.  My biggest reason for not liking pivot tables is that often I want to take many different "slices" of a table. If I'm not mistaken, using pivot tables would require me to create a separate one for each analysis?  They're super easy to create, but to me that results in a cluttered workbook if there are dozens of little pivot tables everywhere. I'd rather use formulas so my results can be nice and tidy - and if anyone wants to see where the data came from they can easily check the formula. Just seems cleaner.  Especially if they all need to be refreshed too...(although presumably there is a refresh all option?).  

good discussion.

# April 14, 2009 4:32 PM

david said:

How to add CalculatedField to pivottable using c# ?

Thks.

# April 22, 2009 4:30 AM
New Comments to this post are disabled
Page view tracker