Welcome to MSDN Blogs Sign in | Join | Help

Microsoft Excel

The team blog for Microsoft Excel and Excel Services.
New conditional formatting "rules" ...

A few posts back when I summarized our goals for conditional formatting, I said that one of our goals was to make a greater number of scenarios possible without needing to write formulas.  Let me briefly explain what I mean.  Excel 2003 gives users two choices when creating conditional formats.  Users can either set up numerical comparisons (this cell is greater than a value, less than a value, equal to a value, etc.), or they can use a formula to specify conditions.  While formulas are powerful and flexible, they take more time and knowledge of the product, so it can be challenging to set up conditional formats like “highlight the top 20% of this range”.  As part of planning Excel 12, our team did a bunch of research in this area, and we found that there were a set of very common things people wanted to do with conditional formatting that while possible, required writing formulas, not all of which were obvious.  Based on this research, we have added a range of new rules to conditional formatting to address the most common customer requests, and we think this will mean a much larger number of people can benefit from the value of the feature.  We have broken these rules into a couple of categories in our new UI to make it easier to find the correct rule type (more on that in the next post).  Specifically, in Excel 12, without writing formulas, you can conditionally format cells:

  1. With data bars, color scales, or icons based on the numeric value in the cell, percentages, percentiles, or a formula. See the posts on data bars, color scales, and icon sets for more information on each of these.
  2. Containing, not containing, beginning with, or ending with specific text.  For example, highlighting parts containing certain characters in a parts catalog.
  3. Containing dates that match dynamic conditions like yesterday, today, tomorrow, in the last 7 days, last week, this week, next week, last month, this month, next month.  For example, highlight all items dated yesterday.  The great part about these conditions is that Excel handles calculating the date based on the system clock, so the user doesn’t need to worry about updating the condition.
  4. That are blank or that are non-blank.
  5. That have errors or that do not have errors.
  6. That are in the top n of a selected range (where n is whatever number you want) OR that are in the top n percent of a selected range (again, where n is adjustable). For example, highlighting the top 10 investment returns in a table of 1,000 investments.
  7. Cells that have the bottom n values OR cells that are the bottom n percent of a selected range.
  8. Cells that are above average, below average, equal to or above average, equal to or below average, 1 standard deviation above, 1 standard deviation below, 2 standard deviations above, 2 standard deviations below, 3 standard deviations above, 3 standard deviations below a selected range.
  9. Cells that are duplicate values or, conversely, cells that are unique values.
  10. Based on comparisons between two columns in tables.  For example, highlight values where values in the “Actual Sales” column are less than in the “Sales Target” column.  When working with tables, we have also made it easy to format the entire row based on the results of a condition.  More on this when I talk about tables in a few weeks.

Note, in Excel 12, you can still create the conditional formatting rules types from Excel 2003.  Next up, I will cover the UI in the Ribbon for adding and removing conditional formats.

Before I sign off, though, I wanted to correct something I said in a comment last week.  Rob van Gelder asked if we had added support for applying number formats to cells using conditional formatting (for example, if a cell is equal to 0, apply a format that makes the cell blank).  I answered that we had not.  The good news is that we have, which opens up a range of possibilities.  I will talk a bit more about these and other limits we changed later this week.

Posted: Tuesday, October 11, 2005 11:11 AM by David Gainer

Comments

Rob van Gelder said:

David,

Thank you for including Conditional Number Formats.

The reason I think this would be a great feature is for dealing with a list of results which are required to be reported at varying levels of decimal places.


While I'm writing, another useful condition would be:
- That contain a formula.

With current versions of Excel, this can only be achieved with UDF or =CELL.

Cheers,
Rob
# October 11, 2005 2:36 PM

Andy said:

This is a great blog.

I applaud the new “dashboard” features and, especially, the increase in power “under the hood” of Excel 12. Kudos to the MS programmers for increasing Excel’s data capacity, which involved overcoming some significant technical challenges.

The new conditional formatting capabilities are terrific, particularly the ones which enable users to highlight cells with errors, duplicates, and blanks. I hope that in a future blog entry, you will elaborate on Excel 12’s features relating to data integrity.

Valid analytical models must rest on foundations of solid, clean data. Identifying and removing “data pollution” becomes exponentially more important with the increased capacity of Excel 12.

Some data-integrity features that would be useful include:

• An error-summary table for imported data (MS Access has this feature).

• A “high altitude” view of spreadsheet data with anomalous cells highlighted/shaded. For example, it would be great if Excel 12 automatically applied conditional formatting to highlight imported numeric data cells that contained errors, non-numeric characters such as asterisks or footnote symbols, or values formatted as text. (Sidebar: marking cells with red triangles is too subtle, IMO; the “Select Special” options also can be unwieldy for large spreadsheets.)

• An enhanced data-import wizard that would remove characters that pad fixed-width cells. (For example, the first column of this file from the U.S. government: http://www.bls.gov/news.release/metro.t03.htm).

• A more-powerful CLEAN function with an optional argument(s) to purge not only non-printing characters, but also user-specified, printing “junk” characters (for instance, *# ~ .). While users can currently accomplish this task using Excel formulas [e.g., =TRIM(SUBSTITUTE(“Field***”,“*”,” “)) ] or VBA functions, an enhanced CLEAN function could be more efficient in some situations. Similarly, a more-powerful TRIM function with optional arguments that would facilitate the removal of characters other than spaces could also be useful.

• The ability to easily purge embedded objects and controls from imported html files. For instance, save the government data file mentioned above using IE, then open the file in Excel and try to select and delete the drop-down dialog box. (Fortunately, the data table imports more cleanly using the Data-->Import External Data-->New Web Query method.)

• A “data cleansing” wizard to assist users in identifying and rectifying data anomalies or problems.

Other desirable features include:

• Improved integration with MS MapPoint. One feature that would be particularly useful would be the ability to easily import distances between addresses (between customers and businesses, for example).

• Easier, more flexible integration with SQL Server Analysis Services (the synched-up KPIs that you mentioned in your last blog entry could be very useful). In particular, it would be great if users could easily develop data-mining models with the data for the dependent and independent variables drawn from columns on the spreadsheet. (Unfortunately, many users have no idea what a cube is, much less how to drill into one.)

Icon sets are a great idea. It would be terrific if you could add icons relating to stocks/investments (e.g., bull, bear, hog, dog, turkey, piles of cash or coins), real estate (e.g., house, building), and transportation (car, plane, train). A few humorous icons might also be good (Bozo the Clown, a court jester/joker, and a steaming pile of B.S., etc.) ;)

I applaud the extensive work that has been done in the area of charting. It would be great if 3-D scatter plots were added to the available chart types, as Harlan Grove mentioned previously. I once suggested this feature to Nathan Myrhvold following a university lecture; he said that it would be a great feature to add to a future version of Excel. That was back in 1998.

My final suggestion is that you reassure users that Excel 12’s calculation engine is accurate, fast, and robust. In particular, Microsoft supposedly improved the statistical functions in Excel 2003, but questions persist in the user community. For instance, the Web page for Palisade’s StatTools states: “StatTools replaces Excel's built-in statistics with its own calculations. The accuracy of Excel's built-in statistics calculations has often been questioned, so StatTools doesn’t use them!” (http://www.palisade.com/stattools/default.asp). Frontline Systems also has rewritten Excel’s calculation engine for its Premium Solver Platform (http://www.solver.com/xlsv6highlights.htm).

I hope that you find these suggestions to be helpful. Many thanks to the entire development team for all of the great work on Excel 12!
# October 11, 2005 3:51 PM

Jean-Marc said:

Hello David,

first, Great blog! :-)

Then, instead of defining the formatting to use, will it be possible to indicate to apply (the formatting defined in) a style (that has been defined before)? This would bring the big advantage that if at some point you modify the style, all cells with conditional formatting would update dynamically.

Cheers,

Jean-Marc
# October 11, 2005 4:22 PM

headtoadie said:

This is very nice indeed, it will be quite useful.

I have a request for a couple of new worksheet functions.

1) We need an easier way to handler formula errors. Currently we the only option we have is nesting IF(ISERROR()).

It's easy to write a UDF that accomplishes this in VBA, but its extremely slow. It needs to be part of the core program.


2) We need a better way to lookup data in data tables. There should be a single function that will allow the user to specifiy a row criteria and header criteria to search on.

Currently we have to nest vlookup with hlookup to accomplish this. And God help you if you have to also apply an IF(ISERROR()) to your nested vlookup-hlookup formula. You end up with a monster that Godzilla would fear.
# October 11, 2005 5:17 PM

mschaef said:

"Currently we have to nest vlookup with hlookup to accomplish this. And God help you if you have to also apply an IF(ISERROR()) to your nested vlookup-hlookup formula. You end up with a monster that Godzilla would fear. "

One way to solve this would be to provide a way to define local names in an expression:

http://www.mschaef.com/cgi-bin/blosxom.cgi/tech/excel/let_in_excel.txt


This:

=IF(ISERROR(MATCH(item, range, 0)), 0, MATCH(item, range, 0)))

Could become something like this:

=LET(value=MATCH(item,range,0), IF(ISERROR(value), 0, value))

What this really does is make an expression tree into an expression DAG. I've actually thought about implementing something like this in gnumeric to see how it'd work. Looking through the code, it doesn't look _too_ hard to do.
# October 11, 2005 5:27 PM

mschaef said:

Oh... and as long as we're dreaming about feature requests (and I'm guessing that this is _way_ late in the release cycle for such things), how about paramaterized range names.

IE:, Define a range name like this:

BAR(size)=REPT("▐",ROUND(size-FLOOR(size,1),0))&REPT("█",size)

Maybe this request can better be phrased in question: "Why do I have to go to VBA to define reusable abstractions in Excel?"

Alternatively, how about something like this:

http://research.microsoft.com/~simonpj/Papers/excel/

That's even work done at Microsoft. :-)


PS: if those strings end up looking odd, they are supposed to be composed of line drawing characters.
# October 11, 2005 6:18 PM

Murray Shactman said:

This new conditional formatting sounds great. Will it be able to highlight a date that is the "last Thursday of the Month?" or the "third Tuesday in each month?"

I can't wait for the beta.

Thanks.
# October 11, 2005 6:48 PM

Biff said:

The new rules are a great improvement!

Ok, you answered a previous question about the physical size of the formula input box. How about the length of that formula. Is the limit still 255 chars?

Off of specific topic: Will I be able to run Excel 12 with 256mb ram?
# October 12, 2005 1:49 AM

Nigel Harper said:

David,

Yet more excellent features. You have not mentioned anything about deleting/ordering of conditional formats. It would be very useful if conditional formats could be named and moved up or down an order list. This would make it much easier when coding and/or dealing with a number of them through the GUI. Sometimes the law of unintended consequences kicks in and you realise that condition number 1 should actually be number 3 and, at the moment, you have to go and copy the formula and formats for each condition into the next condition one at a time.

Echoing the thoughts of some of the commentators above. The error handling of formula results is a nightmare. Nesting things in an ISERROR is clumsy but very necessary (especially when dealing with array formulas and lookups). Each workbook and/or worksheet should have an option area where one can pick an error type and asign it a value from a list (say: 1,0,"",NULL,[custom]) that it should return instead of itself. Call it 'Error Conditioning' or something like that.

Thank you,
Nigel
# October 12, 2005 7:16 AM

Tianwei said:

Wow! This is turning into a late stage dream session. Andy's comments are great. If I could throw in my long-waited list is this:

Have a "subscriber" feature in Excel that allow users (computer names or network IDs) to subscribe to a this particular file name so macro warning can be turned off (on always for subscribers). I can't tell you how many times I have to walk through with individuals who had the security settings default to high and couldn't see anything because the macros dont run (we are still on Excel 2000).
# October 12, 2005 9:12 AM

Harlan Grove said:

Andy...
...
|. . . it would be great if Excel 12
|automatically applied conditional formatting
|to highlight imported numeric data cells that
|contained errors, . . .

PLEASE, no more 'helpful' functionality without
some means of disabling it!

...
|A more-powerful CLEAN function with an optional
|argument(s) to purge not only non-printing
|characters, but also user-specified, printing
|“junk” characters (for instance, *# ~ .). While
|users can currently accomplish this task using
|[...] =TRIM(SUBSTITUTE(“Field***”,“*”,” “)) or
|VBA functions, . . .

SUBSTITUTE already does this. All it needs is regular expression support. That would leave it perhaps a bit more complicated than most users would readily understand, but it'd be a MUCH CLEANER solution than screwing up CLEAN or TRIM.

That said, nothing other than inertia or indifference prevents Microsoft from adding examples to online help (NOT KnowledgeBase articles, but info available offline, e.g., when using XL on a laptop in an airplane) to show how to perform common tasks.

...
|Improved integration with MS MapPoint. One
|feature that would be particularly useful would
|be the ability to easily import distances
|between addresses (between customers and
|businesses, for example).

How big a database would that be? Every address in the US? Only the US, not Canada or Europe? If all OECD countries, would it also need to give distances between addresses in Wyoming and Slovenia? And why would the rest of us want to pay extra for this? IOW, this may be a useful product, but it should be sold separately.

...
|My final suggestion is that you reassure users
|that Excel 12’s calculation engine is accurate,
|fast, and robust. . . .

We should be able to figure out fast for ourselves. As for accuracy and robustness, there are two aspects: functions and efficient iteration through collection data types (arrays, ranges and lists of arguments). I'd hope the latter were no less efficient than current Excel [with XL12 being able to dereference *entire* worksheets, e.g., =SUMPRODUCT(--(X!1:1048576="foo"))]. As for the former, many Excel functions have needed overhauling for years.

GAMMALN [which should special case 1 and 2 so they always return 0]

RANK [to work with arrays as 2nd arguments, which online help implies it can do]

GEOMEAN [which shouldn't succumb to overflow or underflow nearly as often as it does]

MOD [why this hasn't been fixed yet for large quotients is a mystery]

DATEDIF [actually doesn't need fixing, but will it be documented again?]

SQL.REQUEST [will it be included again, or will users have to download the XL10 add-in?]

All the regression functions - XL11 handles blank cells in the X and Y ranges differently that previous versions. Was that intentional? Will XL12's functions keep XL11's semantics, or will MSFT admit a screw-up and revert to the XL10 & prior semantics?

Many of the discrete probability distribution functions are still fubar for mildly extreme values.

|Frontline Systems also has rewritten Excel’s
|calculation engine for its Premium Solver
|Platform . . .

The demands of serious optimization software aren't necessarily ideal for spreadsheets. Read the linked article. You'll see mention of Solver needing to evaluate both functions and their derivatives. There are ways to do both at the same time that would make sense in a product like Solver but not in a spreadsheet.
# October 12, 2005 2:05 PM

David Gainer said:

Greetings,

Everyone – Thanks for the kind words on the value of the blog and the ideas and feedback. It is great to hear what people are interested in.

Rob, Andy – Thanks for the feedback and the ideas. Andy, we have done a lot with SQL Services Analysis Services, and I will spend some time on that later this fall. With regards to our calculation engine, we have taken steps to make it faster, and I promise to get to that too.

Jean-Marc – That is a great idea, although not something we have done in Excel 12.
All the folks that asked about formulas and referencing – I promise next week will be formula and functions week, and yes we have a solution for IF(ISERROR()). After that, I will talk about tables and some referencing improvements we have made there too.

Murray – We don’t have built in rules for the conditions you list below.

Biff – The limit for the number of characters in a formula has been increased to 8k. To answer your second question, I don’t have information on system requirements right now.

Nigel – You will be able to reorder formulas. See post later this week.

Tianwei – We have done some work around a trust model. I will make sure that is on my list of topics.
# October 13, 2005 1:52 AM

Andy said:

Harlan:

I agree that there should be a way to disable the proposed conditional formatting feature that would flag errors or anomalies in imported data. I envision it as an option in the data-import wizard that would be selected by default, but could be de-selected as desired. It could also be an option in the conditional formatting dialog/menu/ribbon, or a separate command in the Data menu.

I respectfully disagree with you about the CLEAN and TRIM functions. I know Excel users that would like to see these functions enhanced to give users the option to scrub more than just non-printing characters or spaces from data fields. Perhaps this data-cleansing capability could be incorporated into a new function(s) so as not to offend users who like the current implementations of the CLEAN and TRIM functions.

SUBSTITUTE is a nifty function, and it works well if the user wants to purge a particular “crud” character or sequence of characters from a data cell. AFAIK, the function cannot accept an array of various characters to scrub from a cell’s contents. While this task can be accomplished using megaformulas or VBA functions, the solutions can be unwieldy.

The nub of my suggestion is that many Excel users want a better way to scrub crud/pollution from their data, which could be accomplished via a function that, for lack of a better word, I’ll call CLEANSE. This function would allow users to not only purge non-printing characters, but also user-specified “crud” characters (e.g., #@~|^*&°∙□). This is definitely not sexy stuff, but it is fundamentally important because, as I stated in my first post, valid analytical models must rest on foundations of solid, clean data.

-------

Calculating distances between addresses has a variety of applications in areas ranging from logistics to real estate (for instance, in regression models of house prices or rents as functions of distances from schools or transportation nodes, among other variables).

MapPoint already has the ability to identify customer/business addresses within a user-specified radius of a given location. My suggestion is to expand on this capability and make it easy for users to calculate distances between addresses——with the address database residing in MapPoint——and analyze the information in Excel (or SQL Server).

My geographic interest lies in the U.S.; I have no qualms about paying for address databases.

-------

I read the Solver PSI article before I posted the link. :) I posted the link simply for informational purposes to show that Excel’s formula calculator has been replaced by a third-party vendor for a specific set of applications. I apologize if my doing so suggested that I believe that MS should rewrite its formula engine to calculate both functions and derivatives.

The Palisade link is more important, because it addresses the issue of Excel’s accuracy in statistical calculations. I hope that MS will rectify the problems that you pointed out in this area.

In a perfect world, people would use “serious optimization software” for complex optimization problems, SAS, SPSS, or Minitab for statistical analysis, Maple or Mathematica for matrix operations and other types of mathematical analyses, and databases for medium-to-large collections of data.

The reality is that most people don’t use those programs.

Excel is the multi-purpose Swiss army knife for analysis and data management. For many people, it is the nexus of their analytical activities.

I think that Excel is an extremely powerful, flexible, and useful program, made even more so by third-party add-ins for optimization, simulation, forecasting, and a variety of other applications. The intent of my first post was to provide suggestions to for making Excel even more functional. To the extent that the post engendered feedback, so much the better. People have shared great ideas in this blog; hopefully their ideas will be incorporated into Excel 13 if not Excel 12.
# October 13, 2005 12:08 PM

Harlan Grove said:

Andy...
...
|SUBSTITUTE is a nifty function, and it works
|well if the user wants to purge a particular
|“crud” character or sequence of characters
|from a data cell. AFAIK, the function cannot
|accept an array of various characters to
|scrub from a cell’s contents. While this task
|can be accomplished using megaformulas or VBA
|functions, the solutions can be unwieldy.

You didn't understand my point about making SUBSTITUTE (and MATCH, SEARCH, {V|H}LOOKUP) accept regular expressions. If it did, formulas like

=SUBSTITUTE(X,"[ "&CHAR(160)&"]+"," ")

could replace all sequences of ASCII breaking and HTML nonbreaking spaces with single ASCII spaces. CLEAN could be replaced by

=SUBSTITUTE(X,"["&CHAR(1)&"-"&CHAR(31)&"]+","")

TRIM would require two modified SUBSTITUTE calls,

=SUBSTITUTE(SUBSTITUTE(X,"^ *| *$",""),
" +"," ")

It'd be simple to augment any of these formulas to handle other undesired characters.

|The nub of my suggestion is that many Excel
|users want a better way to scrub
|crud/pollution from their data, which could
|be accomplished via a function that, for lack
|of a better word, I’ll call CLEANSE. . . .

A better Edit > Replace would be a better idea. Also using regular expressions. Just like in OpenOffice/StarOffice. If Microsoft can copy PDF functionality from OOo/SO, why not regular expressions too?
# October 13, 2005 1:20 PM

Harlan Grove said:

Andy...
......
|In a perfect world, people would use “serious
|optimization software” for complex
|optimization problems, SAS, SPSS, or Minitab
|for statistical analysis, Maple or
|Mathematica for matrix operations and other
|types of mathematical analyses, and databases
|for medium-to-large collections of data.
|
|The reality is that most people don’t use
|those programs.

The reality is that most people don't know what they're doing when they try to use anything beyond arithmetic operations and the simpler math functions (SUM, COUNT, AVERAGE, MIN and MAX).

I'd be willing to bet that comfortably more than half of the people who use LINEST don't know anything about hypothesis testing. Would one aspect of improving the linear regression functions be returning "estimated values do not differ significantly from zero" when they'd fail typical hypothesis testing?

I agree that Excel is a general use tool. The problem is that specialization in one area or another could tend to make it slower for people who'd never need that functionality. Better to make the add-in interface and the object model as efficient as possible and try to keep Excel itself as simple as possible.

IMO, the ideal Excel shoudl follow is the one already provided by programmable editors. The base product includes only a few built-in functions, but the key one is extensibility. The efficiency emphasis is then on making add-on functionality as quick and seamless as possible.

In terms of Excel, this would mean making the bulk of current functions part of one add-in or another. There could be a set of 'standard' add-in, installed by default, providing current function implementations in addition to the AnalysisToolPak, XLODBC (if it's included again), Solver, etc. Maybe one standard add-in for financial functions, one for matrix arithmetic functions, one for list processing functions (SO THEY COULD BE REPLACED!), one for descriptive statistics, one for discrete statistical distributions, one for continuous statistical distributions, one for the nearly worthless Bessel functions, etc. The only functions built into Excel would be IF, AND, OR, NOT, SUM, COUNT, AVERAGE, MIN, MAX, MOD, EXP, LN, LOG, ABS, the IS* functions and the functions for dynamic referencing, INDEX, OFFSET and INDIRECT.

Pipe dreams.
# October 13, 2005 1:55 PM
New Comments to this post are disabled
Page view tracker