Welcome to MSDN Blogs Sign in | Join | Help

Microsoft Excel

The team blog for Microsoft Excel and Excel Services.
What is the fastest way to scan a large range in Excel?

Today's author, Dany Hoter, a Product Planner on the Excel team, talks about some performance characteristics he recently discovered using various methods to manipulate large ranges using VBA.

Problem description

You have a large range in Excel with data. Let's say it contains 100,000 rows and 50 columns for each row (Yes you are using Excel 2007 of course). So altogether you have 5,000,000 cells. Columns A to F have some alphanumeric data that you need analyze and based on the combination of values for each row you need to use the numeric values in G to H to do some calculations and store the results in columns I and J. You could place 200,000 formulas in I and J but you see that a spreadsheet with such a volume of formulas gets very sow and consumes huge amounts of memory.

You decide to try and solve it in a piece of VBA code. The question is how to implement such a task in the most efficient way?

What are your options

How can you scan a range in Excel, read the values in some cells, and change some others?

Use a range object

Let's assume that the range you want to read starts at A1

The code looks something like this:

Dim DataRange as Range ' Could also be Dim DataRange as Object
Dim Irow as Long
Dim MaxRows as Long
Dim Icol as Integer
Dim MaxCols as Long
Dim MyVar as Double
Set DataRange=Range("A1").CurrentRegion
MaxRows= Range("A1").CurrentRegion.Rows.Count
MaxCols= Range("A1").CurrentRegion.Columns.Count
For Irow=1 to MaxRows 
  For icol=1 to MaxCols 
    MyVar=DataRange(Irow,Icol) 
    If MyVar > 0 then 
      MyVar=MyVar*Myvar ' Change the value 
      DataRange(Irow,Icol)=MyVar 
    End If 
  Next Icol
Next Irow

Use the selection and move it using offset

Many VBA developers learned VBA techniques from macro recording.

When using relative reference the generated VBA code creates statements like:

ActiveCell.Offset(0, -1).Range("A1").Select

As a consequence many developers adopt this technique and use the ActiveCell or selection ranges to move from cell to cell in code and read or write the cell values. The code will look like this:

Dim Irow As Long
Dim MaxRows As Long
Dim Icol As Integer
Dim MaxCols As Long
Dim MyVar As Double
Range("A1").Select
MaxRows = Range("A1").CurrentRegion.Rows.Count
MaxCols = Range("A1").CurrentRegion.Columns.Count
For Irow = 1 To MaxRows 
  For Icol = 1 To MaxCols 
    MyVar = ActiveCell.Value 
    If MyVar > 0 Then 
      MyVar=MyVar*Myvar ' Change the value 
      ActiveCell.Value = MyVar 
    End If 
    ActiveCell.Offset(0, 1).Select ' Move one column to the right 
  Next Icol 
  ActiveCell.Offset(1, -MaxCols).Select ' Move one rows down and back to first column
Next Irow

Use a variant type variable

This technique copies the values from all cells in the range into a variable in memory, manipulates the values inside this variable and if needed moves the values back to the range after manipulation.

Here is the code this time:

Dim DataRange As Variant
Dim Irow As Long
Dim MaxRows As Long
Dim Icol As Integer
Dim MaxCols As Long
Dim MyVar As Double
DataRange = Range("A1").CurrentRegion.Value ' Not using set
MaxRows = Range("A1").CurrentRegion.Rows.Count
MaxCols = Range("A1").CurrentRegion.Columns.Count
For Irow = 1 To MaxRows 
  For Icol = 1 To MaxCols 
  MyVar = DataRange(Irow, Icol) 
  If MyVar > 0 Then 
    MyVar=MyVar*Myvar ' Change the value 
    DataRange(Irow, Icol) = MyVar 
  End If
Next Icol
Next Irow
Range("A1").CurrentRegion = DataRange ' writes back the result to the range

Another difference is that this method is blazing fast compared to the two others.

Performance Summary

I compared the three methods on relatively large ranges and here are the results:
 

Method

Operation

Cells/Sec

Variant

Read

1,225,490

 

Write

714,286

 

Read/Write

263,158

Range

Read

250,000

 

Write

1818

 

Read/Write

1,852

Offset

Read

206

 

Write

200

 

Read/Write

203

As you can see using a variant variable is much faster especially when changing cells. Even if the calculation can be done with Excel formulas, in some cases this method is the only one acceptable because using a very large number of formulas can become very slow.

Obviously the one method to avoid is moving the ActiveCell using Offset.

Posted: Friday, October 03, 2008 2:43 PM by Joseph Chirilov

Comments

Neal O said:

Great article Joseph, more like this here please.

Not sure I understand the timings though. How can read/write end up faster than just write in the last two? Overall though timing differences look largely as I would expect.

Neal

# October 7, 2008 3:58 AM

Rembo said:

Good article indeed. One minor issue though: is there any point in declaring Icol as an integer?. Should be of type Long I think. In fact, its being argued anyway what the point is of using an Integer these days. Memory space is not limiting and there's no calculation speed difference in using Integers or Longs.

Cheers,

Rembo

# October 7, 2008 6:32 AM

Harrie Bouwens said:

Nice to see that other people jump into the same problem.

In a database Search, I do pretty complicated calculations on the values found in that database, line by line, and return the result to the database.

And according to your study I use exactly the wrong method: the offset method.

Depending on the PC it takes inbetween 7 and 30 minutes to do the 58000 lines, and that's too long for a What-If kind of study

I build a small testfile to discuss this. Is there a way to share this file, and discuss?

# October 7, 2008 10:22 AM

Simon Murphy said:

Hi Dany

Thanks for posting.

I think most devs would manage state if reading /writing a big block.

If you set

app.calc = false

app.screenupdating = false

app.enableevents = false

Then, whilst you still get the same overall result (variant is the fastest) the margin is much reduced

on read/write range is 5x variant, select is 10x slower.

If you read using an xll call you get down to 2x the variant, but you maintain closer integration with the grid.

I think you demonstrate a good use of the technique, sadly I have seen people recreating Excel functions in VBA to do all the calculations and avoid the calling overhead. At which point its hard to understand why they used Excel at all instead of .net + grid control.

cheers

Simon

# October 7, 2008 10:02 PM

Charles Williams said:

Hi Dany,

Its faster for numeric values to use Range.Value2 rather than Range.Value when assigning to a Variant.

Also using .Value2 does not force an implicit conversion of cells formatted as currency to VBA currency data type (and dates to VBA date type). And the implicit conversion to currency can cause loss of accuracy.

# October 13, 2008 6:00 AM

Nate Oliver said:

Just a thought, perhaps this entry could go one step further in explaining why the Variant approach is so much faster? I do the exact same thing here:

http://mrexcel.com/forum/showthread.php?p=1583022

And it's not by accident. What you're doing is coercing the Range of Values to an intrinsic data type, a Variant Array. Intrinsic data types are far lighter weight in terms of memory than sophisticated Objects, like Range Objects. They don't support the number of properties, etc... That a Range has to commit to memory.

Arrays are extremely fast to iterate through, even compared to even simpler (than a Range Collection) Objects like a Recordset.

It's an important additional concept in addition to the findings posted here.

Best,

Nate Oliver

Microsoft Excel MVP

# October 18, 2008 9:58 PM

Microsoft Excel said:

Today’s author, Chad Rothschiller, a Program Manager on the Excel team, is back with a follow up from

# March 12, 2009 7:38 PM
New Comments to this post are disabled
Page view tracker