System.Collections.ArrayList performance analysis

System.Collections.ArrayList performance analysis

  • Comments 11

ArrayList is one of the most commonly used data structure. However there is little information about ArrayList performance characteristics on MSDN. The document on MSDN does mention asymptotic complexity of some methods on ArrayList class, but it doesn’t talk about how ArrayList is implemented internally. In this document, I will describe the cost of using ArrayList based on current implementation. The implementation might change over time, so the information in this document might not be accurate in future releases.

 

Rotor source will be a great reference if you want to find out more information:

http://dotnet.di.unipi.it/Content/sscli/docs/doxygen/fx/bcl/arraylist_8cs-source.html

It is quite close to the source we shipped.

 

To make the discussion easier, I will assume the platform is x86.

 

1. Overview

ArrayList is basically a dynamic array, which can grow automatically when it doesn’t have enough space to store all items. It uses an array as internal storage. The internal array might have unused spaces. There are two important concepts about ArrayList. Capacity is the number of elements that the ArrayList can store, which is the length of the internal array. Count is the number of elements that are actually in the ArrayList. ArrayList only shrinks when users change the capacity to a number smaller than current Capacity.

 

ArrayList provide fast random access by index. It is slow to insert or remove in the middle.

 

2. Memory Usage of ArrayList

 

Following are the private fields in ArrayList class:

        private object[] _items;

        private int _size;

        private int _version;

        private object _syncRoot;

_items is the internal array and _size is the number of slots used in the Array, i.e., the Count of ArrayList. _size is the number of items in the ArrayList. _items[0] to _items[_size – 1] will store the items. _version is used to support safe enumeration. I will not talk about _syncRoot in this document.

 

The overhead for each managed object, which is not an array, is 8 bytes (4 bytes for its synchronization block and 4 bytes for its method table.) One dimensional array with zero-based indexing needs 8 more bytes to store the length of the array and the method table of array element type. So the size of an ArrayList object will be 24 ((8 + 4*4) bytes. An array of object with n items in it will need 16+n*4 bytes in memory.

 

When an Arraylist is constructed, a Capacity can be specified. The capacity will be the length of the internal array if it is larger than the default capacity, 16. If no capacity is specified while constructing an ArrayList, the default capacity will be used. Clearly it is quite costly to create an ArrayList (the total cost is 104 bytes based on above information.) After shipping .Net Framework V1 and V1.1, we have found in various applications that most ArrayList objects have less than 4 items and a large percentage of them are empty in their lifetime. This means user could waste lot of memory using ArrayList. In Whidbey, we will do two things to address this problem: lazily allocating the internal Array and changing the default capacity to 4. Bear in mind, even after these changes, creating an ArrayList object will not be free. A user should still avoid creating empty ArrayList.       

 

When the internal array is not large enough to hold all items (happens when new items are added to an ArrayList object,) a new array with double size of the original array will be created. All the items in the ArrayList will be copied from the old Array to the new Array. The new Array will be used from here on and the old array will be reclaimed by garbage collector sooner or later.

 

3. Time Efficiency of Methods on ArrayList class  

Now we know the internal structure of ArrayList. I will list the important methods and properties on ArrayList and explain what they do and their complexity.

 

Capacity properties

Getter: The length of the internal array will be returned. As we mentioned before, the length of the internal array is stored inside the array object. So it is trivial to retrieve it.

 

Setter: If the specified capacity is exactly the same as current capacity, nothing happens. If the capacity is less than the size of the ArrayList, an ArgumentOutOfRangeException will be thrown. If the capacity is not zero, we will create a new array with the specified capacity and copy all the items to the new Array.

 

It is worth mentioning that copy an array of object to another is not just a memory copy. We also let GC know the new Array now contain references to the items by updating some internal GC data structures. For details, check Rico Mariani’s article about GC at:

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dndotnet/html/dotnetgcbasics.asp

 

Count Property (read-only)

Value of the size field will be returned.

 

Items Property (indexer method)

Getter and Setter: the index will be validated (a valid index is from 0 to size-1) and we will return the index-th item or change the new value of the index-th item in the internal array. JIT also emits code to do range check when an array is being accessed. So we will waste sometime here.

 

int Add(object Value)

When the internal array has free space for one more item, we simply store the value into last unused slot in the array and increment version number and count of ArrayList. When the internal array is full, we need to grow the array (see section 2 for details.) The amortized cost of Add will be O(1). The return value is the count of ArrayList after adding the new item.

 

void AddRange(ICollection c);

This method is an efficient way to add a bunch of items to the ArrayList. We only to make sure we have enough capacity once and we can use Array.Copy to copy items.

 

int BinarySearch(Object value)

int BinarySearch(Object value, IComparer comparer)

int BinarySearch(int index, int count, Object value, IComparer comparer)

BinarySearch just calls Array.BinarySearch on the internal array. Note if the array is not sorted, the behavior is not defined. When the array is sorted, the complexity will O(lgn).

 

void Clear()

This method clears the content of internal array and set _size to 0. Note it is important to clear the content of the internal array in this method. Otherwise we could keep some objects which could potentially be reclaimed by garbage collector alive for a long time.

 

object Clone()

This method will create a new ArrayList. Capacity of the new ArrayList is the count of current ArrayList. The used items in current internal array will be copy to the internal array in the new ArrayList. Note we do a shallow copy here. So the objects will be shared between two ArrayList.

 

bool Contains(object item)

We will compare the specified item with each items (calling object.Equals) in current ArrayList and returns true if we find a matching one.

 

void CopyTo(Array array)

We will Array.Copy to do a shallow copy.

 

IEnumerator GetEnumerator()

IEnumerator GetEnumerator(int index, int count)

These methods create a new Enumerator object. The correct pattern to use an Enumerator is

                    IEnumerator en = arraylist.GetEnumerator();

                    while(en.MoveNext()) {

                        // use en.Current;

                                          // do not modify the ArrayList here

                    }

IEnumerator.MoveNext will check if the version field of the ArrayList has been changed since it was created and will throw if that happened.

 

ArrayList GetRange(int index, int count)

This method returns an ArrayList which represents a subset of the elements in the source ArrayList. The new ArrayList is just an adapter for the existing ArrayList and doesn’t have an internal array.

 

int IndexOf(Object value)

int IndexOf(Object value, int startIndex)

int IndexOf(Object value, int startIndex, int count)

These methods call Array.IndexOf to find index for the specified value in the internal array. We will need to go through every items in the internal array until we find a matching item, so this method is a O(n) operation.

 

Insert(int index, Object value)

In general, this method is quite expensive (O(n)) since we need to move all the items starting at index one slot forward. When index is the count is equal to the count of the ArrayList, this methods is equivalent to Add(object value). Note insertion at the beginning is the most expensive insertion you could do. It means we need to copy all the existing items in the ArrayList.

 

InsertRange(int index, ICollection c)

This method is an efficient way to insert a bunch of items into the ArrayList.

 

int LastIndexOf(Object value)

int LastIndexOf(Object value, int startIndex)

int LastIndexOf(Object value, int startIndex, int count)

These methods call Array.LastIndexOf to find index for the specified value in the internal array. We will need to go through every items in the internal array until we find a matching item, so this method is a O(n) operation.

 

void Remove(object obj)

This method tries to find the object in current ArrayList. If it is not found, we will do nothing. Otherwise we will remove the first matching item from the ArrayList.

   

Void RemoveAt(int index)

This method moves all the items after the specified index one slot backward. So it could be expensive if index is not close to Count. Same as insertion, Removal of the first element in an ArrayList is the most expensive removal you could do.

 

RemoveRange(int index, int count)

This method will be more efficient if you want to remove a few items.

 

Void Reverse()

void Reverse(int index, int count)

These method calls Array.Reverse to change the internal array in place.

 

SetRange(int index, ICollection c)

This method is an efficient way to change a bunch of items in the ArrayList. It calls ICollection.CopyTo to override the specified items in current ArrayList.   

 

void Sort()

void Sort(IComparer comparer)

void Sort(int index, int count, IComparer comparer)

These methods call Array.Sort to sort all or part of the used items in the internal array. Array.Sort is an implementation of quick sort. So the complexity is O(nlgn).

 

object ToArray()

This method creates a new array. The length of the new array is the count of current ArrayList. We then do a shallow copy from the internal array to the new Array.

 

void TrimToSize()

This method sets the capacity of ArrayList to the count of it.

 

4. Usage Guidelines

l           Try to avoid insertion and removal operations expect to or from the end of ArrayList. If you do need to insert or remove items, consider using InsertRange/RemoveRange.

l           Specify the capacity when constructing an ArrayList if possible. This can avoid allocation of temporary arrays and unnecessary copies.

l           Calls TrimToSize only if you are very unlikely to add new items to the ArrayList in future. It reduces the memory consumption of ArrayList, but it is costly to create a new Array and copy the items.

l           Use for loop to loop through items in an ArrayList if possible. “foreach” has the overhead to create a temporary enumerator objects and MoveNext/Current on IEnumerator is slower than indexer method on ArrayList.

  • Thank you for this information!
  • Are all the collection classes implemented the same way in Rotor and the .Net framework? Since this is the BCL team's blog, I'm assuming you would know for sure.

    BTW I've always disliked the fact the .Net f/w documentation doesn't talk about the time complexities of the collection classes as clearly as the C++ standard's STL part does. Take a look at Bjarne's book and see how clearly he lists out the time complexities of various STL classes and the tradeoff's involved. So here's a difference between a documentation by a programmer versus just a documentation guy who has to pick on the brains of some busy developer for information.

    Another question I have is why do the Synchronised wrapper classes do a blind lock for every function? Why doesn't it do something smart like a reader write lock etc? For example how does it matter how many threads are simultaneously trying to read the count of a collection? Why go to the cost of an expensive lock for something that is only read by various threads? I was shocked to see the source for some of the synchronised wrappers and see what a trivial implementation it is.

    I'm hoping someone on the BCL team really reads these blogs and answers questions here.
  • Incidentally - here's an apparent ArrayList bug I noticed.


    Here's some c# code:

    using System;
    using System.Collections;
    namespace AdapterProblem
    {
    class Class1
    {
    [STAThread]
    static void Main(string[] args)
    {
    ArrayList source = ArrayList.Adapter(new object [] {"1" , "2"});
    ArrayList range = source.GetRange(1,1);
    ArrayList target = new ArrayList();
    target.AddRange(range);
    Console.WriteLine(target[0] == null? "It's null" : target[0]);
    Console.ReadLine();
    }
    }
    }

    I would expect this to print out the second element of source -
    i.e. the string "2". Actually it turns out that target[0] is null so
    the program prints "It's null".
  • It seems you can't iterate over an ArrayList that contains itself:

    ArrayList list = new ArrayList();
    list.Add(list);
    foreach (Object obj in list);
    // ArrayListEnumeratorSimple.get_Current throws InvalidOperationException
  • ArrayList Performance
  • Blog link of the week 49
  • Paul, the bug is fixed in Whidbey.

    Michael, the iteration problem is a bug we also fixed in Whidbey. The problem is ArrayListEnumeratorSimple class uses the list to check for boundary conditions. You can take a look at the rotor source for details.

    Thanks for the feedback.
  • I'm looking for a technique for determing IndexOF of a class element in an ArrayList of CLASSES (such as Employee as shown below - not any regular objets).
    Could you please help me out?

    Example:
    public class Employee
    {
    String _name;
    int _id;
    ...........
    ...........
    public Employee( int id, String name )
    {
    this._name = name;
    this._id = id;
    }
    }
    Thank you.
  • I am not sure I fully understand your question but I will try. IndexOf(Object value) just call value.Equals (assuming value is not null) on every item in ArrayList starting at the begining until Equlas returns true and it returns that index for the item.

    For your Employee object you will get a reference equals by default from Object to override this behavior simply override Equals method possibly like the following:

    public override bool Equals(Object other)
    {
    Employee otherEmployee = other as Employee;

    if(null == otherEmployee) {
    return false;
    }

    return _name == otherEmployee._name && _id == otherEmplyee._id;
    }
  • As Ryan mentioned, basically IndexOf just calls Object.Equals on the items.
    So for the type objects, Type.Equals will be used. So it will work just find for ArrayList of type objects.
  • Does everything you have said about ArrayList also apply to List<T>?
Page 1 of 1 (11 items)