Getting Loopy (Matt Gertz)

Getting Loopy (Matt Gertz)

  • Comments 10

In my last post, I talked about the hidden costs that can occur whenever you call out to methods, particularly in loops.  In looking at my examples, reader KG2V commented that another thing that folks need to be aware of is avoiding the assumption that the world (or, in this case, a list) is a static thing.  It’s a good point and it deserves some attention, particularly given that different languages will react in different ways to change.

To illustrate this point, let’s assume that we have a form with a ListBox on it, and that we also have a button on the form that, when clicked, runs the following VB code:

    ' Bad -- don't do this

    Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click

        For i As Integer = 0 To ListBox1.Items.Count - 1

            ListBox1.Items.RemoveAt(i)

        Next

    End Sub

 

Ostensibly, we want to remove all of the items from the ListBox.  (Yes, normally we’d use Clear() to do that, but stay with me for a moment;  I’ll switch to more “real” cases in a few paragraphs.)  But if we run this code, we’ll get an exception.  Why?  Because whenever we remove an object, all of the indices of the objects after it will decrease by 1.  Let’s assume that there are three objects in the list -  A, B, and C – so that Count=3:

·         When i = 0, we’ll remove the object at index 0, which is A.  B now moves to index 0, and C now moves to index 1.

·         When i = 1, we’ll remove the object at index 1, which is C.  There is nothing after C, so nothing else changes.

·         When i =2, we’ll remove the object at index 2, which is… nothing.  There’s only one object left, which is B, and it’s at index 0.  So, we get an out-of-bounds exception.

In VB (as was pointed out to me in the comments section of my last point), the value of Count is only assessed once, right when the loop is encountered, so we will always loop through (in this case) three times.

But is the lack of reevaluation of Count the only issue?  Maybe not.  Let’s try the same method in C#, in which Count is evaluated on every iteration:

        // Bad -- don't do this.

        private void button1_Click(object sender, EventArgs e)

        {

            int i;

            for (i = 0; i < listBox1.Items.Count; i++)

            {

                listBox1.Items.RemoveAt(i);

            }

        }

 

·         When i = 0, we’ll remove the object at index 0, which is A.  B now moves to index 0, and C now moves to index 1.

o   Now, we’ll loop back to the top.  The value of i becomes 1, and the value of Count becomes 2.  The stop condition is not met, so we keep going.

·         When i = 1, we’ll remove the object at index 1, which is C.  There is nothing after C, so nothing else changes.

o   Again, we loop back to the top.  The value of i becomes 2, and the value of Count becomes 1.  The stop condition is met, and we exit the loop.

So, although no exception is thrown, the method also fails in C# -- B is left behind.  So, the problem isn’t whether or not Count is evaluated; it’s that the elements in the list are changing indices on each deletion.

Sometimes doing things backwards is good

This is why, in the case where multiple removals need to be made on an indexed list, a smart programmer (all of whom are yawning right now as they read this) will work backwards.  That is, they start with the largest index and work down towards zero.  For example, imagine that we want to remove the 4th, 7th, and 9th element from a list.  We don’t want to do it this way:

        ' Bad -- don't do this

ListBox1.Items.RemoveAt(4)

ListBox1.Items.RemoveAt(7)

ListBox1.Items.RemoveAt(9)

 

Once we’ve removed the 4th item, then the element at 7 will have moved to 6 and the element at 9 will have moved to 8.  After we’ve removed the new 7th item, the element at 8 will move to 7, and so on.  We’d either be removing the wrong elements or we’d crash.  It’s far better to do them in reverse order:

ListBox1.Items.RemoveAt(9)

ListBox1.Items.RemoveAt(7)

ListBox1.Items.RemoveAt(4)

 

None of the indices up front will be affected by changes that are occurring behind them in the list – their indices won’t change, and this will work fine.

This issue comes up when you are (for example) deleting items from a listbox that supports multiselection.  You basically have two ways to get the selected items in a listbox: you can use the SelectedIndices member function, which returns a collection of indices, or you can use SelectedItems, which returns a collection of selected items.  Given either collection, there’s no way to say “remove all of these.” There is a “Remove” command on the resulting collections, but if you use it, you’ll just be removing the item from *that* collection and not from the ListBox, which has its own collection of items – essentially, you’ll just deselect the items. 

So, what do you do?  At first, it looks like you have two options; you could try removing them from the master list based on item, or based on object.  Let’s look at removal based on the item:

        ' Bad -- don't do this

        For Each index As Integer In ListBox1.SelectedIndices

            ListBox1.Items.RemoveAt(index)

        Next

 

This won’t work.  The selected indices are in numerical order (no matter in which order you selected them), so you’ll be sliding indices on each deletion and will subsequently delete the wrong things.  You won’t crash if you overreach on an index, because the For…Each loop will check for Nothing when it does a MoveNext internally and will exit gracefully, but this is actually worse since it may take longer before you notice a problem!

This won’t work either, and will potentially crash:

        ' Bad -- don't do this

        For index As Integer = 0 To ListBox1.SelectedIndices.Count - 1

            ListBox1.Items.RemoveAt(ListBox1.SelectedIndices(index))

        Next

 

Again, the indices for the downstream objects will decrease, you’ll delete the wrong things, and you’ll likely crash. 

You can, however, iterate down instead, as long as you are confident that the selected indices are in numerical order (as they will be in this case):

        ' This works:

        For index As Integer = ListBox1.SelectedIndices.Count - 1 To 0 Step -1

            ListBox1.Items.RemoveAt(ListBox1.SelectedIndices(index))

        Next

 

Having found something that works via iteration, let’s now consider the enumeration angle instead.  In general, I prefer to enumerate lists whenever possible instead of iterating through them.  This is not one of those times, though.  Consider this code, which will fail:

        ' Bad -- don't do this

        For Each item In ListBox1.SelectedItems

            ListBox1.Items.Remove(item)

        Next

 

“Well, c’mon, why doesn’t this work?” I hear you cry.  “I don’t care about indices in this case, I’m not even using them; I’m just removing specific items.”  True, but if you run this code, you will indeed get an exception essentially stating that the collection has been modified and the list of selected items is no longer valid for enumeration.  We can’t work around this by assigning the collection of selected items to another variable and enumerating on that, since that just creates a reference to the same collection.  In order for it to work, you’ve have to get the references out of the collection itself so that the ListBox is no longer part of the equation on the removal loop’s assessment.

        ' Really, this is just silly -- don't ever do this

        Dim selItems As New Collection

        For Each item In ListBox1.SelectedItems

            selItems.Add(item)

        Next

        For Each item In selItems

            ListBox1.Items.Remove(item)

        Next

 

Functionally, this will work; however, you shouldn’t do it for two reasons:

·         First of all, it’s stupid – you have to loop through twice.  Occasionally, I do need to clone lists, like when creating an editable list from a read-only list – I’ve done that in a previous blogpost on this site, in fact.   But my motives are not so noble here; I’m using this for no other reason than allowing enumeration to work.

·         Second, each call to RemoveItem has a hidden cost – namely, the list has got to go find that item.  In this case, you can bet that there’s good hashing going on that doesn’t make that as painful as it might be, but do you really want to count on that, given the better numeric alternative above?

No, just there’s really no good way to do removal by enumeration that I’ve ever found.  And since each object is totally clueless about whether it itself is selected or not, you can’t even enumerate through the collection yourself and delete it if it’s marked as selected.

Of course, I’ve just talked about deletions, since they are more interesting than additions due to the crash potential.  However, additions can also mess you up in a loop due to similar index changes, and I’ve actually run into cases where a list may have both insertions and deletions done, so that I wasn’t even able to leverage the change in list size as a clue to figure out what had happened to the data.  The bottom line is, whenever doing an operation within a loop, you need to understand whether or not that operation will collide with your loop conditions and assumptions.  Generally, you control this in your code, but in multithreaded environments, where changes to an existing store of data might be made from a different thread “simultaneously” with your thread, you really need to understand how thread-safe your code is with respect to data changes.  (But that is a story for another time.)

‘Til next time,

  --Matt--*

Leave a Comment
  • Please add 6 and 8 and type the answer here:
  • Post
  • Hi Matt,

    Many thanks for your blog postings. It seems to be a rare treat to get serious discussion on programming in VB.Net on MSDN. Perhaps, I am looking in the wrong places, but I wish there were more postings like yours. In contrast, there are tons of interesting postings on C#.

    I was very interested in the following example of bad behavior that you offered:

           ' Bad -- don't do this

           For Each item In ListBox1.SelectedItems

               ListBox1.Items.Remove(item)

           Next

    My interest was piqued because you *can* do something like this with a Listview SelectedItem Collection. The following will work:

           For Each itm As ListViewItem In Listview1.SelectedItems

              itm.Remove()

           Next

    I stumbled upon this by accident--in exasperation at the time it takes to remove the items by iteration. From my VB6 days, I knew that it was not possible to remove collection items by enumeration. When I discovered that the preceding example worked, I thought that VB.Net had removed the obstacle. I now see the difference is that the ListViewItem has its own Remove method, whereas the ListBox item is an object without a remove method. By the way, in the case of the ListView, removal by enumeration is manifestly faster than removal by iteration. No stopwatch required.

  • Hi! i know it's is not the right post that i have to comment this. but i'm really having trouble and can't find a solution in google (i hope i've missed it). i developed a small database using vb6 and packed and installed it in my pc. but how would i backup my data if i want to reformat my hard drive or upgrade to a new one. Hope you could help. thanks a lot.

  • I've always preferred to avoid iteration and iteration when they aren't necessary: why not just do this?

    Do While ListBox1.SelectedIndices.Count>0

       ListBox1.Items.RemoveAt(ListBox1.SelectedIndices(0))

    End Do

  • @Mike:  Yes, that's correct.  For a ListView, it's totally okay to remove items by enumeration, and you won't get the exception.  (You can even use ListView1.Items.Remove.)  For ListBox, though, you can't.  Listviews handle their selected items somewhat differently than listboxes.

    @Arwin:  I'm going to have to refer you to the VB6 forums, I'm afraid -- this is a .NET forum.  You'll need to provide them with the details of your DB.

    @MarkJ:  Yep, that's another great way to do this iteratively.  You're taking advantage of the fact that the first selected item will always be in the zero index, the next will always slide down into it, and Count is reevaluated in the Do While loop.  Of course, you're reevaluating Count on each loop, which in this case is not a big deal but could be for some other contrived code.  In the For Next loop case, it's only evaluated once, and in both cases you have to do a comparison on each loop.

    --Matt--*

  • (And, for other readers trying this out, it would actually be "Loop" instead of "End Do," but that's a minor point.)

  • One more trivial point -- whether or not enumeration can proceed after a change to the collection depends on how IEnumerable is supported.  Per the MSDN topic on For...Each, typically you can't continue enumerating on a collection unless the code supporting IEnumerable is written specifically to allow it.  I suspect that's the difference between ListView and ListBox.

    --Matt--*

  • It is also helpful to remember that, in general, when dealing with deleting/adding items to listboxes or other controls, doing as little as possible in the control itself yields the greatest speed.  Copying/sorting/manipulating lists or arrays is quick (the system only needs to sling memory), but any time you use a control, you must deal with window handles, messages, etc., and it is much slower.  So although I never benchmarked it, I would recommend the following method as the fastest way to remove all selected items: 1) create an array as large as SelectedIndices.Count, 2) use SelectedIndices.CopyTo to copy it into the array, 3) use Array.Sort to sort the array (it is not actually documented that SelectedIndices is sorted, so it is maybe not a good idea to rely on that behavior for all future versions of .NET), and finally, iterate the array backwards, deleting the items using the RemoveAt.  For ultimate speed, use BeginUpdate and EndUpdate to bracket the removals.  In short, you are using the list as little as possible, and depending on nothing, making your code safest and fastest.

  • That's an good point you bring up, Claudio.  You're right in that any time you modify a control, you will be sending out various "changed" messages whenever items are added, deleted, or selected (and other reasons).  On the other hand, you are now (as you note) using more memory by using alternative structures (as well as complicating the readability of the code, which may or may not be an issue).  I'd love to see some benchmarks on that for very large arrays of data.  Of course, if you already have the structure as an array, and if CopyTo is supported by the control, then there's no question that your solution would be ideal, though I forget whether CopyTo will delay message firing until all elements have been added.

  • This remembers me of an application I wrote a long time ago, when i had this kind of problem and came to a really bad idea:

    BeforeLoop:

     For  Each item in lst.Items

        If ShouldBeDeleted(item) Then

           item.Remove

           GoTo(BeforeLoop)

        End If

     Next

    Never ever use this GoTo-solution, its the silliest solution possible

  • Wow, I just read that, and I think I hurt something. :-)  Thanks for the story!

Page 1 of 1 (10 items)