A group blog from members of the VB team
(Note: there was a grievous error in this post based on a bad assumption on my art regarding VB. Not feeling the need to hide my ignorance :-), I have instead made a number of edits in this post to bring it back to some semblance of reality.)
One thing that gets me annoyed with myself is realizing that the product or service I’ve just bought has some hidden costs that I didn’t anticipate. It might be as complicated as realizing that my plane ticket has all sorts of byzantine surcharges and luggage costs, or as simple as making budgeting for a new killer gaming PC and forgetting about the sales tax – it seems that there are always extra costs out there that will come back to bite you. Being (as I have said in past postings to this blog) an inherently lazy person, and being fully caught up in our “I must have the sparkly thing right now” culture, I’m apt to overlook the fine print in any purchase agreement unless I really force myself to slow down and pay attention to the details.
One of the times I really have to pay attention is when I’m reviewing code here at Microsoft. As a member of the release team that makes the final decisions as to what last-minute fixes are safe enough to be applied to an imminent release, it’s my job to take a look at the proposed fix and make sure that it doesn’t introduce more problems than it purports to solve – i.e., that it doesn’t add hidden costs.
On the rare occasion that I find a problem in someone’s code, it’s usually something that’s obviously dysfunctional – a case where a variable isn’t initialized properly, or where a different team has simultaneously made a change that will render this change dangerous. But some of the problems are more subtle, and might not be obvious if only minimal testing and review has been done. Consider the following code (very similar to a case I reviewed here at work the other day):
Public MyList As New List(Of String)
Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
Dim s As New System.Text.StringBuilder
For i As Integer = 0 To MyList.Count - 1
Edit: This is an error in my blog post – I hadn’t realized that VB, unlike many other languages, only gets the value of the stop condition once (thanks to Phil Wells for pointing this out). To make my point, I would have to be using code like this:
Dim i As Integer = 0
Do While i < MyList.Count() - 1
i += 1
The method Count() might still have a hidden cost, but in the For Loop case, I’m at least only hitting it once. The fact that VB only analyzes the stop condition once leads to a different issue to be aware of that I’ll cover in my next blog.
Seems pretty simple, yes? When a button is clicked, the code iterates through the strings in a list and appends them to a StringBuilder, creating (for whatever reason) one long string containing all of the contents. And, to cursory appearances, it will function correctly.
However, the code has hidden costs. You see, written this way, the value MyList.Count is going to be reevaluated on every iteration of the “For” loop, even though the value itself isn’t changing. Furthermore, my indexing into MyList, which is always with reference to the beginning of the list, will also have a non-zero impact. This impacts the performance of the code and will make my customers unhappy over the sloth-like operation of my application.
In this case, the cost isn’t really high, as Count() is a property wrapping something that’s stored as a private integer member, and index referencing also has tricks to speed it up, but there’s still a cost and it can add up. It’s even worse if the Count() call has to be calculated! Consider this really awful code where I’ve created my own “List” type:
Public Class MyLameObject
' For some unknown reason, I'm creating my own linked list instead of using a List(Of) generic.
' Please don't do this for real!
Public data As String
Public NextMLO As MyLameObject
Public Sub Insert(ByVal mlo As MyLameObject)
If NextMLO IsNot Nothing Then
NextMLO = mlo
mlo.NextMLO = NextMLO.NextMLO
NextMLO.NextMLO = mlo
Public Function Count() As Integer
Dim ct As Integer = 0
Dim current As MyLameObject = Me
Do While current IsNot Nothing
ct += 1
current = current.NextMLO
Public Function Item(ByVal index As Integer) As MyLameObject
If index >= Count() OrElse index < 0 Then Return Nothing
For i As Integer = 0 To index - 1
Public MyLameObjectHeader As New MyLameObject
Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) _
For i As Integer = 0 To MyLameObjectHeader.Count - 1
Perhaps as a school assignment, I’ve created a singly-linked list which contains objects of type “MyLameObject” (instead of using optimized collections that we provide). I’ve also created a member function called “Count” which returns the number of objects in my list (starting from the current node) and another member function called “Item” which returns me a reference to a specific object in the list, as indexed from the current node. The code will compile and it will “work.” But, oh! It has so many hidden costs, and if the definition of MyLameObject exists in an assembly for which I don’t have the source code, I might not even realize it!
First, let’s look at button click handler. It’s calling Count() on every single iteration, and there will be a number of iterations equal to the value returned from Count(). Count(), in turn, walks the entire list every time it’s called. So, that means that if my list contains 1000 items, that means that I’ll be referencing a MyLameObject one million times! (This is what they refer to as an “O(n2) problem” in computer science classes.)
But wait, it gets worse: the code for Item() calls Count() also on each iteration to verify that the index is not out of bounds – that brings us up to a count of 1001000 MyLameObject references.
Edit: As noted above, that isn’t true for VB, though it would be for many other languages
And, of course, Item() itself needs to walk the list to actual get to the desired item, on every iteration – that adds 500500 more object hits (using the n(n+1)/2 formula) , for a grand total of 1501500 1500500 object hits. And all that just to concatenate 1000 strings! That, I fear, is going to slow down my code’s performance somewhat.
Now, of course I could add code to get track of values for Count(), and that would reduce my cost to a third of what it was. I could use some clever hashing algorithm to more easily get to the items while still being able to use method calls. I could enumerate the list myself instead of relying on “Count” or on “Item,” and that would really cut my costs down dramatically. Or, best of all, I could avoid reinventing the wheel and using the classes that VB.NET already supplies for me, since we do our best to optimize them.
But that’s not the point that I’m trying to make here. The point is, whenever you call a function, you are at its mercy performance-wise – you are stuck with whatever hidden costs it contains. You might not even be able to determine what these hidden costs are if you don’t own the code you’re calling, except by experimentation. And if you call that function iteratively, you’re just compounding the problem.
Now, you can’t avoid making calls into code you don’t own – at some level, we all use libraries or assemblies that are handed to us. But you can be smart about how you use them. Consider *this* code:
Dim count As Integer = MyList.Count
For i As Integer = 0 To count - 1
Now, I’m only calling Count() once, so I minimize the impact of its cost! Or,
Edit: As above, this doesn’t actually make a difference in VB. Of course, you still have the cost of “Count(),” whatever it may be. In this case, it’s just a property read; if it’s some home-grown collection or data repository, Count() (or it’s analog) could be a lot more costly.
Consider this even better example that you can use on collections that support IEnumerable (as all System.Collections types do):
For Each data As String In MyList
Yeah! Now I’m firing on all cylinders! Now I don’t have to worry about indexing costs either – I’m just walking the list, collecting my strings in the fastest way possible. Really, the only cost I have to deal with is the cost of actually appending the data (which is in fact why I’m using the highly-optimized StringBuilder class instead of just concatenating strings using a much more costly s &= data line).
So, rule of thumb: whenever calling any method, expect hidden costs (magnified if you’re in a loop) and work to minimize them. Call the methods outside of the loop if they are providing data that will remain the same throughout the loop, such as a count, and avoid them altogether if there’s an alternative way to iterate through the data that doesn’t implicitly involve resetting your position. Your customers will appreciate the time savings!
‘Til next time,
this type of quality information Thank you for offering us the debt we know