Hidden Costs (Matt Gertz)

Hidden Costs (Matt Gertz)

  • Comments 16

(Note: there was a grievous error in this post based on a bad assumption on my art regarding VB.  Not feeling the need to hide my ignorance :-), I have instead made a number of edits in this post to bring it back to some semblance of reality.)

One thing that gets me annoyed with myself is realizing that the product or service I’ve just bought has some hidden costs that I didn’t anticipate.  It might be as complicated as realizing that my plane ticket has all sorts of byzantine surcharges and luggage costs, or as simple as making budgeting for a new killer gaming PC and forgetting about the sales tax – it seems that there are always extra costs out there that will come back to bite you.  Being (as I have said in past postings to this blog) an inherently lazy person, and being fully caught up in our “I must have the sparkly thing right now” culture, I’m apt to overlook the fine print in any purchase agreement unless I really force myself to slow down and pay attention to the details. 

One of the times I really have to pay attention is when I’m reviewing code here at Microsoft.  As a member of the release team that makes the final decisions as to what last-minute fixes are safe enough to be applied to an imminent release, it’s my job to take a look at the proposed fix and make sure that it doesn’t introduce more problems than it purports to solve – i.e., that it doesn’t add hidden costs.

On the rare occasion that I find a problem in someone’s code, it’s usually something that’s obviously dysfunctional – a case where a variable isn’t initialized properly, or where a different team has simultaneously made a change that will render this change dangerous.  But some of the problems are more subtle, and might not be obvious if only minimal testing and review has been done.  Consider the following code (very similar to a case I reviewed here at work the other day):

    Public MyList As New List(Of String)

 

    Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click

        Dim s As New System.Text.StringBuilder

        For i As Integer = 0 To MyList.Count - 1

            s.Append(MyList(i))

        Next

    End Sub

 

Edit:  This is an error in my blog post  – I hadn’t realized that VB, unlike many other languages, only gets the value of the stop condition once (thanks to Phil Wells for pointing this out).  To make my point, I would have to be using code like this:

        Dim s As New System.Text.StringBuilder

        Dim i As Integer = 0

        Do While i < MyList.Count() - 1

            s.Append(MyList(i))

            i += 1

        Loop

 

The method Count() might still have a hidden cost, but in the For Loop case, I’m at least only hitting it once.  The fact that VB only analyzes the stop condition once leads to a different issue to be aware of that I’ll cover in my next blog.

 

Seems pretty simple, yes?  When a button is clicked, the code iterates through the strings in a list and appends them to a StringBuilder, creating (for whatever reason) one long string containing all of the contents.  And, to cursory appearances, it will function correctly.

However, the code has hidden costs.  You see, written this way, the value MyList.Count is going to be reevaluated on every iteration of the “For” loop, even though the value itself isn’t changing.  Furthermore, my indexing into MyList, which is always with reference to the beginning of the list, will also have a non-zero impact.  This impacts the performance of the code and will make my customers unhappy over the sloth-like operation of my application. 

In this case, the cost isn’t really high, as Count() is a property wrapping something that’s stored as a private integer member, and index referencing also has tricks to speed it up, but there’s still a cost and it can add up.  It’s even worse if the Count() call has to be calculated!  Consider this really awful code where I’ve created my own “List” type:

    Public Class MyLameObject

        ' For some unknown reason, I'm creating my own linked list instead of using a List(Of) generic.

        ' Please don't do this for real!

        Public data As String

        Public NextMLO As MyLameObject

        Public Sub Insert(ByVal mlo As MyLameObject)

            If NextMLO IsNot Nothing Then

                NextMLO = mlo

            Else

                mlo.NextMLO = NextMLO.NextMLO

                NextMLO.NextMLO = mlo

            End If

        End Sub

 

        Public Function Count() As Integer

            Dim ct As Integer = 0

            Dim current As MyLameObject = Me

            Do While current IsNot Nothing

                ct += 1

                current = current.NextMLO

            Loop

            Return ct

        End Function

 

        Public Function Item(ByVal index As Integer) As MyLameObject

            If index >= Count() OrElse index < 0 Then Return Nothing

            Dim current As MyLameObject = Me

            For i As Integer = 0 To index - 1

                current = current.NextMLO

            Next

            Return current

        End Function

    End Class

 

 

    Public MyLameObjectHeader As New MyLameObject

 

    Public MyLameObjectHeader As New MyLameObject

    Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) _

Handles Button1.Click

        Dim s As New System.Text.StringBuilder

        For i As Integer = 0 To MyLameObjectHeader.Count - 1

            s.Append(MyLameObjectHeader.Item(i))

        Next

    End Sub

 

Perhaps as a school assignment, I’ve created a singly-linked list which contains objects of type “MyLameObject” (instead of using optimized collections that we provide).  I’ve also created a member function called “Count” which returns the number of objects in my list (starting from the current node) and another member function called “Item” which returns me a reference to a specific object in the list, as indexed from the current node.  The code will compile and it will “work.”  But, oh!   It has so many hidden costs, and if the definition of MyLameObject exists in an assembly for which I don’t have the source code, I might not even realize it!

First, let’s look at button click handler.  It’s calling Count() on every single iteration, and there will be a number of iterations equal to the value returned from Count().  Count(), in turn, walks the entire list every time it’s called.  So, that means that if my list contains 1000 items, that means that I’ll be referencing a MyLameObject one million times!  (This is what they refer to as an “O(n2) problem” in computer science classes.)

But wait, it gets worse:  the code for Item() calls Count() also on each iteration to verify that the index is not out of bounds – that brings us up to a count of 1001000 MyLameObject references. 

Edit:  As noted above, that isn’t true for VB, though it would be for many other languages

And, of course, Item() itself needs to walk the list to actual get to the desired item, on every iteration – that adds 500500 more object hits (using the n(n+1)/2 formula) , for a grand total of 1501500 1500500 object hits. And all that just to concatenate 1000 strings!  That, I fear, is going to slow down my code’s performance somewhat.

Now, of course I could add code to get track of values for Count(), and that would reduce my cost to a third of what it was.  I could use some clever hashing algorithm to more easily get to the items while still being able to use method calls.  I could enumerate the list myself instead of relying on “Count” or on “Item,” and that would really cut my costs down dramatically.  Or, best of all, I could avoid reinventing the wheel and using the classes that VB.NET already supplies for me, since we do our best to optimize them. 

But that’s not the point that I’m trying to make here.  The point is, whenever you call a function, you are at its mercy performance-wise – you are stuck with whatever hidden costs it contains.  You might not even be able to determine what these hidden costs are if you don’t own the code you’re calling, except by experimentation.  And if you call that function iteratively, you’re just compounding the problem.

Now, you can’t avoid making calls into code you don’t own – at some level, we all use libraries or assemblies that are handed to us.  But you can be smart about how you use them.  Consider *this* code:

        Dim s As New System.Text.StringBuilder

        Dim count As Integer = MyList.Count

        For i As Integer = 0 To count - 1

            s.Append(MyList(i))

        Next

 

Now, I’m only calling Count() once, so I minimize the impact of its cost!   Or,

Edit:  As above, this doesn’t actually make a difference in VB.  Of course, you still have the cost of “Count(),” whatever it may be.  In this case, it’s just a property read; if it’s some home-grown collection or data repository, Count() (or it’s analog) could be a lot more costly.

Consider this even better example that you can use on collections that support IEnumerable (as all System.Collections types do):

        Dim s As New System.Text.StringBuilder

        For Each data As String In MyList

            s.Append(data)

        Next

 

Yeah!  Now I’m firing on all cylinders!  Now I don’t have to worry about indexing costs either – I’m just walking the list, collecting my strings in the fastest way possible.  Really, the only cost I have to deal with is the cost of actually appending the data (which is in fact why I’m using the highly-optimized StringBuilder class instead of just concatenating strings using a much more costly s &= data line).

So, rule of thumb: whenever calling any method, expect hidden costs (magnified if you’re in a loop) and work to minimize them.  Call the methods outside of the loop if they are providing data that will remain the same throughout the loop, such as a count, and avoid them altogether if there’s an alternative way to iterate through the data that doesn’t implicitly involve resetting your position.  Your customers will appreciate the time savings!

‘Til next time,

  --Matt--*

 

Leave a Comment
  • Please add 2 and 7 and type the answer here:
  • Post
  • this type of quality information Thank you for offering us the debt we know

Page 2 of 2 (16 items) 12