LINQ Cookbook, Recipe 12: Calculate the Standard Deviation (Doug Rothaus)

Published 18 December 07 03:09 PM

Ingredients:

·         Visual Studio 2008 (Beta2 or Higher)

 

Categories: LINQ to Objects

 

Introduction:

LINQ Cookbook, Recipe 11 showed how you can use LINQ queries to perform calculations on sets of data using a set of standard aggregate functions such as Average, and Sum. In this recipe, you will learn how to add an extension method so that you can include your own custom aggregate function in a LINQ query.

This recipe adds two extension methods: StDev (standard deviation) and StDevP (standard deviation for the entire population). Because the extension methods are added to the IEnumerable(Of T) type, you can use the custom aggregate functions in the Into clause of an Aggregate, Group By, or Group Join query clause. Notice that there are two overloads of each extension method: one that takes input values of type IEnumerable(Of Double), and another that takes input values of type IEnumerable(Of T). This enables you to call the custom aggregate functions whether your LINQ query returns a collection of type Double, or any other numeric type. The overload that takes input values of type IEnumerable(Of T) uses the Func(Of T, Double) lambda expression to project a the numeric values as the corresponding values of type Double before calculating the standard deviation. When calculating the standard deviation for values of type Double, you can simply call the StDev() or StDevP() overloads. When calculating the standard deviation for values of numeric types other than Double, you need to pass the value to the StDev(value) or StDevP(value) overloads to ensure that the value is projected as type Double.

Instructions:

·         Create a Console Application.

·         After the End Module statement of the default Module1 module, add the following class, which contains both the StDev and StDevP functions.

Class StatisticalFunctions

 

    Public Shared Function StDev(ByVal values As Double()) As Double

        Return CalculateStDev(values, False)

    End Function

 

    Public Shared Function StDevP(ByVal values As Double()) As Double

        Return CalculateStDev(values, True)

    End Function

 

    Private Shared Function CalculateStDev(ByVal values As Double(), _

                                           ByVal entirePopulation As Boolean) As Double

        Dim count As Integer = 0

        Dim var As Double = 0

        Dim prec As Double = 0

        Dim dSum As Double = 0

        Dim sqrSum As Double = 0

 

        Dim adjustment As Integer = 1

 

        If entirePopulation Then adjustment = 0

 

        For Each val As Double In values

            dSum += val

            sqrSum += val * val

            count += 1

        Next

 

        If count > 1 Then

            var = count * sqrSum - (dSum * dSum)

            prec = var / (dSum * dSum)

 

            ' Double is only guaranteed for 15 digits. A difference

            ' with a result less than 0.000000000000001 will be considered zero.

            If prec < 0.000000000000001 OrElse var < 0 Then

                var = 0

            Else

                var = var / (count * (count - adjustment))

            End If

 

            Return Math.Sqrt(var)

        End If

 

        Return Nothing

    End Function

 

End Class

 

·         After the StatisticalFunctions class, add the following module to add the extension methods to IEnumerable to calculate the standard deviation for both IEnumerable(Of Double) and IEnumerable(Of T).

Module StatisticalAggregates

 

    ' Calculate the stdev value for a collection of type Double.

    <Extension()> _

    Function StDev(ByVal stDevAggregate As IEnumerable(Of Double)) As Double

        Return StatisticalFunctions.StDev(stDevAggregate.ToArray())

    End Function

 

    ' Project the collection of generic items as type Double and calculate the stdev value.

    <Extension()> _

    Function StDev(Of T)(ByVal stDevAggregate As IEnumerable(Of T), _

                         ByVal selector As Func(Of T, Double)) As Double

        Dim values = (From element In stDevAggregate Select selector(element)).ToArray()

        Return StatisticalFunctions.StDev(values)

    End Function

 

    ' Calculate the stdevp value for a collection of type Double.

    <Extension()> _

    Function StDevP(ByVal stDevAggregate As IEnumerable(Of Double)) As Double

        Return StatisticalFunctions.StDevP(stDevAggregate.ToArray())

    End Function

 

    ' Project the collection of generic items as type Double and calculate the stdevp value.

    <Extension()> _

    Function StDevP(Of T)(ByVal stDevAggregate As IEnumerable(Of T), _

                         ByVal selector As Func(Of T, Double)) As Double

        Dim values = (From element In stDevAggregate Select selector(element)).ToArray()

        Return StatisticalFunctions.StDevP(values)

    End Function

 

End Module

 

·         In Sub Main of the default Module1 module, add the following code to calculate and display some sample standard deviation values:

    Sub Main()

        Dim numbers1 = New Double() {5.4, 2.3, 8.9, 9.456}

        Dim numbers2 = New Integer() {12, 0, 3, 6, 8, 9}

 

        Dim q1 = Aggregate num In numbers1 Into StDev()

        Dim q2 = Aggregate num In numbers2 Into StDev(num)

        Dim q3 = Aggregate num In numbers1 Into StDevP()

        Dim q4 = Aggregate num In numbers2 Into StDevP(num)

 

        Console.WriteLine(q1)

        Console.WriteLine(q2)

        Console.WriteLine(q3)

        Console.WriteLine(q4)

    End Sub

 

Press F5 to see the code run.

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

# chakrit said on December 19, 2007 2:24 PM:

i've just found this tutorial and this blog.. thanks for taking the time to write all these tutorials.. make for a real nice reference for a hardcore vb-ers like me ^^"

p.s.

would you please *not* wrap the code lines? I'd love that it'd be in a box with a scrollbar instead makes for easier reading..  on this blog that is

# Csaba said on December 21, 2007 7:47 AM:

Hi,

Is the calculation

var = var / (count * (count - adjustment))

correct? Should it not be two identical "count".

As I was looking at the code I rewrote the function. I don't have VB2008, I can't check if this is correct, but for me, it seems to be a lot shorter and clearer this way

Regards

Csaba

Private Shared Function CalculateStDev(ByVal values As Double(), _

ByVal entirePopulation As Boolean) As Double

      if values.Length <1 then Return 0

       Dim dSum As Double = 0

       Dim sqrSum As Double = 0

       For Each value As Double In values

           dSum += value

           sqrSum += value * value

       Next

      Dim var As Double = values.count * sqrSum - (dSum * dSum)

      ' Double is only guaranteed for > Double.minvalue. A difference

     ' with a result less than Double.minvalue will be considered zero.

     If var < 0 OrElse var / (dSum * dSum)  < Double.minvalue   Then  return 0

     Dim count as int32 = values.Length  -  if(entirePopulation, 0, 1)

    Return Math.Sqrt( var / (count * count)

End Function

# Csaba said on December 22, 2007 5:35 AM:

  Hi,

Me bad!!! My previous code axample has many errors! At least the example works in VB2005.

Sorry for teh trouble!

Regards

Csaba

Function CalculateStDev(ByVal values As Double(), ByVal entirePopulation As Boolean) As Double

     If values.Length < 1 Then Return 0

     Dim dSum As Double = 0

     Dim sqrSum As Double = 0

     For Each value As Double In values

        dSum += value

        sqrSum += value * value

     Next

     Dim preVariance As Double = sqrSum - (dSum * dSum) / values.Length

     ' Double is only guaranteed for > Double.Epsilon. A difference

     ' with a preVariance less than Double.Epsilon will be considered zero.

     ' (Sqrt of a small numbe os much larger

     If preVariance < 0 OrElse preVariance < 10 * Double.Epsilon Then Return 0

     Return Math.Sqrt(preVariance / (values.Length - CInt(IIf(entirePopulation, 0, 1))))

  End Function

Leave a Comment

(required) 
(optional)
(required) 

This Blog

Syndication

Page view tracker