LINQ Cookbook, Recipe 12: Calculate the Standard Deviation (Doug Rothaus)

LINQ Cookbook, Recipe 12: Calculate the Standard Deviation (Doug Rothaus)

  • Comments 6

Ingredients:

·         Visual Studio 2008 (Beta2 or Higher)

 

Categories: LINQ to Objects

 

Introduction:

LINQ Cookbook, Recipe 11 showed how you can use LINQ queries to perform calculations on sets of data using a set of standard aggregate functions such as Average, and Sum. In this recipe, you will learn how to add an extension method so that you can include your own custom aggregate function in a LINQ query.

This recipe adds two extension methods: StDev (standard deviation) and StDevP (standard deviation for the entire population). Because the extension methods are added to the IEnumerable(Of T) type, you can use the custom aggregate functions in the Into clause of an Aggregate, Group By, or Group Join query clause. Notice that there are two overloads of each extension method: one that takes input values of type IEnumerable(Of Double), and another that takes input values of type IEnumerable(Of T). This enables you to call the custom aggregate functions whether your LINQ query returns a collection of type Double, or any other numeric type. The overload that takes input values of type IEnumerable(Of T) uses the Func(Of T, Double) lambda expression to project a the numeric values as the corresponding values of type Double before calculating the standard deviation. When calculating the standard deviation for values of type Double, you can simply call the StDev() or StDevP() overloads. When calculating the standard deviation for values of numeric types other than Double, you need to pass the value to the StDev(value) or StDevP(value) overloads to ensure that the value is projected as type Double.

Instructions:

·         Create a Console Application.

·         After the End Module statement of the default Module1 module, add the following class, which contains both the StDev and StDevP functions.

Class StatisticalFunctions

 

    Public Shared Function StDev(ByVal values As Double()) As Double

        Return CalculateStDev(values, False)

    End Function

 

    Public Shared Function StDevP(ByVal values As Double()) As Double

        Return CalculateStDev(values, True)

    End Function

 

    Private Shared Function CalculateStDev(ByVal values As Double(), _

                                           ByVal entirePopulation As Boolean) As Double

        Dim count As Integer = 0

        Dim var As Double = 0

        Dim prec As Double = 0

        Dim dSum As Double = 0

        Dim sqrSum As Double = 0

 

        Dim adjustment As Integer = 1

 

        If entirePopulation Then adjustment = 0

 

        For Each val As Double In values

            dSum += val

            sqrSum += val * val

            count += 1

        Next

 

        If count > 1 Then

            var = count * sqrSum - (dSum * dSum)

            prec = var / (dSum * dSum)

 

            ' Double is only guaranteed for 15 digits. A difference

            ' with a result less than 0.000000000000001 will be considered zero.

            If prec < 0.000000000000001 OrElse var < 0 Then

                var = 0

            Else

                var = var / (count * (count - adjustment))

            End If

 

            Return Math.Sqrt(var)

        End If

 

        Return Nothing

    End Function

 

End Class

 

·         After the StatisticalFunctions class, add the following module to add the extension methods to IEnumerable to calculate the standard deviation for both IEnumerable(Of Double) and IEnumerable(Of T).

Module StatisticalAggregates

 

    ' Calculate the stdev value for a collection of type Double.

    <Extension()> _

    Function StDev(ByVal stDevAggregate As IEnumerable(Of Double)) As Double

        Return StatisticalFunctions.StDev(stDevAggregate.ToArray())

    End Function

 

    ' Project the collection of generic items as type Double and calculate the stdev value.

    <Extension()> _

    Function StDev(Of T)(ByVal stDevAggregate As IEnumerable(Of T), _

                         ByVal selector As Func(Of T, Double)) As Double

        Dim values = (From element In stDevAggregate Select selector(element)).ToArray()

        Return StatisticalFunctions.StDev(values)

    End Function

 

    ' Calculate the stdevp value for a collection of type Double.

    <Extension()> _

    Function StDevP(ByVal stDevAggregate As IEnumerable(Of Double)) As Double

        Return StatisticalFunctions.StDevP(stDevAggregate.ToArray())

    End Function

 

    ' Project the collection of generic items as type Double and calculate the stdevp value.

    <Extension()> _

    Function StDevP(Of T)(ByVal stDevAggregate As IEnumerable(Of T), _

                         ByVal selector As Func(Of T, Double)) As Double

        Dim values = (From element In stDevAggregate Select selector(element)).ToArray()

        Return StatisticalFunctions.StDevP(values)

    End Function

 

End Module

 

·         In Sub Main of the default Module1 module, add the following code to calculate and display some sample standard deviation values:

    Sub Main()

        Dim numbers1 = New Double() {5.4, 2.3, 8.9, 9.456}

        Dim numbers2 = New Integer() {12, 0, 3, 6, 8, 9}

 

        Dim q1 = Aggregate num In numbers1 Into StDev()

        Dim q2 = Aggregate num In numbers2 Into StDev(num)

        Dim q3 = Aggregate num In numbers1 Into StDevP()

        Dim q4 = Aggregate num In numbers2 Into StDevP(num)

 

        Console.WriteLine(q1)

        Console.WriteLine(q2)

        Console.WriteLine(q3)

        Console.WriteLine(q4)

    End Sub

 

Press F5 to see the code run.

Leave a Comment
  • Please add 2 and 8 and type the answer here:
  • Post
  • i've just found this tutorial and this blog.. thanks for taking the time to write all these tutorials.. make for a real nice reference for a hardcore vb-ers like me ^^"

    p.s.

    would you please *not* wrap the code lines? I'd love that it'd be in a box with a scrollbar instead makes for easier reading..  on this blog that is

  • Hi,

    Is the calculation

    var = var / (count * (count - adjustment))

    correct? Should it not be two identical "count".

    As I was looking at the code I rewrote the function. I don't have VB2008, I can't check if this is correct, but for me, it seems to be a lot shorter and clearer this way

    Regards

    Csaba

    Private Shared Function CalculateStDev(ByVal values As Double(), _

    ByVal entirePopulation As Boolean) As Double

          if values.Length <1 then Return 0

           Dim dSum As Double = 0

           Dim sqrSum As Double = 0

           For Each value As Double In values

               dSum += value

               sqrSum += value * value

           Next

          Dim var As Double = values.count * sqrSum - (dSum * dSum)

          ' Double is only guaranteed for > Double.minvalue. A difference

         ' with a result less than Double.minvalue will be considered zero.

         If var < 0 OrElse var / (dSum * dSum)  < Double.minvalue   Then  return 0

         Dim count as int32 = values.Length  -  if(entirePopulation, 0, 1)

        Return Math.Sqrt( var / (count * count)

    End Function

  •   Hi,

    Me bad!!! My previous code axample has many errors! At least the example works in VB2005.

    Sorry for teh trouble!

    Regards

    Csaba

    Function CalculateStDev(ByVal values As Double(), ByVal entirePopulation As Boolean) As Double

         If values.Length < 1 Then Return 0

         Dim dSum As Double = 0

         Dim sqrSum As Double = 0

         For Each value As Double In values

            dSum += value

            sqrSum += value * value

         Next

         Dim preVariance As Double = sqrSum - (dSum * dSum) / values.Length

         ' Double is only guaranteed for > Double.Epsilon. A difference

         ' with a preVariance less than Double.Epsilon will be considered zero.

         ' (Sqrt of a small numbe os much larger

         If preVariance < 0 OrElse preVariance < 10 * Double.Epsilon Then Return 0

         Return Math.Sqrt(preVariance / (values.Length - CInt(IIf(entirePopulation, 0, 1))))

      End Function

  • Why can't I use your code in the following code:

           Dim CompanySet = From Company As HKNBIndex43 In SelectMarket._HKNBIndex43Set _

                            Where Company.Date >= StartDate And Company.Date <= EndDate _

                            Select Company _

                            Group By Name = Company.Name _

                            Into Count(), Average(Company.Close), StDev(Company.Close)

  • I change my code from

           Dim CompanySet = From Company As HKNBIndex43 In SelectMarket._HKNBIndex43Set _

    to

           Dim CompanySet = From Company As HKNBIndex43 In SelectMarket._HKNBIndex43Set.AsEnumerable _

    And I change my code from

           Me.DataGridView1.DataSource = CompanySet

    to

           Me.DataGridView1.DataSource = CompanySet.ToList

    Then it can run.

  • Great!!!!!!!!!!!!!

    Thank u so much :)

    ladyblue555@hotmail.com

Page 1 of 1 (6 items)