# LINQ Cookbook, Recipe 12: Calculate the Standard Deviation (Doug Rothaus)

### LINQ Cookbook, Recipe 12: Calculate the Standard Deviation (Doug Rothaus)

• Comments 6

Ingredients:

·         Visual Studio 2008 (Beta2 or Higher)

Categories: LINQ to Objects

Introduction:

LINQ Cookbook, Recipe 11 showed how you can use LINQ queries to perform calculations on sets of data using a set of standard aggregate functions such as Average, and Sum. In this recipe, you will learn how to add an extension method so that you can include your own custom aggregate function in a LINQ query.

This recipe adds two extension methods: StDev (standard deviation) and StDevP (standard deviation for the entire population). Because the extension methods are added to the IEnumerable(Of T) type, you can use the custom aggregate functions in the Into clause of an Aggregate, Group By, or Group Join query clause. Notice that there are two overloads of each extension method: one that takes input values of type IEnumerable(Of Double), and another that takes input values of type IEnumerable(Of T). This enables you to call the custom aggregate functions whether your LINQ query returns a collection of type Double, or any other numeric type. The overload that takes input values of type IEnumerable(Of T) uses the Func(Of T, Double) lambda expression to project a the numeric values as the corresponding values of type Double before calculating the standard deviation. When calculating the standard deviation for values of type Double, you can simply call the StDev() or StDevP() overloads. When calculating the standard deviation for values of numeric types other than Double, you need to pass the value to the StDev(value) or StDevP(value) overloads to ensure that the value is projected as type Double.

Instructions:

·         Create a Console Application.

·         After the End Module statement of the default Module1 module, add the following class, which contains both the StDev and StDevP functions.

Class StatisticalFunctions

Public Shared Function StDev(ByVal values As Double()) As Double

Return CalculateStDev(values, False)

End Function

Public Shared Function StDevP(ByVal values As Double()) As Double

Return CalculateStDev(values, True)

End Function

Private Shared Function CalculateStDev(ByVal values As Double(), _

ByVal entirePopulation As Boolean) As Double

Dim count As Integer = 0

Dim var As Double = 0

Dim prec As Double = 0

Dim dSum As Double = 0

Dim sqrSum As Double = 0

Dim adjustment As Integer = 1

If entirePopulation Then adjustment = 0

For Each val As Double In values

dSum += val

sqrSum += val * val

count += 1

Next

If count > 1 Then

var = count * sqrSum - (dSum * dSum)

prec = var / (dSum * dSum)

' Double is only guaranteed for 15 digits. A difference

' with a result less than 0.000000000000001 will be considered zero.

If prec < 0.000000000000001 OrElse var < 0 Then

var = 0

Else

var = var / (count * (count - adjustment))

End If

Return Math.Sqrt(var)

End If

Return Nothing

End Function

End Class

·         After the StatisticalFunctions class, add the following module to add the extension methods to IEnumerable to calculate the standard deviation for both IEnumerable(Of Double) and IEnumerable(Of T).

Module StatisticalAggregates

' Calculate the stdev value for a collection of type Double.

<Extension()> _

Function StDev(ByVal stDevAggregate As IEnumerable(Of Double)) As Double

Return StatisticalFunctions.StDev(stDevAggregate.ToArray())

End Function

' Project the collection of generic items as type Double and calculate the stdev value.

<Extension()> _

Function StDev(Of T)(ByVal stDevAggregate As IEnumerable(Of T), _

ByVal selector As Func(Of T, Double)) As Double

Dim values = (From element In stDevAggregate Select selector(element)).ToArray()

Return StatisticalFunctions.StDev(values)

End Function

' Calculate the stdevp value for a collection of type Double.

<Extension()> _

Function StDevP(ByVal stDevAggregate As IEnumerable(Of Double)) As Double

Return StatisticalFunctions.StDevP(stDevAggregate.ToArray())

End Function

' Project the collection of generic items as type Double and calculate the stdevp value.

<Extension()> _

Function StDevP(Of T)(ByVal stDevAggregate As IEnumerable(Of T), _

ByVal selector As Func(Of T, Double)) As Double

Dim values = (From element In stDevAggregate Select selector(element)).ToArray()

Return StatisticalFunctions.StDevP(values)

End Function

End Module

·         In Sub Main of the default Module1 module, add the following code to calculate and display some sample standard deviation values:

Sub Main()

Dim numbers1 = New Double() {5.4, 2.3, 8.9, 9.456}

Dim numbers2 = New Integer() {12, 0, 3, 6, 8, 9}

Dim q1 = Aggregate num In numbers1 Into StDev()

Dim q2 = Aggregate num In numbers2 Into StDev(num)

Dim q3 = Aggregate num In numbers1 Into StDevP()

Dim q4 = Aggregate num In numbers2 Into StDevP(num)

Console.WriteLine(q1)

Console.WriteLine(q2)

Console.WriteLine(q3)

Console.WriteLine(q4)

End Sub

Press F5 to see the code run.

Leave a Comment
• Please add 8 and 6 and type the answer here:
• Post
• i've just found this tutorial and this blog.. thanks for taking the time to write all these tutorials.. make for a real nice reference for a hardcore vb-ers like me ^^"

p.s.

would you please *not* wrap the code lines? I'd love that it'd be in a box with a scrollbar instead makes for easier reading..  on this blog that is

• Hi,

Is the calculation

var = var / (count * (count - adjustment))

correct? Should it not be two identical "count".

As I was looking at the code I rewrote the function. I don't have VB2008, I can't check if this is correct, but for me, it seems to be a lot shorter and clearer this way

Regards

Csaba

Private Shared Function CalculateStDev(ByVal values As Double(), _

ByVal entirePopulation As Boolean) As Double

if values.Length <1 then Return 0

Dim dSum As Double = 0

Dim sqrSum As Double = 0

For Each value As Double In values

dSum += value

sqrSum += value * value

Next

Dim var As Double = values.count * sqrSum - (dSum * dSum)

' Double is only guaranteed for > Double.minvalue. A difference

' with a result less than Double.minvalue will be considered zero.

If var < 0 OrElse var / (dSum * dSum)  < Double.minvalue   Then  return 0

Dim count as int32 = values.Length  -  if(entirePopulation, 0, 1)

Return Math.Sqrt( var / (count * count)

End Function

•   Hi,

Me bad!!! My previous code axample has many errors! At least the example works in VB2005.

Sorry for teh trouble!

Regards

Csaba

Function CalculateStDev(ByVal values As Double(), ByVal entirePopulation As Boolean) As Double

If values.Length < 1 Then Return 0

Dim dSum As Double = 0

Dim sqrSum As Double = 0

For Each value As Double In values

dSum += value

sqrSum += value * value

Next

Dim preVariance As Double = sqrSum - (dSum * dSum) / values.Length

' Double is only guaranteed for > Double.Epsilon. A difference

' with a preVariance less than Double.Epsilon will be considered zero.

' (Sqrt of a small numbe os much larger

If preVariance < 0 OrElse preVariance < 10 * Double.Epsilon Then Return 0

Return Math.Sqrt(preVariance / (values.Length - CInt(IIf(entirePopulation, 0, 1))))

End Function

• Why can't I use your code in the following code:

Dim CompanySet = From Company As HKNBIndex43 In SelectMarket._HKNBIndex43Set _

Where Company.Date >= StartDate And Company.Date <= EndDate _

Select Company _

Group By Name = Company.Name _

Into Count(), Average(Company.Close), StDev(Company.Close)

• I change my code from

Dim CompanySet = From Company As HKNBIndex43 In SelectMarket._HKNBIndex43Set _

to

Dim CompanySet = From Company As HKNBIndex43 In SelectMarket._HKNBIndex43Set.AsEnumerable _

And I change my code from

Me.DataGridView1.DataSource = CompanySet

to

Me.DataGridView1.DataSource = CompanySet.ToList

Then it can run.

• Great!!!!!!!!!!!!!

Thank u so much :)

ladyblue555@hotmail.com

Page 1 of 1 (6 items)