Nulls - LINQ to DataSets Part 3

Published 13 February 07 08:43 AM | dpblogs 

 

 

In previous posts I have spent time talking about LINQ to DataSet, and how it can

help you write better code, and how you can write some very interesting queries with DataSet. Today I am going to talk more about nulls, which is one of the areas in which there is a lot of confusion. The key thing to remember is that nulls work very differently in LINQ than they do in SQL!

 

Let us take another look at one of the example queries from a previous post. This query looks for orders with no OrderDate, perhaps because the order has not been finalized yet.

var query = orderDataTable.AsEnumerable()

    .Where(dr => dr.Field<datetime>("OrderDate") == null)

    .Select(dr => dr.Field<int>("OrderID"));

In this particular query, the Field<T> method returns a nullable type, which is great, as you no longer have to deal with DBNull. You can now use null equality like you do everywhere else in your code. This is one of the ways that LINQ really helps bridge the impedance mismatch between data and code.

 

What if you wanted to find all the orders placed in 2006? The following query (as a query expression) might be a good start.

var query = from dataRow in orderDataTable.AsEnumerable()

            where dataRow.Field<DateTime>("OrderDate").Year == 2006

            select dataRow.Field<int>("OrderID");

If you run this code, and any of your orders do not have an OrderDate, guess what happens? You get an ever so pleasant NullReferenceException. This is the problem with nulls. When the Field<T> method returns a null, the CLR attempts to call a method, and we all know you cannot call a method on a null object!

 

IsNull

What we have to do is make sure that we do not ever call a method on a null object. How should we do this? This is yet another place where the expressiveness of LINQ is really showcased. Because the where predicate is simply an expression, we can modify the expression so that we do not call the Year getter method if the value is null.

 

There are many solutions to this problem. We could wrap the Method<T> and null check inside of an if-then-else block, but that certainly is not very readable! Fortunately for us, we have the IsNull method that will check for a null value. In fact, the first query can use this method.

var query = orderDataTable.AsEnumerable()

    .Where(dr => dr.IsNull("OrderDate") == true)

    .Select(dr => dr.Field<int>("OrderID"));

Is that not a lot more readable? However, that is only half of the problem. We still need to have the logic that will execute the Year getter method only when OrderDate is not null.

var query = from dataRow in orderDataTable.AsEnumerable()

            where

                dataRow.IsNull("OrderDate") == false

                &&

                dataRow.Field<DateTime>("OrderDate").Year == 2006

            select dataRow.Field<int>("OrderID");

Now we are cooking! This query will run without exceptions and produce the correct results. We are looking golden.

 

Visual Basic

In Visual Basic, the AND operator does not short circuit, so in the above example the Year property getter method will still be called. There is another operator however, the AndAlso operator, which behaves like the C# && operator, and this could be used to avoid the NullReferenceException. The query using this approach would look like the following.

Dim query = From dataRow In orderDataTable _

            Where _

                dataRow.IsNull("OrderDate") = false _

                AndAlso

                dataRow.Field(Of DateTime)("OrderDate").Year = 2006 _

            Select dataRow.Field(Of int)("OrderID")

All that typing certainly is a drag. If you are using VB.NET, you have a powerful new version of an existing operator that you can use to make it much easier! This function is the ternary operator IIF.

 

IIF

IIF takes three arguments, if the first argument is true, then the second argument is returned, otherwise the third argument is returned. The cool part is that all three arguments can be methods! With this, the above code is converted into the much nicer version below.

Dim query = From dataRow In orderDataTable _

            Where IIF(dataRow.IsNull("OrderDate") = false, dataRow.Field(Of DateTime)("OrderDate").Year = 2006, false) _

            Select dataRow.Field<of int>("OrderID")

The possibilities for this new operator are endless both in LINQ and in your own code!

 

Coming soon…

 

Whatever you want! Leave a comment, send an email! Let me know what questions you have, concerns you’d like to share, or anything else.

 

 

Erick Thompson

Program Manager, ADO.NET

 

LINQ to DataSet Part 1

 

LINQ to DataSet Part 2

 

Filed under:

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

# onovotny said on February 13, 2007 12:43 PM:

What kind of extensibility will there be in the LINQ to SQL implementation?

So far, nothing I've seen shows that you can use CTE's or Full-Text functions directly from LINQ code.  It would be very useful if there was a mechanism where additional functions could be added to expression tree that could add additional SQL calls.  

Is there any way to do this?

# Frans Bouma said on February 13, 2007 2:17 PM:

I asked a question yesterday about your part 2 of the entity data model. You didn't reply. Could you please answer my question? thanks :)

# Beth Massi said on February 13, 2007 2:23 PM:

I think your VB syntax is incorrect. Shouldn't it be:

Dim query = From dataRow In orderDataTable _

           Where IIF(dataRow.IsNull("OrderDate"), False, dataRow.Field(Of DateTime)("OrderDate").Year = 2006) _

           Select dataRow.Field(Of Integer)("OrderID")

# Erick Thompson said on February 13, 2007 6:25 PM:

Beth,

Thanks for the catch! The code should be updated soon.

Frans,

The author of that post will respond to your comment on that post, but if you have any LINQ to DataSet related questions, please post!

# Frans Bouma said on February 14, 2007 10:36 AM:

I don't have any dataset related questions, or perhaps it's related, but what I really want the ADO.NET vNext team to comment on are 2 things:

1) the 'where is what executed' problem. See:

http://www.thedatafarm.com/blog/2007/02/13/TheShadowySideOfLINQToSQL.aspx

It contains a Linq query which actually doesn't result in a single query but in a query which results are evaluated in-memory, which isn't expected at all, and thus will result in a lot of queries.  (point 2 in that blog). When is what executed in the db and when is what executed in-memory, which could trigger lazy loading or other slow constructs. To me, if Linq doesn't have a solution for this, it's dead in the water as too many people will be bitten by it sooner or later. It should be deterministic where what is executed (e.g. in-memory with a function or in the db with a query predicate/fragment)

2) the explicit specification of a subquery instead of a join. Take northwind, and the employee - order - customer relations. Now, there are 8 or 9 employees in the db, a few hundred orders and 89 or so customers. If I want all employees who have accepted an order from a customer in Germany, I can do that with JOINs or with subqueries. The fact is, with JOINs, I get a lot of duplicates for the employee data (due to the 1:n relation), and I can't use DISTINCT for the select, as Employee contains a blob field. It therefore will be very slow as the data has to be filtered on the client (o/r mapper core) and with subqueries this isn't the case, the code is very fast. The thing is though: how to specify in Linq that a subquery has to be used instead? For northwind, it might be still acceptable, but if I have 500,000 orders and 150,000 customers, I won't be very happy with the join approach.

Again, if Linq can't express this, it won't be that useful for a lot of applications. It would be great to see how to address these two points in Linq code.

Thanks.

# RKJ said on February 14, 2007 11:04 AM:

Could anyone please provide an example of using ExceptRows function in a sql query ?

ExceptRows

Public Function ExceptRows( _

       first as IEnumerable(Of DataRow), _

       second as IEnumerable(Of DataRow)) _

       as IEnumerable(of DataRow)

For example please use the following query: Replace operator "<>" with Except Rows and "=" with EqualRows.

Dim RetTable As DataTable = (From o In (From A In Elite _

Select A), P In Person _

Where LTrim(RTrim(o.Field(Of String)("TimeKeeperNumber"))) = LTrim(RTrim(P.Field(Of String)("TimeKeeperNumber"))) _

And UCase(LTrim(RTrim(o.Field(Of String)("Email_Address")))) <> UCase(LTrim(RTrim(P.Field(Of String)("Email_Address")))) _

Select TimeKeeperNumber := o.Field(Of String)("TimeKeeperNumber"), _

Last_Name := P.Field(Of String)("Person_Last_Name")).ToDataTable()

# Beth Massi said on February 14, 2007 3:37 PM:

Frans,

Re: 2) What I want to know is if that same issue comes up when querying the Entity Data Model. I see using LINQ to the EDM more than directly like LINQ to SQL.

And I'd like to hear what kind of guidance is Microsoft going to give regarding using LINQ to SQL directly vs. the Entity Framework? With LINQ to EDM I can't yet see where LINQ to SQL should be used except for really small, trivial applications.

# ADO.NET team blog said on February 15, 2007 3:51 PM:

Greetings fellow data junkies! My name is Erick Thompson, a PM at Microsoft who is working on driving

# ADO.NET team blog said on February 15, 2007 3:53 PM:

Back in the first post of this LINQ to DataSet series, I spent some time talking about what LINQ to DataSet

# DiegoV said on February 17, 2007 12:10 PM:

Hello Erick, It is good to have something to read while the February CTP (or is it a March CTP?) gets out!

I happen to use Visual Basic a lot, and just wanted to comment:

1. The short-circuited behavior of IIF is new to Orcas, but the operator has been part of the language for many years (only it was just like a normal function call that evaluated all parameters). For what I know this is still subject to change in the final version (http://www.panopticoncentral.net/archive/2006/12/29.aspx)

2. I think the use of ANDALSO and IIF is a matter of taste, but I personally like more ANDALSO in this case :)

3. I prefer not to compare a Boolean function like IsNull() to true or false as you do in your code, but again, it is matter of taste. Perhaps my problem is that I have seen really ugly code do things like that :)

Now, for a couple of questions:

1. With Field<T>() returning nullable types you mean that it is also safe to use something like Field<T>(name).HasValue if T is a value type?

2. Is it going to be possible to change the settings of the Typed DataSet generator to get fields with nullable types in Orcas?

Thanks for your post.

# Sam Gentile said on February 18, 2007 11:16 AM:

Data/ADO.NET Orcas Two from the ADO.NET team: Entity Client and Nulls - LINQ to DataSets Part 3 Software

# Erick Thompson said on February 28, 2007 5:38 PM:

Diego,

Thanks for the great comments! FTYI, The Feb/March CTP is now out, so take a look!

ANDALSO is definitely an option, and I can see why people like that more. However, I've found that a lot of people aren't familiar with ANDALSO, and they get stuck on the short circuiting part of it.

For your questions, if you use Field<T> you can definitely use HasValue to check for a null value, which is yet another way you can write these queries.

As for the Typed DataSet returning nullable types, that is something we're looking at doing for RTM.

Thanks!

Erick

# Bill McCarthy said on May 20, 2007 10:07 PM:

The syntax around IIf is completely wrong.  IIf is a function with the syntax of :

Function IIf(expression As Booleam, _

                  returnTrue As Object, _

                  returnFalse As Object)

                  As Object

So when you pass to IIf a statment such as :

Dim query = From dataRow In orderDataTable _

          Where IIF(dataRow.IsNull("OrderDate"), False, dataRow.Field(Of DateTime)("OrderDate").Year = 2006) _

          Select dataRow.Field(Of Integer)("OrderID")

it means that the parameters otthe function all get evaluated, including :

dataRow.Field(Of DateTime)("OrderDate").Year = 2006

even when dataRow.IsNull("OrderDate") is true.

This is a language limitation that will be addressed in Orcas with the new If tenrary operator.   The correct code will be:

Dim query = From dataRow In orderDataTable _

          Where If(dataRow.IsNull("OrderDate"), False, dataRow.Field(Of DateTime)("OrderDate").Year = 2006) _

          Select dataRow.Field(Of Integer)("OrderID")

# Patrick said on November 13, 2007 4:44 PM:

Hi,

I have a typed Dataset with two related data tables that I am joining together on a single relation. I want to then put the result [of the select] back into a third data table. This seems to be quite difficult! Can you suggest a method for achieving this?

           var joined = from p in dset.Donations.AsEnumerable()

                        join e in dset.Name_and_Address on p.Member_ID equals e.Member_ID

                        select new { Donation_ID = p.Field<int>("Donation ID"), Member_ID = p.Field<int>("Member ID"),

                                       Date = p.Field<DateTime>("Date"), Amount = p.Field<double>("Amount"),

                                       Title = e.Field<string>("Title"), Initials = e.Field<string>("Initials"),

                                       Surname = e.Field<string>("Surname") };

The source data may contain DBNull.

# Travis Riffle said on March 4, 2008 5:37 PM:

I just wanted to say this is exactly what I was looking for...  not testing IsNull was what was causing my exception problems in my datatable.

Good info!

# pramod.gupta said on July 22, 2008 4:21 AM:

How do i  write a dynamic Linq query with Group Join over Two DataTables.

Constraint columns can be more than one.

# Gabriel Bogea Perez said on August 8, 2008 3:15 PM:

Hey, what if I want to use a column from a Typed Dataset in an OrderBy and this column may have nulls? How to deal with this?

var query = from row in dataTable.AsEnumerable()

                  where row.IdCompany == 5

                  orderby row.Age

I get an error saying that the column Age is DBNull, which is expected since the Property in the DataTable throws an exception when you try to cast the null value.

In this case Age may be null. I have also tried this:

var query = from row in dataTable.AsEnumerable()

                  where row.IdCompany == 5

                  orderby row["Age"]

but it fails in the CompareTo method.

Any ideas?

# shankar said on September 25, 2008 2:39 AM:

Hi ,

        I just want to build the where clause of the query dynamicaly. Kindly help me out

Leave a Comment

(required) 
(optional)
(required) 

  
Enter Code Here: Required

Search

This Blog

Syndication

Page view tracker