Welcome to MSDN Blogs Sign in | Join | Help

Simon Ince's Blog

Ramblings of an Application Development Consultant in the UK
Loving LINQ

I have to say, I love how easy it is to do SQL-like expressions over collections of objects with LINQ...

If we have two simple classes that define our data;

class Person

{

    public int Id { get; set; }

    public string Name { get; set; }

    public int AddressId { get; set; }

}

 

class Address

{

    public int Id { get; set; }

    public string Line1 { get; set; }

    public string PostCode { get; set; }

}

Then we can “join” them in memory with a quick LINQ query;

static void JoiningCollections()

{

    // create some test data

    List<Person> people = new List<Person>()

    {

        new Person() { Id=1, Name="Simon Ince", AddressId=1},

        new Person() { Id=2, Name="Someone Else", AddressId=1},

        new Person() { Id=3, Name="My Friend", AddressId=2},

        new Person() { Id=4, Name="My Other Friend"}

    };

 

    // create some more test data

    List<Address> addresses = new List<Address>()

    {

        new Address() { Id=1,

                        Line1="1 Aroad",

                        PostCode="A1 B23" },

        new Address() { Id=2,

                        Line1="3 Another Road",

                        PostCode="Z1 ZZZ" }

    };

 

    // define a really simple query

    var results =

        from p in people

        join a in addresses

        on p.AddressId equals a.Id

        select new { p.Name, a.Line1, a.PostCode };

 

    // and enjoy the simplicity of the results

    foreach (var result in results)

    {

        Console.WriteLine(String.Format("{0} lives at {1}, {2}.",

                         result.Name,

                         result.Line1,

                         result.PostCode));

    }

}

 

Imagine how many more lines of code you would have to write to achieve that before LINQ existed.

The one thing I don’t like about this is that you may need to avoid the “var” keyword when returning result sets to a calling function. No problem... just add a class that can pair up your names and addresses;

class PersonsAddress

{

    public Person Person { get; set; }

    public Address Address { get; set; }

}

... and change your query and output routine to use it;

IEnumerable<PersonsAddress> results =

    from p in people

    join a in addresses

    on p.AddressId equals a.Id

    select new PersonsAddress { Person=p, Address=a };

 

foreach (PersonsAddress result in results)

{

    Console.WriteLine(String.Format("{0} lives at {1}, {2}.",

                     result.Person.Name,

                     result.Address.Line1,

                     result.Address.PostCode));

}

Hey presto – you have just joined together two sets of objects in memory, and could now return them as a strongly typed collection of data. What is also interesting is that by using the IEnumerable<PersonsAddress> type this query won’t be executed until you start to iterate over the results.

The only gotcha with this is that I dread the day when I see two massive collections of data retrieved from a database and joined in memory... so be careful! Bear in mind that in memory joins like this will almost certainly be slower than when done in an engine designed to do it all the time – SQL Server. I say almost as you could point out there is overhead to hit a database, and if you have two collections containing only a few items each I might be persuaded it is quicker to do it in memory... but you see my point, non?!

Still, you’ve got to love LINQ.

Posted: Tuesday, October 30, 2007 1:41 PM by simonince
Filed under: ,

Comments

Bart Czernicki said:

Nice intro article.  However, ur paragraph on the bottom, couldn't be further from the truth about large collections :)  

My tests show 100x improvements when working with LINQ in memory with larger collection sets (> 200,000 rows), without any optimizations!!!

- add in custom indexes (i.e. Hash indexes)

http://blog.magenic.com/blogs/aarone/archive/2007/04/09/Introducing-i4o.aspx

- PLINQ (coming soon to a developer near u, some people aready have the bits)

http://spellcoder.com/blogs/bashmohandes/archive/2007/10/14/8530.aspx

- make ur own custom Expression trees optimized

You get even faster results!  PLINQ and hash indexes are just sick on performance.  Check out the SQL 2005 Optimizations book on how SQL handles Hash joins and how they are preferred for large data sets and u can do the same for LINQ!!!

I have queries that take 6 seconds to run in SQL Server coming back in 27ms...(no I am not missing a zero)

# October 31, 2007 5:40 PM

simonince said:

Yes, I should probably have qualified that comment a bit further! I take your points - and I have to say the indexing ideas are very interesting, and potentially fantastically useful in some situations - but I'd still advise caution.

My main concern is in multi-user scenarios - e.g. ASP.NET. The idea of many users pulling back 200,000 objects each, joining them in memory concurrently, etc, etc... watch that scalability plunge! I'm keen that people understand what LINQ is doing - when it is optimising a SQL query, and when it is just operating in memory, as misunderstanding this could leave you with a headache.

On the other hand, database servers (although incredibly advanced) are not magical - the fact is they have to do a lot of processing that can be simulated in .NET code... so if you write something approaching the same algorithms, but don't have the overhead of network or cross process communication, serialization, and so on, I can completely believe that in some scenarios you can beat a SQL server.

We also shouldn't forget that there is no like-for-like comparison here. SQL can balance reads with writes, can scale up massively, has availability and recovery options, etc, etc... so as usual I suppose it is a case of picking your approach according to the best fit at the time!

# November 2, 2007 6:35 AM
Leave a Comment

(required) 

(required) 

(optional)

(required) 

  
Enter Code Here: Required

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Page view tracker