As promised in the previous blog, I am going to talk about the common mistake and trciks on LEFT join in LINQ to SQL.
Common issues I saw?
1. Using DefaultIfEmpty in value type variable.
If you copy the LEFT join example from internet and then use it directly against value type such as Int[], then, you query is not truly LEFT join.
from x in TXs
join y in TYs on x.YId equals y.YId into yG
from y1 in yG.DefaultIfEmpty()
select new { X = x, Y =y1 }
As the name of the method suggested, DefaultIfEmpty will check whether int[] is empty, if it is empty, it will become int[]{0}, since 0 is the default of int. You will end up with 0 instead of null in the right integer array when there is no match in the left integer array.
2. Didn't define proper relationship and make your query unnecessary complicated.
Assume there is a one to many relationship between table X and Y, if you don't default proper relationship in your mapping, you will require to do this for inner join
from y in TYs
where x.YId == y.YId
select new { x, y }
If you declare the relationship in your mapping, you can write your query like this
select new {x, x.Y}
3. Not being flexible on query writing.
If you familiar with SQL, sometime a query can be rewrite into different format, for example
SELECT TABLEX.*
FROM TableX LEFT Join TableY ON TableX.YID = TableY.YID
WHERE TableY.YID IS NULL
can be rewrite into
FROM TableX
WHERE TableX.YID NOT IN (SELECT Disctinct TableY.YID FROM TableY}
Be flexible! Write your Linq query in another format if it make more sense.
4. Try to write everything in one query
Be flexible! If it make sense, just break your gaint query into multiple small queries.
5. Think in object way only, and don't understand the underneath data store
This is a problem which I saw mainly on LINQ to SQL/Entity. People don't know that they are querying database when they are programming in .NEt language such as C#. Sometime, it cause huge performance issue. For example, I saw application do this...
foreach(X x in db.TableXs){
if (x.Y.YID == 1) continue;
}
So, db.TableXs will send 1 SQL query to SQL database, and x.Y.YID will send 1 query to SQL database. If there is 100 row of data in Table X, then, the code above will trigger 101 queries.
Sometime it is much cheaper if you write one query outside the loop.