Zip Me Up

Zip Me Up

Rate This
  • Comments 31

Suppose you’ve got a sequence of Foos and you want to project from that a sequences of Bars. That’s straightforward using LINQ:

IEnumerable<Bars> bars = from foo in foos select MakeBar(foo);

or, without the query sugar:

IEnumerable<Bars> bars = foos.Select(foo=>MakeBar(foo));

But what if you have two sequences that you want to project from? Say you’ve got two sequences of doubles that are the same length, and you want to project a sequences of points.

The operation of “project from two sequences of the same length” is called a “zip join”, because its like doing up a zipper – you need each member of the left sequence to correspond to a member of the right sequence. In the version of the base class library that will ship with C# 4.0, we’ll be adding a Zip sequence operator to the standard extension methods.

The code to do so is pretty trivial; if you happen to need this in C# 3.0, I’ve put the source code below.

We considered adding a query syntax for zip joins. Something like

from foo in foos, bar in bars select MakeBlah(foo, bar)

or

from (foo, bar) in (foos, bars) select MakeBlah(foo, bar)

or some such thing, which would be syntactically transformed into

foos.Zip(bars, (foo, bar)=>MakeBlah(foo, bar))

However, we decided that adding a query syntax for a relatively obscure operation that is not actually supported by most relational databases was not worth the cost of complicating the grammar and the syntactic translation rules.

Here’s the source code:

public static IEnumerable<TResult> Zip<TFirst, TSecond, TResult>
    (this IEnumerable<TFirst> first,
    IEnumerable<TSecond> second,
    Func<TFirst, TSecond, TResult> resultSelector)
{
    if (first == null) throw new ArgumentNullException("first");
    if (second == null) throw new ArgumentNullException("second");
    if (resultSelector == null) throw new ArgumentNullException("resultSelector");
    return ZipIterator(first, second, resultSelector);
}

private static IEnumerable<TResult> ZipIterator<TFirst, TSecond, TResult>
    (IEnumerable<TFirst> first,
    IEnumerable<TSecond> second,
    Func<TFirst, TSecond, TResult> resultSelector)
{
    using (IEnumerator<TFirst> e1 = first.GetEnumerator())
        using (IEnumerator<TSecond> e2 = second.GetEnumerator())
            while (e1.MoveNext() && e2.MoveNext())
                yield return resultSelector(e1.Current, e2.Current);
}

  • Clearly the query syntax is pared down to some notion of what is most commonly used, but I'm not sure whether or not an operator is supported by the majority of relational databases is a good criteria to base that decision on.  The query syntax is part of Linq, not part of a specific ORM like LinqToSql or the EF.

  • Interesting Finds: May 8, 2009

  • I would go with an implementation that matches Ruby for objects:

    irb> [1,2,3].zip([4,5])

    => [[1, 4], [2, 5], [3, nil]]

    irb> [1,2,3].zip([4,5,6,7])

    => [[1, 4], [2, 5], [3, 6]]

    Either exception out or have nil as the second member of the tuple when the zipee is shorter.

    Keep the existing behavior for zippor is shorter than the zipee

    Also this seems a bit more natural and as @OlivierLeclant said in an earlier comment, it is unzippable,  

    public static IEnumerable<Tuple<TFirst, TSecond>> Zip<TFirst, TSecond> (this TFirst first, TSecond second);

  • Anthony, I agree, and the point is that the code I posted works perfectly well on unbounded streams, as should have been obvious from both the code and the parenthetical comment in my explanation.

    Nothing in my code requires that the enumerables are bounded.  The only additional requirement that my code imposes is that IN THE CASE THAT the code encounters the end of one stream, it will check that the other stream terrminates there also.

  • @Kosak  : I just don't think it is an Exception to have different lengths. Are you telling me you code doesn't throw on different lengths?

    """ throw new Exception("Enumerables have different lengths?"); """

  • Can you include how that would be done without using LINQ and possibly a real world scenario where this could be used?

  • Reviewing what I learned from your excellent post on naming things, I have to admit that Andrey Shchekin's comment is spot on.

  • Why did you choose a 2-method implementation? What is the need for the private ZipIterator()? Can't you yield return from Zip()?

    Good question. I covered that already a while back: http://blogs.msdn.com/ericlippert/archive/tags/Psychic+Debugging/default.aspx

    -- Eric

     

  • Thanks Eric!! I use iterators all the time and was not aware that my null checks (just like in your psychic debugging example) where being deferred... doh!

  • "However, we decided that adding a query syntax for a relatively obscure operation that is not actually supported by most relational databases was not worth the cost of complicating the grammar and the syntactic translation rules."

    That's exactly the reason why I think Scala made the right decision to put Linq-similar things in libraries and instead create an expandable language. While Linq surely feels nice it can't be extended by anyone except the C# developers...

  • Scala still has syntactic sugar for query comprehensions, no? So it exposes "filter" and "map" already, and in a non-extensible way, too.

    Sure, C# goes further than that, with joins and ordering, but fundamentally there's no difference. You still want the common operations to be sugared, the only question is where to draw the dividing line.

  • Eric, will there be an Unfold in version 4?

    Not to my knowledge. But it would be trivial to write such a beast yourself. How about this?

    static IE<T> Unfold<S, T>(this S state, Func<S, bool> done, Func<S, S> next, Func<S, T> transform)
    {
      while (true) 
      {
        yield return transform(state);
        if (done(state)) yield break;
        state = next(state);
      }
    }

    -- Eric

  • Daniel Lehmann wrote:

    > That's exactly the reason why I think Scala made the right decision to put Linq-similar things in libraries and instead create an expandable language. While Linq surely feels nice it can't be extended by anyone except the C# developers...

    Linq is a pattern, anyone can adhere to the Linq pattern, which means that the implementation part of Linq is extensible at the core, the only part of Linq that is not extensible is the comprehension syntax (you can't add new comprehension keywords, but you can make the comprehension syntax translate to entirely different "member" calls, usually methods). Other than adding new language keywords, you can extend Linq in any possible way imaginable (you can completely replace Linq to SQL, or Linq to XML, or Linq to Objects with your implementation, or simply mix other implementations with your own selectively)...

    The comprehension syntax in linq is simply a loose pattern matching (that works by translating to appropriate "member" calls (not just methods), using regular member resolution rules), the rest of Linq is straight forward libraries with standard CLR types that adhere to the linq pattern, and which can be implemented by anyone.

  • Re: Unfold, thanks! I already wrote my own in a simpler form where the transform part is left out (just stick a Select after it if you need that). I guess everyone using Linq is writing their own slightly different version of it.

  • The funny thing is that the code to use the zip join is overly expensive to maintain and develop for in production quality commercial software.  The cost is not really in lines of code but in the mental effort needed to understand and verify what is actually happening (easy to verify in a single 10 line chunk but not easy to do when you have 500+ methods using similar coding constructs).

    It fails the Monday morning, did not sleep the night before, code takes too long to understand test (even for code you wrote a week ago).

    list ret = new list()

    int x = max( list1.length, list2.length )

    for (int ctr = 0; ctr < x; ctr++)

    {

     element o1 = o2 = null

     if (x < list1.length)     o1 = list1[x];

     if (x < list2.length)     o2 = list2[x];

     e = ret.addnew(o1, o2)

    }

    Concentrating effort on better high level algorithms, data base organization and project organization will yield many times more productivity.

    The straight line sequential code is easier to take over and maintain by a developer unfamiliar with the rest of the application's code.

Page 2 of 3 (31 items) 123