Fabulous Adventures In Coding
Eric Lippert is a principal developer on the C# compiler team. Learn more about Eric.
Ed "Scripting Guy" Wilson was kind enough to ask me to be a guest commentator at this years Summer Scripting Games, which have just completed.
I've been working on a series for this blog about some unusual cases in the design of the "iterator block" feature in C# 2.0; this bit from my commentary is going to be germane to that series. I thought I'd post it here as an appetizer before we get into the main courses of the series. The series on iterators will probably run throughout July.
The problem for event ten of the scripting games was to write a script which changes the priority of every new process that has a particular name.
There’s an odd thing that you learn when working on developer tools: The people who design and build the tools are often not the experts on the actual real-world use of those tools. I could tell you anything you want to know about the VBScript parser or the code generator or the runtime library, but I’m no expert on writing actual scripts that solve real problems. This is why I was both intrigued and a bit worried when the Scripting Guys approached me and asked if I’d like to be a guest commentator for the 2009 Summer Scripting Games.
I wrote this script the same way most scripters approach a technical problem that they don’t immediately know how to solve; I searched the Internet for keywords from the problem domain to see what I could come up with. Of course, I already knew about our MSDN documentation, I had a (very) basic understanding of WMI, and I knew that the Scripting Guys had a massive repository of handy scripts.
My initial naïve thought was that I would have to go with a polling solution; sit there in a loop, querying the process table every couple of seconds, waiting for new processes to pop up. Fortunately, my Internet searches quickly led me to discover that process startup events can be treated as an endless collection of objects returned by a WMI query.
That got me thinking about the powerful isomorphism between events and collections.
A collection typically uses a “pull” model — the consumer asks for each item in the collection one at a time as needed, and the call returns when the item is available. Events typically work on a “push” model — the consumer registers a method that gets called every time the event fires. But not necessarily; the WMI provider implements events on a “pull” model. The event is realized as a collection of “event objects.” It can be queried like any other collection. Asking for the next event object that matches the query simply blocks until it is available.
Similarly, collection iterators could be implemented on a “push” model. They could call a method whenever the next item in the collection becomes available. The next version of the CLR framework is likely to have standard interfaces that represent “observable collections”, that is, collections that “push” data to you, like events do. The ability to treat events as collections and collections as events can lead to some interesting and powerful coding patterns.
The rest of the solution and analysis is here.
Have a good weekend!
I generally prefer pulling to pushing. The reason I prefer it is because when you are being pushed data, it's sometimes hard to know if you have enough data yet without doing half the work then aborting.
For example, I bet the compression stream Write methods are more complicated than the Read methods, because they have to wait for enough data before they can emit the next byte. The read method just asks for the data it wants, and that's it.
I don't like it when things are needlessly complicated, so I decided to try to turn allow pull methods to accept 'push' data. The major challenge is the lack of continuations or coroutines in .Net, so I used a separate thread instead. When data arrives the push thread passes control to the processing thread, which runs until it requests more data than is available, at which point it passes control back to the push thread. Unfortunately the requirement of that extra thread makes this only useful in limited cases.
I thought a commentator was supposed to make mindless comments! Where were the corny sports metaphors?
"To win this competition, that script will really need to go out and execute. Really, you just need a little bit of I/O to keep the drive alive. That may not be a full line of output...it depends on where they spot the cursor. A new Dictionary object would give him some more room to operate. That runtime error was definitely due to a blown coverage test. Now that was an ill-advised pass through the loop. He'll try and tack on the extra point with an autoincrement."
"Similarly, collection iterators could be implemented on a “push” model." - you hay have already seen it, but if not, take a peek at PushLINQ in MiscUtil; Jon and I had fun making this work as a full LINQ implementation where data is pushed, not pulled.
http://msmvps.com/blogs/jon_skeet/archive/2008/01/04/quot-push-quot-linq-revisited-next-attempt-at-an-explanation.aspx & http://www.yoda.arachsys.com/csharp/miscutil/
If I'm not mistaken, one example of collection iterators that push is SAX. http://en.wikipedia.org/wiki/Simple_API_for_XML
> The next version of the CLR framework is likely to have standard interfaces that represent “observable collections”
Do you mean 4.0, or the next version after that?
Oh, and I also can't help but remember F#'s IEvent, which can work in either pull or push. Given the ongoing type unification between languages (e.g. F# and IronPython separate tuples being replaced by a single common System.Tuple), I wonder if that upcoming feature is also going to be used in a similar way - for IEvent by F#, and possibly for something similar in C#?
Keeping state and using 'push' is the price you pay for non-blocking operation. But who said C# doesn't have coroutines? Since Eric is talking about iterators, it's the right place to point out that iterator blocks (the "yield return" construct) are nothing other than coroutines.
There is already INotifyCollectionChanged and INotifyPropertyChanged. Or do you mean something like the abstraction of parsing & registering callbacks and filters to have more control over when your code is called?
> There is already INotifyCollectionChanged and INotifyPropertyChanged.
Don't forget IBindingList.ListChanged, which (according to MSDN) goes all the way back to .NET 1.1
"Why exactly is this exciting? Or are we really just that impressed at someone reinventing pthread_cond_wait?"
imagine a fold on an infinite sequence where the accumulator is always available to query at any point.
All your usual tools/functions for dealing with sequences would 'just work', I like the expressibiity implied.
The compiler can transform something like this already
public static IEnumerable<TResult> FoldWithPeek(this IEnumerable<T> s, TResult acc, Func<TResult,T,TResult> op)
yield return acc;
foreach (var t in s)
acc = op(acc,t);
yield return acc;
or you could be more imperative in style
public static TResult FoldWithPeek(this IEnumerable<T> s, ref TResult acc, Func<TResult,T,TResult> op)
and the compiler inserts the magic to let acc be read (but not written to) elsewhere if you need to refer to it. I prefer the co-routine style (you are letting the compiler/platform deal with the underlying thread management (the ref value wouldn't play nice if the stack moved for example), heck it could use fibers or somesuch if it wanted to.
If you haven't seen this talk by Erik Meijer at Lang.NET 2009 you should give it a viewing..
He pretty much nails "the powerful isomorphism between events and collections" by using combinators with IObserverable reactive LINQ queries instead of IEnumerables..
The easier way to do this is to use powershell to build a batch file for your to change the process's priority. You would then difference (using FC.exe) this batch file with the last batch file you had to find new lines (i.e., new processes) and output that to a batch file. Run that batch file. Rename your output file to old.txt and use it for the next run's file compare.
I've used this trick of building a batch file via command line tools many times to simplify what would involve lots of vbscript or win32 api coding.
Great news! Sound like the ReactiveLinq http://tomasp.net/blog/reactive-ii-csevents.aspx
Though a nice article, take no offence: sounds like you've just found a wheel.
This isomorphism is well-known (at least from my point of view) and is also used in VERY many places.
Take for example plain old send/receive pair for berkeley sockets. RECV is blocking data reading mechanisms. Almost wherever you look at OOP send/recv-like communication, you find that recv is loop'ed to wait until whole object is downloaded and then returns the whole object. More bright C++'ers envelop that method in an .. iterator, to use such sequence easily with whole STL/Boost/etc libraries. Mind the whole "stream/collection" approach directly leads to this. Yep, writing your own iterator is not a task for a complete beginner, but after writing two or three templates, its asy enough to consider it a tool rather than esoteric feature.
Or look at Ruby or Lua, where lambdas thrive as the blocks/procs/coroutines, implementing such dynamic sequence iterators is so plain dumb trivial, that there's hard NOT TO write any streaming iterators at all. if you want to read lines from file and parse them, you open the file, call "enumeratormethod" and supply it with your callback procedure, that will be called back on every single line found in the file. Similar for sockets, webrequests, whatever. It's literally everywhere.
Totalling up: whenever you perform blocking read from whatever source, you actually use this isomorphism, and it is well-known that every blocking read can be transformed into- and backfrom- an 'on completion' callback. Existence of AsyncIO proves that well enough.