Some meeting notes [Kit George]

Some meeting notes [Kit George]

  • Comments 10

All, thought I would post some notes from a recent meeting we had with the WebData team. This is in relation to a Tree class they have created internally in their namespaces, and what we should consider doing with it. The reference to 'our redblack' tree reflects a type you'll see in beta2.

Tree: created by WebData

-          The webdata team has made a tree, which is currently non-public. They wanted to consider whether this should be public, and whether they could use a generic version. Gang explained our current position on SortedDictionary, our generic redblack tree

-          Webdata asserted that our redblack tree does not satisfy their requirements

-          They also stated that some of our collections don’t meet the requirements for particular scenarios for our users. For example, adding a million rows to a SortedList takes around 30 minutes, but a redblack tree takes around 30 seconds. This is huge

-          End issue was: they believe their tree is good for general use, should we not consider having it in the BCL? One of the key things it does is that it has a key, value, and index

o        We have not heard the mass of request for this, but as with all collections, we’ll keep our heads up looking for requests. If the number of requests becomes significant, we’ll consider adding this in a future version

 NOTE: Redblack tree implementation underneath SortedList is all goodnesss for scalability and performance while operating with 100s of 1000s of rows but while operating in the lower end of the spectrum say less than 5000 rows (I don't have the exact perf data here), it would actually add to the overhead than any perf again compared to using simple list. What I'm getting at is there would be a break-even point before which using redblack tree is probbaly worse for your perf. Memory usage would be another point to consider as well.

-          Things we can consider for the future:

o        Do we have to handle duplicates in the class? Do we want to?

o        GC friendliness

o        Do we want to provide positioning (index)

-          Goal for VNext: consider how we can provide a tree, or tree like structure(s) which could meet the needs of the webdata team

Other issues:

-          Fix Dictionary, so that it is single writer safe. This means consumers have to stick locks everywhere to ensure the right thing happens (or, use Hashtable)

-     WebData suggestion: Consider making a HybridDictionary<T>, a generic version of HybridDictionary

  • "We have not heard the mass of request for this, but as with all collections, we’ll keep our heads up looking for requests."

    I appreciate the position that sometimes must be taken to make a statement like this. But as a developer who is accutely aware of how much one good algorithm can make a process hum, this just really burns me when I hear this over, and over, and over.

    First you are never going to get mass requests for stuff like this because the percentage of developers that know that certain algorithms are needed is really very small. Sorry but it's the truth.

    Second, everyone that's ever tried to use SortedList for large lists knows that it sucks. If they don't know that the collection sucks they are probably thinking to themselves, man .NET sure is slow. Which is worse. To me it is someone thinking .NET sucks rather than thinking the algorithm sucks. The implementation of SortedList is really something that a first year CS student would write, not something that a software company the size of MS would produce. Sorry but that's an even bigger truth.

    Third, the number of collections that have been made publicly available should give weight to the case that better implementations need to be written. I know a lot of developers will not use free 3rd party collections because of GPL issues or they just don't trust someone else's code. But if the same collection was made a part of the BCL then they would more than likely use it.

    When you try and hide the implementation details of a collection you really owe it to the developers to either provide the best possible implementation of that collection or provide other similar alternatives that allow for more and better choices.

    OK off my soap box. Now you guys get back to writing great collections.
  • Just a week ago we had a need for a collection with the key AND the index. So, please, count it as a request. After all, if there is an implementation that works faster than current one and supplies bigger functionality, i don't see any good reasons not to use it.
  • +1
  • First, a response for Kelly. A 'mass of requests' does not mean 500 requests before we say 'yup, we'll do that'. But no the same side, 1-2 doesn't normally fit the bill. We like at least a handful of requests before we can really consider an item critical.

    Furthermore, whenever we consider things like this, you've gotta know that for every person clammering for feature X, there's just as many clammering for feature Y. In this case, adding one collection, simply makes others ask 'why not this one?'. What we're really assessing is how important a feature is, to both the team, and customers.

    But it's good to hear the issues come out, thanks for the straight-forwardness. We're really trying to ensure we nail the generic collections, just a head at a time.

    For Andrew/Scott: thanks for the vote. We do have the new Collection<T> class, but I'm assuming you're asking for that one to be sped up?
  • One point to add to Kit's post above:

    Redblack tree implementation underneath SortedList is all goodnesss for scalability and performance while operating with 100s of 1000s of rows but while operating in the lower end of the spectrum say less than 5000 rows (I don't have the exact perf data here), it would add to the overhead and would actually fair poorly compared to using simple list. What I'm getting at is there would be a break-even point before which using redblack tree is probbaly worse for your perf. Memory usage would be another point to consider as well.
  • Ravi, perhaps SortedList should change its underlying implementation at the break even point you speak of. Like HybridDictionary. That lowers the friction of having multiple classes for novice developers to choose from and provides the optimized performance for a larger percentage of scenarios.
  • I'll vote for the red-black tree. There have been several cases where I could have used one of those.
  • For those of you who want an implementation of R-B Trees now, you can do a lot worse than checking out Scott Mitchell's article on MSDN (part of his excellent Data Structures series):

    http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dv_vstechart/html/datastructures_guide.asp

    (It's part 4 if the URL doesn't take you straight there.)

    I'd love to see a replacement or enhancement of the current Sort-related implementations as perf can be a problem with very large sets. I'd also love to see a decent reference string radix sort and suffix tree implementation, even if they were in a special namespace, but I realise that's reaching a bit ;)

    Keep up the great work!
  • PingBack from http://portablegreenhousesite.info/story.php?id=15784

  • PingBack from http://portablegreenhousesite.info/story.php?id=25591

Page 1 of 1 (10 items)