My Proposal For Optimizing LINQ

Code Red > .NET

(1/5) > >>

TheMaster:
http://connect.microsoft.com/VisualStudio/feedback/details/510253/linq-optimization-with-non-generic-icollections

This was originally brought about as a result of having to deal with Autodesk's failure to implement IEnumerable<T>, ICollection<T>, and IDictionary<TKey, TValue> on many AutoCAD types that expose collections as non-generic IEnumerable, ICollection, or IDictionary.

Kerry:

Thanks for the link Tony.

A bit over 2 years since the original posting ... Is the issue actually closed/resolved ... can't make the time to read fully at the moment.

fixo:
You're really bad, Tony :)

TheMaster:

--- Quote from: Kerry on February 08, 2012, 09:32:40 PM ---
Thanks for the link Tony.

A bit over 2 years since the original posting ... Is the issue actually closed/resolved ... can't make the time to read fully at the moment.

--- End quote ---

It's closed, but as a result they did make some changes in .NET 4.0 (they now support both the generic and non-generic ICollection at every point where they previously supported only ICollection<T>), and that was a direct result of that feedback, but I have been lobbying for more, along the lines of my prototype.

The summary of the proposal is that, with the changes I outlined implemented, the number of items that will be returned by a complex LINQ expression like what follows can be determined without having to execute the expression and iterate over the returned items (as is required today):

--- Code: ---
ICollection items = // assigned to non-generic ICollection
var expr = items.Cast<TSomething>().Select( a => a.b ).OrderBy( b => b.c );

// here, the count is obtained without executing the LINQ expression:
int count = expr.Count();

--- End code ---

The scheme involves adding an interface to the objects returned by
Select(), OrderBy(), Cast() (and others), that would allow each of
them to query their adjacent operator/source argument for a count.

Because each of those iterator objects returned by the LINQ operators
shown in the example are often the source of another iterator object,
the query can propagate down the calling chain until one of the objects
in it returns a count, that would then be returned back up the chain of
callers until it reaches the Count() method call and becomes its result,
and we've deduced the number of items in the result of the composite
LINQ expression without executing it. :-o

Today, as the design exists in all current versions of the framework, the
call to Count() in the above example triggers execution of the LINQ
expression, to produce a resulting count, and the expressions result is
unreachable, requiring the expression be executed again, in order to
make use of it.

The optimizations are not only for the sake of the Count() method.
They're also beneficial to ToArray(), ToList(), ToDictionary(), ToLookup(),
and so forth, because they must incrementally grow an array in order
to build their result when the count of items is unknown. If the count
of items is known, they can allocate the needed array capacity in one
swell foop (similar to how the List<T>(int capacity) constructor works).

So in the above example, the Count() method would ask the object
returned by OrderBy() for a count. The object returned by OrderBy()
would in-turn ask the object returned by Select() for a count, which
in-turn asks the object returned by Cast() for a count. The object
returned by Cast() would examine its source argument; see that it is
an ICollection (that knows how many elements it contains), and simply
return the value if it's ICollection.Count property.

The caveat (of course there's always one or two), is that this scheme
only works with LINQ operators that return a determinant number of
items. LINQ operators whose result count is indeterminate and depend
on conditional branching (an example of which is the Where() operator)
cannot implement the interface or participate in the scheme, since they
can't determine how many items they will yield without actually executing
and producing the result.

retsameht:

--- Quote from: Kerry on February 08, 2012, 09:32:40 PM ---
Thanks for the link Tony.

A bit over 2 years since the original posting ... Is the issue actually closed/resolved ... can't make the time to read fully at the moment.

--- End quote ---

Hi Kerry.

Well, 12 years later and 14 years since I posted it, they finally got around to implementing my proposal (in .NET 6.0), in the form of TryGetNonEnumeratedCount(), and IIListProvider<T>. While their implementation is somewhat more intricate than my humble proof-of-concept, it is conceptually based on it, precisely.

Attached is the original code from that proposal, in which my TryGetCount() method materialized as TryGetNonEnumeratedCount(), and my IFixedSizedEnumerable<T> interface materialized as IIListProvider<T>.

Back when I first came up with this, it was significantly faster than the framework 4.7 Linq runtime. However, they did quite a bit of optimization since, and now the .NET 8.0 runtime's implementation of the aforementioned optimizations are roughly 2x faster than my proof-of-concept code. So, don't bother with it. :laugh:

They also implemented my proposals for CountBy() and DistinctBy() which I may have posted about here around the same time frame, but can't remember.

Navigation

[0] Message Index

[#] Next page

Go to full version