Author Topic: My Proposal For Optimizing LINQ (Read 3300 times)

TheMaster · « **on:** February 08, 2012, 05:58:53 PM »

http://connect.microsoft.com/VisualStudio/feedback/details/510253/linq-optimization-with-non-generic-icollections

This was originally brought about as a result of having to deal with Autodesk's failure to implement IEnumerable<T>, ICollection<T>, and IDictionary<TKey, TValue> on many AutoCAD types that expose collections as non-generic IEnumerable, ICollection, or IDictionary.

Kerry · « **Reply #1 on:** February 08, 2012, 09:32:40 PM »

Thanks for the link Tony.

A bit over 2 years since the original posting ... Is the issue actually closed/resolved ... can't make the time to read fully at the moment.

fixo · « **Reply #2 on:** February 09, 2012, 01:04:42 AM »

You're really bad, Tony

TheMaster · « **Reply #3 on:** February 09, 2012, 05:29:23 AM »

Quote from: Kerry on February 08, 2012, 09:32:40 PM

Thanks for the link Tony.

A bit over 2 years since the original posting ... Is the issue actually closed/resolved ... can't make the time to read fully at the moment.

It's closed, but as a result they did make some changes in .NET 4.0 (they now support both the generic and non-generic ICollection at every point where they previously supported only ICollection<T>), and that was a direct result of that feedback, but I have been lobbying for more, along the lines of my prototype.

The summary of the proposal is that, with the changes I outlined implemented, the number of items that will be returned by a complex LINQ expression like what follows can be determined without having to execute the expression and iterate over the returned items (as is required today):

Code: [Select]


  ICollection items =  // assigned to non-generic ICollection
  var expr = items.Cast<TSomething>().Select( a => a.b ).OrderBy( b => b.c );

  //  here, the count is obtained without executing the LINQ expression:
  int count = expr.Count();

The scheme involves adding an interface to the objects returned by
Select(), OrderBy(), Cast() (and others), that would allow each of
them to query their adjacent operator/source argument for a count.

Because each of those iterator objects returned by the LINQ operators
shown in the example are often the source of another iterator object,
the query can propagate down the calling chain until one of the objects
in it returns a count, that would then be returned back up the chain of
callers until it reaches the Count() method call and becomes its result,
and we've deduced the number of items in the result of the composite
LINQ expression without executing it.

Today, as the design exists in all current versions of the framework, the
call to Count() in the above example triggers execution of the LINQ
expression, to produce a resulting count, and the expressions result is
unreachable, requiring the expression be executed again, in order to
make use of it.

The optimizations are not only for the sake of the Count() method.
They're also beneficial to ToArray(), ToList(), ToDictionary(), ToLookup(),
and so forth, because they must incrementally grow an array in order
to build their result when the count of items is unknown. If the count
of items is known, they can allocate the needed array capacity in one
swell foop (similar to how the List<T>(int capacity) constructor works).

So in the above example, the Count() method would ask the object
returned by OrderBy() for a count. The object returned by OrderBy()
would in-turn ask the object returned by Select() for a count, which
in-turn asks the object returned by Cast() for a count. The object
returned by Cast() would examine its source argument; see that it is
an ICollection (that knows how many elements it contains), and simply
return the value if it's ICollection.Count property.

The caveat (of course there's always one or two), is that this scheme
only works with LINQ operators that return a determinant number of
items. LINQ operators whose result count is indeterminate and depend
on conditional branching (an example of which is the Where() operator)
cannot implement the interface or participate in the scheme, since they
can't determine how many items they will yield without actually executing
and producing the result.

retsameht · « **Reply #4 on:** February 21, 2024, 03:20:41 AM »

Quote from: Kerry on February 08, 2012, 09:32:40 PM

Thanks for the link Tony.

A bit over 2 years since the original posting ... Is the issue actually closed/resolved ... can't make the time to read fully at the moment.

Hi Kerry.

Well, 12 years later and 14 years since I posted it, they finally got around to implementing my proposal (in .NET 6.0), in the form of TryGetNonEnumeratedCount(), and IIListProvider<T>. While their implementation is somewhat more intricate than my humble proof-of-concept, it is conceptually based on it, precisely.

Attached is the original code from that proposal, in which my TryGetCount() method materialized as TryGetNonEnumeratedCount(), and my IFixedSizedEnumerable<T> interface materialized as IIListProvider<T>.

Back when I first came up with this, it was significantly faster than the framework 4.7 Linq runtime. However, they did quite a bit of optimization since, and now the .NET 8.0 runtime's implementation of the aforementioned optimizations are roughly 2x faster than my proof-of-concept code. So, don't bother with it.

They also implemented my proposals for CountBy() and DistinctBy() which I may have posted about here around the same time frame, but can't remember.

kdub_nz · « **Reply #5 on:** February 21, 2024, 03:21:48 PM »

Thank you Tony, great to see you back here . . and with most excellent news.
I'll do some study in the upcoming week.

Very clever name choice !!

Quote from: DrWhoWiki

An incarnation of the Master - seemingly one of the last of his original regeneration cycle

Stay well

57gmc · « **Reply #6 on:** February 21, 2024, 06:11:36 PM »

Quote from: kdub_nz on February 21, 2024, 03:21:48 PM

Very clever name choice !!

I thought it was maybe Arabic.

kdub_nz · « **Reply #7 on:** February 21, 2024, 07:40:11 PM »

Quote from: 57gmc on February 21, 2024, 06:11:36 PM

Quote from: kdub_nz on February 21, 2024, 03:21:48 PM

Very clever name choice !!
I thought it was maybe Arabic.

Perhaps it is that too,

. . . but read it backwards

57gmc · « **Reply #8 on:** February 21, 2024, 07:47:01 PM »

Quote from: kdub_nz on February 21, 2024, 07:40:11 PM

Quote from: 57gmc on February 21, 2024, 06:11:36 PM
Quote from: kdub_nz on February 21, 2024, 03:21:48 PM

Very clever name choice !!
I thought it was maybe Arabic.

Perhaps it is that too,

. . . but read it backwards

I got it. It was just my attempt at humor. :0-)

JohnK · « **Reply #9 on:** February 21, 2024, 09:59:58 PM »

Welcome back, Tony. ...As a new member (again) you are entitled to (one time) publicly call my code/idea/concept moronic; I will see if I can work up a good enough topic to start a heated discussion but if you are short on time, I have in the queue:

[0] My version is faster because error checking is not really necessary.
[1] Just integrate the whole list/array and don't worry about bounds.
[2] ALWAYS sort an array.
[3] There really isn't a difference between constant strings and string literals so just go ahead and try and modify it.
[4] "array[100]" is just SOP, don't bother counting actuals.
[5] You're a stinky-butt! <-- That one was added my my 6 year old.

retsameht · « **Reply #10 on:** February 22, 2024, 08:59:28 PM »

Hi John, and Thanks.

Perhaps we can have a heated discussion about why it's so hard to login here?

Chrome autofills the password and username fields for me, and as soon as I click the login button, the password field is cleared

. It took me about 10 tries to login just now.

Anyways....

For anyone interested, I'm picking a fight on GitHub over the List<T> class, and why it's wise to avoid its constructor (that takes an IEnumerable<T>), and AddRange() methods, and use Linq's ToList() instead, or at least, when initializing it with large collection.

The post is here

Code: [Select]

public static class TestListConstruction
{

   static IEnumerable<int> CreateInput(int start, int count, Func<int, int> func)
   {
      int end = start + count;
      for(int i = start; i < end; i++) yield return func(i);
   }

   static int count = 100000;

   public static void Run()
   {
      var seq = Enumerable.Range(0, count).Select(i => i / 2);
      var seq2 = CreateInput(0, count, i => i / 2);

      /// JIT code 
      List<int> list = new List<int>();
      List<int> list2 = new List<int>();
      List<int> list3 = new List<int>();
      list.AddRange(seq);
      new List<int>(seq);
      seq.ToList();
      list = new List<int>();
      list2 = new List<int>();
      list2.AddRange(seq2);
      list3.AddRange2(seq);
      new List<int>(seq2);
      seq2.ToList();

      for(int i = 0; i < 4; i++)
      {
         list = new List<int>();
         list2 = new List<int>();
         list3 = new List<int>();
         Write("  list.AddRange(seq): ", () => list.AddRange(seq));
         Write("list2.AddRange(seq2): ", () => list2.AddRange(seq2));
         Write("list3.AddRange2(seq): ", () => list3.AddRange2(seq));
         Write("  new List<int>(seq): ", () => new List<int>(seq));
         Write(" new List<int>(seq2): ", () => new List<int>(seq2));
         Write("        seq.ToList(): ", () => seq.ToList());
         Write("       seq2.ToList(): ", () => seq2.ToList());
         Console.WriteLine("");
      }
   }

   public static void Write(string label, Action action)
   {
      long result = Time(action);
      Console.WriteLine("{0} {1,8}", label, result);
   }

   public static long Time(Action action)
   {
      GC.Collect();
      GC.WaitForFullGCComplete();
      Stopwatch stopwatch = Stopwatch.StartNew();
      action();
      stopwatch.Stop();
      return stopwatch.ElapsedTicks;
   }

}

public static class ListExtension
{
   /// <summary>
   /// Surrogate for List<T>.AddRange()
   /// 
   /// Attempts to deduce the number of elements needed
   /// to hold the result, and if successful, increases 
   /// the list's capacity to same. 
   /// 
   /// Unfortunately, this doesn't seem to gain much in
   /// contrast to the more extensive optimizations that 
   /// Enumerable.ToList() has undergone.
   /// </summary>


   public static void AddRange2<T>(this List<T> list, IEnumerable<T> items)
   {
      if(items == null)
         throw new ArgumentNullException(nameof(items));
      if(list == null) 
         throw new ArgumentNullException(nameof(list));
      int count = 0;
      if(items.TryGetNonEnumeratedCount(out count))
         list.EnsureCapacity(list.Count + count);
      list.AddRange(items);
   }
}

JohnK · « **Reply #11 on:** February 23, 2024, 12:45:15 PM »

log in: I'll try to reproduce and dig in a bit but try the login boxes in the upper left corner instead of the other one.

Warning: POSIX C based information follows; grain of salt (I'm not good with C# so chances are these statements could be just gibberish)!

On time:
Typical `start clock`/`stop clock` time functions we tend to build measure actual clock-time (like a stopwatch). But multiple processes share the CPU so real-clock-time isn't an accurate measurement. In POSIX I have to create a virtual timer to find out how long a process spends in a running state to get an "accurate" time (and this is something I have yet to do myself so...).

Can you disable JIT to hopefully get "more optimized code"?

That `Fill()` reduction stephentoub posted is interesting. I would have thought it would have been more optimized than that. For example, I was playing around the other day--in C--with a theory and I pinned myself up against a std lib function. My approach was over-simplistic and doomed to fail (obviously) but the point of my testing was really about how much can be packed into just a line of code like: do { if (*p1++ != *p2++) vs a for(unsigned i = 0...){ if *p1 != *p2... construct and to see if I could even come close.

But if I pull on that thread a bit to see if I can offer any sort of help: I was going up against the `memcmp()` with a simple FOR loop.

The reason the below code will ALWAYS win is because of the pointer array postfix operator which returns the value BEFORE the operation. So, the statement *p1++ != *p2++ is comparing the value THEN incrementing the array position, and that little optimization puts it one step ahead of my FOR(; ; p1++, p2++) loop.

Code - C++: [Select]

int memcmp(const void *s1, const void *s2, size_t n)
{
        if (n != 0) {
                const unsigned char *p1 = s1, *p2 = s2;
 
                do {
                        if (*p1++ != *p2++)
                                return (*--p1 - *--p2);
                } while (--n != 0);
        }
        return (0);
}

I'd like to read a bit more on arrays, loops and operators in C# but it might be a fun little distraction for myself trying to create a `Fill()` function.

It's Alive! · « **Reply #12 on:** February 23, 2024, 08:08:39 PM »

Quote from: JohnK on February 23, 2024, 12:45:15 PM

Warning:

how its done these days https://xoranth.net/memcmp-avx2/

JohnK · « **Reply #13 on:** February 23, 2024, 10:05:18 PM »

Quote from: It's Alive! on February 23, 2024, 08:08:39 PM

Quote from: JohnK on February 23, 2024, 12:45:15 PM
Warning:
how its done these days https://xoranth.net/memcmp-avx2/

stephentoub said the code reduced to Fill().

I supposed I should have framed my ramblings this way (but I was so rudely interrupted by actual work all day):

Questions about Fill():
1. Why is there an iterator (i) when you already have the start?
2. Why get the length every iteration?
3. If you accepted a size_t you could have iterated backwards to zero and answered #1 and #2.

Code - C#: [Select]

private static void Fill(Span<TResult> results, int start, Func<int, TResult> func) {
  for (int i = 0; i < results.Length; i++, start++) {
    results[i] = func(start);
  }
}

I would have expected something like (not sure how the compiler would react to the sequence for modification and access to that size var because I'm sure most compilers would toss a wobbly, but you get my concept and it could be moved down):

Code - C#: [Select]

do {
  results[size--] = func(size);
} while (size > 0);

---
EDIT: checked theory.
All of these compile (shocked the first one did, to be honest), but they should be faster than the `fill()`'s for loop stephentoub provided.

Code - C++: [Select]

// Compiles but tosses a warning.
do {
  results[size--] = func[size];                         // Here be dragons?
} while (size > 0);
 
do {
  results[size] = func[size];
  size = size - 1;
} while (size > 0;
 
for (; size > 0;) {
  results[size] = func[size];
  size = size - 1;
}

retsameht · « **Reply #14 on:** February 24, 2024, 07:30:59 AM »

Quote from: JohnK on February 23, 2024, 12:45:15 PM

Can you disable JIT to hopefully get "more optimized code"?

Now you can deploy a .NET app as a native APP without any JIT whatsoever. It's called
Native AOT

Quote

That `Fill()` reduction stephentoub posted is interesting. I would have thought it would have been more optimized than that.

The latest .NET has some nifty optimizations for ToArray() and ToList(). But they didn't propagate them to the List<T> constructor or AddRange().

Here's the same/similar optimization applied to the latter. The Span<T> class allows direct access to the list's internal storage, which is where the biggest improvement comes from:

Code: [Select]

public static void FastAddRange<T>(this List<T> list, IEnumerable<T> items)
{
   if(items == null)
      throw new ArgumentNullException(nameof(items));
   if(list == null)
      throw new ArgumentNullException(nameof(list));
   if(items.TryGetNonEnumeratedCount(out int count) && count > 0)
   {
      int current = list.Count;
      CollectionsMarshal.SetCount(list, current + count);
      Span<T> span = CollectionsMarshal.AsSpan(list);
      int i = current;
      int end = current + count;
      using(IEnumerator<T> enumerator = items.GetEnumerator())
      {
         while(enumerator.MoveNext())
         {
            Debug.Assert(i < end, "Count mismatch");
            span[i++] = enumerator.Current;
         }
         Debug.Assert(i == end, "Count mismatch");
      }
   }
   else
   {
      list.AddRange(items);
   }
}

kdub_nz · « **Reply #15 on:** February 24, 2024, 04:30:23 PM »

Quote from: TheMaster on February 08, 2012, 05:58:53 PM

http://connect.microsoft.com/VisualStudio/feedback/details/510253/linq-optimization-with-non-generic-icollections

This was originally brought about as a result of having to deal with Autodesk's failure to implement IEnumerable<T>, ICollection<T>, and IDictionary<TKey, TValue> on many AutoCAD types that expose collections as non-generic IEnumerable, ICollection, or IDictionary.

Hi Tony, do you have an alternate address for this.
The link posted throws me into
https://www.bing.com/?ref=aka&shorturl=connect-redirect
for some reason.

added :
ahhhh, a re-read of the new post shows the linked issue is closed.

I'll go away . . .

retsameht · « **Reply #16 on:** February 24, 2024, 04:30:54 PM »

Still investigating how to get FastAddRange() to be comparable with Enumerable.ToList(), which is showing very peculiar results.

Code: [Select]

using System.Collections.Generic;
using System.Diagnostics;
using System.Runtime.InteropServices;
using System.Text;

namespace System.Linq
{
   // project config:
   //
   // <PropertyGroup>
   //    <ConcurrentGarbageCollection>false</ConcurrentGarbageCollection>
   // </PropertyGroup>

   public static class TestListConstruction
   {

      static IEnumerable<int> CreateInput(int start, int count, Func<int, int> func)
      {
         int end = start + count;
         for(int i = start; i < end; i++) yield return func(i);
      }

      static bool manageGC = true;
      static int startcount = 1000;        // starting data size, doubles on each step
      static int steps = 7;                // # of steps
      static int count = startcount;       // # list elements
      static int iterations = 100;         // iterations per-test
      static int testCount = 7;            // # of tests being measured
      static int rowCount = testCount + 1; // # rows in output
      static Table table;

      static List<int> result;

      static Func<int, int> func = i => i + 1;

      static IEnumerable<int> s1 = null;
      static IEnumerable<int> s2 = null;
      static IEnumerable<int> s3 = null;

      static List<int> list = null;

      static string[] labels = new string[]
      {
         "List.Count",
         "new List<int>().AddRange(s1))",
         "new List<int>().AddRange(s2))",
         "new List<int>().FastAddRange(s3))",
         "new List<int>(s1))",
         "new List<int>(s2))",
         "s1.ToList())",
         "s2.ToList())",
      };

      public static void Run()
      {

         /// Steps is the number of test steps, with the 
         /// data volume doubling on each step
         /// int steps = 4; // 7;       // includes a warm-up step that is discarded
         
         /// Table row 0 is the count/size of the test data list
         /// Each subsequent row represents a test.
         /// Each column represents the timing for the data size
         /// stored in row 0.
         
         table = new Table(steps, testCount + 1);

         for(int i = 0; i < steps + 1; i++)
         {
            s1 = Enumerable.Range(0, count).Select(func);
            s2 = CreateInput(0, count, func);
            s3 = Enumerable.Range(0, count).Select(func);

            long[] times = new long[testCount + 1];
            times[0] = count;
            times[1] = Measure(() => new List<int>().AddRange(s1));
            times[2] = Measure(() => new List<int>().AddRange(s2));
            times[3] = Measure(() => new List<int>().FastAddRange(s3));
            times[4] = Measure(() => result = new List<int>(s1));
            times[5] = Measure(() => result = new List<int>(s2));
            times[6] = Measure(() => result = s1.ToList());
            times[7] = Measure(() => result = s2.ToList());

            if(i > 0) // Discard first step (a warmup/pre-jit).
            {
               for(int k = 0; k < testCount + 1; k++)
                  table[i-1, k] = times[k];
            }        

            /// Data volume is doubled on each step
            count *= 2;
         }

         Dump();
      }

      static void Dump()
      {
         StringBuilder s = new StringBuilder();
         int maxlabel = labels.Select(s => s.Length).Max();
         string fmt = $"{{0,{maxlabel}}}:  ";
         for(int i = 0; i < testCount + 1; i++)
         {
            s.AppendFormat(fmt, labels[i]);
            for(int j = 0; j < steps; j++)
            {
               s.AppendFormat("{0,8}", table[j, i]);
            }
            if(i == 0)
               s.Append($"\n{new string('-', s.Length)}\n");
            s.Append("\n");
         }
         Console.WriteLine($"\n{s}");
      }

      public static long Measure(Action action)
      {
         long ticks = 0;
         for(int i = 0; i < iterations; i++)
         {
            GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced, true, true);
            GC.WaitForPendingFinalizers();
            Stopwatch stopwatch = Stopwatch.StartNew();
            action();
            stopwatch.Stop();
            ticks += stopwatch.ElapsedTicks;
         }
         return ticks / iterations;
      }

   }

   class Table
   {
      int rows;
      int columns;
      long[,] table;

      public Table(int rows, int cols)
      {
         table = new long[rows, cols];
         this.rows = rows;
         this.columns = cols;
      }

      public long this[int row, int col]
      {
         get { return table[row,col]; }
         set { table[row,col] = value; }
      }
   }

   public static class ListExtension
   {
      /// <summary>
      /// Surrogate for List<T>.AddRange()
      /// 
      /// Attempts to deduce the number of elements needed
      /// to hold the result, and if successful, increases 
      /// the list's capacity to same.
      /// 
      /// </summary>

      public static void FastAddRange<T>(this List<T> list, IEnumerable<T> items)
      {
         if(items == null)
            throw new ArgumentNullException(nameof(items));
         if(list == null)
            throw new ArgumentNullException(nameof(list));
         if(items.TryGetNonEnumeratedCount(out int count) && count > 0)
         {
            int start = list.Count;
            int end = start + count;
            CollectionsMarshal.SetCount(list, end);
            Debug.Assert(list.Count == end);
            Span<T> span = CollectionsMarshal.AsSpan(list);
            Debug.Assert(span.Length == list.Count);
            int i = start;
            using(IEnumerator<T> enumerator = items.GetEnumerator())
            {
               while(enumerator.MoveNext())
               {
                  Debug.Assert(i < end, "Count mismatch");
                  span[i++] = enumerator.Current;
               }
               Debug.Assert(i == end, "Count mismatch");
            }
         }
         else
         {
            list.AddRange(items);
         }
      }
   }

}

I tried running this on a 7 year-old laptop with 8GB that was mostly in-use and I still see something I can't figure out. The second to last result (Enumerable.ToList() using an iterator that implements IIListProvider), just blows away all the other methods.

Code: [Select]

                       List.Count:      2000    4000    8000   16000   32000   64000  128000
--------------------------------------------------------------------------------------------

    new List<int>().AddRange(s1)):       248     401     843    1844     820    1908    5038
    new List<int>().AddRange(s2)):       606    1137    2073    1086    2183    5375    9717
new List<int>().FastAddRange(s3)):       357     454     636     333     653    1451    3365
               new List<int>(s1)):       219     412    1095     441     897    1863    4076
               new List<int>(s2)):       594    1208     470     884    1750    3518    8451
                     s1.ToList()):       247     253     289      96     181     363    1226    <----- !?!?!?
                     s2.ToList()):       609    1178     470     888    1820    3667    7138

retsameht · « **Reply #17 on:** February 24, 2024, 04:33:02 PM »

Quote from: kdub_nz on February 24, 2024, 04:30:23 PM

Quote from: TheMaster on February 08, 2012, 05:58:53 PM
http://connect.microsoft.com/VisualStudio/feedback/details/510253/linq-optimization-with-non-generic-icollections

This was originally brought about as a result of having to deal with Autodesk's failure to implement IEnumerable<T>, ICollection<T>, and IDictionary<TKey, TValue> on many AutoCAD types that expose collections as non-generic IEnumerable, ICollection, or IDictionary.
Hi Tony, do you have an alternate address for this.
The link posted throws me into
https://www.bing.com/?ref=aka&shorturl=connect-redirect
for some reason.

No, I don't. that discussion group on connect.microsoft.com has long since been retired (archived I suppose), and I get the same thing when I tried the link a few days back.

kdub_nz · « **Reply #18 on:** February 24, 2024, 04:58:05 PM »

The thing that caught my eye in the report was the first line reduction in time between the counts of 16000 and 32000.

. . . and the lack of a consistant pattern in times between tests.

I'll need to put this aside for study later.

retsameht · « **Reply #19 on:** February 24, 2024, 06:06:07 PM »

Quote from: kdub_nz on February 24, 2024, 04:58:05 PM

The thing that caught my eye in the report was the first line reduction in time between the counts of 16000 and 32000.

. . . and the lack of a consistant pattern in times between tests.

I'll need to put this aside for study later.

The amount of free memory will definitely affect the results. The extreme deltas are almost certainly a result of memory pressure and the resulting need to do a GC. The test generates a lot of garbage, which is problematic, since the GC needs to reclaim memory often.

So, the timings are not stable in absolute terms, but seem to be somewhat stable in relative terms. IOW, the differences between timings is fairly-consistent. I was considering using BenchmarkDotNet, but that would require a lot more effort given that the code is measuring execution time of delegates rather than methods..

pkohut · « **Reply #20 on:** February 25, 2024, 03:21:43 AM »

TT, was just thinking of you around the new year. Best wishes.

retsameht · « **Reply #21 on:** February 25, 2024, 09:33:58 PM »

Quote from: pkohut on February 25, 2024, 03:21:43 AM

TT, was just thinking of you around the new year. Best wishes.

Thanks Paul, Likewise.

kdub_nz · « **Reply #22 on:** February 26, 2024, 05:31:50 PM »

Without too much future envy :

Looks like 'they' are adding some new LINQ Methods in .NET 9 alpha

News:

Author Topic: My Proposal For Optimizing LINQ (Read 3300 times)

TheMaster

fixo

TheMaster