Author Topic: My Proposal For Optimizing LINQ  (Read 2954 times)

0 Members and 1 Guest are viewing this topic.

kdub_nz

  • Mesozoic keyThumper
  • SuperMod
  • Water Moccasin
  • Posts: 2140
  • class keyThumper<T>:ILazy<T>
Re: My Proposal For Optimizing LINQ
« Reply #15 on: February 24, 2024, 04:30:23 PM »
http://connect.microsoft.com/VisualStudio/feedback/details/510253/linq-optimization-with-non-generic-icollections

This was originally brought about as a result of having to deal with Autodesk's failure to implement IEnumerable<T>, ICollection<T>, and IDictionary<TKey, TValue> on many AutoCAD types that expose collections as non-generic IEnumerable, ICollection, or IDictionary.

Hi Tony, do you have an alternate address for this.
The link posted throws me into
https://www.bing.com/?ref=aka&shorturl=connect-redirect
for some reason.    :cry:

added :
ahhhh, a re-read of the new post shows the linked issue is closed.

I'll go away . . .
« Last Edit: February 24, 2024, 04:34:55 PM by kdub_nz »
Called Kerry in my other life
Retired; but they dragged me back in !

I live at UTC + 13.00

---
some people complain about loading the dishwasher.
Sometimes the question is more important than the answer.

retsameht

  • Mosquito
  • Posts: 16
Re: My Proposal For Optimizing LINQ
« Reply #16 on: February 24, 2024, 04:30:54 PM »
Still investigating how to get FastAddRange() to be comparable with Enumerable.ToList(), which is showing very peculiar results.

Code: [Select]
using System.Collections.Generic;
using System.Diagnostics;
using System.Runtime.InteropServices;
using System.Text;

namespace System.Linq
{
   // project config:
   //
   // <PropertyGroup>
   //    <ConcurrentGarbageCollection>false</ConcurrentGarbageCollection>
   // </PropertyGroup>

   public static class TestListConstruction
   {

      static IEnumerable<int> CreateInput(int start, int count, Func<int, int> func)
      {
         int end = start + count;
         for(int i = start; i < end; i++) yield return func(i);
      }

      static bool manageGC = true;
      static int startcount = 1000;        // starting data size, doubles on each step
      static int steps = 7;                // # of steps
      static int count = startcount;       // # list elements
      static int iterations = 100;         // iterations per-test
      static int testCount = 7;            // # of tests being measured
      static int rowCount = testCount + 1; // # rows in output
      static Table table;

      static List<int> result;

      static Func<int, int> func = i => i + 1;

      static IEnumerable<int> s1 = null;
      static IEnumerable<int> s2 = null;
      static IEnumerable<int> s3 = null;

      static List<int> list = null;

      static string[] labels = new string[]
      {
         "List.Count",
         "new List<int>().AddRange(s1))",
         "new List<int>().AddRange(s2))",
         "new List<int>().FastAddRange(s3))",
         "new List<int>(s1))",
         "new List<int>(s2))",
         "s1.ToList())",
         "s2.ToList())",
      };

      public static void Run()
      {

         /// Steps is the number of test steps, with the
         /// data volume doubling on each step
         /// int steps = 4; // 7;       // includes a warm-up step that is discarded
         
         /// Table row 0 is the count/size of the test data list
         /// Each subsequent row represents a test.
         /// Each column represents the timing for the data size
         /// stored in row 0.
         
         table = new Table(steps, testCount + 1);

         for(int i = 0; i < steps + 1; i++)
         {
            s1 = Enumerable.Range(0, count).Select(func);
            s2 = CreateInput(0, count, func);
            s3 = Enumerable.Range(0, count).Select(func);

            long[] times = new long[testCount + 1];
            times[0] = count;
            times[1] = Measure(() => new List<int>().AddRange(s1));
            times[2] = Measure(() => new List<int>().AddRange(s2));
            times[3] = Measure(() => new List<int>().FastAddRange(s3));
            times[4] = Measure(() => result = new List<int>(s1));
            times[5] = Measure(() => result = new List<int>(s2));
            times[6] = Measure(() => result = s1.ToList());
            times[7] = Measure(() => result = s2.ToList());

            if(i > 0) // Discard first step (a warmup/pre-jit).
            {
               for(int k = 0; k < testCount + 1; k++)
                  table[i-1, k] = times[k];
            }       

            /// Data volume is doubled on each step
            count *= 2;
         }

         Dump();
      }

      static void Dump()
      {
         StringBuilder s = new StringBuilder();
         int maxlabel = labels.Select(s => s.Length).Max();
         string fmt = $"{{0,{maxlabel}}}:  ";
         for(int i = 0; i < testCount + 1; i++)
         {
            s.AppendFormat(fmt, labels[i]);
            for(int j = 0; j < steps; j++)
            {
               s.AppendFormat("{0,8}", table[j, i]);
            }
            if(i == 0)
               s.Append($"\n{new string('-', s.Length)}\n");
            s.Append("\n");
         }
         Console.WriteLine($"\n{s}");
      }

      public static long Measure(Action action)
      {
         long ticks = 0;
         for(int i = 0; i < iterations; i++)
         {
            GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced, true, true);
            GC.WaitForPendingFinalizers();
            Stopwatch stopwatch = Stopwatch.StartNew();
            action();
            stopwatch.Stop();
            ticks += stopwatch.ElapsedTicks;
         }
         return ticks / iterations;
      }

   }

   class Table
   {
      int rows;
      int columns;
      long[,] table;

      public Table(int rows, int cols)
      {
         table = new long[rows, cols];
         this.rows = rows;
         this.columns = cols;
      }

      public long this[int row, int col]
      {
         get { return table[row,col]; }
         set { table[row,col] = value; }
      }
   }

   public static class ListExtension
   {
      /// <summary>
      /// Surrogate for List<T>.AddRange()
      ///
      /// Attempts to deduce the number of elements needed
      /// to hold the result, and if successful, increases
      /// the list's capacity to same.
      ///
      /// </summary>

      public static void FastAddRange<T>(this List<T> list, IEnumerable<T> items)
      {
         if(items == null)
            throw new ArgumentNullException(nameof(items));
         if(list == null)
            throw new ArgumentNullException(nameof(list));
         if(items.TryGetNonEnumeratedCount(out int count) && count > 0)
         {
            int start = list.Count;
            int end = start + count;
            CollectionsMarshal.SetCount(list, end);
            Debug.Assert(list.Count == end);
            Span<T> span = CollectionsMarshal.AsSpan(list);
            Debug.Assert(span.Length == list.Count);
            int i = start;
            using(IEnumerator<T> enumerator = items.GetEnumerator())
            {
               while(enumerator.MoveNext())
               {
                  Debug.Assert(i < end, "Count mismatch");
                  span[i++] = enumerator.Current;
               }
               Debug.Assert(i == end, "Count mismatch");
            }
         }
         else
         {
            list.AddRange(items);
         }
      }
   }

}



I tried running this on a 7 year-old laptop with 8GB that was mostly in-use and I still see something I can't figure out. The second to last result (Enumerable.ToList() using an iterator that implements IIListProvider), just blows away all the other methods.

Code: [Select]
                       List.Count:      2000    4000    8000   16000   32000   64000  128000
--------------------------------------------------------------------------------------------

    new List<int>().AddRange(s1)):       248     401     843    1844     820    1908    5038
    new List<int>().AddRange(s2)):       606    1137    2073    1086    2183    5375    9717
new List<int>().FastAddRange(s3)):       357     454     636     333     653    1451    3365
               new List<int>(s1)):       219     412    1095     441     897    1863    4076
               new List<int>(s2)):       594    1208     470     884    1750    3518    8451
                     s1.ToList()):       247     253     289      96     181     363    1226    <----- !?!?!?
                     s2.ToList()):       609    1178     470     888    1820    3667    7138

« Last Edit: February 24, 2024, 04:44:54 PM by retsameht »

retsameht

  • Mosquito
  • Posts: 16
Re: My Proposal For Optimizing LINQ
« Reply #17 on: February 24, 2024, 04:33:02 PM »
http://connect.microsoft.com/VisualStudio/feedback/details/510253/linq-optimization-with-non-generic-icollections

This was originally brought about as a result of having to deal with Autodesk's failure to implement IEnumerable<T>, ICollection<T>, and IDictionary<TKey, TValue> on many AutoCAD types that expose collections as non-generic IEnumerable, ICollection, or IDictionary.
Hi Tony, do you have an alternate address for this.
The link posted throws me into
https://www.bing.com/?ref=aka&shorturl=connect-redirect
for some reason.    :cry:


No, I don't. that discussion group on connect.microsoft.com has long since been retired (archived I suppose), and I get the same thing when I tried the link a few days back.

kdub_nz

  • Mesozoic keyThumper
  • SuperMod
  • Water Moccasin
  • Posts: 2140
  • class keyThumper<T>:ILazy<T>
Re: My Proposal For Optimizing LINQ
« Reply #18 on: February 24, 2024, 04:58:05 PM »
The thing that caught my eye in the report was the first line reduction in time between the counts of 16000  and  32000.

. . . and the lack of a consistant pattern in times between tests.

I'll need to put this aside for study later.
« Last Edit: February 24, 2024, 05:01:15 PM by kdub_nz »
Called Kerry in my other life
Retired; but they dragged me back in !

I live at UTC + 13.00

---
some people complain about loading the dishwasher.
Sometimes the question is more important than the answer.

retsameht

  • Mosquito
  • Posts: 16
Re: My Proposal For Optimizing LINQ
« Reply #19 on: February 24, 2024, 06:06:07 PM »
The thing that caught my eye in the report was the first line reduction in time between the counts of 16000  and  32000.

. . . and the lack of a consistant pattern in times between tests.

I'll need to put this aside for study later.

The amount of free memory will definitely affect the results. The extreme deltas are almost certainly a result of memory pressure and the resulting need to do a GC. The test generates a lot of garbage, which is problematic, since the GC needs to reclaim memory often.

So, the timings are not stable in absolute terms, but seem to be somewhat stable in relative terms. IOW, the differences between timings is fairly-consistent. I was considering using BenchmarkDotNet, but that would require a lot more effort given that the code is measuring execution time of delegates rather than methods..
« Last Edit: February 24, 2024, 06:11:35 PM by retsameht »

pkohut

  • Bull Frog
  • Posts: 483
Re: My Proposal For Optimizing LINQ
« Reply #20 on: February 25, 2024, 03:21:43 AM »
TT, was just thinking of you around the new year.  Best wishes.
New tread (not retired) - public repo at https://github.com/pkohut

retsameht

  • Mosquito
  • Posts: 16
Re: My Proposal For Optimizing LINQ
« Reply #21 on: February 25, 2024, 09:33:58 PM »
TT, was just thinking of you around the new year.  Best wishes.
Thanks Paul,  Likewise.

kdub_nz

  • Mesozoic keyThumper
  • SuperMod
  • Water Moccasin
  • Posts: 2140
  • class keyThumper<T>:ILazy<T>
Re: My Proposal For Optimizing LINQ
« Reply #22 on: February 26, 2024, 05:31:50 PM »
Without too much future envy :

Looks like 'they' are adding some new LINQ Methods in .NET 9 alpha    :-)
Called Kerry in my other life
Retired; but they dragged me back in !

I live at UTC + 13.00

---
some people complain about loading the dishwasher.
Sometimes the question is more important than the answer.