Author Topic: about speed of searching objects in list  (Read 6018 times)

0 Members and 1 Guest are viewing this topic.

nekitip

  • Guest
about speed of searching objects in list
« on: September 09, 2014, 03:26:08 AM »
This may be interesting as a general idea for someone who has been wandering why comparing of two list is slow and has not yet have this "aha" moment for some reason. :)
It's half .NET question/half autocad, so I think it is still a good place to ask.
Lets say you have your list of custom objects, holding length, area and objectid.
Now, user selection has changed, you want to update your list based on user selection and you use impliedselectionchanged event to find all selected items.
You need to remove the ones not currently in selection and add the new ones. Now, I still have not found simple routine to do this, but the observations is - when comparing, numbers are the fastest to compare, but you just forget this. And here are the times it take to compare (on remove!), using "contains" on 20 000 objects on i3 machine.
comparing object itself to other object= 12 secs
comparing objectid to objectid = 8 secs
if you take just string, and let strings be compared, it all goes faster!:
objectid.tostring = 4 sec
speed things up with paralell processing
objectid.tostring (asparalell)=2 sec
and if you use number, it is the fastest you can do
objectid.handle.value (asparalell)=1 sec
test routine that I have used:
Code - vb.net: [Select]
  1. Public Shared Sub selectionchanged()        
  2.         Dim candidates As New List(Of ObjectId)
  3.  
  4.         For Each o As ObjectId In get_selected_oids()
  5.             If o.ObjectClass.IsDerivedFrom(RXClass.GetClass(GetType(Curve))) Then
  6.                 candidates.Add(o)
  7.             End If
  8.         Next
  9.  
  10.         Dim oidOLD = (From o As LengthAreaObject In lengthAreaList Select o.oid).ToList
  11.         Dim oidOLD_NODEL = (From o As LengthAreaObject In lengthAreaList Where o.but_not_this_one= False Select o.oid).ToList
  12.  
  13.         Dim oidstoadd = candidates.Except(oidOLD).ToList
  14.         Dim oidstoREM = oidOLD_NODEL.Except(candidates).ToList
  15.         Dim oidstoremStr As New List(Of String)
  16.         For Each o As ObjectId In oidstoREM
  17.             oidstoremStr.Add(o.ToString)
  18.         Next
  19.  
  20.         Dim oidstoremNum As New List(Of Long)
  21.         For Each o As ObjectId In oidstoREM
  22.             oidstoremNum.Add(o.Handle.Value)
  23.         Next
  24.  
  25.         Dim rlist As List(Of LengthAreaObject) = (From i As LengthAreaObject In lengthAreaList.AsParallel Where oidstoremNum.Contains(i.oid.Handle.Value)).ToList
  26.  
  27.         lengthAreaList.RemoveRange(rlist)
  28.         lengthAreaList.AddRange((From o As ObjectId In oidstoadd Select New LengthAreaObject() With {.oid = o}).ToList)
  29.  
  30.     End Sub
  31.  

Now, when we know this, then all there is left is to find some clean routine (possibly linq) to actually do this job.
...so if someone has idea?
* I was also wondering if it is possible to fast convert objectid to number, and not to use handle. Just a thought...

gile

  • Gator
  • Posts: 2507
  • Marseille, France
Re: about speed of searching objects in list
« Reply #1 on: September 09, 2014, 05:10:05 AM »
Hi,

Just a thought, did you try using HashSet<T> instead of List<T> ?

You can also see this thread.

<EDIT>
Reading more attentively your post:
Quote
You need to remove the ones not currently in selection and add the new ones.
It seems to me it's the same as building a new collection with the current selection.

Quote
I was also wondering if it is possible to fast convert objectid to number
Have a look at the ObjectId.OldIdPtr property which returns an IntPtr which has a ToInt64() method.
« Last Edit: September 09, 2014, 12:06:14 PM by gile »
Speaking English as a French Frog

nekitip

  • Guest
Re: about speed of searching objects in list
« Reply #2 on: September 09, 2014, 03:59:55 PM »
@gile
My english is bad, so I'm hard to understand. I need something like "objects added to selection" and "objects moved from selection". There is somewhere editor's event like that, but I react to impliedselectionchanged event (don't remember why though) However, not to miss the point, this is not something impossible and should be easy, and I'm intrigued. The easiest way to do this would be something like mycollection=nothing, mycollection=implied selection, but I can't do that, i need "added" and "removed" objects.

I have naver used HashSet, and will take a look, thank you.
But I have tried different approach, the one it seems "modern and clean" to me but strangely (for me) the results are not what I have expected and I believe it's because of my notunderstaning things somewhere.
So the idea here (as I have shown before) is that the fastest way to do this is to compare numbers and I have tried LINQ and custom comparer way.
Code - vb.net: [Select]
  1. Public Class comparerObjIdNum
  2.     Implements IEqualityComparer(Of ObjectId)
  3.  
  4.     Public Overloads Function Equals(x As ObjectId, y As ObjectId) As Boolean Implements IEqualityComparer(Of ObjectId).Equals
  5.         If x.Handle.Value = y.Handle.Value Then
  6.             Return True
  7.         Else
  8.             Return False
  9.         End If
  10.     End Function
  11.  
  12.     Public Overloads Function GetHashCode(obj As ObjectId) As Integer Implements IEqualityComparer(Of ObjectId).GetHashCode
  13.         If obj.IsNull Then
  14.             Return 0
  15.         Else
  16.             Return obj.Handle.Value.GetHashCode
  17.         End If
  18.     End Function
  19. End Class
Code - vb.net: [Select]
  1. Public Shared Sub selectionchangedCOMPARER()
  2.         Dim oids As List(Of ObjectId) = get_selected_oids()
  3.         Dim candidates As New List(Of ObjectId)
  4.  
  5.         For Each o As ObjectId In oids
  6.             If o.ObjectClass.IsDerivedFrom(RXClass.GetClass(GetType(Curve))) Then
  7.                 candidates.Add(o)
  8.             End If
  9.             ''and some other class
  10.         Next
  11.  
  12.         Dim compareObjectIDvsNumber As New comparerObjIdNum
  13.  
  14.         Dim oidOLD = From o As LengthAreaObject In lengthAreaList Select o.oid
  15.         Dim addoids = candidates.Except(oidOLD, compareObjectIDvsNumber).ToList
  16.         Dim addList As List(Of LengthAreaObject) = (From o As ObjectId In addoids.AsParallel Select New LengthAreaObject With {.oid = o}).ToList
  17.  
  18.         Dim oidOLD_free_to_del = From o As LengthAreaObject In mydataholder.lengthAreaList Where o.dont_delete_me= False Select o.oid
  19.         Dim remoids = oidOLD_free_to_del.Except(candidates, compareObjectIDvsNumber).ToList
  20.         Dim rlist = (From i As LengthAreaObject In lengthAreaList.AsParallel Where remoids.Contains(i.oid, compareObjectIDvsNumber)).ToList
  21.  
  22.        .lengthAreaList.RemoveRange(rlist)
  23.         lengthAreaList.AddRange(addList)
  24.  
  25.     End Sub

this code will run about 10 secs on 10 000 objects on removing objects, since "contains" is the slowest one and we talk about that (all time lost on "rlist")
however, the code below will run 1 sec
Code - vb.net: [Select]
  1.  Public Shared Sub selectionchanged()
  2.         Dim oids As List(Of ObjectId) = get_selected_oids()
  3.         Dim candidates As New List(Of ObjectId)
  4.  
  5.         For Each o As ObjectId In oids
  6.             If o.ObjectClass.IsDerivedFrom(RXClass.GetClass(GetType(Curve))) Then
  7.                 candidates.Add(o)
  8.             End If
  9.             ''and some other class
  10.         Next
  11.  
  12.         Dim oidOLD = From o As LengthAreaObject In lengthAreaList Select o.oid
  13.         Dim oidstoadd = candidates.Except(oidOLD).ToList
  14.         Dim addList = (From o As ObjectId In oidstoadd.AsParallel Select New LengthAreaObject With {.oid = o}).ToList
  15.  
  16.         Dim candiNUM = From c As ObjectId In candidates Select c.Handle.Value
  17.         Dim oidOldStaysNUM = From o As LengthAreaObject In lengthAreaList Where o.dont_delete_me= False Select o.oid.Handle.Value
  18.         Dim odistoRemNUM = oidOldStaysNUM.Except(candiNUM).ToList
  19.         Dim rlist As List(Of LengthAreaObject) = (From i As LengthAreaObject In lengthAreaList.AsParallel Where odistoRemNUM.Contains(i.oid.Handle.Value)).ToList
  20.  
  21.         lengthAreaList.RemoveRange(rlist)
  22.         lengthAreaList.AddRange(addList)
  23.     End Sub

So, I'm looking code, and still cant figure out why solution with comparer is so slower.
EDIT: (when typing, listing #2 was first time same as #3)
« Last Edit: September 09, 2014, 04:23:07 PM by nekitip »

nekitip

  • Guest
Re: about speed of searching objects in list
« Reply #3 on: September 09, 2014, 04:15:29 PM »
update!!!
unexpected things.
If comparer is changed to look for, as gile has pointed - pointer, like this:
Code - vb.net: [Select]
  1. Public Class comparerObjIdNum
  2.     Implements IEqualityComparer(Of ObjectId)
  3.  
  4.     Public Overloads Function Equals(x As ObjectId, y As ObjectId) As Boolean Implements IEqualityComparer(Of ObjectId).Equals
  5.         If x.OldIdPtr.ToInt64 = y.OldIdPtr.ToInt64 Then
  6.             Return True
  7.         Else
  8.             Return False
  9.         End If
  10.     End Function
  11.  
  12.     Public Overloads Function GetHashCode(obj As ObjectId) As Integer Implements IEqualityComparer(Of ObjectId).GetHashCode
  13.         If obj.IsNull Then
  14.             Return 0
  15.         Else
  16.             Return obj.OldIdPtr.ToInt64.GetHashCode
  17.         End If
  18.     End Function
  19. End Class
Now the time is 2 sec for 10 000 objects (removing on contains), and 4 on 20 000
Changing handle.value to objectptr should not be of significance, but it is! Why? Not expected at least.
Also, if used toint32, it doesnt work (maybe because i have 64 machine), and this potentially complicates things if it is to be used in practice.
But! Routine without comparer is finished in 1-2 secs on removing 20 000 objects (on contains) and still leads.

Jeff H

  • Needs a day job
  • Posts: 6150
Re: about speed of searching objects in list
« Reply #4 on: September 09, 2014, 04:32:12 PM »
I not following everything here but if look at MSDN docs they tell what is used for methods

List<T>.IList.Remove Method
Quote
This method determines equality using the default equality comparer EqualityComparer<T>.Default for T, the type of values in the list.


List<T>.Contains Method
Quote
This method determines equality by using the default equality comparer, as defined by the object's implementation of the IEquatable<T>.Equals method for T (the type of values in the list).

You can read through them should answer all your questions.

nekitip

  • Guest
Re: about speed of searching objects in list
« Reply #5 on: September 09, 2014, 04:42:09 PM »
I not following everything here but if look at MSDN docs they tell what is used for methods

List<T>.IList.Remove Method
Quote
This method determines equality using the default equality comparer EqualityComparer<T>.Default for T, the type of values in the list.


List<T>.Contains Method
Quote
This method determines equality by using the default equality comparer, as defined by the object's implementation of the IEquatable<T>.Equals method for T (the type of values in the list).

You can read through them should answer all your questions.
well, the basic question is this: comparers compare by something, but what property is the fastest to compare. You can make custom comparer no matter what the list contains. I have shown some numbers that ilustrate speed by comparing different things. This is the limit of my current knowlege :)

Jeff H

  • Needs a day job
  • Posts: 6150
Re: about speed of searching objects in list
« Reply #6 on: September 09, 2014, 05:15:40 PM »
It depends,

on how the class was built

Quote
x.OldIdPtr.ToInt64 = y.OldIdPtr.ToInt64

It depends on if  and how they implemented the "=" operator.


When dealing with AutoCAD object properties other than testing I am not sure how to tell because how the underlying property is retrieved can cause the timing to vary.
Just about every property you access is either creating another managed wrapper to get it or calling a unmanaged method to get it.

Code - C#: [Select]
  1. [Category("Misc")]
  2. public ObjectId ObjectId
  3. {
  4.     get
  5.     {
  6.         ObjectId id;
  7.         AcDbObjectId id2;
  8.         AcDbObjectId modopt(IsConst)* idPtr = AcDbObject.objectId((AcDbObject modopt(IsConst)* modopt(IsConst) modopt(IsConst)) this.GetImpObj(), &id2);
  9.         id.m_id = *((long*) idPtr);
  10.         return id;
  11.     }
  12. }
  13.  
  14.  
  15.  
  16.  

gile

  • Gator
  • Posts: 2507
  • Marseille, France
Re: about speed of searching objects in list
« Reply #7 on: September 09, 2014, 05:52:48 PM »
Hi,

IMO, the performance differences between comparing ObjectId or ObjectId.Handle.Value or ObjectId.OldIdPtr.ToInt64() is not so important.

Instead, you should optimize your code by avoiding duplicative collection mappings (ObjectId -> LengthAreaObject and LengthAreaObject -> ObjectId): either use ObjectId collections and only map the result to a LengthAreaObject collection; or first map the ObjectId collection (get_selected_oids ()) into a LengthAreaObject collection and compare these objects (i.e. override Equals () and GetHashCode () in the LengthAreaObject class to use ObjectId comparisons).
The type of collection can also provide serious performance boost: HashSet<T>.Contains() and HashSet<T>.ExceptWith() are much more efficient than List<T>.Contains() and IEnumerable<T>.Except().
« Last Edit: September 09, 2014, 06:06:43 PM by gile »
Speaking English as a French Frog

nekitip

  • Guest
Re: about speed of searching objects in list
« Reply #8 on: September 10, 2014, 04:38:03 PM »
I was hoping to get out easy, by only changing algorithm.
I'm not sure if performance difference from  ObjectId or ObjectId.Handle.Value or ObjectId.OldIdPtr.ToInt64 is so small. It looks to be some sort of exponential thing.
@Jeff, so I assume, OldIdPtr.ToInt64 is unique (in same way like objectid)?
...Although, most of users will never notice this speed difference since - it shows only on large (badly written) operations.

O.T.
If someone has a only one minute, can something be confirmed to me, fully related to this.
I'm not sure if there a bug in autocad in the selection process that is messing with me.
As an end user, use AutoCAD to:
1) Select 20 000-30 000 (same) objects with mouse at once
2) Deselect part of previous selection using mouse and holding SHIFT key.
3) Notice that one CPU core is now fully busy and stays busy forever.
4) Press ESC key and now CPU goes to 0% usage.

I'm seeing this on 2015.

Jeff H

  • Needs a day job
  • Posts: 6150
Re: about speed of searching objects in list
« Reply #9 on: September 10, 2014, 07:08:23 PM »
It depends on computer, system variables, what type of object, etc......
 
I would not worry about the little difference of time comparing and probably differences have to do more with hardware.
I was just guessing on differences.
 
As gile mentioned earlier a different collection and as gile mentioned a HashSet would be better.
A HashSet will use GetHashCode but uses that  in a algorithm to decide what bucket to put it in.
So a List must search search Linear but a hashSet will use GetHashCode to decide what bucket it is in which might contain more buckets but cuts down the items searched.
 
This is over simplified version but for example if the algorithm  was using integer division and diving by 10 so
Code - C#: [Select]
  1. int bucket = obj.GetHashCode() / 10;
  2.  

And number is 78 see pic or attached drawing

 
 
 
 

nekitip

  • Guest
Re: about speed of searching objects in list
« Reply #10 on: September 11, 2014, 04:39:32 PM »
Hashset turns out to be a nice thing. Removing is now taking around 1 sec or less, or in this case - stops being narrow throat.
Now the routine looks like this:
Code - vb.net: [Select]
  1.     Public Shared Sub selectionchanged()
  2.         Dim oids As List(Of ObjectId) = get_selected_oids()            
  3.         Dim candidates = (From o In oids.AsParallel Where o.ObjectClass.IsDerivedFrom(RXClass.GetClass(GetType(Curve)))).ToList
  4.  
  5.         Dim oidOLD = From o As LengthAreaObject In lengthAreaList Select o.oid
  6.         Dim oidstoadd = candidates.Except(oidOLD).ToList
  7.         Dim addList = (From o As ObjectId In oidstoadd.AsParallel Select New LengthAreaObject With {.oid = o}).ToList
  8.  
  9.         Dim oidOldStays = From o As LengthAreaObject In lengthAreaList Where o.keep_this_one= False Select o.oid
  10.         Dim oidstoRem = oidOldStays.Except(candidates).ToList
  11.         Dim setToRemove = New HashSet(Of ObjectId)(oidstoRem)
  12.         Dim rlist As List(Of LengthAreaObject) = lengthAreaList.AsParallel.Where(Function(x) setToRemove.Contains(x.oid)).ToList
  13.  
  14.         lengthAreaList.RemoveRange(rlist)
  15.         lengthAreaList.AddRange(addList)
  16.     End Sub
Now, rembember - i needed "add list" and "remove list". As gile has noticed, this is not an efficient way to do things, but other parts of code relies on "IList", so, for now...

And next question is: is dealing in a way I did with ObjectClass.IsDerivedFrom legal?


gile

  • Gator
  • Posts: 2507
  • Marseille, France
Re: about speed of searching objects in list
« Reply #11 on: September 12, 2014, 05:11:57 AM »
Doesn't the following code returns the expected result ?

C#
Code - C#: [Select]
  1. public static void SelectionChanged()
  2. {
  3.     var curveType = RXClass.GetClass(typeof(Curve));
  4.  
  5.     var lengthAreaSet = new HashSet<LengthAreaObject>(
  6.         get_selected_oids()
  7.         .Where(id => id.ObjectClass.IsDerivedFrom(curveType))
  8.         .Select(id => new LengthAreaObject { oid = id }));
  9.  
  10.     lengthAreaSet.UnionWith(lengthAreaList.Where(o => o.keep_this_one));
  11.  
  12.     lengthAreaList = lengthAreaSet.ToList();
  13. }

VB (not sure about translation)
Code - vb.net: [Select]
  1. Public Shared Sub SelectionChanged()
  2.     Dim curveType = RXClass.GetClass(GetType(Curve))
  3.    
  4.     Dim lengthAreaSet = New HashSet(Of LengthAreaObject)( _
  5.         From id In get_selected_oids() _
  6.         Where id.ObjectClass.IsDerivedFrom(curveType) _
  7.         Select New LengthAreaObject() With { .oid = id })
  8.        
  9.     lengthAreaSet.UnionWith( _
  10.         From o In lengthAreaList Where o.keep_this_one Select o)
  11.        
  12.     lengthAreaList = lengthAreaSet.ToList()
  13. End Sub

this requires to override the Equals() and GetHashCode() methods in the LengthAreaObject class definition.

C#
Code - C#: [Select]
  1. public override bool Equals(object other)
  2. {
  3.     return other is LengthAreaObject && this.oid == ((LengthAreaObject)other).oid;
  4. }
  5.    
  6. public override int GetHashCode()
  7. {
  8.     return this.oid.GetHashCode();
  9. }

VB (not sure about translation)
Code - vb.net: [Select]
  1. Public Overrides Function Equals(other As Object) As Boolean
  2.     Return TypeOf other Is LengthAreaObject AndAlso _
  3.         Me.oid = DirectCast(other, LengthAreaObject).oid
  4. End Function
  5.  
  6. Public Overrides Function GetHashCode() As Integer
  7.     Return Me.oid.GetHashCode()
  8. End Function
« Last Edit: September 12, 2014, 05:30:30 AM by gile »
Speaking English as a French Frog

nekitip

  • Guest
Re: about speed of searching objects in list
« Reply #12 on: September 12, 2014, 09:46:04 AM »
No.
Quote
The easiest way to do this would be something like mycollection=nothing, mycollection=implied selection, but I can't do that, i need "added" and "removed" objects.
Other than that, doesn't following line, in general, brakes binding?
Code - vb.net: [Select]
  1. lengthAreaList = lengthAreaSet.ToList()

However, overriding object's equals is an idea to explore. Nice effort though.


gile

  • Gator
  • Posts: 2507
  • Marseille, France
Re: about speed of searching objects in list
« Reply #13 on: September 12, 2014, 03:54:25 PM »
No.
Quote
The easiest way to do this would be something like mycollection=nothing, mycollection=implied selection, but I can't do that, i need "added" and "removed" objects.
Did you try the code or only read it ?
The code I posted does add and remove objects. All the magic is in the HashSet<T>.UnionWith(): it computes a mathematical set union, i.e. adds the items of the parameter collection to the HashSet excepted duplicated ones.

If I don't misunderstand, the algorithm you implemented is something like this:
- create a list of items to add with the content of 'candidates' except the ones already contained in 'lengthAreaList'
- create a list of items to remove with the items of 'lengthAreaList' which the 'keep_this_one' property is false and which aren't contained in 'candidates'
- remove the items of this second list from 'lengthAreaList'
- add the items of the first list to 'lengthAreaList'

The algorithm I tried to implement should return the same result:
- create a set of 'LengthAreaObject' with the content of 'candidates'
- union this set with all items of 'lengthAreaList' which the 'keep_this_one' property is true (i.e. add to the set all items in the collection which are not already contained in the set)
- convert the set into a list (as you requiered)

Other than that, doesn't following line, in general, brakes binding?
Code - vb.net: [Select]
  1. lengthAreaList = lengthAreaSet.ToList()
What binding are you talking about ?
If you prefer, you can replace:
Code - vb.net: [Select]
  1. lengthAreaList = lengthAreaSet.ToList()
with:
Code - vb.net: [Select]
  1. lengthAreaList.Clear()
  2. lengthAreaList.AddRange(lengthAreaSet)
If you need the LengthAreaObjects which 'keep_this_one' is true remain the same references in lengthAreaList, you can reverse the process: build a Hashset with these objects and union it with candidates.
Code - vb.net: [Select]
  1. Public Shared Sub SelectionChanged()
  2.    Dim curveType = RXClass.GetClass(GetType(Curve))
  3.  
  4.    Dim lengthAreaSet = New HashSet(Of LengthAreaObject)( _
  5.            From o In lengthAreaList Where o.keep_this_one Select o)
  6.  
  7.    lengthAreaSet.UnionWith( _
  8.            From id In get_selected_oids() _
  9.            Where id.ObjectClass.IsDerivedFrom(curveType) _
  10.            Select New LengthAreaObject() With { .oid = id })
  11.  
  12.    lengthAreaList.Clear()
  13.    lengthAreaList.AddRange(lengthAreaSet)
  14. End Sub
« Last Edit: September 13, 2014, 06:35:08 AM by gile »
Speaking English as a French Frog

gile

  • Gator
  • Posts: 2507
  • Marseille, France
Re: about speed of searching objects in list
« Reply #14 on: September 13, 2014, 03:13:27 AM »
Maybe a picture is more explainatory.

One other thing, I'm curious about the List<T>.RemoveRange() method you use which takes a List<T> as argument.

Speaking English as a French Frog