TheSwamp
Code Red => .NET => Topic started by: dann.boy.001 on January 25, 2011, 03:47:24 AM
-
Hello,
Does anybody can help me, with some sugestion, how on quickest
way to remove duplicate points from list or point3dcollection...
Best regards,
Danijel
-
Danijel
My first thought :
declare a new Collection
Iterate the original collection
Test if each Point3D exists in the new collection using Autodesk.AutoCAD.Geometry.Point3dCollection.Contains(Point3d value)
If not there , add the Point3D to the new Collection.
Depending on the tolerance you require for matching you may need to write your own equalsTesting method.
There will be a way to do it with LINQ too, I'm sure. :)
-
Danijel
My first thought :
declare a new Collection
Iterate the original collection
Test if each Point3D exists in the new collection using Autodesk.AutoCAD.Geometry.Point3dCollection.Contains(Point3d value)
If not there , add the Point3D to the new Collection.
Depending on the tolerance you require for matching you may need to write your own equalsTesting method.
There will be a way to do it with LINQ too, I'm sure. :)
Thank you Kerry!
Your first sugestion is very good, for now it is enough.
It is very fast method, but unfortunately there is not posibility to specify tolerances.
When find more time, I will try to add tolerances...
Best regards,
Danijel
-
There will be a way to do it with LINQ too, I'm sure. :)
It is very fast method, but unfortunately there is not posibility to specify tolerances.
When find more time, I will try to add tolerances...
Hi Danijel,
as Kerry said, of course there is, and with a tolerance too. Take a look at this page: http://msdn.microsoft.com/en-us/library/bb338049.aspx (http://msdn.microsoft.com/en-us/library/bb338049.aspx).
Proof of concept in F# follows; alas, how to get a hashcode for this I couldn't grok yet.
type MyEqualityComparer(eps) =
interface System.Collections.Generic.IEqualityComparer<Point3d> with
member me.Equals(a: Point3d, b: Point3d) =
a.DistanceTo b < eps
member me.GetHashCode(pt: Point3d) =
0
let dupPts eps sset =
let db = acApp.DocumentManager.MdiActiveDocument.Database
use tr = db.TransactionManager.StartTransaction()
let pts =
new Point3dCollection[|
for so in Seq.cast<SelectedObject> sset do
let dbp = tr.GetObject(so.ObjectId, OpenMode.ForRead) :?> DBPoint
yield dbp.Position |]
let ptsDistinct =
pts.OfType<Point3d>().Distinct<Point3d>(new MyEqualityComparer(eps))
let res = pts.Count - Seq.length ptsDistinct
tr.Commit()
res
Function signature is epsilon: float -> sset: SelectionSet -> int; that means it returns the difference in length between the Point3dCollection and the IEnumerable which hold the distinct values.
Cheers
-
Sort the list by at least one coordinate first (all coordinates for extremely large sets), so as you compare against each point if the test point coordinate is less than (or greater, depending on the sort order) the main point coordinate you can stop searching since all the other points will never match.
-
Hi,
You can try these extension methods
EDIT: corrected code, see below.
EDIT 2: little optimisation
public static void RemoveDuplicate(this Point3dCollection pts)
{
pts.RemoveDuplicate(Tolerance.Global);
}
public static void RemoveDuplicate(this Point3dCollection pts, Tolerance tol)
{
List<Point3d> ptlst = new List<Point3d>();
for (int i = 0; i < pts.Count; i++)
{
ptlst.Add(pts[i]);
}
ptlst.Sort((p1, p2) => p1.X.CompareTo(p2.X));
for (int i = 0; i < ptlst.Count - 1; i++)
{
for (int j = i + 1; j < ptlst.Count;)
{
if ((ptlst[j].X - ptlst[i].X) > tol.EqualPoint)
break;
if (ptlst[i].IsEqualTo(ptlst[j], tol))
{
pts.Remove(ptlst[j]);
ptlst.RemoveAt(j);
}
else
j++;
}
}
}
-
ptlst.Sort((p1, p2) => p1.X.CompareTo(p2.X));
for (int i = 0; i < ptlst.Count - 1; i++)
{
if (ptlst[i].IsEqualTo(ptlst[i + 1], tol))
pts.Remove(ptlst[i]);
}
}
pt1.IsEqualTo(pt2, new Tolerance(1.0, 1.0)) is equal to (pt1 - pt2).Length <= 1.0. That's good, it's circular as well.
Is it just me or can this fail when encountering more than 2 points that have the same X but different Y or Z?
E.g. ((0,0,0)(0,3,0)(0,1,0)) with Tolerance.EqualPoint = 1 may not flag (0,0,0) and (0,1,0) as equal.
-
Is it just me or can this fail when encountering more than 2 points that have the same X but different Y or Z?
E.g. ((0,0,0)(0,3,0)(0,1,0)) with Tolerance.EqualPoint = 1 may not flag (0,0,0) and (0,1,0) as equal.
You're right thorsen, I was just thinking about this and correct the code in my previous post.
-
Since Point3d is a structure could you not just use the Equals method.
Have not tested but equals is overridden and sealed for point3d
-
Since you are comparing doubles you *must* convert the value before comparing.
Even if you leave the data type as double you need to round or truncate or ?.
I stole recycled this from StackOverflow
internal static double DecimalDouble(double d, int NumberOfPlaces)
{
double scale = Math.Pow(10, Math.Floor(Math.Log10(Math.Abs(d))) + 1);
return scale * Math.Round(d / scale, NumberOfPlaces);
}
-
Since Pont3d is a structure Equals seems to work and should check for equality of fields rather than memory location.
This prints duplicate when pnt1 member function Equals passing in pnt2 is called.
[CommandMethod("TestDupslicate")]
public void TestDupslicate()
{
Document doc = Application.DocumentManager.MdiActiveDocument;
Database db = doc.Database;
Editor ed = doc.Editor;
Point3dCollection pntcoll = new Point3dCollection();
Random rdn = new Random();
for (int i = 0; i <= 10; i++)
{
Point3d pnt = new Point3d(rdn.Next(100) + .000001, rdn.Next(100) + .000000001, rdn.Next(100) + .000001);
pntcoll.Add(pnt);
}
Point3d pnt1 = new Point3d(4.000001, 7000000001, 9.000000000000001);
pntcoll.Add(pnt1);
Point3d pnt2 = new Point3d(4.000001, 7000000001, 9.000000000000001);
foreach (Point3d pnt in pntcoll)
{
if (pnt.Equals(pnt2))
{
ed.WriteMessage("\nDuplicate");
}
else
{
ed.WriteMessage("\nNon-Duplicate");
}
}
}
-
That was not a good example .
I have not narrowed down the exact tolerence but
pnt1, pnt2 will print duplicates and pnt3 will print non-duplicate
[CommandMethod("TestDupslicate")]
public void TestDupslicate()
{
Document doc = Application.DocumentManager.MdiActiveDocument;
Database db = doc.Database;
Editor ed = doc.Editor;
Point3dCollection pntcoll = new Point3dCollection();
Point3d pnt1 = new Point3d(4.000001, 7.000000001, 9.000000000000001);
pntcoll.Add(pnt1);
Point3d pnt2 = new Point3d(4.000001, 7.000000001, 9.00000000000001);
pntcoll.Add(pnt2);
Point3d pnt3 = new Point3d(4.000001, 7.000000001, 9.000000001);
pntcoll.Add(pnt3);
Point3d pnt4 = new Point3d(4.000001, 7.000000001, 9.0000000001);
foreach (Point3d pnt in pntcoll)
{
if (pnt.Equals(pnt4))
{
ed.WriteMessage("\nDuplicate");
}
else
{
ed.WriteMessage("\nNon-Duplicate");
}
}
}
-
Hi,
I tryed to implement Kerry's algo defining an override extension method for Point3dCollection.Contains() to be used with a Tolerance.
From the tests I made with 100 random point.dwg (http://www.theswamp.org/index.php?action=dlattach;topic=32874.0;attach=15396) (from here (http://www.theswamp.org/index.php?topic=32874.0)) it seems to be slower than the solution I posted Reply #5.
public static void RemoveDuplicate2(this Point3dCollection pts, Tolerance tol)
{
Point3dCollection tmp = new Point3dCollection();
for (int i = 0; i < pts.Count; )
{
if (tmp.Contains(pts[i], tol))
pts.RemoveAt(i);
else
{
tmp.Add(pts[i]);
i++;
}
}
}
public static bool Contains(this Point3dCollection pts, Point3d pt, Tolerance tol)
{
for (int i = 0; i < pts.Count; i++)
{
if (pt.IsEqualTo(pts[i], tol))
return true;
}
return false;
}
-
Sorry dann.boy.001 ,
gile uses Point3d.IsEqualTo() which takes a tolerance as a parameter and does not create the uncertainty and extra work that my previous post does.
-
internal static double DecimalDouble(double d, int NumberOfPlaces)
{
double scale = Math.Pow(10, Math.Floor(Math.Log10(Math.Abs(d))) + 1);
return scale * Math.Round(d / scale, NumberOfPlaces);
}
I thought I'd give it a try, but then I realized that log10(0.) gives an unholy value.
-
Thank you very much guys :wink:!!!
Tonight I will test it... Seems problem solved :-)
Best regards
Danijel
-
Hi,
Here's a F# solution which seems to be very much faster on big collections than the C# solutions i gave before.
type Point3dComparer(pt : Point3d, tol) =
member this.Pt = pt
member this.Equals(tPt : Point3dComparer) = this.Pt.IsEqualTo(tPt.Pt, tol)
let removeDuplicate pts tol =
new Point3dCollection(
pts
|> Seq.cast
|> Seq.distinctBy (fun p -> new Point3dComparer(p, tol))
|> Seq.toArray)
-
I ran some little benchs using 100 random points.dwg (http://www.theswamp.org/index.php?action=dlattach;topic=32874.0;attach=15396)
RemoveDuplicate(sorted List)
Removed 2 duplicate in a 10000 items collection
Ellapsed milliseconds : 9.0005
RemoveDuplicate2 (override Contains)
Removed 2 duplicate in a 10000 items collection
Ellapsed milliseconds : 9766.5586
removeDuplicate (F#)
Removed 2 duplicate in a 10000 items collection
Ellapsed milliseconds : 7.0004
RemoveDuplicate(sorted List)
Removed 383 duplicate in a 10381 items collection
Ellapsed milliseconds : 231.0133
RemoveDuplicate2 (override Contains)
Removed 383 duplicate in a 10381 items collection
Ellapsed milliseconds : 9677.5535
removeDuplicate (F#)
Removed 383 duplicate in a 10381 items collection
Ellapsed milliseconds : 6.0004
RemoveDuplicate(sorted List)
Removed 4378 duplicate in a 14376 items collection
Ellapsed milliseconds : 2150.123
RemoveDuplicate2 (override Contains)
Removed 4378 duplicate in a 14376 items collection
Ellapsed milliseconds : 9665.5528
removeDuplicate (F#)
Removed 4378 duplicate in a 14376 items collection
Ellapsed milliseconds : 7.0004
-
how does this compare?
using System.Collections.Generic;
using Autodesk.AutoCAD.ApplicationServices;
using Autodesk.AutoCAD.DatabaseServices;
using Autodesk.AutoCAD.EditorInput;
using Autodesk.AutoCAD.Geometry;
using Autodesk.AutoCAD.Runtime;
using AcAp = Autodesk.AutoCAD.ApplicationServices;
using AcDb = Autodesk.AutoCAD.DatabaseServices;
#endregion
[assembly: CommandClass(typeof(ExecMethod.Commands))]
namespace ExecMethod
{
public class Point3dComparer : IEqualityComparer<Point3d>
{
Tolerance m_tol;
public Point3dComparer(Tolerance tol)
{
m_tol = tol;
}
public bool Equals(Point3d a, Point3d b)
{
return a.IsEqualTo(b, m_tol);
}
public int GetHashCode(Point3d pnt)
{
return (pnt.X.GetHashCode() ^ pnt.Y.GetHashCode() ^ pnt.Z.GetHashCode());
}
}
public class Commands
{
[CommandMethod("doit")]
public void doit()
{
List<Point3d> pointsList = new List<Point3d>();
Document doc = AcAp.Application.DocumentManager.MdiActiveDocument;
Database db = AcDb.HostApplicationServices.WorkingDatabase;
Editor ed = doc.Editor;
AcDb.TransactionManager tm = doc.TransactionManager;
using (Transaction tr = tm.StartTransaction())
{
BlockTableRecord btr = tr.GetObject
(db.CurrentSpaceId, OpenMode.ForRead, false) as BlockTableRecord;
foreach (ObjectId id in btr)
{
DBPoint dbpoint = tr.GetObject(id, OpenMode.ForRead, false) as DBPoint;
if (dbpoint != null)
pointsList.Add(dbpoint.Position);
}
tr.Commit();
//
Tolerance tol = new Tolerance();
var hash = new HashSet<Point3d>(pointsList, new Point3dComparer(tol));
ed.WriteMessage("{0}-{1}", pointsList.Count, hash.Count);
}
}
}
}
-
Nice trick Dan 8-)
It seems to be a little faster than my F#code (but very closed to).
I didn't know the HashSet class, one more time, thank you to teatch me a new thing.
Here're the method and the class I used (each other method requires an returns a Point3dCollection).
private Point3dCollection DanRemDup(Point3dCollection pts, Tolerance tol)
{
Point3dCollection result = new Point3dCollection();
Point3d[] ptArr = new Point3d[pts.Count];
pts.CopyTo(ptArr, 0);
HashSet<Point3d> hash = new HashSet<Point3d>(ptArr, new Point3dComparer(tol));
foreach (Point3d p in hash)
{
result.Add(p);
}
return result;
}
public class Point3dComparer : IEqualityComparer<Point3d>
{
private Tolerance _tol;
public Point3dComparer()
{
_tol = Tolerance.Global;
}
public Point3dComparer(Tolerance tol)
{
_tol = tol;
}
public bool Equals(Point3d a, Point3d b)
{
return a.IsEqualTo(b, _tol);
}
public int GetHashCode(Point3d pnt)
{
return (pnt.X.GetHashCode() ^ pnt.Y.GetHashCode() ^ pnt.Z.GetHashCode());
}
}
-
Nice trick Dan
Sorry, boys, it's still slower than Linq's Distinct<T>(IEqualityComparer) extension, at least around here...
public int GetHashCode(Point3d pnt)
{
return (pnt.X.GetHashCode() ^ pnt.Y.GetHashCode() ^ pnt.Z.GetHashCode());
}
... and when there's a real Tolerance involved, other than 0.0 or Double.Epsilon. I have the nagging suspicion that this kind of HashCode calculation doesn't quite work in our scenario; it will gladly let false negatives pass, when the delta is actually below the threshold but the coordinates are sufficiently different.
BTW, in F# there's a helper to create an IEqualityComparer from lambdas, very fast it is not.
let myEqualityComparer =
HashIdentity.FromFunctions
(fun _ -> 0)
(fun (a: Point3d) b -> a.IsEqualTo(b, new Tolerance(eps, eps)))
Cheers, Thorsten
-
Here's my data for "100 random point.dwg" with Tolerance(1, 1) :
DistinctHashSet
Found 2 duplicate(s) in a 10000 items collection.
Elapsed milliseconds : 5150.0162
DistinctEnumerable
Found 2 duplicate(s) in a 10000 items collection.
Elapsed milliseconds : 3502.6673
Sure, setting the HashCode to 0 is cheating, as is the creation of a sequence where the Current property is never touched.
public class MyEqualityComparer : IEqualityComparer<Point3d>
{
Tolerance m_tol;
public MyEqualityComparer(Tolerance tol)
{
m_tol = tol;
}
public bool Equals(Point3d a, Point3d b)
{
return a.IsEqualTo(b, m_tol);
}
public int GetHashCode(Point3d pnt)
{
return 0;
}
}
public class Commands
{
[CommandMethod("DistinctEnumerable")]
public void DistinctEnumerable() { cmd("DistinctEnumerable"); }
[CommandMethod("DistinctHashSet")]
public void DistinctHashSet() { cmd("DistinctHashSet"); }
private void cmd(string what)
{
Editor ed = acApp.DocumentManager.MdiActiveDocument.Editor;
PromptDoubleResult pdr = ed.GetDouble("Tolerance: ");
if (pdr.Status != PromptStatus.OK) return;
SelectionFilter sf = new SelectionFilter(new TypedValue[] { new TypedValue((int)DxfCode.Start, "POINT") });
PromptSelectionResult psr = ed.GetSelection(sf);
if (psr.Status != PromptStatus.OK) return;
int numin, numout;
long ticks;
dupPts(what, pdr.Value, psr.Value, out numin, out numout, out ticks);
ed.WriteMessage(
"\n{0} " +
"\nFound {1} duplicate(s) in a {2} items collection. " +
"\nElapsed milliseconds : {3} ",
what,
numin - numout,
numin,
(double)ticks / (double)TimeSpan.TicksPerMillisecond);
}
private void dupPts(string what, double eps, SelectionSet sset, out int numin, out int numout, out long ticks)
{
Database db = acApp.DocumentManager.MdiActiveDocument.Database;
using (Transaction tr = db.TransactionManager.StartTransaction())
{
Point3dCollection pts = new Point3dCollection();
foreach (SelectedObject so in sset)
{
DBPoint dbp = (DBPoint)tr.GetObject(so.ObjectId, OpenMode.ForRead);
pts.Add(dbp.Position);
}
Tolerance tol = new Tolerance(eps, eps);
numin = pts.Count;
Stopwatch sw = Stopwatch.StartNew();
if (what == "DistinctEnumerable")
{
IEnumerable<Point3d> enumerable = pts.Cast<Point3d>().Distinct<Point3d>(new MyEqualityComparer(tol));
numout = enumerable.Count<Point3d>();
}
else if (what == "DistinctHashSet")
{
HashSet<Point3d> hash = new HashSet<Point3d>(pts.Cast<Point3d>(), new MyEqualityComparer(tol));
numout = hash.Count;
}
else numout = 0;
sw.Stop();
ticks = sw.Elapsed.Ticks;
tr.Commit();
}
}
}
-
Nice, although the times are about the same on my box
DistinctEnumerable
Found 20002 duplicate(s) in a 30000 items collection.
Elapsed milliseconds : 7844.7255
Command: DistinctHashSet
Tolerance: 1
Select objects: all
30000 found
Select objects:
DistinctHashSet
Found 20002 duplicate(s) in a 30000 items collection.
Elapsed milliseconds : 7775.5968
-
My results are very close like yours Daniel
Command: DISTINCTENUMERABLE
Tolerance: 1
Select objects: all
10000 found
Select objects:
DistinctEnumerable
Found 2 duplicate(s) in a 10000 items collection.
Elapsed milliseconds : 2680.2491
Command: DISTINCTHASHSET
Tolerance: 1
Select objects: all
10000 found
Select objects:
DistinctHashSet
Found 2 duplicate(s) in a 10000 items collection.
Elapsed milliseconds : 2653.0955
-
Hello,
Gile's code (wich he posted in one of previous posts) seems very fast for me.
I am sending that code, litle modified.
Imports Autodesk.AutoCAD.Runtime
Imports Autodesk.AutoCAD.ApplicationServices
Imports Autodesk.AutoCAD.DatabaseServices
Imports Autodesk.AutoCAD.EditorInput
Imports Autodesk.AutoCAD.Geometry
Module DuplicatePoints
<System.Runtime.CompilerServices.Extension()> _
Public Sub RemoveDuplicate(ByVal pts As Point3dCollection)
pts.RemoveDuplicate(Tolerance.[Global])
End Sub
<System.Runtime.CompilerServices.Extension()> _
Public Sub RemoveDuplicate(ByRef pts As Point3dCollection, ByVal tol As Tolerance)
Dim ptlst As New List(Of Point3d)()
Dim n As Integer = pts.Count
For i As Integer = 0 To n - 1
ptlst.Add(pts(i))
Next
ptlst.Sort(Function(p1, p2) p1.X.CompareTo(p2.X))
Dim m As Integer = ptlst.Count
For i As Integer = 0 To m - 2
Dim j As Integer = i + 1
While j < ptlst.Count
If (ptlst(j).X - ptlst(i).X) > tol.EqualPoint Then
Exit While
End If
If ptlst(i).IsEqualTo(ptlst(j), tol) Then
pts.Remove(ptlst(j))
ptlst.RemoveAt(j)
Else
j += 1
End If
End While
Next
End Sub
<System.Runtime.CompilerServices.Extension()> _
Public Sub RemoveDuplicate(ByRef dbpts As List(Of DBPoint), ByRef ListDuplicatePoints As List(Of ObjectId), ByVal tol As Tolerance)
ListDuplicatePoints = New List(Of ObjectId)
Dim ptlst As New List(Of DBPoint)()
Dim n As Integer = dbpts.Count
For i As Integer = 0 To n - 1
ptlst.Add(dbpts(i))
Next
ptlst.Sort(Function(p1, p2) p1.Position.X.CompareTo(p2.Position.X))
Dim m As Integer = ptlst.Count
For i As Integer = 0 To m - 2
Dim j As Integer = i + 1
While j < ptlst.Count
If (ptlst(j).Position.X - ptlst(i).Position.X) > tol.EqualPoint Then
Exit While
End If
If ptlst(i).Position.IsEqualTo(ptlst(j).Position, tol) Then
ListDuplicatePoints.Add(ptlst(j).ObjectId)
ptlst.RemoveAt(j)
Else
j += 1
End If
End While
Next
End Sub
End Module
Public Class Test
<CommandMethod("DPOINTS")> _
Public Sub DPOINTS()
Dim doc As Document = Application.DocumentManager.MdiActiveDocument
Dim db As Database = doc.Database
Dim ed As Editor = doc.Editor
'Get points
Dim SelectedPoints As IEnumerable = SelectPoints()
If SelectedPoints Is Nothing Then Exit Sub
'Get tolerance
Dim pdr As PromptDoubleResult = ed.GetDouble("Tolerance: ")
If pdr.Status <> PromptStatus.OK Then Exit Sub
'Open transaction for reading points (or to later erase double points)
Dim tr As Transaction = db.TransactionManager.StartTransaction()
Using tr
DeleteDuplicatePoints3(SelectedPoints, tr, 0)
tr.Commit()
End Using
End Sub
Public Function SelectPoints() As IEnumerable
Dim doc As Document = Application.DocumentManager.MdiActiveDocument
Dim db As Database = doc.Database
Dim ed As Editor = doc.Editor
Dim acTypValAr(0) As TypedValue
acTypValAr.SetValue(New TypedValue(DxfCode.Start, "POINT"), 0)
Dim sf As New SelectionFilter(acTypValAr)
Dim pso As New PromptSelectionOptions()
pso.MessageForAdding = "Select points: "
pso.AllowDuplicates = False
Dim psr As PromptSelectionResult = ed.GetSelection(pso, sf)
If psr.Status <> PromptStatus.OK Then
Return Nothing
Else
Return psr.Value.GetObjectIds
End If
End Function
Public Sub DeleteDuplicatePoints3(ByVal o As IEnumerable, ByVal t As Transaction, ByVal Tol As Double)
'Get list of points
Dim ListPoints As New List(Of DBPoint)
For Each id As ObjectId In o
Dim dbpt As DBPoint = DirectCast(t.GetObject(id, OpenMode.ForRead), DBPoint)
'Add point to list
ListPoints.Add(dbpt)
Next
'Here add duplicate points (if exist)
Dim ListDuplicatePoints As New List(Of ObjectId)
Dim sw As Stopwatch = Stopwatch.StartNew()
'Get duplicate points
DuplicatePoints.RemoveDuplicate(ListPoints, ListDuplicatePoints, New Tolerance(Tol, Tol))
sw.Stop()
MsgBox("Found " + ListDuplicatePoints.Count.ToString + " duplicate(s) in a " + ListPoints.Count.ToString + " items collection. Elapsed milliseconds : " + (CDbl(sw.Elapsed.Ticks) / CDbl(TimeSpan.TicksPerMillisecond)).ToString)
End Sub
End Class
Regards
Danijel
-
Hi,
First, thanks again to all, I learned a lot with this topic.
Finally, as dann.boy.001 said the sorted list route seems to be really faster with large collections than the IEnumerable and HashSet ones (which are very closed).
Using Thorsen's test commands (mine returned false results...) in which I add 'SortList' one:
else if (what == "SortList")
{
List<Point3d> lst = SortList(pts, tol);
numout = lst.Count;
}
This command calls a method which returns a List<Point3d> object.
private List<Point3d> SortList(Point3dCollection pts, Tolerance tol)
{
List<Point3d> ptlst = new List<Point3d>();
for (int i = 0; i < pts.Count; i++)
{
ptlst.Add(pts[i]);
}
ptlst.Sort((p1, p2) => p1.X.CompareTo(p2.X));
for (int i = 0; i < ptlst.Count - 1; i++)
{
for (int j = i + 1; j < ptlst.Count; )
{
if ((ptlst[j].X - ptlst[i].X) > tol.EqualPoint)
break;
if (ptlst[i].IsEqualTo(ptlst[j], tol))
{
ptlst.RemoveAt(j);
}
else
j++;
}
}
return ptlst;
}
Here're the results with 10000 points
DistinctEnumerable
Found 2 duplicate(s) in a 10000 items collection.
Elapsed milliseconds : 3072.7459
DistinctHashSet
Found 2 duplicate(s) in a 10000 items collection.
Elapsed milliseconds : 2999.9208
SortedList
Found 2 duplicate(s) in a 10000 items collection.
Elapsed milliseconds : 18.9593
With 20000 points
DistinctEnumerable
Found 10002 duplicate(s) in a 20000 items collection.
Elapsed milliseconds : 6106.0164
DistinctHashSet
Found 10002 duplicate(s) in a 20000 items collection.
Elapsed milliseconds : 5996.8205:
SortedList
Found 10002 duplicate(s) in a 20000 items collection.
Elapsed milliseconds : 317.208
With 40000 points
DistinctEnumerable
Found 20004 duplicate(s) in a 40000 items collection.
Elapsed milliseconds : 18422.6259
DistinctHashSet
Found 20004 duplicate(s) in a 40000 items collection.
Elapsed milliseconds : 18053.2102
SortedList
Found 20004 duplicate(s) in a 40000 items collection.
Elapsed milliseconds : 1260.4816
-
I ran the provious tests on AutoCAD 2007 32 bits.
The same tests with AutoCAD 2011 64 bits gives very different results:
10000 points
DistinctEnumerable
Found 0 duplicate(s) in a 10000 items collection.
Elapsed milliseconds : 2.4629
DistinctHashSet
Found 0 duplicate(s) in a 10000 items collection.
Elapsed milliseconds : 2.5144
SortList
Found 2 duplicate(s) in a 10000 items collection.
Elapsed milliseconds : 7.4756
20000 points
DistinctEnumerable
Found 10000 duplicate(s) in a 20000 items collection.
Elapsed milliseconds : 6.8504
DistinctHashSet
Found 10000 duplicate(s) in a 20000 items collection.
Elapsed milliseconds : 6.0844
SortList
Found 10002 duplicate(s) in a 20000 items collection.
Elapsed milliseconds : 222.6415
40000 points
DistinctEnumerable
Found 20000 duplicate(s) in a 40000 items collection.
Elapsed milliseconds : 14.7062
DistinctHashSet
Found 20000 duplicate(s) in a 40000 items collection.
Elapsed milliseconds : 14.3403
SortList
Found 20004 duplicate(s) in a 40000 items collection.
Elapsed milliseconds : 895.1789
-
The same tests with AutoCAD 2011 64 bits gives very different results:
Did on 2011 64 bit and I am showing duplicates matching test on 2011 32 bit unlike your tests
-
And for reference a way in C++/ARX
bool operator < (const AcGePoint3d& pt1, const AcGePoint3d& pt2)
{
if (pt1.x == pt2.x)
{
if (pt1.y == pt2.y)
return (pt1.z < pt2.z);
else
return (pt1.y < pt2.y);
}
else
return (pt1.x < pt2.x);
}
static void ArxCmdsTester_DuplicatePointsFinder(void)
{
std::vector<AcGePoint3d> pts;
AcDbObjectId objId;
ads_name ename;
ads_name ss;
resbuf* rbFilter;
rbFilter = acutBuildList(RTDXF0, _T("POINT"), RTNONE);
TCHAR* intersPrompts[] = {_T("Select points: "), _T("Remove points: ")};
if (acedSSGet(_T(":$"), intersPrompts, NULL, rbFilter, ss) != RTNORM)
{
acutRelRb(rbFilter);
return;
}
acutRelRb(rbFilter);
long len = 0;
if ((acedSSLength(ss, &len) != RTNORM) || (len == 0))
{
acedSSFree(ss);
return;
}
for (int i = 0; i < len; i++)
{
if (acedSSName(ss, i, ename) == RTNORM)
{
if (acdbGetObjectId(objId, ename) == Acad::eOk)
{
AcDbObjectPointer<AcDbPoint> pPoint(objId, AcDb::kForRead);
if (pPoint.openStatus() == Acad::eOk)
{
pts.push_back(pPoint->position());
}
}
}
}
acedSSFree(ss);
acutPrintf(_T("\nPoints before unique: %ld "), pts.size());
std::sort(pts.begin(), pts.end());
pts.erase(std::unique(pts.begin(), pts.end()), pts.end());
acutPrintf(_T("\nPoints after unique: %ld "), pts.size());
pts.clear();
}
-
Nice trick Dan
public int GetHashCode(Point3d pnt)
{
return (pnt.X.GetHashCode() ^ pnt.Y.GetHashCode() ^ pnt.Z.GetHashCode());
}
I have the nagging suspicion that this kind of HashCode calculation doesn't quite work in our scenario; it will gladly let false negatives pass, when the delta is actually below the threshold but the coordinates are sufficiently different.
Also, returning a 32 bit int hash from a 64 bit double isn't going to guarantee unique values, let alone 192 bits worth of doubles. Worse yet XOR'ing the hash values reduces the uniqueness even further.
For example: 1 ^ 2 ^ 3 = 0
ps, also from the .Net docs
The default implementation of the GetHashCode method does not guarantee unique return values for different objects. Furthermore, the .NET Framework does not guarantee the default implementation of the GetHashCode method, and the value it returns will be the same between different versions of the .NET Framework. Consequently, the default implementation of this method must not be used as a unique object identifier for hashing purposes.
-
And for reference a way in C++/ARX
bool operator < (const AcGePoint3d& pt1, const AcGePoint3d& pt2)
{
if (pt1.x == pt2.x)
{
if (pt1.y == pt2.y)
return (pt1.z < pt2.z);
else
return (pt1.y < pt2.y);
}
else
return (pt1.x < pt2.x);
}
Shorten that up...
bool operator < (const AcGePoint3d & pt1, const AcGePoint3d & pt2)
{
AcGeVector3d pt = pt1 - pt2;
return pt.x < 0.0 || pt.y < 0.0 || pt.z < 0.0;
}
-
Nice trick Dan
public int GetHashCode(Point3d pnt)
{
return (pnt.X.GetHashCode() ^ pnt.Y.GetHashCode() ^ pnt.Z.GetHashCode());
}
I have the nagging suspicion that this kind of HashCode calculation doesn't quite work in our scenario; it will gladly let false negatives pass, when the delta is actually below the threshold but the coordinates are sufficiently different.
Also, returning a 32 bit int hash from a 64 bit double isn't going to guarantee unique values, let alone 192 bits worth of doubles. Worse yet XOR'ing the hash values reduces the uniqueness even further.
For example: 1 ^ 2 ^ 3 = 0
ps, also from the .Net docs
The default implementation of the GetHashCode method does not guarantee unique return values for different objects. Furthermore, the .NET Framework does not guarantee the default implementation of the GetHashCode method, and the value it returns will be the same between different versions of the .NET Framework. Consequently, the default implementation of this method must not be used as a unique object identifier for hashing purposes.
if you notice, that call is not using the double's value to get generate a hash, but calling the objects GetHashCode() method.
I.e
double x = 9.990;
Console.WriteLine(x.GetHashCode().ToString());
prints
126742170
and
double x = 1.0;
double y = 2.0;
double z = 3.0;
int hash = x.GetHashCode() ^ y.GetHashCode() ^ z.GetHashCode();
Console.WriteLine(hash.ToString());
1073217536
-
Oh and classes using IEqualityComparer must override GetHashCode()
-
if you notice, that call is not using the double's value to get generate a hash, but calling the objects GetHashCode() method.
double x = 1.0;
double y = 2.0;
double z = 3.0;
int hash = x.GetHashCode() ^ y.GetHashCode() ^ z.GetHashCode();
Console.WriteLine(hash.ToString());
Wouldn't matter. The object represents a double value of 64 bits and can't be packed into a 32 bit hash without data loss.
For the XOR part of the problem (not on a Windows box right now) what is the result of x = y = z = 10.0? How about x = 10, y = z = 0.0? I think they might be the same.
-
if you notice, that call is not using the double's value to get generate a hash, but calling the objects GetHashCode() method.
double x = 1.0;
double y = 2.0;
double z = 3.0;
int hash = x.GetHashCode() ^ y.GetHashCode() ^ z.GetHashCode();
Console.WriteLine(hash.ToString());
Wouldn't matter. The object represents a double value of 64 bits and can't be packed into a 32 bit hash without data loss.
For the XOR part of the problem (not on a Windows box right now) what is the result of x = y = z = 10.0? How about x = 10, y = z = 0.0? I think they might be the same.
true that! what would you recomend?
-
true that! what would you recomend?
Don't have one. Once looked at an interesting x, y, z encoding technique but never figured out how to get it to work with doubles (works best with whole numbers of limited size). Drawing a blank right now on the name a mathematician the technique is named after. Will post a link in a bit.
-
For the XOR part of the problem (not on a Windows box right now) what is the result of x = y = z = 10.0? How about x = 10, y = z = 0.0? I think they might be the same.
Wouldn't matter. It's a false positive, the Equals() override will take care of that.
My concern are the cases where GetHashCode() return differing values, then Equals() won't be called, but would return true if it were.
true that! what would you recomend?
-
This might be a point of interest, take look
http://www.codewrecks.com/blog/index.php/2011/01/18/linq-distinct-with-lambda
Oleg
-
true that! what would you recomend?
Not a recommendation, just providing some prior research into the problem prior to settling on using KD Trees.
Calculating a unique hash for a given 3D point is really about finding a 1D value that represents the point. The mathematician Giuseppe Peano, http://en.wikipedia.org/wiki/Giuseppe_Peano, figured out such a method and it is called Peano Coordinate encoding which has the properties of a nearest neighbor problem.
To encode a 2D point of say, x=1234 and y = 5678 the Peano encoded 1D value is 15263748, which can also be decoded back to the original value. When multiple 2D values are encoded and sorted then they will fall naturally into a nearest neighbor when traversed.
The internal storage of a 64 bit double complicates the implementation of Peano encoding for these types. It may not be practical to use?
-
Thanks Paul,
I have tried something similar in the past, also by using the isEqualTo() with AcGeTol and none works, excepts the one I posted earlier.
If I try with a small selection of points works, btw.
I did my testings with a 100,000 points and having the same amount of duplicates.
Shorten that up...
bool operator < (const AcGePoint3d & pt1, const AcGePoint3d & pt2)
{
AcGeVector3d pt = pt1 - pt2;
return pt.x < 0.0 || pt.y < 0.0 || pt.z < 0.0;
}
-
I saw this and reminded me of this thread and thought this might be a good collection to use for task
SortedSet<T>
- Does not add duplicates
- takes a IComparer in constructor
http://msdn.microsoft.com/en-us/library/dd395024.aspx