### Author Topic: Find duplicate points  (Read 14118 times)

0 Members and 1 Guest are viewing this topic.

#### dann.boy.001

• Guest
##### Re: Find duplicate points
« Reply #15 on: January 26, 2011, 07:05:15 AM »

Thank you very much guys  !!!

Tonight I will test it... Seems problem solved

Best regards
Danijel

#### gile

• Gator
• Posts: 2507
• Marseille, France
##### Re: Find duplicate points
« Reply #16 on: January 29, 2011, 10:44:25 AM »
Hi,

Here's a F# solution which seems to be very much faster on big collections than the C# solutions i gave before.

Code: [Select]
`type Point3dComparer(pt : Point3d, tol) =    member this.Pt = pt    member this.Equals(tPt : Point3dComparer) = this.Pt.IsEqualTo(tPt.Pt, tol)        let removeDuplicate pts tol =    new Point3dCollection(        pts        |> Seq.cast        |> Seq.distinctBy (fun p -> new Point3dComparer(p, tol))        |> Seq.toArray)`
Speaking English as a French Frog

#### gile

• Gator
• Posts: 2507
• Marseille, France
##### Re: Find duplicate points
« Reply #17 on: January 29, 2011, 11:11:14 AM »
I ran some little benchs using 100 random points.dwg

Quote
RemoveDuplicate(sorted List)
Removed 2 duplicate in a 10000 items collection
Ellapsed milliseconds : 9.0005

RemoveDuplicate2 (override Contains)
Removed 2 duplicate in a 10000 items collection
Ellapsed milliseconds : 9766.5586

removeDuplicate (F#)
Removed 2 duplicate in a 10000 items collection
Ellapsed milliseconds : 7.0004

Quote
RemoveDuplicate(sorted List)
Removed 383 duplicate in a 10381 items collection
Ellapsed milliseconds : 231.0133

RemoveDuplicate2 (override Contains)
Removed 383 duplicate in a 10381 items collection
Ellapsed milliseconds : 9677.5535

removeDuplicate (F#)
Removed 383 duplicate in a 10381 items collection
Ellapsed milliseconds : 6.0004

Quote
RemoveDuplicate(sorted List)
Removed 4378 duplicate in a 14376 items collection
Ellapsed milliseconds : 2150.123

RemoveDuplicate2 (override Contains)
Removed 4378 duplicate in a 14376 items collection
Ellapsed milliseconds : 9665.5528

removeDuplicate (F#)
Removed 4378 duplicate in a 14376 items collection
Ellapsed milliseconds : 7.0004
Speaking English as a French Frog

#### It's Alive!

• Retired
• Needs a day job
• Posts: 8690
• AKA Daniel
##### Re: Find duplicate points
« Reply #18 on: January 29, 2011, 12:29:41 PM »
how does this compare?

Code: [Select]
`using System.Collections.Generic;using Autodesk.AutoCAD.ApplicationServices;using Autodesk.AutoCAD.DatabaseServices;using Autodesk.AutoCAD.EditorInput;using Autodesk.AutoCAD.Geometry;using Autodesk.AutoCAD.Runtime;using AcAp = Autodesk.AutoCAD.ApplicationServices;using AcDb = Autodesk.AutoCAD.DatabaseServices;#endregion[assembly: CommandClass(typeof(ExecMethod.Commands))]namespace ExecMethod{  public class Point3dComparer : IEqualityComparer<Point3d>   {    Tolerance m_tol;    public Point3dComparer(Tolerance tol)    {      m_tol = tol;    }    public bool Equals(Point3d a, Point3d b)        {      return a.IsEqualTo(b, m_tol);    }    public int GetHashCode(Point3d pnt)        {      return (pnt.X.GetHashCode() ^ pnt.Y.GetHashCode() ^ pnt.Z.GetHashCode());         }   }   public class Commands  {    [CommandMethod("doit")]    public void doit()    {      List<Point3d> pointsList = new List<Point3d>();      Document doc = AcAp.Application.DocumentManager.MdiActiveDocument;      Database db = AcDb.HostApplicationServices.WorkingDatabase;      Editor ed = doc.Editor;      AcDb.TransactionManager tm = doc.TransactionManager;      using (Transaction tr = tm.StartTransaction())      {        BlockTableRecord btr = tr.GetObject          (db.CurrentSpaceId, OpenMode.ForRead, false) as BlockTableRecord;        foreach (ObjectId id in btr)        {          DBPoint dbpoint = tr.GetObject(id, OpenMode.ForRead, false) as DBPoint;          if (dbpoint != null)            pointsList.Add(dbpoint.Position);        }        tr.Commit();        //        Tolerance tol = new Tolerance();        var hash = new HashSet<Point3d>(pointsList, new Point3dComparer(tol));        ed.WriteMessage("{0}-{1}", pointsList.Count, hash.Count);      }    }  }} `
« Last Edit: January 29, 2011, 12:49:07 PM by __assume »

#### gile

• Gator
• Posts: 2507
• Marseille, France
##### Re: Find duplicate points
« Reply #19 on: January 29, 2011, 01:23:18 PM »
Nice trick Dan

It seems to be a little faster than my F#code (but very closed to).
I didn't know the HashSet class, one more time, thank you to teatch me a new thing.

Here're the method and the class I used (each other method requires an returns a Point3dCollection).

Code: [Select]
`        private Point3dCollection DanRemDup(Point3dCollection pts, Tolerance tol)        {            Point3dCollection result = new Point3dCollection();            Point3d[] ptArr = new Point3d[pts.Count];            pts.CopyTo(ptArr, 0);            HashSet<Point3d> hash = new HashSet<Point3d>(ptArr, new Point3dComparer(tol));            foreach (Point3d p in hash)            {                result.Add(p);            }            return result;        }        public class Point3dComparer : IEqualityComparer<Point3d>        {            private Tolerance _tol;            public Point3dComparer()            {                _tol = Tolerance.Global;            }            public Point3dComparer(Tolerance tol)            {                _tol = tol;            }            public bool Equals(Point3d a, Point3d b)            {                return a.IsEqualTo(b, _tol);            }            public int GetHashCode(Point3d pnt)            {                return (pnt.X.GetHashCode() ^ pnt.Y.GetHashCode() ^ pnt.Z.GetHashCode());            }        }`
« Last Edit: January 29, 2011, 01:36:37 PM by gile »
Speaking English as a French Frog

#### kaefer

• Guest
##### Re: Find duplicate points
« Reply #20 on: January 29, 2011, 01:53:47 PM »
Nice trick Dan

Sorry, boys, it's still slower than Linq's Distinct<T>(IEqualityComparer) extension, at least around here...

Quote
Code: [Select]
`            public int GetHashCode(Point3d pnt)            {                return (pnt.X.GetHashCode() ^ pnt.Y.GetHashCode() ^ pnt.Z.GetHashCode());            }`

... and when there's a real Tolerance involved, other than 0.0 or Double.Epsilon. I have the nagging suspicion that this kind of HashCode calculation doesn't quite work in our scenario; it will gladly let false negatives pass, when the delta is actually below the threshold but the coordinates are sufficiently different.

BTW, in F# there's a helper to create an IEqualityComparer from lambdas, very fast it is not.

Code: [Select]
`   let myEqualityComparer =        HashIdentity.FromFunctions            (fun _ -> 0)            (fun (a: Point3d) b -> a.IsEqualTo(b, new Tolerance(eps, eps)))`
Cheers, Thorsten

#### kaefer

• Guest
##### Re: Find duplicate points
« Reply #21 on: January 29, 2011, 06:26:50 PM »
Here's my data for "100 random point.dwg" with Tolerance(1, 1) :

DistinctHashSet
Found 2 duplicate(s) in a 10000 items collection.
Elapsed milliseconds : 5150.0162

DistinctEnumerable
Found 2 duplicate(s) in a 10000 items collection.
Elapsed milliseconds : 3502.6673

Sure, setting the HashCode to 0 is cheating, as is the creation of a sequence where the Current property is never touched.

Code: [Select]
`   public class MyEqualityComparer : IEqualityComparer<Point3d>    {        Tolerance m_tol;        public MyEqualityComparer(Tolerance tol)        {            m_tol = tol;        }        public bool Equals(Point3d a, Point3d b)        {            return a.IsEqualTo(b, m_tol);        }        public int GetHashCode(Point3d pnt)        {            return 0;        }    }    public class Commands    {        [CommandMethod("DistinctEnumerable")]        public void DistinctEnumerable() { cmd("DistinctEnumerable"); }        [CommandMethod("DistinctHashSet")]        public void DistinctHashSet() { cmd("DistinctHashSet"); }        private void cmd(string what)        {            Editor ed = acApp.DocumentManager.MdiActiveDocument.Editor;            PromptDoubleResult pdr = ed.GetDouble("Tolerance: ");            if (pdr.Status != PromptStatus.OK) return;            SelectionFilter sf = new SelectionFilter(new TypedValue[] { new TypedValue((int)DxfCode.Start, "POINT") });            PromptSelectionResult psr = ed.GetSelection(sf);            if (psr.Status != PromptStatus.OK) return;            int numin, numout;            long ticks;            dupPts(what, pdr.Value, psr.Value, out numin, out numout, out ticks);            ed.WriteMessage(                "\n{0} " +                "\nFound {1} duplicate(s) in a {2} items collection. " +                "\nElapsed milliseconds : {3} ",                what,                numin - numout,                numin,                (double)ticks / (double)TimeSpan.TicksPerMillisecond);        }        private void dupPts(string what, double eps, SelectionSet sset, out int numin, out int numout, out long ticks)        {            Database db = acApp.DocumentManager.MdiActiveDocument.Database;            using (Transaction tr = db.TransactionManager.StartTransaction())            {                Point3dCollection pts = new Point3dCollection();                foreach (SelectedObject so in sset)                {                    DBPoint dbp = (DBPoint)tr.GetObject(so.ObjectId, OpenMode.ForRead);                    pts.Add(dbp.Position);                }                Tolerance tol = new Tolerance(eps, eps);                numin = pts.Count;                Stopwatch sw = Stopwatch.StartNew();                if (what == "DistinctEnumerable")                {                    IEnumerable<Point3d> enumerable = pts.Cast<Point3d>().Distinct<Point3d>(new MyEqualityComparer(tol));                    numout = enumerable.Count<Point3d>();                }                else if (what == "DistinctHashSet")                {                    HashSet<Point3d> hash = new HashSet<Point3d>(pts.Cast<Point3d>(), new MyEqualityComparer(tol));                    numout = hash.Count;                }                else numout = 0;                sw.Stop();                ticks = sw.Elapsed.Ticks;                tr.Commit();            }        }    }`

#### It's Alive!

• Retired
• Needs a day job
• Posts: 8690
• AKA Daniel
##### Re: Find duplicate points
« Reply #22 on: January 29, 2011, 10:40:06 PM »
Nice, although the times are about the same on my box

Code: [Select]
`DistinctEnumerableFound 20002 duplicate(s) in a 30000 items collection.Elapsed milliseconds : 7844.7255Command: DistinctHashSetTolerance: 1Select objects: all30000 foundSelect objects:DistinctHashSetFound 20002 duplicate(s) in a 30000 items collection.Elapsed milliseconds : 7775.5968`

#### Jeff H

• Needs a day job
• Posts: 6150
##### Re: Find duplicate points
« Reply #23 on: January 29, 2011, 11:00:38 PM »
My results are very close like yours Daniel

Quote
Command: DISTINCTENUMERABLE
Tolerance: 1
Select objects: all
10000 found
Select objects:
DistinctEnumerable
Found 2 duplicate(s) in a 10000 items collection.
Elapsed milliseconds : 2680.2491
Command: DISTINCTHASHSET
Tolerance: 1
Select objects: all
10000 found
Select objects:
DistinctHashSet
Found 2 duplicate(s) in a 10000 items collection.
Elapsed milliseconds : 2653.0955

#### dann.boy.001

• Guest
##### Re: Find duplicate points
« Reply #24 on: January 30, 2011, 08:03:53 AM »
Hello,

Gile's code (wich he posted in one of previous posts) seems very fast for me.
I am sending that code, litle modified.

Code: [Select]
`Imports Autodesk.AutoCAD.RuntimeImports Autodesk.AutoCAD.ApplicationServicesImports Autodesk.AutoCAD.DatabaseServicesImports Autodesk.AutoCAD.EditorInputImports Autodesk.AutoCAD.GeometryModule DuplicatePoints    <System.Runtime.CompilerServices.Extension()> _    Public Sub RemoveDuplicate(ByVal pts As Point3dCollection)        pts.RemoveDuplicate(Tolerance.[Global])    End Sub    <System.Runtime.CompilerServices.Extension()> _    Public Sub RemoveDuplicate(ByRef pts As Point3dCollection, ByVal tol As Tolerance)        Dim ptlst As New List(Of Point3d)()        Dim n As Integer = pts.Count        For i As Integer = 0 To n - 1            ptlst.Add(pts(i))        Next        ptlst.Sort(Function(p1, p2) p1.X.CompareTo(p2.X))        Dim m As Integer = ptlst.Count        For i As Integer = 0 To m - 2            Dim j As Integer = i + 1            While j < ptlst.Count                If (ptlst(j).X - ptlst(i).X) > tol.EqualPoint Then                    Exit While                End If                If ptlst(i).IsEqualTo(ptlst(j), tol) Then                    pts.Remove(ptlst(j))                    ptlst.RemoveAt(j)                Else                    j += 1                End If            End While        Next    End Sub    <System.Runtime.CompilerServices.Extension()> _    Public Sub RemoveDuplicate(ByRef dbpts As List(Of DBPoint), ByRef ListDuplicatePoints As List(Of ObjectId), ByVal tol As Tolerance)        ListDuplicatePoints = New List(Of ObjectId)        Dim ptlst As New List(Of DBPoint)()        Dim n As Integer = dbpts.Count        For i As Integer = 0 To n - 1            ptlst.Add(dbpts(i))        Next        ptlst.Sort(Function(p1, p2) p1.Position.X.CompareTo(p2.Position.X))        Dim m As Integer = ptlst.Count        For i As Integer = 0 To m - 2            Dim j As Integer = i + 1            While j < ptlst.Count                If (ptlst(j).Position.X - ptlst(i).Position.X) > tol.EqualPoint Then                    Exit While                End If                If ptlst(i).Position.IsEqualTo(ptlst(j).Position, tol) Then                    ListDuplicatePoints.Add(ptlst(j).ObjectId)                    ptlst.RemoveAt(j)                Else                    j += 1                End If            End While        Next    End SubEnd ModulePublic Class Test    <CommandMethod("DPOINTS")> _    Public Sub DPOINTS()        Dim doc As Document = Application.DocumentManager.MdiActiveDocument        Dim db As Database = doc.Database        Dim ed As Editor = doc.Editor        'Get points        Dim SelectedPoints As IEnumerable = SelectPoints()        If SelectedPoints Is Nothing Then Exit Sub        'Get tolerance        Dim pdr As PromptDoubleResult = ed.GetDouble("Tolerance: ")        If pdr.Status <> PromptStatus.OK Then Exit Sub        'Open transaction for reading points (or to later erase double points)        Dim tr As Transaction = db.TransactionManager.StartTransaction()        Using tr            DeleteDuplicatePoints3(SelectedPoints, tr, 0)            tr.Commit()        End Using    End Sub    Public Function SelectPoints() As IEnumerable        Dim doc As Document = Application.DocumentManager.MdiActiveDocument        Dim db As Database = doc.Database        Dim ed As Editor = doc.Editor        Dim acTypValAr(0) As TypedValue        acTypValAr.SetValue(New TypedValue(DxfCode.Start, "POINT"), 0)        Dim sf As New SelectionFilter(acTypValAr)        Dim pso As New PromptSelectionOptions()        pso.MessageForAdding = "Select points: "        pso.AllowDuplicates = False        Dim psr As PromptSelectionResult = ed.GetSelection(pso, sf)        If psr.Status <> PromptStatus.OK Then            Return Nothing        Else            Return psr.Value.GetObjectIds        End If    End Function    Public Sub DeleteDuplicatePoints3(ByVal o As IEnumerable, ByVal t As Transaction, ByVal Tol As Double)        'Get list of points        Dim ListPoints As New List(Of DBPoint)        For Each id As ObjectId In o            Dim dbpt As DBPoint = DirectCast(t.GetObject(id, OpenMode.ForRead), DBPoint)            'Add point to list            ListPoints.Add(dbpt)        Next        'Here add duplicate points (if exist)        Dim ListDuplicatePoints As New List(Of ObjectId)        Dim sw As Stopwatch = Stopwatch.StartNew()        'Get duplicate points        DuplicatePoints.RemoveDuplicate(ListPoints, ListDuplicatePoints, New Tolerance(Tol, Tol))        sw.Stop()        MsgBox("Found " + ListDuplicatePoints.Count.ToString + " duplicate(s) in a " + ListPoints.Count.ToString + " items collection. Elapsed milliseconds : " + (CDbl(sw.Elapsed.Ticks) / CDbl(TimeSpan.TicksPerMillisecond)).ToString)    End SubEnd Class`

Regards
Danijel
« Last Edit: January 30, 2011, 08:08:47 AM by dann.boy.001 »

#### gile

• Gator
• Posts: 2507
• Marseille, France
##### Re: Find duplicate points
« Reply #25 on: January 30, 2011, 09:06:25 AM »
Hi,

First, thanks again to all, I learned a lot with this topic.

Finally, as dann.boy.001 said the sorted list route seems to be really faster with large collections than the IEnumerable and HashSet ones (which are very closed).

Using Thorsen's test commands (mine returned false results...) in which I add 'SortList' one:
Code: [Select]
`                else if (what == "SortList")                {                    List<Point3d> lst = SortList(pts, tol);                    numout = lst.Count;                }`This command calls a method which returns a List<Point3d> object.
Code: [Select]
`        private List<Point3d> SortList(Point3dCollection pts, Tolerance tol)        {            List<Point3d> ptlst = new List<Point3d>();            for (int i = 0; i < pts.Count; i++)            {                ptlst.Add(pts[i]);            }            ptlst.Sort((p1, p2) => p1.X.CompareTo(p2.X));            for (int i = 0; i < ptlst.Count - 1; i++)            {                for (int j = i + 1; j < ptlst.Count; )                {                    if ((ptlst[j].X - ptlst[i].X) > tol.EqualPoint)                        break;                    if (ptlst[i].IsEqualTo(ptlst[j], tol))                    {                        ptlst.RemoveAt(j);                    }                    else                        j++;                }            }            return ptlst;        }`
Here're the results with 10000 points
Quote
DistinctEnumerable
Found 2 duplicate(s) in a 10000 items collection.
Elapsed milliseconds : 3072.7459

DistinctHashSet
Found 2 duplicate(s) in a 10000 items collection.
Elapsed milliseconds : 2999.9208

SortedList
Found 2 duplicate(s) in a 10000 items collection.
Elapsed milliseconds : 18.9593

With 20000 points
Quote
DistinctEnumerable
Found 10002 duplicate(s) in a 20000 items collection.
Elapsed milliseconds : 6106.0164

DistinctHashSet
Found 10002 duplicate(s) in a 20000 items collection.
Elapsed milliseconds : 5996.8205:

SortedList
Found 10002 duplicate(s) in a 20000 items collection.
Elapsed milliseconds : 317.208

With 40000 points
Quote
DistinctEnumerable
Found 20004 duplicate(s) in a 40000 items collection.
Elapsed milliseconds : 18422.6259

DistinctHashSet
Found 20004 duplicate(s) in a 40000 items collection.
Elapsed milliseconds : 18053.2102

SortedList
Found 20004 duplicate(s) in a 40000 items collection.
Elapsed milliseconds : 1260.4816
Speaking English as a French Frog

#### gile

• Gator
• Posts: 2507
• Marseille, France
##### Re: Find duplicate points
« Reply #26 on: January 30, 2011, 01:48:04 PM »
I ran the provious tests on AutoCAD 2007 32 bits.

The same tests with AutoCAD 2011 64 bits gives very different results:

10000 points
Quote
DistinctEnumerable
Found 0 duplicate(s) in a 10000 items collection.
Elapsed milliseconds : 2.4629

DistinctHashSet
Found 0 duplicate(s) in a 10000 items collection.
Elapsed milliseconds : 2.5144

SortList
Found 2 duplicate(s) in a 10000 items collection.
Elapsed milliseconds : 7.4756

20000 points
Quote
DistinctEnumerable
Found 10000 duplicate(s) in a 20000 items collection.
Elapsed milliseconds : 6.8504

DistinctHashSet
Found 10000 duplicate(s) in a 20000 items collection.
Elapsed milliseconds : 6.0844

SortList
Found 10002 duplicate(s) in a 20000 items collection.
Elapsed milliseconds : 222.6415

40000 points
Quote
DistinctEnumerable
Found 20000 duplicate(s) in a 40000 items collection.
Elapsed milliseconds : 14.7062

DistinctHashSet
Found 20000 duplicate(s) in a 40000 items collection.
Elapsed milliseconds : 14.3403

SortList
Found 20004 duplicate(s) in a 40000 items collection.
Elapsed milliseconds : 895.1789
Speaking English as a French Frog

#### Jeff H

• Needs a day job
• Posts: 6150
##### Re: Find duplicate points
« Reply #27 on: January 30, 2011, 03:57:11 PM »
The same tests with AutoCAD 2011 64 bits gives very different results:

Did on 2011 64 bit and I am showing duplicates matching test on 2011 32 bit unlike your tests

#### LE3

• Guest
##### Re: Find duplicate points
« Reply #28 on: January 30, 2011, 04:44:23 PM »
And for reference a way in C++/ARX
Code: [Select]
`bool operator < (const AcGePoint3d& pt1, const AcGePoint3d& pt2) { if (pt1.x == pt2.x) { if (pt1.y == pt2.y) return (pt1.z < pt2.z); else return (pt1.y < pt2.y); } else return (pt1.x < pt2.x);}static void ArxCmdsTester_DuplicatePointsFinder(void){ std::vector<AcGePoint3d> pts; AcDbObjectId objId; ads_name ename; ads_name ss; resbuf* rbFilter; rbFilter = acutBuildList(RTDXF0, _T("POINT"), RTNONE); TCHAR* intersPrompts[] = {_T("Select points: "), _T("Remove points: ")}; if (acedSSGet(_T(":\$"), intersPrompts, NULL, rbFilter, ss) != RTNORM) { acutRelRb(rbFilter); return; } acutRelRb(rbFilter); long len = 0; if ((acedSSLength(ss, &len) != RTNORM) || (len == 0)) { acedSSFree(ss); return; } for (int i = 0; i < len; i++) { if (acedSSName(ss, i, ename) == RTNORM) { if (acdbGetObjectId(objId, ename) == Acad::eOk) { AcDbObjectPointer<AcDbPoint> pPoint(objId, AcDb::kForRead); if (pPoint.openStatus() == Acad::eOk) { pts.push_back(pPoint->position()); } } } } acedSSFree(ss); acutPrintf(_T("\nPoints before unique: %ld "), pts.size()); std::sort(pts.begin(), pts.end()); pts.erase(std::unique(pts.begin(), pts.end()), pts.end()); acutPrintf(_T("\nPoints after unique: %ld "), pts.size()); pts.clear();}`

#### pkohut

• Bull Frog
• Posts: 483
##### Re: Find duplicate points
« Reply #29 on: January 31, 2011, 03:34:23 AM »
Nice trick Dan

Quote
Code: [Select]
`            public int GetHashCode(Point3d pnt)            {                return (pnt.X.GetHashCode() ^ pnt.Y.GetHashCode() ^ pnt.Z.GetHashCode());            }`

I have the nagging suspicion that this kind of HashCode calculation doesn't quite work in our scenario; it will gladly let false negatives pass, when the delta is actually below the threshold but the coordinates are sufficiently different.

Also, returning a 32 bit int hash from a 64 bit double isn't going to guarantee unique values, let alone 192 bits worth of doubles. Worse yet XOR'ing the hash values reduces the uniqueness even further.
For example: 1 ^ 2 ^ 3 = 0

ps, also from the .Net docs
Quote
The default implementation of the GetHashCode method does not guarantee unique return values for different objects. Furthermore, the .NET Framework does not guarantee the default implementation of the GetHashCode method, and the value it returns will be the same between different versions of the .NET Framework. Consequently, the default implementation of this method must not be used as a unique object identifier for hashing purposes.
New tread (not retired) - public repo at https://github.com/pkohut