Author Topic: Find duplicate points  (Read 14138 times)

0 Members and 1 Guest are viewing this topic.

dann.boy.001

  • Guest
Find duplicate points
« on: January 25, 2011, 03:47:24 AM »
Hello,

Does anybody can help me, with some sugestion, how on quickest
way to remove duplicate points from list or point3dcollection...

Best regards,
Danijel
 

Kerry

  • Mesozoic relic
  • Seagull
  • Posts: 11654
  • class keyThumper<T>:ILazy<T>
Re: Find duplicate points
« Reply #1 on: January 25, 2011, 04:58:01 AM »


Danijel

My first thought :
declare a new Collection
  Iterate the original collection
    Test if each Point3D exists in the new collection using Autodesk.AutoCAD.Geometry.Point3dCollection.Contains(Point3d value)
    If not there , add the Point3D to the new Collection.


Depending on the tolerance you require for matching you may need to write your own equalsTesting method.

There will be a way to do it with LINQ too, I'm sure. :)
kdub, kdub_nz in other timelines.
Perfection is not optional.
Everything will work just as you expect it to, unless your expectations are incorrect.
Discipline: None at all.

dann.boy.001

  • Guest
Re: Find duplicate points
« Reply #2 on: January 25, 2011, 06:07:27 AM »


Danijel

My first thought :
declare a new Collection
  Iterate the original collection
    Test if each Point3D exists in the new collection using Autodesk.AutoCAD.Geometry.Point3dCollection.Contains(Point3d value)
    If not there , add the Point3D to the new Collection.


Depending on the tolerance you require for matching you may need to write your own equalsTesting method.

There will be a way to do it with LINQ too, I'm sure. :)

Thank you Kerry!

Your first sugestion is very good, for now it is enough.
It is very fast method, but unfortunately there is not posibility to specify tolerances.

When find more time, I will try to add tolerances...

Best regards,
Danijel


kaefer

  • Guest
Re: Find duplicate points
« Reply #3 on: January 25, 2011, 09:21:47 AM »
There will be a way to do it with LINQ too, I'm sure. :)
It is very fast method, but unfortunately there is not posibility to specify tolerances.

When find more time, I will try to add tolerances...

Hi Danijel,

as Kerry said, of course there is, and with a tolerance too. Take a look at this page: http://msdn.microsoft.com/en-us/library/bb338049.aspx.

Proof of concept in F# follows; alas, how to get a hashcode for this I couldn't grok yet.
Code: [Select]
type MyEqualityComparer(eps) =
    interface System.Collections.Generic.IEqualityComparer<Point3d> with
        member me.Equals(a: Point3d, b: Point3d) =
            a.DistanceTo b < eps
        member me.GetHashCode(pt: Point3d) =
            0
    
let dupPts eps sset =
    let db = acApp.DocumentManager.MdiActiveDocument.Database
    use tr = db.TransactionManager.StartTransaction()

    let pts =
        new Point3dCollection[|
            for so in Seq.cast<SelectedObject> sset do
                let dbp = tr.GetObject(so.ObjectId, OpenMode.ForRead) :?> DBPoint
                yield dbp.Position |]

    let ptsDistinct =
        pts.OfType<Point3d>().Distinct<Point3d>(new MyEqualityComparer(eps))
    
    let res = pts.Count - Seq.length ptsDistinct
    tr.Commit()
    res

Function signature is epsilon: float -> sset: SelectionSet -> int; that means it returns the difference in length between the Point3dCollection and the IEnumerable which hold the distinct values.

Cheers

dgorsman

  • Water Moccasin
  • Posts: 2437
Re: Find duplicate points
« Reply #4 on: January 25, 2011, 10:36:07 AM »
Sort the list by at least one coordinate first (all coordinates for extremely large sets), so as you compare against each point if the test point coordinate is less than (or greater, depending on the sort order) the main point coordinate you can stop searching since all the other points will never match.
If you are going to fly by the seat of your pants, expect friction burns.

try {GreatPower;}
   catch (notResponsible)
      {NextTime(PlanAhead);}
   finally
      {MasterBasics;}

gile

  • Gator
  • Posts: 2507
  • Marseille, France
Re: Find duplicate points
« Reply #5 on: January 25, 2011, 04:05:10 PM »
Hi,

You can try these extension methods


EDIT: corrected code, see below.

EDIT 2: little optimisation

Code: [Select]
        public static void RemoveDuplicate(this Point3dCollection pts)
        {
            pts.RemoveDuplicate(Tolerance.Global);
        }

        public static void RemoveDuplicate(this Point3dCollection pts, Tolerance tol)
        {
            List<Point3d> ptlst = new List<Point3d>();
            for (int i = 0; i < pts.Count; i++)
            {
                ptlst.Add(pts[i]);
            }
            ptlst.Sort((p1, p2) => p1.X.CompareTo(p2.X));
            for (int i = 0; i < ptlst.Count - 1; i++)
            {
                for (int j = i + 1; j < ptlst.Count;)
                {
                    if ((ptlst[j].X - ptlst[i].X) > tol.EqualPoint)
                        break;
                    if (ptlst[i].IsEqualTo(ptlst[j], tol))
                    {
                        pts.Remove(ptlst[j]);
                        ptlst.RemoveAt(j);
                    }
                    else
                        j++;
                }
            }
        }
« Last Edit: January 26, 2011, 03:04:49 AM by gile »
Speaking English as a French Frog

kaefer

  • Guest
Re: Find duplicate points
« Reply #6 on: January 25, 2011, 05:43:06 PM »
Code: [Select]
            ptlst.Sort((p1, p2) => p1.X.CompareTo(p2.X));
            for (int i = 0; i < ptlst.Count - 1; i++)
            {
                if (ptlst[i].IsEqualTo(ptlst[i + 1], tol))
                    pts.Remove(ptlst[i]);
            }
        }

pt1.IsEqualTo(pt2, new Tolerance(1.0, 1.0)) is equal to (pt1 - pt2).Length <= 1.0. That's good, it's circular as well.

Is it just me or can this fail when encountering more than 2 points that have the same X but different Y or Z?
E.g. ((0,0,0)(0,3,0)(0,1,0)) with Tolerance.EqualPoint = 1 may not flag (0,0,0) and (0,1,0) as equal.


gile

  • Gator
  • Posts: 2507
  • Marseille, France
Re: Find duplicate points
« Reply #7 on: January 25, 2011, 05:49:11 PM »
Quote
Is it just me or can this fail when encountering more than 2 points that have the same X but different Y or Z?
E.g. ((0,0,0)(0,3,0)(0,1,0)) with Tolerance.EqualPoint = 1 may not flag (0,0,0) and (0,1,0) as equal.

You're right thorsen, I was just thinking about this and correct the code in my previous post.
Speaking English as a French Frog

Jeff H

  • Needs a day job
  • Posts: 6150
Re: Find duplicate points
« Reply #8 on: January 25, 2011, 06:36:50 PM »
Since Point3d is a structure could you not just use the Equals method.

Have not tested but equals is overridden and sealed for point3d

mohnston

  • Bull Frog
  • Posts: 305
  • CAD Programmer
Re: Find duplicate points
« Reply #9 on: January 25, 2011, 07:02:10 PM »
Since you are comparing doubles you *must* convert the value before comparing.
Even if you leave the data type as double you need to round or truncate or ?.

I stole recycled this from StackOverflow
Code: [Select]
internal static double DecimalDouble(double d, int NumberOfPlaces)
{
     double scale = Math.Pow(10, Math.Floor(Math.Log10(Math.Abs(d))) + 1);
     return scale * Math.Round(d / scale, NumberOfPlaces);
}


It's amazing what you can do when you don't know what you can't do.
CAD Programming Solutions

Jeff H

  • Needs a day job
  • Posts: 6150
Re: Find duplicate points
« Reply #10 on: January 25, 2011, 07:41:31 PM »
Since Pont3d is a structure Equals seems to work and should check for equality of fields rather than memory location.

This prints duplicate when pnt1 member function Equals passing in pnt2 is called.
Code: [Select]


[CommandMethod("TestDupslicate")]
        public void TestDupslicate()
        {
            Document doc = Application.DocumentManager.MdiActiveDocument;
            Database db = doc.Database;
            Editor ed = doc.Editor;

            Point3dCollection pntcoll = new Point3dCollection();

            Random rdn = new Random();
            for (int i = 0; i <= 10; i++)
            {
                Point3d pnt = new Point3d(rdn.Next(100) + .000001, rdn.Next(100) + .000000001, rdn.Next(100) + .000001);
                pntcoll.Add(pnt);
            }

            Point3d pnt1 = new Point3d(4.000001, 7000000001, 9.000000000000001);
            pntcoll.Add(pnt1);


            Point3d pnt2 = new Point3d(4.000001, 7000000001, 9.000000000000001);

            foreach (Point3d pnt in pntcoll)
            {
                if (pnt.Equals(pnt2))
                {
                   ed.WriteMessage("\nDuplicate");
                }
                else
                {
                    ed.WriteMessage("\nNon-Duplicate");
                }

            }
           
        }
« Last Edit: January 25, 2011, 07:48:07 PM by Jeff H »

Jeff H

  • Needs a day job
  • Posts: 6150
Re: Find duplicate points
« Reply #11 on: January 25, 2011, 08:46:48 PM »
That was not a good example .

I have not narrowed down the exact tolerence but

pnt1, pnt2 will print duplicates and pnt3 will print non-duplicate

Code: [Select]
[CommandMethod("TestDupslicate")]
        public void TestDupslicate()
        {
            Document doc = Application.DocumentManager.MdiActiveDocument;
            Database db = doc.Database;
            Editor ed = doc.Editor;

            Point3dCollection pntcoll = new Point3dCollection();


            Point3d pnt1 = new Point3d(4.000001, 7.000000001, 9.000000000000001);
            pntcoll.Add(pnt1);


            Point3d pnt2 = new Point3d(4.000001, 7.000000001, 9.00000000000001);
            pntcoll.Add(pnt2);

            Point3d pnt3 = new Point3d(4.000001, 7.000000001, 9.000000001);
            pntcoll.Add(pnt3);

            Point3d pnt4 = new Point3d(4.000001, 7.000000001, 9.0000000001);

            foreach (Point3d pnt in pntcoll)
            {
                if (pnt.Equals(pnt4))
                {
                    ed.WriteMessage("\nDuplicate");
                }
                else
                {
                    ed.WriteMessage("\nNon-Duplicate");
                }

            }
           
     
   
           
        }

gile

  • Gator
  • Posts: 2507
  • Marseille, France
Re: Find duplicate points
« Reply #12 on: January 26, 2011, 03:45:14 AM »
Hi,

I tryed to implement Kerry's algo defining an override extension method for Point3dCollection.Contains() to be used with a Tolerance.

From the tests I made with 100 random point.dwg (from here) it seems to be slower than the solution I posted Reply #5.

Code: [Select]
        public static void RemoveDuplicate2(this Point3dCollection pts, Tolerance tol)
        {
            Point3dCollection tmp = new Point3dCollection();
            for (int i = 0; i < pts.Count; )
            {
                if (tmp.Contains(pts[i], tol))
                    pts.RemoveAt(i);
                else
                {
                    tmp.Add(pts[i]);
                    i++;
                }
            }
        }

        public static bool Contains(this Point3dCollection pts, Point3d pt, Tolerance tol)
        {
            for (int i = 0; i < pts.Count; i++)
            {
                if (pt.IsEqualTo(pts[i], tol))
                    return true;
            }
            return false;
        }
Speaking English as a French Frog

Jeff H

  • Needs a day job
  • Posts: 6150
Re: Find duplicate points
« Reply #13 on: January 26, 2011, 04:09:37 AM »
Sorry dann.boy.001 ,

gile uses Point3d.IsEqualTo() which takes a tolerance as a parameter and does not create the uncertainty and extra work that my previous post does.

kaefer

  • Guest
Re: Find duplicate points
« Reply #14 on: January 26, 2011, 06:50:16 AM »
Code: [Select]
internal static double DecimalDouble(double d, int NumberOfPlaces)
{
     double scale = Math.Pow(10, Math.Floor(Math.Log10(Math.Abs(d))) + 1);
     return scale * Math.Round(d / scale, NumberOfPlaces);
}

I thought I'd give it a try, but then I realized that log10(0.) gives an unholy value.

dann.boy.001

  • Guest
Re: Find duplicate points
« Reply #15 on: January 26, 2011, 07:05:15 AM »

Thank you very much guys  :wink:!!!

Tonight I will test it... Seems problem solved  :-)

Best regards
Danijel

gile

  • Gator
  • Posts: 2507
  • Marseille, France
Re: Find duplicate points
« Reply #16 on: January 29, 2011, 10:44:25 AM »
Hi,

Here's a F# solution which seems to be very much faster on big collections than the C# solutions i gave before.

Code: [Select]
type Point3dComparer(pt : Point3d, tol) =
    member this.Pt = pt
    member this.Equals(tPt : Point3dComparer) = this.Pt.IsEqualTo(tPt.Pt, tol)
       
let removeDuplicate pts tol =
    new Point3dCollection(
        pts
        |> Seq.cast
        |> Seq.distinctBy (fun p -> new Point3dComparer(p, tol))
        |> Seq.toArray)
Speaking English as a French Frog

gile

  • Gator
  • Posts: 2507
  • Marseille, France
Re: Find duplicate points
« Reply #17 on: January 29, 2011, 11:11:14 AM »
I ran some little benchs using 100 random points.dwg

Quote
RemoveDuplicate(sorted List)
Removed 2 duplicate in a 10000 items collection
Ellapsed milliseconds : 9.0005

RemoveDuplicate2 (override Contains)
Removed 2 duplicate in a 10000 items collection
Ellapsed milliseconds : 9766.5586

removeDuplicate (F#)
Removed 2 duplicate in a 10000 items collection
Ellapsed milliseconds : 7.0004

Quote
RemoveDuplicate(sorted List)
Removed 383 duplicate in a 10381 items collection
Ellapsed milliseconds : 231.0133

RemoveDuplicate2 (override Contains)
Removed 383 duplicate in a 10381 items collection
Ellapsed milliseconds : 9677.5535

removeDuplicate (F#)
Removed 383 duplicate in a 10381 items collection
Ellapsed milliseconds : 6.0004

Quote
RemoveDuplicate(sorted List)
Removed 4378 duplicate in a 14376 items collection
Ellapsed milliseconds : 2150.123

RemoveDuplicate2 (override Contains)
Removed 4378 duplicate in a 14376 items collection
Ellapsed milliseconds : 9665.5528

removeDuplicate (F#)
Removed 4378 duplicate in a 14376 items collection
Ellapsed milliseconds : 7.0004
Speaking English as a French Frog

It's Alive!

  • Retired
  • Needs a day job
  • Posts: 8691
  • AKA Daniel
Re: Find duplicate points
« Reply #18 on: January 29, 2011, 12:29:41 PM »
how does this compare?

Code: [Select]
using System.Collections.Generic;
using Autodesk.AutoCAD.ApplicationServices;
using Autodesk.AutoCAD.DatabaseServices;
using Autodesk.AutoCAD.EditorInput;
using Autodesk.AutoCAD.Geometry;
using Autodesk.AutoCAD.Runtime;
using AcAp = Autodesk.AutoCAD.ApplicationServices;
using AcDb = Autodesk.AutoCAD.DatabaseServices;


#endregion

[assembly: CommandClass(typeof(ExecMethod.Commands))]
namespace ExecMethod
{
  public class Point3dComparer : IEqualityComparer<Point3d>
  {
    Tolerance m_tol;

    public Point3dComparer(Tolerance tol)
    {
      m_tol = tol;
    }
    public bool Equals(Point3d a, Point3d b)   
    {
      return a.IsEqualTo(b, m_tol);
    }
    public int GetHashCode(Point3d pnt)   
    {
      return (pnt.X.GetHashCode() ^ pnt.Y.GetHashCode() ^ pnt.Z.GetHashCode());     
    }
  }

  public class Commands
  {
    [CommandMethod("doit")]
    public void doit()
    {
      List<Point3d> pointsList = new List<Point3d>();
      Document doc = AcAp.Application.DocumentManager.MdiActiveDocument;
      Database db = AcDb.HostApplicationServices.WorkingDatabase;
      Editor ed = doc.Editor;
      AcDb.TransactionManager tm = doc.TransactionManager;
      using (Transaction tr = tm.StartTransaction())
      {
        BlockTableRecord btr = tr.GetObject
          (db.CurrentSpaceId, OpenMode.ForRead, false) as BlockTableRecord;
        foreach (ObjectId id in btr)
        {
          DBPoint dbpoint = tr.GetObject(id, OpenMode.ForRead, false) as DBPoint;
          if (dbpoint != null)
            pointsList.Add(dbpoint.Position);
        }
        tr.Commit();
        //
        Tolerance tol = new Tolerance();
        var hash = new HashSet<Point3d>(pointsList, new Point3dComparer(tol));
        ed.WriteMessage("{0}-{1}", pointsList.Count, hash.Count);
      }
    }
  }
}
 
« Last Edit: January 29, 2011, 12:49:07 PM by __assume »

gile

  • Gator
  • Posts: 2507
  • Marseille, France
Re: Find duplicate points
« Reply #19 on: January 29, 2011, 01:23:18 PM »
Nice trick Dan  8-)

It seems to be a little faster than my F#code (but very closed to).
I didn't know the HashSet class, one more time, thank you to teatch me a new thing.

Here're the method and the class I used (each other method requires an returns a Point3dCollection).

Code: [Select]
        private Point3dCollection DanRemDup(Point3dCollection pts, Tolerance tol)
        {
            Point3dCollection result = new Point3dCollection();
            Point3d[] ptArr = new Point3d[pts.Count];
            pts.CopyTo(ptArr, 0);
            HashSet<Point3d> hash = new HashSet<Point3d>(ptArr, new Point3dComparer(tol));
            foreach (Point3d p in hash)
            {
                result.Add(p);
            }
            return result;
        }

        public class Point3dComparer : IEqualityComparer<Point3d>
        {
            private Tolerance _tol;

            public Point3dComparer()
            {
                _tol = Tolerance.Global;
            }

            public Point3dComparer(Tolerance tol)
            {
                _tol = tol;
            }

            public bool Equals(Point3d a, Point3d b)
            {
                return a.IsEqualTo(b, _tol);
            }
            public int GetHashCode(Point3d pnt)
            {
                return (pnt.X.GetHashCode() ^ pnt.Y.GetHashCode() ^ pnt.Z.GetHashCode());
            }
        }
« Last Edit: January 29, 2011, 01:36:37 PM by gile »
Speaking English as a French Frog

kaefer

  • Guest
Re: Find duplicate points
« Reply #20 on: January 29, 2011, 01:53:47 PM »
Nice trick Dan

Sorry, boys, it's still slower than Linq's Distinct<T>(IEqualityComparer) extension, at least around here...

Quote
Code: [Select]
            public int GetHashCode(Point3d pnt)
            {
                return (pnt.X.GetHashCode() ^ pnt.Y.GetHashCode() ^ pnt.Z.GetHashCode());
            }

... and when there's a real Tolerance involved, other than 0.0 or Double.Epsilon. I have the nagging suspicion that this kind of HashCode calculation doesn't quite work in our scenario; it will gladly let false negatives pass, when the delta is actually below the threshold but the coordinates are sufficiently different.

BTW, in F# there's a helper to create an IEqualityComparer from lambdas, very fast it is not.

Code: [Select]
   let myEqualityComparer =
        HashIdentity.FromFunctions
            (fun _ -> 0)
            (fun (a: Point3d) b -> a.IsEqualTo(b, new Tolerance(eps, eps)))

Cheers, Thorsten

kaefer

  • Guest
Re: Find duplicate points
« Reply #21 on: January 29, 2011, 06:26:50 PM »
Here's my data for "100 random point.dwg" with Tolerance(1, 1) :

DistinctHashSet
Found 2 duplicate(s) in a 10000 items collection.
Elapsed milliseconds : 5150.0162

DistinctEnumerable
Found 2 duplicate(s) in a 10000 items collection.
Elapsed milliseconds : 3502.6673


Sure, setting the HashCode to 0 is cheating, as is the creation of a sequence where the Current property is never touched.

Code: [Select]
   public class MyEqualityComparer : IEqualityComparer<Point3d>
    {
        Tolerance m_tol;
        public MyEqualityComparer(Tolerance tol)
        {
            m_tol = tol;
        }
        public bool Equals(Point3d a, Point3d b)
        {
            return a.IsEqualTo(b, m_tol);
        }
        public int GetHashCode(Point3d pnt)
        {
            return 0;
        }
    }
    public class Commands
    {
        [CommandMethod("DistinctEnumerable")]
        public void DistinctEnumerable() { cmd("DistinctEnumerable"); }

        [CommandMethod("DistinctHashSet")]
        public void DistinctHashSet() { cmd("DistinctHashSet"); }

        private void cmd(string what)
        {
            Editor ed = acApp.DocumentManager.MdiActiveDocument.Editor;
            PromptDoubleResult pdr = ed.GetDouble("Tolerance: ");
            if (pdr.Status != PromptStatus.OK) return;
            SelectionFilter sf = new SelectionFilter(new TypedValue[] { new TypedValue((int)DxfCode.Start, "POINT") });
            PromptSelectionResult psr = ed.GetSelection(sf);
            if (psr.Status != PromptStatus.OK) return;
            int numin, numout;
            long ticks;
            dupPts(what, pdr.Value, psr.Value, out numin, out numout, out ticks);
            ed.WriteMessage(
                "\n{0} " +
                "\nFound {1} duplicate(s) in a {2} items collection. " +
                "\nElapsed milliseconds : {3} ",
                what,
                numin - numout,
                numin,
                (double)ticks / (double)TimeSpan.TicksPerMillisecond);
        }
        private void dupPts(string what, double eps, SelectionSet sset, out int numin, out int numout, out long ticks)
        {
            Database db = acApp.DocumentManager.MdiActiveDocument.Database;
            using (Transaction tr = db.TransactionManager.StartTransaction())
            {
                Point3dCollection pts = new Point3dCollection();
                foreach (SelectedObject so in sset)
                {
                    DBPoint dbp = (DBPoint)tr.GetObject(so.ObjectId, OpenMode.ForRead);
                    pts.Add(dbp.Position);
                }
                Tolerance tol = new Tolerance(eps, eps);
                numin = pts.Count;
                Stopwatch sw = Stopwatch.StartNew();
                if (what == "DistinctEnumerable")
                {
                    IEnumerable<Point3d> enumerable = pts.Cast<Point3d>().Distinct<Point3d>(new MyEqualityComparer(tol));
                    numout = enumerable.Count<Point3d>();
                }
                else if (what == "DistinctHashSet")
                {
                    HashSet<Point3d> hash = new HashSet<Point3d>(pts.Cast<Point3d>(), new MyEqualityComparer(tol));
                    numout = hash.Count;
                }
                else numout = 0;
                sw.Stop();
                ticks = sw.Elapsed.Ticks;
                tr.Commit();
            }
        }
    }

It's Alive!

  • Retired
  • Needs a day job
  • Posts: 8691
  • AKA Daniel
Re: Find duplicate points
« Reply #22 on: January 29, 2011, 10:40:06 PM »
Nice, although the times are about the same on my box

Code: [Select]
DistinctEnumerable
Found 20002 duplicate(s) in a 30000 items collection.
Elapsed milliseconds : 7844.7255
Command: DistinctHashSet
Tolerance: 1
Select objects: all
30000 found
Select objects:
DistinctHashSet
Found 20002 duplicate(s) in a 30000 items collection.
Elapsed milliseconds : 7775.5968

Jeff H

  • Needs a day job
  • Posts: 6150
Re: Find duplicate points
« Reply #23 on: January 29, 2011, 11:00:38 PM »
My results are very close like yours Daniel

Quote
Command: DISTINCTENUMERABLE
Tolerance: 1
Select objects: all
10000 found
Select objects:
DistinctEnumerable
Found 2 duplicate(s) in a 10000 items collection.
Elapsed milliseconds : 2680.2491
Command: DISTINCTHASHSET
Tolerance: 1
Select objects: all
10000 found
Select objects:
DistinctHashSet
Found 2 duplicate(s) in a 10000 items collection.
Elapsed milliseconds : 2653.0955

dann.boy.001

  • Guest
Re: Find duplicate points
« Reply #24 on: January 30, 2011, 08:03:53 AM »
Hello,

Gile's code (wich he posted in one of previous posts) seems very fast for me.
I am sending that code, litle modified.

Code: [Select]

Imports Autodesk.AutoCAD.Runtime
Imports Autodesk.AutoCAD.ApplicationServices
Imports Autodesk.AutoCAD.DatabaseServices
Imports Autodesk.AutoCAD.EditorInput
Imports Autodesk.AutoCAD.Geometry

Module DuplicatePoints

    <System.Runtime.CompilerServices.Extension()> _
    Public Sub RemoveDuplicate(ByVal pts As Point3dCollection)
        pts.RemoveDuplicate(Tolerance.[Global])
    End Sub

    <System.Runtime.CompilerServices.Extension()> _
    Public Sub RemoveDuplicate(ByRef pts As Point3dCollection, ByVal tol As Tolerance)
        Dim ptlst As New List(Of Point3d)()
        Dim n As Integer = pts.Count
        For i As Integer = 0 To n - 1
            ptlst.Add(pts(i))
        Next
        ptlst.Sort(Function(p1, p2) p1.X.CompareTo(p2.X))
        Dim m As Integer = ptlst.Count
        For i As Integer = 0 To m - 2
            Dim j As Integer = i + 1
            While j < ptlst.Count
                If (ptlst(j).X - ptlst(i).X) > tol.EqualPoint Then
                    Exit While
                End If
                If ptlst(i).IsEqualTo(ptlst(j), tol) Then
                    pts.Remove(ptlst(j))
                    ptlst.RemoveAt(j)
                Else
                    j += 1
                End If
            End While
        Next
    End Sub

    <System.Runtime.CompilerServices.Extension()> _
    Public Sub RemoveDuplicate(ByRef dbpts As List(Of DBPoint), ByRef ListDuplicatePoints As List(Of ObjectId), ByVal tol As Tolerance)
        ListDuplicatePoints = New List(Of ObjectId)
        Dim ptlst As New List(Of DBPoint)()
        Dim n As Integer = dbpts.Count
        For i As Integer = 0 To n - 1
            ptlst.Add(dbpts(i))
        Next
        ptlst.Sort(Function(p1, p2) p1.Position.X.CompareTo(p2.Position.X))
        Dim m As Integer = ptlst.Count
        For i As Integer = 0 To m - 2
            Dim j As Integer = i + 1
            While j < ptlst.Count
                If (ptlst(j).Position.X - ptlst(i).Position.X) > tol.EqualPoint Then
                    Exit While
                End If
                If ptlst(i).Position.IsEqualTo(ptlst(j).Position, tol) Then
                    ListDuplicatePoints.Add(ptlst(j).ObjectId)
                    ptlst.RemoveAt(j)
                Else
                    j += 1
                End If
            End While
        Next
    End Sub

End Module

Public Class Test

    <CommandMethod("DPOINTS")> _
    Public Sub DPOINTS()

        Dim doc As Document = Application.DocumentManager.MdiActiveDocument
        Dim db As Database = doc.Database
        Dim ed As Editor = doc.Editor

        'Get points
        Dim SelectedPoints As IEnumerable = SelectPoints()
        If SelectedPoints Is Nothing Then Exit Sub

        'Get tolerance
        Dim pdr As PromptDoubleResult = ed.GetDouble("Tolerance: ")
        If pdr.Status <> PromptStatus.OK Then Exit Sub

        'Open transaction for reading points (or to later erase double points)
        Dim tr As Transaction = db.TransactionManager.StartTransaction()
        Using tr

            DeleteDuplicatePoints3(SelectedPoints, tr, 0)

            tr.Commit()
        End Using

    End Sub

    Public Function SelectPoints() As IEnumerable

        Dim doc As Document = Application.DocumentManager.MdiActiveDocument
        Dim db As Database = doc.Database
        Dim ed As Editor = doc.Editor

        Dim acTypValAr(0) As TypedValue
        acTypValAr.SetValue(New TypedValue(DxfCode.Start, "POINT"), 0)

        Dim sf As New SelectionFilter(acTypValAr)
        Dim pso As New PromptSelectionOptions()

        pso.MessageForAdding = "Select points: "
        pso.AllowDuplicates = False

        Dim psr As PromptSelectionResult = ed.GetSelection(pso, sf)

        If psr.Status <> PromptStatus.OK Then
            Return Nothing
        Else
            Return psr.Value.GetObjectIds
        End If

    End Function

    Public Sub DeleteDuplicatePoints3(ByVal o As IEnumerable, ByVal t As Transaction, ByVal Tol As Double)

        'Get list of points
        Dim ListPoints As New List(Of DBPoint)

        For Each id As ObjectId In o

            Dim dbpt As DBPoint = DirectCast(t.GetObject(id, OpenMode.ForRead), DBPoint)

            'Add point to list
            ListPoints.Add(dbpt)

        Next

        'Here add duplicate points (if exist)
        Dim ListDuplicatePoints As New List(Of ObjectId)

        Dim sw As Stopwatch = Stopwatch.StartNew()

        'Get duplicate points
        DuplicatePoints.RemoveDuplicate(ListPoints, ListDuplicatePoints, New Tolerance(Tol, Tol))

        sw.Stop()

        MsgBox("Found " + ListDuplicatePoints.Count.ToString + " duplicate(s) in a " + ListPoints.Count.ToString + " items collection. Elapsed milliseconds : " + (CDbl(sw.Elapsed.Ticks) / CDbl(TimeSpan.TicksPerMillisecond)).ToString)

    End Sub

End Class



Regards
Danijel
« Last Edit: January 30, 2011, 08:08:47 AM by dann.boy.001 »

gile

  • Gator
  • Posts: 2507
  • Marseille, France
Re: Find duplicate points
« Reply #25 on: January 30, 2011, 09:06:25 AM »
Hi,

First, thanks again to all, I learned a lot with this topic.

Finally, as dann.boy.001 said the sorted list route seems to be really faster with large collections than the IEnumerable and HashSet ones (which are very closed).

Using Thorsen's test commands (mine returned false results...) in which I add 'SortList' one:
Code: [Select]
                else if (what == "SortList")
                {
                    List<Point3d> lst = SortList(pts, tol);
                    numout = lst.Count;
                }
This command calls a method which returns a List<Point3d> object.
Code: [Select]
        private List<Point3d> SortList(Point3dCollection pts, Tolerance tol)
        {
            List<Point3d> ptlst = new List<Point3d>();
            for (int i = 0; i < pts.Count; i++)
            {
                ptlst.Add(pts[i]);
            }
            ptlst.Sort((p1, p2) => p1.X.CompareTo(p2.X));
            for (int i = 0; i < ptlst.Count - 1; i++)
            {
                for (int j = i + 1; j < ptlst.Count; )
                {
                    if ((ptlst[j].X - ptlst[i].X) > tol.EqualPoint)
                        break;
                    if (ptlst[i].IsEqualTo(ptlst[j], tol))
                    {
                        ptlst.RemoveAt(j);
                    }
                    else
                        j++;
                }
            }
            return ptlst;
        }

Here're the results with 10000 points
Quote
DistinctEnumerable
Found 2 duplicate(s) in a 10000 items collection.
Elapsed milliseconds : 3072.7459

DistinctHashSet
Found 2 duplicate(s) in a 10000 items collection.
Elapsed milliseconds : 2999.9208

SortedList
Found 2 duplicate(s) in a 10000 items collection.
Elapsed milliseconds : 18.9593

With 20000 points
Quote
DistinctEnumerable
Found 10002 duplicate(s) in a 20000 items collection.
Elapsed milliseconds : 6106.0164

DistinctHashSet
Found 10002 duplicate(s) in a 20000 items collection.
Elapsed milliseconds : 5996.8205:

SortedList
Found 10002 duplicate(s) in a 20000 items collection.
Elapsed milliseconds : 317.208

With 40000 points
Quote
DistinctEnumerable
Found 20004 duplicate(s) in a 40000 items collection.
Elapsed milliseconds : 18422.6259

DistinctHashSet
Found 20004 duplicate(s) in a 40000 items collection.
Elapsed milliseconds : 18053.2102

SortedList
Found 20004 duplicate(s) in a 40000 items collection.
Elapsed milliseconds : 1260.4816
Speaking English as a French Frog

gile

  • Gator
  • Posts: 2507
  • Marseille, France
Re: Find duplicate points
« Reply #26 on: January 30, 2011, 01:48:04 PM »
I ran the provious tests on AutoCAD 2007 32 bits.

The same tests with AutoCAD 2011 64 bits gives very different results:

10000 points
Quote
DistinctEnumerable
Found 0 duplicate(s) in a 10000 items collection.
Elapsed milliseconds : 2.4629

DistinctHashSet
Found 0 duplicate(s) in a 10000 items collection.
Elapsed milliseconds : 2.5144

SortList
Found 2 duplicate(s) in a 10000 items collection.
Elapsed milliseconds : 7.4756

20000 points
Quote
DistinctEnumerable
Found 10000 duplicate(s) in a 20000 items collection.
Elapsed milliseconds : 6.8504

DistinctHashSet
Found 10000 duplicate(s) in a 20000 items collection.
Elapsed milliseconds : 6.0844

SortList
Found 10002 duplicate(s) in a 20000 items collection.
Elapsed milliseconds : 222.6415

40000 points
Quote
DistinctEnumerable
Found 20000 duplicate(s) in a 40000 items collection.
Elapsed milliseconds : 14.7062

DistinctHashSet
Found 20000 duplicate(s) in a 40000 items collection.
Elapsed milliseconds : 14.3403

SortList
Found 20004 duplicate(s) in a 40000 items collection.
Elapsed milliseconds : 895.1789
Speaking English as a French Frog

Jeff H

  • Needs a day job
  • Posts: 6150
Re: Find duplicate points
« Reply #27 on: January 30, 2011, 03:57:11 PM »
The same tests with AutoCAD 2011 64 bits gives very different results:

Did on 2011 64 bit and I am showing duplicates matching test on 2011 32 bit unlike your tests

LE3

  • Guest
Re: Find duplicate points
« Reply #28 on: January 30, 2011, 04:44:23 PM »
And for reference a way in C++/ARX
Code: [Select]
bool operator < (const AcGePoint3d& pt1, const AcGePoint3d& pt2)
{
if (pt1.x == pt2.x)
{
if (pt1.y == pt2.y)
return (pt1.z < pt2.z);
else
return (pt1.y < pt2.y);
}
else
return (pt1.x < pt2.x);
}
static void ArxCmdsTester_DuplicatePointsFinder(void)
{
std::vector<AcGePoint3d> pts;
AcDbObjectId objId;
ads_name ename;
ads_name ss;
resbuf* rbFilter;
rbFilter = acutBuildList(RTDXF0, _T("POINT"), RTNONE);
TCHAR* intersPrompts[] = {_T("Select points: "), _T("Remove points: ")};
if (acedSSGet(_T(":$"), intersPrompts, NULL, rbFilter, ss) != RTNORM)
{
acutRelRb(rbFilter);
return;
}
acutRelRb(rbFilter);
long len = 0;
if ((acedSSLength(ss, &len) != RTNORM) || (len == 0))
{
acedSSFree(ss);
return;
}
for (int i = 0; i < len; i++)
{
if (acedSSName(ss, i, ename) == RTNORM)
{
if (acdbGetObjectId(objId, ename) == Acad::eOk)
{
AcDbObjectPointer<AcDbPoint> pPoint(objId, AcDb::kForRead);
if (pPoint.openStatus() == Acad::eOk)
{
pts.push_back(pPoint->position());
}
}
}
}
acedSSFree(ss);
acutPrintf(_T("\nPoints before unique: %ld "), pts.size());
std::sort(pts.begin(), pts.end());
pts.erase(std::unique(pts.begin(), pts.end()), pts.end());
acutPrintf(_T("\nPoints after unique: %ld "), pts.size());
pts.clear();
}

pkohut

  • Bull Frog
  • Posts: 483
Re: Find duplicate points
« Reply #29 on: January 31, 2011, 03:34:23 AM »
Nice trick Dan

Quote
Code: [Select]
            public int GetHashCode(Point3d pnt)
            {
                return (pnt.X.GetHashCode() ^ pnt.Y.GetHashCode() ^ pnt.Z.GetHashCode());
            }

I have the nagging suspicion that this kind of HashCode calculation doesn't quite work in our scenario; it will gladly let false negatives pass, when the delta is actually below the threshold but the coordinates are sufficiently different.

Also, returning a 32 bit int hash from a 64 bit double isn't going to guarantee unique values, let alone 192 bits worth of doubles. Worse yet XOR'ing the hash values reduces the uniqueness even further.
For example: 1 ^ 2 ^ 3 = 0

ps, also from the .Net docs
Quote
The default implementation of the GetHashCode method does not guarantee unique return values for different objects. Furthermore, the .NET Framework does not guarantee the default implementation of the GetHashCode method, and the value it returns will be the same between different versions of the .NET Framework. Consequently, the default implementation of this method must not be used as a unique object identifier for hashing purposes.
New tread (not retired) - public repo at https://github.com/pkohut

pkohut

  • Bull Frog
  • Posts: 483
Re: Find duplicate points
« Reply #30 on: January 31, 2011, 04:01:04 AM »
And for reference a way in C++/ARX
Code: [Select]
bool operator < (const AcGePoint3d& pt1, const AcGePoint3d& pt2)
{
if (pt1.x == pt2.x)
{
if (pt1.y == pt2.y)
return (pt1.z < pt2.z);
else
return (pt1.y < pt2.y);
}
else
return (pt1.x < pt2.x);
}

Shorten that up...
Code: [Select]
bool operator < (const AcGePoint3d & pt1, const AcGePoint3d & pt2)
{
    AcGeVector3d pt = pt1 - pt2;
    return pt.x < 0.0 || pt.y < 0.0 || pt.z < 0.0;
}
New tread (not retired) - public repo at https://github.com/pkohut

It's Alive!

  • Retired
  • Needs a day job
  • Posts: 8691
  • AKA Daniel
Re: Find duplicate points
« Reply #31 on: January 31, 2011, 04:25:36 AM »
Nice trick Dan

Quote
Code: [Select]
            public int GetHashCode(Point3d pnt)
            {
                return (pnt.X.GetHashCode() ^ pnt.Y.GetHashCode() ^ pnt.Z.GetHashCode());
            }

I have the nagging suspicion that this kind of HashCode calculation doesn't quite work in our scenario; it will gladly let false negatives pass, when the delta is actually below the threshold but the coordinates are sufficiently different.

Also, returning a 32 bit int hash from a 64 bit double isn't going to guarantee unique values, let alone 192 bits worth of doubles. Worse yet XOR'ing the hash values reduces the uniqueness even further.
For example: 1 ^ 2 ^ 3 = 0

ps, also from the .Net docs
Quote
The default implementation of the GetHashCode method does not guarantee unique return values for different objects. Furthermore, the .NET Framework does not guarantee the default implementation of the GetHashCode method, and the value it returns will be the same between different versions of the .NET Framework. Consequently, the default implementation of this method must not be used as a unique object identifier for hashing purposes.

if you notice, that call is not using the double's value to get generate a hash, but calling the objects GetHashCode() method.

I.e
Code: [Select]
double x = 9.990;
Console.WriteLine(x.GetHashCode().ToString());
prints
126742170

and
Code: [Select]

double x = 1.0;
double y = 2.0;
double z = 3.0;
int hash = x.GetHashCode() ^ y.GetHashCode() ^ z.GetHashCode();
Console.WriteLine(hash.ToString());

1073217536
« Last Edit: January 31, 2011, 04:29:58 AM by __assume »

It's Alive!

  • Retired
  • Needs a day job
  • Posts: 8691
  • AKA Daniel
Re: Find duplicate points
« Reply #32 on: January 31, 2011, 04:32:04 AM »
Oh and classes using IEqualityComparer must override GetHashCode()

pkohut

  • Bull Frog
  • Posts: 483
Re: Find duplicate points
« Reply #33 on: January 31, 2011, 05:22:25 AM »
if you notice, that call is not using the double's value to get generate a hash, but calling the objects GetHashCode() method.
Code: [Select]
double x = 1.0;
double y = 2.0;
double z = 3.0;
int hash = x.GetHashCode() ^ y.GetHashCode() ^ z.GetHashCode();
Console.WriteLine(hash.ToString());

Wouldn't matter. The object represents a double value of 64 bits and can't be packed into a 32 bit hash without data loss.

For the XOR part of the problem (not on a Windows box right now) what is the result of x = y = z = 10.0? How about x = 10, y = z = 0.0? I think they might be the same.
New tread (not retired) - public repo at https://github.com/pkohut

It's Alive!

  • Retired
  • Needs a day job
  • Posts: 8691
  • AKA Daniel
Re: Find duplicate points
« Reply #34 on: January 31, 2011, 05:30:30 AM »
if you notice, that call is not using the double's value to get generate a hash, but calling the objects GetHashCode() method.
Code: [Select]
double x = 1.0;
double y = 2.0;
double z = 3.0;
int hash = x.GetHashCode() ^ y.GetHashCode() ^ z.GetHashCode();
Console.WriteLine(hash.ToString());

Wouldn't matter. The object represents a double value of 64 bits and can't be packed into a 32 bit hash without data loss.

For the XOR part of the problem (not on a Windows box right now) what is the result of x = y = z = 10.0? How about x = 10, y = z = 0.0? I think they might be the same.

true that! what would you recomend?

pkohut

  • Bull Frog
  • Posts: 483
Re: Find duplicate points
« Reply #35 on: January 31, 2011, 05:44:51 AM »
true that! what would you recomend?

Don't have one. Once looked at an interesting x, y, z encoding technique but never figured out how to get it to work with doubles (works best with whole numbers of limited size). Drawing a blank right now on the name a mathematician the technique is named after. Will post a link in a bit.
New tread (not retired) - public repo at https://github.com/pkohut

kaefer

  • Guest
Re: Find duplicate points
« Reply #36 on: January 31, 2011, 05:49:00 AM »
For the XOR part of the problem (not on a Windows box right now) what is the result of x = y = z = 10.0? How about x = 10, y = z = 0.0? I think they might be the same.

Wouldn't matter. It's a false positive, the Equals() override will take care of that.

My concern are the cases where GetHashCode() return differing values, then Equals() won't be called, but would return true if it were.

Quote
true that! what would you recomend?


« Last Edit: January 31, 2011, 06:50:08 AM by kaefer »

fixo

  • Guest
Re: Find duplicate points
« Reply #37 on: January 31, 2011, 06:23:12 AM »

pkohut

  • Bull Frog
  • Posts: 483
Re: Find duplicate points
« Reply #38 on: January 31, 2011, 06:48:09 AM »
true that! what would you recomend?

Not a recommendation, just providing some prior research into the problem prior to settling on using KD Trees.
Calculating a unique hash for a given 3D point is really about finding a 1D value that represents the point. The mathematician Giuseppe Peano, http://en.wikipedia.org/wiki/Giuseppe_Peano, figured out such a method and it is called Peano Coordinate encoding which has the properties of a nearest neighbor problem.

To encode a 2D point of say, x=1234 and y = 5678 the Peano encoded 1D value is 15263748, which can also be decoded back to the original value. When multiple 2D values are encoded and sorted then they will fall naturally into a nearest neighbor when traversed.

The internal storage of a 64 bit double complicates the implementation of Peano encoding for these types. It may not be practical to use?
New tread (not retired) - public repo at https://github.com/pkohut

LE3

  • Guest
Re: Find duplicate points
« Reply #39 on: January 31, 2011, 01:30:26 PM »
Thanks Paul,

I have tried something similar in the past, also by using the isEqualTo() with AcGeTol and none works, excepts the one I posted earlier.

If I try with a small selection of points works, btw.

I did my testings with a 100,000 points and having the same amount of duplicates.

Shorten that up...
Code: [Select]
bool operator < (const AcGePoint3d & pt1, const AcGePoint3d & pt2)
{
    AcGeVector3d pt = pt1 - pt2;
    return pt.x < 0.0 || pt.y < 0.0 || pt.z < 0.0;
}

Jeff H

  • Needs a day job
  • Posts: 6150
Re: Find duplicate points
« Reply #40 on: July 26, 2011, 12:50:51 PM »
I saw this and reminded me of this thread and thought this might be a good collection to use for task
SortedSet<T>
- Does not add duplicates
- takes a IComparer in constructor

http://msdn.microsoft.com/en-us/library/dd395024.aspx