Author Topic: Find duplicate points  (Read 14166 times)

0 Members and 1 Guest are viewing this topic.

pkohut

  • Bull Frog
  • Posts: 483
Re: Find duplicate points
« Reply #30 on: January 31, 2011, 04:01:04 AM »
And for reference a way in C++/ARX
Code: [Select]
bool operator < (const AcGePoint3d& pt1, const AcGePoint3d& pt2)
{
if (pt1.x == pt2.x)
{
if (pt1.y == pt2.y)
return (pt1.z < pt2.z);
else
return (pt1.y < pt2.y);
}
else
return (pt1.x < pt2.x);
}

Shorten that up...
Code: [Select]
bool operator < (const AcGePoint3d & pt1, const AcGePoint3d & pt2)
{
    AcGeVector3d pt = pt1 - pt2;
    return pt.x < 0.0 || pt.y < 0.0 || pt.z < 0.0;
}
New tread (not retired) - public repo at https://github.com/pkohut

It's Alive!

  • Retired
  • Needs a day job
  • Posts: 8706
  • AKA Daniel
Re: Find duplicate points
« Reply #31 on: January 31, 2011, 04:25:36 AM »
Nice trick Dan

Quote
Code: [Select]
            public int GetHashCode(Point3d pnt)
            {
                return (pnt.X.GetHashCode() ^ pnt.Y.GetHashCode() ^ pnt.Z.GetHashCode());
            }

I have the nagging suspicion that this kind of HashCode calculation doesn't quite work in our scenario; it will gladly let false negatives pass, when the delta is actually below the threshold but the coordinates are sufficiently different.

Also, returning a 32 bit int hash from a 64 bit double isn't going to guarantee unique values, let alone 192 bits worth of doubles. Worse yet XOR'ing the hash values reduces the uniqueness even further.
For example: 1 ^ 2 ^ 3 = 0

ps, also from the .Net docs
Quote
The default implementation of the GetHashCode method does not guarantee unique return values for different objects. Furthermore, the .NET Framework does not guarantee the default implementation of the GetHashCode method, and the value it returns will be the same between different versions of the .NET Framework. Consequently, the default implementation of this method must not be used as a unique object identifier for hashing purposes.

if you notice, that call is not using the double's value to get generate a hash, but calling the objects GetHashCode() method.

I.e
Code: [Select]
double x = 9.990;
Console.WriteLine(x.GetHashCode().ToString());
prints
126742170

and
Code: [Select]

double x = 1.0;
double y = 2.0;
double z = 3.0;
int hash = x.GetHashCode() ^ y.GetHashCode() ^ z.GetHashCode();
Console.WriteLine(hash.ToString());

1073217536
« Last Edit: January 31, 2011, 04:29:58 AM by __assume »

It's Alive!

  • Retired
  • Needs a day job
  • Posts: 8706
  • AKA Daniel
Re: Find duplicate points
« Reply #32 on: January 31, 2011, 04:32:04 AM »
Oh and classes using IEqualityComparer must override GetHashCode()

pkohut

  • Bull Frog
  • Posts: 483
Re: Find duplicate points
« Reply #33 on: January 31, 2011, 05:22:25 AM »
if you notice, that call is not using the double's value to get generate a hash, but calling the objects GetHashCode() method.
Code: [Select]
double x = 1.0;
double y = 2.0;
double z = 3.0;
int hash = x.GetHashCode() ^ y.GetHashCode() ^ z.GetHashCode();
Console.WriteLine(hash.ToString());

Wouldn't matter. The object represents a double value of 64 bits and can't be packed into a 32 bit hash without data loss.

For the XOR part of the problem (not on a Windows box right now) what is the result of x = y = z = 10.0? How about x = 10, y = z = 0.0? I think they might be the same.
New tread (not retired) - public repo at https://github.com/pkohut

It's Alive!

  • Retired
  • Needs a day job
  • Posts: 8706
  • AKA Daniel
Re: Find duplicate points
« Reply #34 on: January 31, 2011, 05:30:30 AM »
if you notice, that call is not using the double's value to get generate a hash, but calling the objects GetHashCode() method.
Code: [Select]
double x = 1.0;
double y = 2.0;
double z = 3.0;
int hash = x.GetHashCode() ^ y.GetHashCode() ^ z.GetHashCode();
Console.WriteLine(hash.ToString());

Wouldn't matter. The object represents a double value of 64 bits and can't be packed into a 32 bit hash without data loss.

For the XOR part of the problem (not on a Windows box right now) what is the result of x = y = z = 10.0? How about x = 10, y = z = 0.0? I think they might be the same.

true that! what would you recomend?

pkohut

  • Bull Frog
  • Posts: 483
Re: Find duplicate points
« Reply #35 on: January 31, 2011, 05:44:51 AM »
true that! what would you recomend?

Don't have one. Once looked at an interesting x, y, z encoding technique but never figured out how to get it to work with doubles (works best with whole numbers of limited size). Drawing a blank right now on the name a mathematician the technique is named after. Will post a link in a bit.
New tread (not retired) - public repo at https://github.com/pkohut

kaefer

  • Guest
Re: Find duplicate points
« Reply #36 on: January 31, 2011, 05:49:00 AM »
For the XOR part of the problem (not on a Windows box right now) what is the result of x = y = z = 10.0? How about x = 10, y = z = 0.0? I think they might be the same.

Wouldn't matter. It's a false positive, the Equals() override will take care of that.

My concern are the cases where GetHashCode() return differing values, then Equals() won't be called, but would return true if it were.

Quote
true that! what would you recomend?


« Last Edit: January 31, 2011, 06:50:08 AM by kaefer »

fixo

  • Guest
Re: Find duplicate points
« Reply #37 on: January 31, 2011, 06:23:12 AM »

pkohut

  • Bull Frog
  • Posts: 483
Re: Find duplicate points
« Reply #38 on: January 31, 2011, 06:48:09 AM »
true that! what would you recomend?

Not a recommendation, just providing some prior research into the problem prior to settling on using KD Trees.
Calculating a unique hash for a given 3D point is really about finding a 1D value that represents the point. The mathematician Giuseppe Peano, http://en.wikipedia.org/wiki/Giuseppe_Peano, figured out such a method and it is called Peano Coordinate encoding which has the properties of a nearest neighbor problem.

To encode a 2D point of say, x=1234 and y = 5678 the Peano encoded 1D value is 15263748, which can also be decoded back to the original value. When multiple 2D values are encoded and sorted then they will fall naturally into a nearest neighbor when traversed.

The internal storage of a 64 bit double complicates the implementation of Peano encoding for these types. It may not be practical to use?
New tread (not retired) - public repo at https://github.com/pkohut

LE3

  • Guest
Re: Find duplicate points
« Reply #39 on: January 31, 2011, 01:30:26 PM »
Thanks Paul,

I have tried something similar in the past, also by using the isEqualTo() with AcGeTol and none works, excepts the one I posted earlier.

If I try with a small selection of points works, btw.

I did my testings with a 100,000 points and having the same amount of duplicates.

Shorten that up...
Code: [Select]
bool operator < (const AcGePoint3d & pt1, const AcGePoint3d & pt2)
{
    AcGeVector3d pt = pt1 - pt2;
    return pt.x < 0.0 || pt.y < 0.0 || pt.z < 0.0;
}

Jeff H

  • Needs a day job
  • Posts: 6150
Re: Find duplicate points
« Reply #40 on: July 26, 2011, 12:50:51 PM »
I saw this and reminded me of this thread and thought this might be a good collection to use for task
SortedSet<T>
- Does not add duplicates
- takes a IComparer in constructor

http://msdn.microsoft.com/en-us/library/dd395024.aspx