Author Topic: [XDrX-PlugIn(165)] Use a hash table to quickly delete completely overlapping ent  (Read 976 times)

0 Members and 2 Guests are viewing this topic.

xdcad

  • Swamp Rat
  • Posts: 522
Use a hash table to quickly delete completely overlapping entities

=============

What is a Hash Table?

A hash table (or hash map) is a data structure that provides a mechanism for storing key-value pairs, allowing for fast data retrieval. Each key is processed through a hash function, which computes an index (or hash code) into an array of buckets or slots. The key-value pair is then stored in the corresponding bucket. The fundamental operations of a hash table, such as insertion, deletion, and search, typically have an average time complexity of O(1), making them very efficient.

Why is Using a Hash Table to Delete Identical Entities in ARX Highly Efficient?

1. Fast Lookups:
  • The primary strength of a hash table is its ability to perform quick lookups. When deleting duplicate entities, you can compute a hash for each entity and check if that hash already exists in the table.
  • If the hash already exists, it means the entity is a duplicate and can be marked for deletion.
  • If the hash does not exist, the entity is added to the table.

2. Efficient Handling of Collisions:

  • In a hash table, collisions occur when two different entities produce the same hash code. Advanced hash table implementations efficiently handle collisions using techniques like chaining (where each bucket is a linked list of entries) or open addressing (where a collision triggers a search for the next available slot).
  • Proper handling of collisions ensures that even with duplicate entities, the overall performance remains optimal.

3. Scalability:

  • Hash tables scale well with the number of entities. As more entities are added, a well-implemented hash table can resize itself to maintain performance, ensuring that operations remain O(1) on average.
  • This scalability is crucial when dealing with large datasets or complex drawings in ARX.

ARX and Hash Tables

In the context of ARX (AutoCAD Runtime Extension), using a hash table to manage and delete duplicate entities is beneficial due to the following reasons:

1.Unique Identification:
  • Each entity can be uniquely identified using a hash value that combines various properties (such as geometry, layer, color, etc.).
  • This unique identification ensures that only truly identical entities are considered duplicates.

2.Performance:

  • AutoCAD drawings can contain thousands or even millions of entities. Using a hash table to manage duplicates ensures that the process remains efficient and does not degrade performance.
  • This efficiency is especially important in complex CAD environments where users require quick and responsive operations.

Sample Code for Removing Duplicate Entities Using a Hash Table
Here’s a corrected and complete code sample in ARX for removing duplicate entities using a hash table:

Code - C++: [Select]
  1. /**
  2.  * @brief Removes duplicate entities, keeping only unique ones.
  3.  *
  4.  * @param objectIdArray Array of entity object IDs
  5.  * @param param Bitmask parameter used for selectively calculating the hash value:
  6.  *  - 1: Use the entity's dxfName
  7.  *  - 2: Use the entity's layerName
  8.  *  - 4: Use the entity's colorIndex
  9.  *  - 8: Use the entity's linetypeName
  10.  *  - 16: Use the entity's linetypeScale
  11.  *  - 32: Use the entity's lineWidth
  12.  *  - 64: Use the entity's GRIP Point array (nStretchPts)
  13.  *  - 128: Use the entity's bounding box (box)
  14.  *  - 256: Use the entity's centroid
  15.  *  - 512: Use the entity's description (entity->desc())
  16.  */
  17. void XdDbUtils::RemoveDuplicateEntities(AcDbObjectIdArray& objectIdArray, int param)
  18. {
  19.     std::unordered_map<size_t, AcDbObjectId> entityHashmap;
  20.     AcDbObjectIdArray remainingObjects;
  21.  
  22.     // Iterate through the entity object array
  23.     for (int i = 0; i < objectIdArray.length(); ++i) {
  24.         AcDbObjectId objectId = objectIdArray[i];
  25.         AcDbEntity* pEntity = nullptr;
  26.         if (acdbOpenAcDbEntity(pEntity, objectId, AcDb::kForRead) == Acad::eOk && pEntity) {
  27.             // Calculate the hash value of the entity
  28.             size_t hash = XdDbUtils::CalculateHash(pEntity, param);
  29.  
  30.             // Check if the hash value already exists in the hash table
  31.             if (entityHashmap.find(hash) == entityHashmap.end()) {
  32.                 // The hash value does not exist, add to the hash table and remainingObjects array
  33.                 entityHashmap[hash] = objectId;
  34.                 remainingObjects.append(objectId);
  35.             }
  36.             // Close the entity
  37.             pEntity->close();
  38.         }
  39.     }
  40.  
  41.     // Replace the original objectIdArray with the remainingObjects array
  42.     objectIdArray = remainingObjects;
  43. }
  44.  

Conclusion

Using a hash table to delete duplicate entities in ARX is highly efficient due to the hash table's fast lookup, insertion, and deletion operations. This efficiency is crucial for maintaining performance in complex AutoCAD environments, ensuring that even large datasets are processed quickly and accurately.

==================


Code: [Select]
(defun c:xdtb_removeovents (/ ss len ret)
  (xdrx-begin)
  (if (setq ss (xdrx-ssget
(xdrx-string-multilanguage
   "\n选择要处理的对象<退出>:"
   "\nSelect objects to process <Exit>:"
)
       )
      )
    (progn
      (setq len (sslength ss))
      (setq ret (xdrx-entity-removeduplicates ss 129))
      (xdrx-prompt
(xdrx-string-formatex
  (xdrx-string-multilanguage
    "\n共删除了 %d 个重复的实体."
    "\nA total of %d duplicate entities have been removed."
  )
  (- len ret)
)
      )
    )
  )
  (xdrx-end)
  (princ)
)
« Last Edit: May 23, 2024, 10:21:16 AM by xdcad »
The code I wrote uses XDRX-API,which can be downloaded from github.com and is updated at any time.
===================================
https://github.com/xdcad
https://sourceforge.net/projects/xdrx-api-zip/
http://bbs.xdcad.net

It's Alive!

  • Retired
  • Needs a day job
  • Posts: 8858
  • AKA Daniel
Cool, I always thought it would be really cool to have hashmaps in Autolisp.
I had imagined using hash_combine generating hashes from lists, so lisp users could create generic hash maps

Code - Auto/Visual Lisp: [Select]
  1. (vlia_k_v '(100.0 100.0 10.0) '((-1 . <Entity name: 23873ab8490>) (0 . LWPOLYLINE)...))
  2.  

I toyed with it here, but passing data between autolisp and arx is just not reliable..
so bleh @ autodesk, there's really no good way to extend autolisp.

https://www.theswamp.org/index.php?topic=57894.0


xdcad

  • Swamp Rat
  • Posts: 522
Cool, I always thought it would be really cool to have hashmaps in Autolisp.
I had imagined using hash_combine generating hashes from lists, so lisp users could create generic hash maps

Code - Auto/Visual Lisp: [Select]
  1.  
  2. (vlia_k_v '(100.0 100.0 10.0) '((-1 . <Entity name: 23873ab8490>) (0 . LWPOLYLINE)...))
  3.  

I toyed with it here, but passing data between autolisp and arx is just not reliable..
so bleh @ autodesk, there's really no good way to extend autolisp.

https://www.theswamp.org/index.php?topic=57894.0

thanks,
In addition to the xdrx-entity-removeduplicates provided above, the XDRX API
Functions for calculating the hash value of an entity are also provided:
xdrx-entity-hashstring

Code - Auto/Visual Lisp: [Select]
  1. ;|
  2. @param param Bitmask parameter used for selectively calculating the hash value:
  3.  *  - 1: Use the entity's dxfName
  4.  *  - 2: Use the entity's layerName
  5.  *  - 4: Use the entity's colorIndex
  6.  *  - 8: Use the entity's linetypeName
  7.  *  - 16: Use the entity's linetypeScale
  8.  *  - 32: Use the entity's lineWidth
  9.  *  - 64: Use the entity's GRIP Point array (nStretchPts)
  10.  *  - 128: Use the entity's bounding box (box)
  11.  *  - 256: Use the entity's centroid
  12.  *  - 512: Use the entity's description (entity->desc())
  13. |;
  14. Command: (xdrx-entity-hashstring (entlast) 1)
  15. "1078249248256508798"
  16. Command: (xdrx-entity-hashstring (entlast) 3)
  17. "11643239585821548497"
  18. Command: (xdrx-entity-hashstring (entlast) 129)
  19. "3826409512706688743"
  20.  
« Last Edit: May 24, 2024, 12:47:07 AM by xdcad »
The code I wrote uses XDRX-API,which can be downloaded from github.com and is updated at any time.
===================================
https://github.com/xdcad
https://sourceforge.net/projects/xdrx-api-zip/
http://bbs.xdcad.net

xdcad

  • Swamp Rat
  • Posts: 522
Cool, I always thought it would be really cool to have hashmaps in Autolisp.
I had imagined using hash_combine generating hashes from lists, so lisp users could create generic hash maps

Code - Auto/Visual Lisp: [Select]
  1. (vlia_k_v '(100.0 100.0 10.0) '((-1 . <Entity name: 23873ab8490>) (0 . LWPOLYLINE)...))
  2.  

I toyed with it here, but passing data between autolisp and arx is just not reliable..
so bleh @ autodesk, there's really no good way to extend autolisp.

https://www.theswamp.org/index.php?topic=57894.0

Group code 10, ARX is treated as 3D point,
So for lwpolyline, after lisp is passed to arx for processing, the two-dimensional points will become three-dimensional points.
The code I wrote uses XDRX-API,which can be downloaded from github.com and is updated at any time.
===================================
https://github.com/xdcad
https://sourceforge.net/projects/xdrx-api-zip/
http://bbs.xdcad.net

It's Alive!

  • Retired
  • Needs a day job
  • Posts: 8858
  • AKA Daniel
Yes, since arx interprets the list, it’s not reliable to hash.
Example  '(220 220 220 220)

xdcad

  • Swamp Rat
  • Posts: 522
Yes, since arx interprets the list, it’s not reliable to hash.
Example  '(220 220 220 220)

'(220 220 220 220)
ARX's resbuf will be parsed into 3D points, but restype=220, not RT3DPOINT,

  So when you calculate the hash value, don’t you include the restype to make it unique?

Code - C++: [Select]
  1. size_t XdDbUtils::CalculateHash(const AcGePoint3d& point) {
  2.         size_t hash = 0;
  3.         hash ^= std::hash<double>{}(point.x) + 0x9e3779b9 + (hash << 6) + (hash >> 2);
  4.         hash ^= std::hash<double>{}(point.y) + 0x9e3779b9 + (hash << 6) + (hash >> 2);
  5.         hash ^= std::hash<double>{}(point.z) + 0x9e3779b9 + (hash << 6) + (hash >> 2);
  6.  
  7.         return hash;
  8. }

Code - C++: [Select]
  1. int XDGetHash()
  2. {
  3.         resbuf* rb = ads_getargs();
  4.         size_t hash = 0;
  5.         while(rb)
  6.         {
  7.                 int restype = XdRbUtils::dxfCodeToDataType(rb->restype);
  8.                 hash ^= XdDbUtils::CalculateHash(rb->restype);
  9.                 switch (restype)
  10.                 {
  11.                 case RTENAME:
  12.                 {
  13.                         AcDbObjectId id;
  14.                         if (acdbGetObjectId(id, rb->resval.rlname) == eOk)
  15.                         {
  16.                                 AcDbEntity* pEnt;
  17.                                 if (acdbOpenObject(pEnt, id, kForRead) == eOk)
  18.                                 {
  19.                                         int param = 511;
  20.                                         if (rb->rbnext && rb->rbnext->restype == RTSHORT)
  21.                                         {
  22.                                                 param = rb->rbnext->resval.rint;
  23.                                         }
  24.                                         hash ^= XdDbUtils::CalculateHash(pEnt, param);
  25.                                         pEnt->close();
  26.                                 }
  27.                         }
  28.                 }
  29.                         break;
  30.                 case RT3DPOINT:
  31.                 {
  32.                         hash ^= XdDbUtils::CalculateHash(asPnt3d(rb->resval.rpoint));
  33.                 }
  34.                         break;
  35.                 case RTSTR:
  36.                 {
  37.                         hash ^= XdDbUtils::CalculateHash(rb->resval.rstring);
  38.                 }
  39.                         break;
  40.                 case  RTREAL:
  41.                 {
  42.                         hash ^= XdDbUtils::CalculateHash(rb->resval.rreal);
  43.                 }
  44.                         break;
  45.                 case  RTSHORT:
  46.                 {
  47.                         hash ^= XdDbUtils::CalculateHash(rb->resval.rint);
  48.                 }
  49.                         break;
  50.                 case RTLONG:
  51.                 {
  52.                         hash ^= XdDbUtils::CalculateHash(rb->resval.rlong);
  53.                 }
  54.                         break;
  55.                 default:
  56.                         break;
  57.                 }              
  58.                 rb = rb->rbnext;
  59.         }
  60.         // &#23558;&#21704;&#24076;&#20540;&#36716;&#25442;&#20026;&#23383;&#31526;&#20018;
  61.         std::string hashString = std::to_string(hash);
  62.         ads_retstr(CMyString::StringToTCHAR(hashString));
  63.         return RSRSLT;
  64. }
« Last Edit: May 24, 2024, 03:10:49 AM by xdcad »
The code I wrote uses XDRX-API,which can be downloaded from github.com and is updated at any time.
===================================
https://github.com/xdcad
https://sourceforge.net/projects/xdrx-api-zip/
http://bbs.xdcad.net

It's Alive!

  • Retired
  • Needs a day job
  • Posts: 8858
  • AKA Daniel
No, I don’t think I hashed the restype, just the value, but the value you get back can also be modified

vlia_k_v (‘(10 10 10) ‘(10 10))
or
vlia_k_v ‘(10 10 10) ‘'(220 220 220 220)

so it’s not really possible to build a generic unordered_map for lisp,  because there is some probability of a ;LsAdsInvoke Internal Error, or worse, corrupt data


xdcad

  • Swamp Rat
  • Posts: 522
Yes, since arx interprets the list, it’s not reliable to hash.
Example  '(220 220 220 220)

Just tested it
'(220 220 220 220) This kind of data is illegal data. If it cannot be converted into resbuf, an error will be reported. This should be a BUG of AUTOCAD, and there are 221, 222, etc.

Command: (xdrx_entity_hashstring '(220 220 220 220))
error: LsAdsInvoke Internal Error

Command: (xdrx_entity_hashstring '(221 220 220 220))
error: LsAdsInvoke Internal Error

Command: (xdrx_entity_hashstring '(219 220 220 220))
"4355413184637992076"

Command: (xdrx_entity_hashstring '(218 220 220 220))
"6650467817808998717"

Command: (xdrx_entity_hashstring '(223 220 220 220))
error: LsAdsInvoke Internal Error

Command: (xdrx_entity_hashstring '(239 220 220 220))
error: LsAdsInvoke Internal Error

Command: (xdrx_entity_hashstring '(230 220 220 220))
error: LsAdsInvoke Internal Error

Command: (xdrx_entity_hashstring '(10 220 220 220))
"6350337662529562573"

Command: (xdrx_entity_hashstring '(215 220 220 220))
"13560894329457377856"

Code - C++: [Select]
  1. else if ((resType >= 210) && (resType <=239))
  2. {
  3.         return(RT3DPOINT);
  4. }
  5.  

Should be changed to:

Code - C++: [Select]
  1. else if ((resType >= 210) && (resType <=219))
  2. {
  3.         return(RT3DPOINT);
  4. }
  5.  
The code I wrote uses XDRX-API,which can be downloaded from github.com and is updated at any time.
===================================
https://github.com/xdcad
https://sourceforge.net/projects/xdrx-api-zip/
http://bbs.xdcad.net

xdcad

  • Swamp Rat
  • Posts: 522
No, I don’t think I hashed the restype, just the value, but the value you get back can also be modified

vlia_k_v (‘(10 10 10) ‘(10 10))
or
vlia_k_v ‘(10 10 10) ‘'(220 220 220 220)

so it’s not really possible to build a generic unordered_map for lisp,  because there is some probability of a ;LsAdsInvoke Internal Error, or worse, corrupt data

'(220.0 220.0 220.0) and  '(210 220.0 220.0 220.0)

What I said about adding restype to hash calculation is to address the above points.
Within ARX, they are all considered to be 3D points. If calculated directly, their hash values will be the same.

======

Command: (xdrx_entity_hashstring '(220.0 220.0 220.0))
"17095823065452309527"
Command: (xdrx_entity_hashstring '(210 220.0 220.0 220.0))
"6635585597989606581"
The code I wrote uses XDRX-API,which can be downloaded from github.com and is updated at any time.
===================================
https://github.com/xdcad
https://sourceforge.net/projects/xdrx-api-zip/
http://bbs.xdcad.net

It's Alive!

  • Retired
  • Needs a day job
  • Posts: 8858
  • AKA Daniel
My point is, you can’t use certain lists in the ‘value’ part of the map either, because what you put in, is not the same as what you get out

MapInsert ('(KEY) ‘(10 10)) may return ‘(10 10 0)
MapInsert ('(KEY) ‘(220 220 220 220))  LsAdsInvoke Internal Error

xdcad

  • Swamp Rat
  • Posts: 522
My point is, you can’t use certain lists in the ‘value’ part of the map either, because what you put in, is not the same as what you get out

MapInsert ('(KEY) ‘(10 10)) may return ‘(10 10 0)
MapInsert ('(KEY) ‘(220 220 220 220))  LsAdsInvoke Internal Error

‘(220 220 220 220) This kind, because it is a BUG, cannot be passed from LISP to ARX at all, let’s not discuss it first.
'(10 10), when passed to ARX, will become a 3D point of '(10.0 10.0 0.0),
Insert into MAP, so '(10 10) under LISP can also query the index position INDEX through '(10.0 10.0 0.0), and then return the index to LISP for processing'(10 10)
The code I wrote uses XDRX-API,which can be downloaded from github.com and is updated at any time.
===================================
https://github.com/xdcad
https://sourceforge.net/projects/xdrx-api-zip/
http://bbs.xdcad.net

It's Alive!

  • Retired
  • Needs a day job
  • Posts: 8858
  • AKA Daniel
Misunderstanding? Let’s say I have a key that’s a string, and a value that starts with a DXF code
(‘(“A”), ‘(11 11))
  Key       Value

Now I want to retrieve the value of ‘(“A”), I will get back (11.0 11.0 0.0). The value is wrong
I wanted two integers back, but I get back 3 doubles

Here’s another example of a valiant attempt to extend lisp
https://www.theswamp.org/index.php?topic=45841.msg618986#msg618986

It’s very unfortunate



kdub_nz

  • Mesozoic keyThumper
  • SuperMod
  • Water Moccasin
  • Posts: 2159
  • class keyThumper<T>:ILazy<T>
>>>>>
Here’s another example of a valiant attempt to extend lisp
https://www.theswamp.org/index.php?topic=45841.msg618986#msg618986

It’s very unfortunate




Yes, unfortunate is an understatement.  I'm baffled why AutoDesk can't provide a valid round-trip mechanism for values.

Really, really frustrating not being able to maintain the integrity of data.
Called Kerry in my other life
Retired; but they dragged me back in !

I live at UTC + 13.00

---
some people complain about loading the dishwasher.
Sometimes the question is more important than the answer.

xdcad

  • Swamp Rat
  • Posts: 522
Efficiency test, 510 entities, including 505 duplicate entities

Hash table efficiency far exceeds overkill

 

===============================

Command: tt
Select objects: Specify opposite corner: 510 found

Select objects:

===============================

Command: (tt1)

CPU:(1x)Intel(R) Xeon(R) E3-1575M v5 @ 3.00GHz 4Cores  / Memory:64G / OS: WIN10 professional workstation version
Benchmarking ......
Current settings: Tolerance=0.000001, Ignore=None, Optimize polylines=Yes, Combine partial overlap=Yes, Combine end-to-end=Yes
505 duplicate(s) deleted
0 overlapping object(s) or segment(s) deleted. done for 16 iterations. Sorted from fastest.
Statement                    Increment  Time(ms)  Normalize  Relative
-------------------------------------------------------------------------------
(xdrx-entity-removeduplic...)       16      1078         1078     28.07 <fastest>
(COMMAND ._-overkill SS  )          1       1891      30256       1.00 <slowest>
-------------------------------------------------------------------------------

Code - Auto/Visual Lisp: [Select]
  1. (defun c:tt ()
  2.   (if (setq ss (ssget))
  3.     (progn
  4.      (setq ss1 (xdrx-entity-copy ss))
  5.     )
  6.   )
  7.   (princ)
  8. )
  9. (defun tt1 ()
  10.   (xd::quickbench
  11.     '((command "._-overkill" ss "" "")
  12.       (xdrx-entity-removeduplicates ss1 129)
  13.      )
  14.   )
  15.   (princ)
  16. )
The code I wrote uses XDRX-API,which can be downloaded from github.com and is updated at any time.
===================================
https://github.com/xdcad
https://sourceforge.net/projects/xdrx-api-zip/
http://bbs.xdcad.net