Author Topic: Find duplicates (and do it quickly)  (Read 13138 times)

0 Members and 1 Guest are viewing this topic.

kaefer

  • Guest
Re: Find duplicates (and do it quickly)
« Reply #15 on: November 25, 2010, 05:05:00 PM »
The upper C# code is certainly not very good: it seems to run a little slower than the old Autodesk del-blk.lsp

Yes it is! Just do not provide overrides for equality. Would you believe that System.ValueType does already contain code for Equals(object obj) to the tune of...
Code: [Select]
       ...
        FieldInfo[] fields = type.GetFields(BindingFlags.NonPublic | BindingFlags.Public | BindingFlags.Instance);
        for (int i = 0; i < fields.Length; i++)
        {
            object obj3 = ((RtFieldInfo) fields[i]).InternalGetValue(a, false);
            object obj4 = ((RtFieldInfo) fields[i]).InternalGetValue(obj, false);
            ...

Quote
Commande: DELDUP_BLK

Building list of objects.
Deleting duplicate objects.
Deleted 10134 duplicated blocks have been erased in 1359 millseconds

He, power user! By my tests it's the slowest of all.
F#Seq.groupBy450 ms
F#Map<duplicate,unit>500 ms
C#LINQ .groupBy510 ms
C#Dictionary<DuplicatePattern,int>530 ms
C#List<DuplicatePattern>2,100 ms
LispC:DELDUP_BLK5,800 ms

The first two rely on comparison, Dictionary<_,_> on equality. Why should List<_> be so bad, and how would one go ahead to make C# more competitive?

Thanks to Jeff H for a simple yet attractive alternative idea, to xsfhlzh for the LINQ implementation and to Gile for his youthful zest.

All the best, Thorsten
« Last Edit: November 26, 2010, 01:46:47 AM by kaefer »

xsfhlzh

  • Guest
Re: Find duplicates (and do it quickly)
« Reply #16 on: November 26, 2010, 12:53:47 AM »

linq a bit like the F # code
11284 duplicated block(s) have been erased in 578.125 milliseconds
Code: [Select]
        struct DuplicatePattern
        {
            public string Layer;
            public double PositionX;
            public double PositionY;
            public double PositionZ;
            public double Rotation;
            public double ScaleX;
            public double ScaleY;
            public double ScaleZ;

            public DuplicatePattern(BlockReference bref)
            {
                Layer = bref.Layer;
                PositionX = Math.Round(bref.Position.X, 6);
                PositionY = Math.Round(bref.Position.Y, 6);
                PositionZ = Math.Round(bref.Position.Z, 6);
                Rotation = Math.Round(bref.Rotation, 6);
                ScaleX = Math.Round(bref.ScaleFactors.X, 6);
                ScaleY = Math.Round(bref.ScaleFactors.Y, 6);
                ScaleZ = Math.Round(bref.ScaleFactors.Z, 6);
            }
        }

        [CommandMethod("tt8")]
        public static void test28()
        {

            DateTime t0 = DateTime.Now;
            int count = 0;

            Document doc = Application.DocumentManager.MdiActiveDocument;
            Editor ed = doc.Editor;
            Database db = doc.Database;

            using (Transaction tr = db.TransactionManager.StartTransaction())
            {
                var brefids =
                    from ObjectId id in (BlockTable)tr.GetObject(db.BlockTableId, OpenMode.ForRead)
                    let btr = (BlockTableRecord)tr.GetObject(id, OpenMode.ForRead)
                    select btr.GetBlockReferenceIds(true, false);
                foreach (var idcol in brefids)
                {
                    var brefdict =
                        idcol.Cast<ObjectId>()
                        .Select(id => (BlockReference)tr.GetObject(id, OpenMode.ForRead))
                        .GroupBy(bref => new DuplicatePattern(bref));
                    foreach (var brefs in brefdict)
                    {
                        foreach (var bref in brefs.Skip(1))
                        {
                            bref.UpgradeOpen();
                            bref.Erase();
                            count++;
                        }
                    }
                }
                tr.Commit();
            }

            TimeSpan ts = DateTime.Now - t0;
            ed.WriteMessage(
                "{0} duplicated block(s) have been erased in {1} milliseconds",
                count, ts.TotalMilliseconds);
        }

gile

  • Gator
  • Posts: 2507
  • Marseille, France
Re: Find duplicates (and do it quickly)
« Reply #17 on: November 26, 2010, 01:51:03 AM »
Thank you kaefer for the "youthful zest" (I'm 52 but always happy to learn new things) and for the tricks you provide (even it's difficult for me to understand both English and programmation languages).

Thank you xsfhlzh for you Linq using code.

Here're new results after replacing List<DuplicatePattern> with Dictionary<DuplicatePattern, int> in the C# code I posted.

Quote
Commande: DELDUP_BLK

Building list of objects.
Deleting duplicate objects.
10134 duplicated block(s) have been erased in 1224 millseconds

Commande: TT8
10134 duplicated block(s) have been erased in 233.0134 milliseconds

Commande: CS_GILE
10134 duplicated block(s) have been erased in 275.0157 milliseconds

Commande: FS_GILE
10134 duplicated block(s) have been erased in 210.012 milliseconds
« Last Edit: November 26, 2010, 02:53:22 AM by gile »
Speaking English as a French Frog

gile

  • Gator
  • Posts: 2507
  • Marseille, France
Re: Find duplicates (and do it quickly)
« Reply #18 on: November 26, 2010, 03:36:43 AM »
Back to my 'origins'.
Here's a new LISP code

Code: [Select]
(defun delDups (/ n ss blk elst data orglst duplst)
  (if (setq n  -1
    ss (ssget "_X" '((0 . "INSERT")))
      )
    (while (setq blk (ssname ss (setq n (1+ n))))
      (setq elst (entget blk)
    data (list
   (cdr (assoc 330 elst))
   (cdr (assoc 2 elst))
   (mapcar '(lambda (x) (round x 0.000001)) (cdr (assoc 10 elst)))
   (cdr (assoc 8 elst))
   (round (cdr (assoc 50 elst)) 0.000001)
   (round (cdr (assoc 41 elst)) 0.000001)
   (round (cdr (assoc 42 elst)) 0.000001)
   (round (cdr (assoc 43 elst)) 0.000001)
)
      )
      (if (member data orglst)
(setq duplst (cons blk duplst))
(setq orglst (cons data orglst))
      )
    )
  )
  (length (mapcar 'entdel duplst))
)

(defun round (num prec)
  (if (zerop (setq prec (abs prec)))
    num
    (if (minusp num)
      (* prec (fix (- (/ num prec) 0.5)))
      (* prec (fix (+ (/ num prec) 0.5)))
    )
  )
)

(defun c:lsp_gile (/ t0 t1 cnt)
  (vl-load-com)
  (setq t0 (* 86400000. (getvar "tdusrtimer")))
  (vla-startUndoMark (vla-get-ActiveDocument (vlax-get-acad-object)))
  (setq cnt (delDups))
  (vla-endUndoMark (vla-get-ActiveDocument (vlax-get-acad-object)))
  (setq t1 (* 86400000. (getvar "tdusrtimer")))
  (princ (strcat
   "\n "
   (itoa cnt)
   " duplicated block(s) have been erased in "
   (rtos (- t1 t0))
   " millseconds"
)
  )
  (princ)
)

Quote
Commande: LSP_GILE

 10134 duplicated block(s) have been erased in 624 millseconds
Speaking English as a French Frog

kaefer

  • Guest
Re: Find duplicates (and do it quickly)
« Reply #19 on: November 26, 2010, 08:03:03 AM »
Update on LINQ (thx again to xsfhlzh, without whom I'd never had delved into the docs). Compared to the mixed mode like his example we could have coded instead in pure IEnumerable<T> instance method style, or alternatively with mostly pure Query Expressions instead.

Instance method example:
Code: [Select]
                var brefsToDelete =
                    ((BlockTable)tr.GetObject(db.BlockTableId, OpenMode.ForRead))
                    .Cast<ObjectId>()
                    .Select(id => (BlockTableRecord)tr.GetObject(id, OpenMode.ForRead))
                    .SelectMany(btr => btr.GetBlockReferenceIds(true, false).Cast<ObjectId>())
                    .Select(id => (BlockReference)tr.GetObject(id, OpenMode.ForRead))
                    .GroupBy(bref => new DuplicatePattern(bref))
                    .SelectMany(brefs => brefs.Skip(1));

Query Expression example
Code: [Select]
                var brefsToDelete =
                    from ObjectId id in (BlockTable)tr.GetObject(db.BlockTableId, OpenMode.ForRead)
                    let btr = (BlockTableRecord)tr.GetObject(id, OpenMode.ForRead)
                    select btr.GetBlockReferenceIds(true, false) into brefids
                    from ObjectId id in brefids
                    let bref = (BlockReference)tr.GetObject(id, OpenMode.ForRead)
                    group bref by new DuplicatePattern(bref) into brefDict
                    from brefs in brefDict.Skip(1)
                    select brefs;

The execution timings do not differ significantly, but the size of the dll does (Instance method < mixed < Query expression).

Have a nice weekend, Thorsten

It's Alive!

  • Retired
  • Needs a day job
  • Posts: 8698
  • AKA Daniel
Re: Find duplicates (and do it quickly)
« Reply #20 on: November 26, 2010, 08:58:49 AM »
Oh finally a moment to play..
changed the search pattern a bit to just

Code: [Select]
    struct DuplicatePattern
    {
      public ObjectId LayerId;
      public Point3d Posision;
      public double Rotation;
      public Scale3d Scale;
    }
...
C++ is a tad faster it seems



It's Alive!

  • Retired
  • Needs a day job
  • Posts: 8698
  • AKA Daniel
Re: Find duplicates (and do it quickly)
« Reply #21 on: November 26, 2010, 09:02:20 AM »
code

Code: [Select]
//command helper
  static Acad::ErrorStatus getRefIds(AcDbObjectIdArray &blockRefIds)
  {
    AcDbObjectId blockTableRecordId;
    AcDbDatabase *pDb = acdbHostApplicationServices()->workingDatabase();
    AcDbBlockTablePointer pBlockTable(pDb->blockTableId(),AcDb::kForRead);
    AcDbBlockTableIterator *pIter;
    pBlockTable->newIterator(pIter);
    for(pIter->start();!pIter->done();pIter->step())
    {
      pIter->getRecordId(blockTableRecordId);
      AcDbBlockTableRecordPointer pBlockTableRecord(blockTableRecordId,AcDb::kForRead);
      pBlockTableRecord->getBlockReferenceIds(blockRefIds);
    }
    delete pIter;
    return eOk;
  }

  //command
  static void NukeDupRefs_nukeit(void)
  {
    LARGE_INTEGER freq,start,end;   
    QueryPerformanceFrequency(&freq);
    QueryPerformanceCounter(&start);
    AcDbObjectIdArray blockRefIds;
    getRefIds(blockRefIds);
    std::vector<BlockData> vec;
    vec.reserve(blockRefIds.length());
    size_t cnt = 0;
    for(size_t idx = 0; idx < blockRefIds.length(); idx++)
    {
      AcDbBlockReference *pRef = NULL;
      if(acdbOpenAcDbEntity((AcDbEntity*&)pRef,blockRefIds[idx],AcDb::kForWrite) == Acad::eOk)
      {
        BlockData data(pRef);
        if(std::find(vec.begin(),vec.end(),data) == vec.end())
        {
          vec.push_back(data);
        }
        else
        {
          pRef->erase();
          cnt++;
        }
        pRef->close();
      }
    }
    QueryPerformanceCounter(&end);
    acutPrintf(_T("\nNuked %ld blocks in %f seconds: "),cnt,
      (double)(end.QuadPart-start.QuadPart)/freq.QuadPart );
  }

  class BlockData
  {
  private:
    AcDbObjectId m_layerId;
    AcGePoint3d m_position;
    double m_rotation;
    AcGeScale3d m_scale;
  public:
    BlockData(const AcDbBlockReference *pReference);
    bool operator==(const BlockData& rhs) const;
  };

  inline BlockData::BlockData(const AcDbBlockReference *pReference )
  {
    if(pReference)
    {
      m_layerId  = pReference->layerId();
      m_position = pReference->position();
      m_rotation = pReference->rotation();
      m_scale    = pReference->scaleFactors();
    }
  }

  inline bool BlockData::operator==( const BlockData& rhs )const
  {
    return m_layerId  == rhs.m_layerId  &&
      m_position.isEqualTo(rhs.m_position) &&
      m_rotation == rhs.m_rotation &&
      m_scale.isEqualTo(rhs.m_scale);
  }
« Last Edit: November 26, 2010, 09:20:43 AM by __assume »

gile

  • Gator
  • Posts: 2507
  • Marseille, France
Re: Find duplicates (and do it quickly)
« Reply #22 on: November 26, 2010, 09:12:45 AM »
Quote
C++ is a tad faster it seems

Certainly, but looks like chinese to my eyes... :oops:
Speaking English as a French Frog

It's Alive!

  • Retired
  • Needs a day job
  • Posts: 8698
  • AKA Daniel
Re: Find duplicates (and do it quickly)
« Reply #23 on: November 26, 2010, 09:21:25 AM »
Quote
C++ is a tad faster it seems

Certainly, but looks like chinese to my eyes... :oops:

F# looks like Martian to me  :laugh:

It's Alive!

  • Retired
  • Needs a day job
  • Posts: 8698
  • AKA Daniel
Re: Find duplicates (and do it quickly)
« Reply #24 on: November 26, 2010, 09:57:29 AM »
Gile,  have a look at this pattern

Code: [Select]
public class Commands
  {
    struct DuplicatePattern
    {
      public string Name;
      public string Layer;
      public ObjectId OwnerId;
      public double PositionX;
      public double PositionY;
      public double PositionZ;
      public double Rotation;
      public double ScaleX;
      public double ScaleY;
      public double ScaleZ;
    }

    static Predicate<DuplicatePattern> Eq(DuplicatePattern b)
    {
      return delegate(DuplicatePattern a)
      {
        return
          a.Name.Equals(b.Name) &&
          a.Layer.Equals(b.Layer) &&
          a.OwnerId.Equals(b.OwnerId) &&
          a.PositionX.Equals(b.PositionX) &&
          a.PositionY.Equals(b.PositionY) &&
          a.PositionZ.Equals(b.PositionZ) &&
          a.Rotation.Equals(b.Rotation) &&
          a.ScaleX.Equals(b.ScaleX) &&
          a.ScaleY.Equals(b.ScaleY) &&
          a.ScaleZ.Equals(b.ScaleZ);
      };
    }

    private int DeleteDuplicatedBlockRef()
    {
      Document doc = Application.DocumentManager.MdiActiveDocument;
      Editor ed = doc.Editor;
      Database db = doc.Database;
      int result = 0;
      SelectionFilter filter = new SelectionFilter(new TypedValue[1] { new TypedValue(0, "INSERT") });
      PromptSelectionResult psr = ed.SelectAll(filter);
      if (psr.Status != PromptStatus.OK)
        return 0;
      using (Transaction tr = db.TransactionManager.StartTransaction())
      {
        BlockTable bt = (BlockTable)tr.GetObject(db.BlockTableId, OpenMode.ForRead);
        List<DuplicatePattern> Patterns = new List<DuplicatePattern>();
        foreach (ObjectId id in psr.Value.GetObjectIds())
        {
          BlockReference br = (BlockReference)tr.GetObject(id, OpenMode.ForWrite, false);
          DuplicatePattern dp = new DuplicatePattern();
          dp.OwnerId = br.OwnerId;
          dp.Name = br.Name;
          dp.Layer = br.Layer;
          dp.PositionX = Math.Round(br.Position.X, 6);
          dp.PositionY = Math.Round(br.Position.Y, 6);
          dp.PositionZ = Math.Round(br.Position.Z, 6);
          dp.Rotation = Math.Round(br.Rotation, 6);
          dp.ScaleX = Math.Round(br.ScaleFactors.X, 6);
          dp.ScaleY = Math.Round(br.ScaleFactors.Y, 6);
          dp.ScaleZ = Math.Round(br.ScaleFactors.Z, 6);
          if (Patterns.FindIndex(Eq(dp)) > -1)
          {
            br.Erase();
            result++;
          }
          else
          {
            Patterns.Add(dp);
          }
        }
        tr.Commit();
      }
      return result;
    }

    [CommandMethod("Cs_gile")]
    public void DelDupBlkRef()
    {
      DateTime t0 = DateTime.Now;
      int del = DeleteDuplicatedBlockRef();
      DateTime t1 = DateTime.Now;
      TimeSpan ts = t1 - t0;
      Application.DocumentManager.MdiActiveDocument.Editor.WriteMessage(
          "{0} duplicated block(s) have been erased in {1} milliseconds",
          del, ts.TotalMilliseconds);
    }
  }

« Last Edit: November 26, 2010, 11:45:34 AM by __assume »

Glenn R

  • Guest
Re: Find duplicates (and do it quickly)
« Reply #25 on: November 26, 2010, 09:57:48 AM »

kaefer

  • Guest
Re: Find duplicates (and do it quickly)
« Reply #26 on: November 26, 2010, 05:19:37 PM »
Code: [Select]
    ...
    static Predicate<DuplicatePattern> Eq(DuplicatePattern b)
    {
        return delegate(DuplicatePattern a)
        {
            return
              a.Name.Equals(b.Name) &&
              a.Layer.Equals(b.Layer) &&
              a.OwnerId.Equals(b.OwnerId) &&
              a.PositionX.Equals(b.PositionX) &&
              a.PositionY.Equals(b.PositionY) &&
              a.PositionZ.Equals(b.PositionZ) &&
              a.Rotation.Equals(b.Rotation) &&
              a.ScaleX.Equals(b.ScaleX) &&
              a.ScaleY.Equals(b.ScaleY) &&
              a.ScaleZ.Equals(b.ScaleZ);
        };
    }
      ...
          if (Patterns.FindIndex(Eq(dp)) > -1)
          { ...

Very minor optimization potential detected, if only for readability. May shave yet another few milliseconds off.

Code: [Select]
                ...
                if (Patterns.Exists(b =>
                    dp.Name.Equals(b.Name) &&
                    dp.Layer.Equals(b.Layer) &&
                    dp.OwnerId.Equals(b.OwnerId) &&
                    dp.PositionX.Equals(b.PositionX) &&
                    dp.PositionY.Equals(b.PositionY) &&
                    dp.PositionZ.Equals(b.PositionZ) &&
                    dp.Rotation.Equals(b.Rotation) &&
                    dp.ScaleFactorsX.Equals(b.ScaleFactorsX) &&
                    dp.ScaleFactorsY.Equals(b.ScaleFactorsY) &&
                    dp.ScaleFactorsZ.Equals(b.ScaleFactorsZ)))
                { ...

Or does the lambda look like Greek for anybody?

Glenn R

  • Guest
Re: Find duplicates (and do it quickly)
« Reply #27 on: November 26, 2010, 06:38:40 PM »
What lambda?

xsfhlzh

  • Guest
Re: Find duplicates (and do it quickly)
« Reply #28 on: November 26, 2010, 09:20:56 PM »


Code: [Select]
            using (Transaction tr = db.TransactionManager.StartTransaction())
            {

                Func<BlockTableRecord, IEnumerable<BlockReference>> depfunc =
                    btr =>
                        btr.GetBlockReferenceIds(true, false).Cast<ObjectId>()
                        .Select(id => (BlockReference)tr.GetObject(id, OpenMode.ForRead))
                        .GroupBy(bref => new { bref.Layer, bref.Position, bref.Rotation, bref.ScaleFactors })
                        .SelectMany(brefgroup => brefgroup.Skip(1));

                var brefs =
                    ((BlockTable)tr.GetObject(db.BlockTableId, OpenMode.ForRead)).Cast<ObjectId>()
                    .Select(id => (BlockTableRecord)tr.GetObject(id, OpenMode.ForRead))
                    .SelectMany(depfunc);

                foreach (var bref in brefs)
                {
                    bref.UpgradeOpen();
                    bref.Erase();
                    count++;
                }

                tr.Commit();
            }

Kerry

  • Mesozoic relic
  • Seagull
  • Posts: 11654
  • class keyThumper<T>:ILazy<T>
Re: Find duplicates (and do it quickly)
« Reply #29 on: November 26, 2010, 09:27:51 PM »
F# looks like Martian to me  :laugh:

++

let MyWeirdness = TheirWeirdness + 1

though, I have been trying :)
and at least I'm not being declared Dim on every second line  :|
kdub, kdub_nz in other timelines.
Perfection is not optional.
Everything will work just as you expect it to, unless your expectations are incorrect.
Discipline: None at all.