Author Topic: optimization for a grid/coords duplicate combiner  (Read 4231 times)

0 Members and 1 Guest are viewing this topic.

CAB

  • Global Moderator
  • Seagull
  • Posts: 10401
Re: optimization for a grid/coords duplicate combiner
« Reply #15 on: December 13, 2012, 11:07:53 PM »
I believe that if the records do not match exactly then this code will work:
Code: [Select]
  (defun unique (lst / result itm doup)
    (while (setq itm (car lst))
     (setq lst (cdr lst))
     (while (setq doup (assoc (car itm) lst))
        (setq lst (vl-remove doup lst)
              itm (reverse (cons (strcat (last itm) "\n" (last doup)) (cdr (reverse itm))))
        )
      )
      (setq result (if result (cons itm result) (list itm)))
    )
    result
  )
Code: [Select]
(setq a '(((2 3) "other" "data" "2-3-0") ; doup
  ((1 0) "other" "data" "1-0")
  ((2 0) "other" "data" "2-0")
  ((1 1) "other" "data" "1-1") ; doup
  ((1 3) "other" "data" "1-3")
  ((2 2) "other" "data" "2-2")
  ((2 3) "other" "data" "2-3-1") ; <<
  ((1 2) "other" "data" "1-2")
  ((0 0) "other" "data" "0-0")
  ((0 1) "other" "data" "0-1")
  ((1 1) "other" "data" "1-1") ; <<  if there are more "1-1"s then they will be lost unless the "other" OR "data" are different
  ((0 2) "other" "data" "0-2")
  ((2 1) "other" "data" "2-1")
  ((2 3) "other" "data" "2-3-2") ; <---<<<
  ((2 3) "other" "data" "2-3-3") ; <---<<<<
  ))

(defun c:test()
  (mapcar 'print (unique a))
  (princ)
  )

Note that the results save the "other" & "data" from the first record in the series.
I've reached the age where the happy hour is a nap. (°¿°)
Windows 10 core i7 4790k 4Ghz 32GB GTX 970
Please support this web site.

Lee Mac

  • Seagull
  • Posts: 12915
  • London, England
Re: optimization for a grid/coords duplicate combiner
« Reply #16 on: December 14, 2012, 07:37:15 AM »
It's better than inerb, but only slightly (0.09 better).  I can only guess that it's the 62 hits to the IF in the lambda, using CAB's last data sample.  Using live data, the number of if hits is much greater.

It just throws me that this is still working best, even with the extra test for middle data.

:? In my tests:

Code - Auto/Visual Lisp: [Select]
  1. _$ (setq a
  2.    '(
  3.         ((2 3) "2-3") ; +
  4.         ((1 0) "1-0")
  5.         ((2 0) "2-0")
  6.         ((1 1) "1-1") ; *
  7.         ((1 3) "1-3")
  8.         ((2 2) "2-2")
  9.         ((2 3) "2-3") ; ++
  10.         ((1 2) "1-2")
  11.         ((0 0) "0-0")
  12.         ((0 1) "0-1")
  13.         ((1 1) "1-1") ; **
  14.         ((0 2) "0-2")
  15.         ((2 1) "2-1")
  16.         ((2 3) "2-3") ; +++
  17.     )
  18. )

Testing function validity:
Code - Auto/Visual Lisp: [Select]
  1. _$ (equal (LM:Group a) (reverse (CD:merge-grids a)))
  2. T
  3. _$ (equal (LM:Group a) (reverse (CAB:unique a)))
  4. T
  5. _$ (equal (LM:Group a) (IB:merge-grids a))
  6. T
(Note that CD:merge-grids & CAB:unique both return reversed data).

Benchmark with small data set:
Code - Auto/Visual Lisp: [Select]
  1. _$ (length a)
  2. 14
  3. _$ (benchmark '((LM:Group a) (CD:merge-grids a) (CAB:unique a) (IB:merge-grids a)))
  4. Benchmarking .................Elapsed milliseconds / relative speed for 16384 iteration(s):
  5.  
  6.     (IB:MERGE-GRIDS A).....1201 / 3.00 <fastest>
  7.     (CAB:UNIQUE A).........1216 / 2.96
  8.     (LM:GROUP A)...........2387 / 1.51
  9.     (CD:MERGE-GRIDS A).....3604 / 1.00 <slowest>

Benchmark with large data set with many duplicates:
Code - Auto/Visual Lisp: [Select]
  1. _$ (repeat 5 (setq a (append a a)))
  2. _$ (length a)
  3. 448
  4.  
  5. _$ (benchmark '((LM:Group a) (CD:merge-grids a) (CAB:unique a) (IB:merge-grids a)))
  6. Benchmarking ............Elapsed milliseconds / relative speed for 512 iteration(s):
  7.  
  8.     (LM:GROUP A)...........1669 / 1.76 <fastest>
  9.     (IB:MERGE-GRIDS A).....1825 / 1.61
  10.     (CAB:UNIQUE A).........1934 / 1.52
  11.     (CD:MERGE-GRIDS A).....2933 / 1.00 <slowest>


CADDOG

  • Newt
  • Posts: 82
  • wishbonesr
Re: optimization for a grid/coords duplicate combiner
« Reply #17 on: December 14, 2012, 01:01:18 PM »
Yeah...  Once again.  You're right.  Rebooted, and now my results are completely different (matches yours).  Frustrating.

So a few things needed to be changed.  CAB's last structure rebuild captures the inner goodies of a potential data set expansion.

Code: [Select]
(setq itm (reverse (cons (strcat (last itm) "\n" (last doup)) (cdr (reverse itm))))
This can be replicated on each of the others (except for mine :) ).  And Inerb's and Lee's catch the, really rare duplicate text scenario.
See each modification (final reverse calls have been removed - not needed).

Code: [Select]
(defun IN:merge-grids (input / result found)
  (foreach item  input
    (setq result (if (setq found (assoc (car item) result))
   ;(subst (list (car item) (strcat (cadr found) "\n" (cadr item))) found result)
   (subst (reverse (cons (strcat (last item) "\n" (last found)) (cdr (reverse item)))) found result)
   (cons item result)
   );if
  );setq
    );foreach
   result
  )

  (defun LM:group ( l / a r x )
    (while (setq x (car l) a (car x))
      (setq l (vl-remove-if (function (lambda ( b )
(if (equal a (car b))
  ;(setq x (list a (strcat (cadr x) "\n" (cadr b))))
  (setq x (reverse (cons (strcat (last x) "\n" (last b)) (cdr (reverse x))))))))
(cdr l))
    r (cons x r));setq
      );while
    r
    );defun LM:group

  (defun CAB:unique:v3 (lst / result itm doup)
    (while (setq itm (car lst))
     (setq lst (cdr lst))
     (while (setq doup (assoc (car itm) lst))
        (setq lst (vl-remove doup lst)
              itm (reverse (cons (strcat (last itm) "\n" (last doup)) (cdr (reverse itm))))
        )
      )
      (setq result (if result (cons itm result) (list itm)))
    )
    result
  )

Data set is CAB's last post, and data building resembles Lee's repeat.
Was going to post my results, but wanted to see Lee's results first (cuz I tire of being wrong) hahaha.
But really, this isn't the first time my results waned compared to others.  Something I've been trying to figure out.

 < edit >
Added attachment - maybe someone can spot why my results go one way then another.  Benchmark is the benchmark program found on this site.
« Last Edit: December 14, 2012, 01:07:13 PM by CADDOG »

CAB

  • Global Moderator
  • Seagull
  • Posts: 10401
Re: optimization for a grid/coords duplicate combiner
« Reply #18 on: December 14, 2012, 01:34:13 PM »
Quote
And Inerb's and Lee's catch the, really rare duplicate text scenario.
Note that if the data & other are not exactly the same then duplicate text in the last position will not be lost.
Example, these four are combined and none are lost:
((2 3) "other" "data" "2-3-0")
((2 3) "other" "data2" "2-3-0")
((2 3) "other" "data3" "2-3-0")
((2 3) "other" "data4" "2-3-0")
yield
((2 3) "other" "data" "2-3-0\n2-3-0\n2-3-0\n2-3-0")

But data2 3 & 4 are lost
I've reached the age where the happy hour is a nap. (°¿°)
Windows 10 core i7 4790k 4Ghz 32GB GTX 970
Please support this web site.

CADDOG

  • Newt
  • Posts: 82
  • wishbonesr
Re: optimization for a grid/coords duplicate combiner
« Reply #19 on: December 14, 2012, 01:50:45 PM »
Note that if the data & other are not exactly the same then duplicate text in the last position will not be lost.

Gotcha.  The potential is there, but more likely because a user messed up, than done on purpose.  Since there's no way for the program to know, it can't be lost, forcing the blame on the user and not the program eliminating data.

I'm starting to see why my tests vary.  When going for those large, probably never to be seen datasets, cd:merge-grids started climbing the ranks.

OK..  I'll show it.
Code: [Select]
Testing with a list length of 480
Elapsed milliseconds / relative speed for 16384 iteration(s):
    (CAB:UNIQUE:V3 A)........18906 / 7.76 <fastest>
    (LM:GROUP A).............75734 / 1.94
    (IN:MERGE-GRIDS A).......99844 / 1.47
    (CD:MERGE-GRIDS2 A).....146704 / 1.00 <slowest>
Testing with a list length of 1920
Elapsed milliseconds / relative speed for 2048 iteration(s):
    (CAB:UNIQUE:V3 A).......20265 / 4.29 <fastest>
    (LM:GROUP A)............62203 / 1.40
    (IN:MERGE-GRIDS A)......81797 / 1.06
    (CD:MERGE-GRIDS2 A).....87016 / 1.00 <slowest>
Testing with an unreasonable amount of data
 List length of 7680
Elapsed milliseconds / relative speed for 512 iteration(s):
    (CAB:UNIQUE:V3 A).......15203 / 5.43 <fastest>
    (CD:MERGE-GRIDS2 A).....57875 / 1.43
    (LM:GROUP A)............60313 / 1.37
    (IN:MERGE-GRIDS A)......82578 / 1.00 <slowest>

Lee Mac

  • Seagull
  • Posts: 12915
  • London, England
Re: optimization for a grid/coords duplicate combiner
« Reply #20 on: December 14, 2012, 02:07:00 PM »
But surely we are comparing apples with oranges here, since CAB's code will remove multiple duplicates, e.g. consider:

Code - Auto/Visual Lisp: [Select]
  1. _$ (CAB:unique:v3 '(((1 0) "a" "1-0") ((1 0) "a" "1-0") ((1 0) "a" "1-0") ((2 0) "b" "2-0")))
  2. (
  3.     ((2 0) "b" "2-0")
  4.     ((1 0) "a" "1-0\n1-0")
  5. )
  6.  
  7. _$ (LM:group '(((1 0) "a" "1-0") ((1 0) "a" "1-0") ((1 0) "a" "1-0") ((2 0) "b" "2-0")))
  8. (
  9.     ((2 0) "b" "2-0")
  10.     ((1 0) "a" "1-0\n1-0\n1-0")
  11. )

Without the overhead required to allow for such cases, the function will inevitably be much faster.

CADDOG

  • Newt
  • Posts: 82
  • wishbonesr
Re: optimization for a grid/coords duplicate combiner
« Reply #21 on: December 14, 2012, 04:48:30 PM »
But surely we are comparing apples with oranges here, since CAB's code will remove multiple duplicates, e.g. consider:

Yes. Didn't mean to imply CAB's as my final use solution.  CAB's is not currently a viable solution, because pure duplicates will be lost completely, and I want the text, just not the "extra" data on the duplicate coords.
Was keeping it an option in case it can be tweaked to not loose pure duplicate's text, because that 7 fold is lucrative.
And because I subconsciously probably wanted to give some credit for his piece that fit nicely into the other's to bring them up to par.   :kewl:

As far as the "extra" data loss on duplicates - is a desirable affect.  The order of the supplied live list is already determined by another function, and the first one in a set will always hold the defining "extra" data for other duplicate coords.  I intend to use that field with formatting options eventually.

In the run-off, though Lee's looks like the one I'll be using / already using.  I don't forsee ever having data exceeding 1500, but even then the breaking point for mine to become more efficient is extremely large (5000+), never will use, lists.

In the grand scheme of things, we're talking a 12 milisecond spread for live data between the worst (mine) and the best (LM).  And it was worth the ride to get here.

« Last Edit: December 14, 2012, 04:51:38 PM by CADDOG »