Author Topic: optimization for a grid/coords duplicate combiner  (Read 4230 times)

0 Members and 1 Guest are viewing this topic.

CADDOG

  • Newt
  • Posts: 82
  • wishbonesr
optimization for a grid/coords duplicate combiner
« on: December 12, 2012, 09:14:53 PM »
That's a mouthfull.

My brain is just not working today.  Normally I write, vb style, and then work backwards, because that's how lisp is to me... backwards. hehe.
But today, I just can't see the "hook".  I run it through in my mind, and each chess move fails before I can even begin typing.
Sample data will average a couple hundred entries, no more than a few thousand.
Code: [Select]
(setq a '(((2 3) "2-3")
  ((1 0) "1-0")
  ((2 0) "2-0")
  ((1 3) "1-3")
  ((2 2) "2-2")
  ((2 3) "2-3")
  ((1 2) "1-2")
  ((0 0) "0-0")
  ((0 1) "0-1")
  ((1 1) "1-1")
  ((0 2) "0-2")
  ((2 1) "2-1")
  )

(merge-grids a)
;_1_$(((2 1) "2-1") ((0 2) "0-2") ((2 0) "2-0") ((1 0) "1-0\n1-0") ((2 3) "2-3\n2-3"))
The text/string value is irrelevant, and may include much more data in the future (eg. enames, tmatrix, vla ref's).  However the text value will always be last

The code's intent is to find duplicate coordinates and merge what text it finds with a new line.   Duplicate coords will be rare (1 in 50~100).

Code: [Select]
(defun merge-grids (indexed_lst
    /
    return item duplicates
    )
  (while indexed_lst
   (setq item (car indexed_lst)
indexed_lst (cdr indexed_lst))
    (if (setq duplicates (vl-remove-if-not (function (lambda (x) (equal (car x) (car item)))) indexed_lst))
      (progn
(setq indexed_lst (vl-remove-if (function (lambda (x) (equal (car x) (car item)))) indexed_lst))
(setq item (list (car item) (vl:list->string "\n" (cons (last item) (mapcar 'last duplicates)))))
);progn
      );if dudplicates
    (setq return (cons item return))
    );while list
  );defun merge-grids



  (defun vl:list->string (delim ls / out)
    (setq out (apply 'strcat (mapcar (function (lambda (x) (strcat x delim))) ls)))
    (if out (vl-string-right-trim delim out))
    );defun list->string

CAB

  • Global Moderator
  • Seagull
  • Posts: 10401
Re: optimization for a grid/coords duplicate combiner
« Reply #1 on: December 12, 2012, 09:55:03 PM »
Maybe this? not much testing.
Code: [Select]
  (defun unique (lst / result itm doup)
    (while (setq itm (car lst))
     (setq lst (cdr lst))
     (while (setq doup (assoc (car itm) lst))
        (setq lst (vl-remove doup lst)
              itm (list (car itm) (strcat (cadr itm) "\n" (cadr doup)))
        )
      )
      (setq result (if result (cons itm result) (list itm)))
    )
    result
  )

(setq a '(((2 3) "2-3") ; doup
  ((1 0) "1-0")
  ((2 0) "2-0")
  ((1 1) "1-1") ; doup
  ((1 3) "1-3")
  ((2 2) "2-2")
  ((2 3) "2-3") ; <<
  ((1 2) "1-2")
  ((0 0) "0-0")
  ((0 1) "0-1")
  ((1 1) "1-1") ; <<
  ((0 2) "0-2")
  ((2 1) "2-1")
  ))

(mapcar 'print (unique a))
« Last Edit: December 13, 2012, 10:05:13 AM by CAB »
I've reached the age where the happy hour is a nap. (°¿°)
Windows 10 core i7 4790k 4Ghz 32GB GTX 970
Please support this web site.

CADDOG

  • Newt
  • Posts: 82
  • wishbonesr
Re: optimization for a grid/coords duplicate combiner
« Reply #2 on: December 12, 2012, 10:56:36 PM »
Super... 3x better. 
Larger data sets hit 12x better.

I get isolated into a recently learned style, and forget the basics.

Thanks CAB.

CAB

  • Global Moderator
  • Seagull
  • Posts: 10401
Re: optimization for a grid/coords duplicate combiner
« Reply #3 on: December 12, 2012, 11:35:45 PM »
Glad to help.  8-)
I've reached the age where the happy hour is a nap. (°¿°)
Windows 10 core i7 4790k 4Ghz 32GB GTX 970
Please support this web site.

irneb

  • Water Moccasin
  • Posts: 1794
  • ACad R9-2016, Revit Arch 6-2016
Re: optimization for a grid/coords duplicate combiner
« Reply #4 on: December 13, 2012, 01:24:16 AM »
CAB: Just a slight thing I'm a bit worried about:
Quote
(defun unique (lst / result itm lst ...

Anyhow, here's my take on this:
Code - Auto/Visual Lisp: [Select]
  1. (defun merge-grids  (input / result found)
  2.   (foreach item  input
  3.     (setq result (if (setq found (assoc (car item) result))
  4.                (subst (list (car item) (strcat (cadr found) "\n" (cadr item))) found result)
  5.                (cons item result))))
  6.   (reverse result))
Common sense - the curse in disguise. Because if you have it, you have to live with those that don't.

Lee Mac

  • Seagull
  • Posts: 12915
  • London, England
Re: optimization for a grid/coords duplicate combiner
« Reply #5 on: December 13, 2012, 08:35:33 AM »
Minor optimisation of Irneb's code:

Code - Auto/Visual Lisp: [Select]
  1. (defun merge-grids2 ( lst / ass rtn )
  2.     (foreach itm lst
  3.         (if (setq ass (assoc (car itm) rtn))
  4.             (if (wcmatch (cadr ass) "~*\n*")
  5.                 (setq rtn (subst (list (car itm) (strcat (cadr ass) "\n" (cadr itm))) ass rtn))
  6.             )
  7.             (setq rtn (cons itm rtn))
  8.         )
  9.     )
  10.     (reverse rtn)
  11. )

CAB

  • Global Moderator
  • Seagull
  • Posts: 10401
Re: optimization for a grid/coords duplicate combiner
« Reply #6 on: December 13, 2012, 10:07:01 AM »
Thanks inerb I fixed the error in my code. I was in too big of a hurry.
I've reached the age where the happy hour is a nap. (°¿°)
Windows 10 core i7 4790k 4Ghz 32GB GTX 970
Please support this web site.

CADDOG

  • Newt
  • Posts: 82
  • wishbonesr
Re: optimization for a grid/coords duplicate combiner
« Reply #7 on: December 13, 2012, 01:47:04 PM »
CAB's still holds the cup for my data set (500~1500 entries w/ 5% duplicate coords) by a significant amount, although more verbose.
I removed the benchmarks restriction on data set depth.  Did this two different ways, tested seperately.
    One was to simply remove the list length restriction. 
    Second(separate method) was to supply the testing function as a (mapcar '<mergefunction> <nesteddatalist>)
           Nested data was 1 list of 60 duplicate lists. Each nested list was a lot of 25 with two of the items being duplicates.


Lee's tweak improved inerb's decently, however introduces a compatability issue...
My output is to two optional destinations; Excel and tab delimited.  When output to plain text, the delimiter is not a tab, rather a space, so testing for a space would present problems with text items that contains spaces.

Side note: Compiling actually degraded the performance over all.  Didn't expect that one.
      When running a compiled test, should the benchmark program be compiled as well?

Lee Mac

  • Seagull
  • Posts: 12915
  • London, England
Re: optimization for a grid/coords duplicate combiner
« Reply #8 on: December 13, 2012, 02:15:38 PM »
CAB's method could also be optimised slightly:

Code - Auto/Visual Lisp: [Select]
  1. (defun unique ( lst / dup itm rtn )
  2.     (while (setq itm (car lst))
  3.         (setq lst (cdr lst))
  4.         (if (setq dup (assoc (car itm) lst))
  5.             (setq lst (vl-remove dup lst)
  6.                   itm (list (car itm) (strcat (cadr itm) "\n" (cadr dup)))
  7.             )
  8.         )
  9.         (setq rtn (cons itm rtn))
  10.     )
  11.     rtn
  12. )

(Note that results will be in reverse)

CAB

  • Global Moderator
  • Seagull
  • Posts: 10401
Re: optimization for a grid/coords duplicate combiner
« Reply #9 on: December 13, 2012, 03:05:17 PM »
Noticed a problem if there were more than two matches in the list the vl-remove would remove ALL matching items.
My fix:
Code: [Select]
  (defun unique (lst / result itm doup)
    (while (setq itm (car lst))
     (setq lst (cdr lst))
     (cond
       ((null result)(setq result (list itm)))
       ((setq found (assoc (car itm) result))
        (setq result (subst (list (car itm) (strcat (cadr found) "\n" (cadr itm))) found result)))
       ((setq result (cons itm result)))
     )
    )
    result
  )

Code: [Select]
(setq a '(((2 3) "2-3") ; doup
  ((1 0) "1-0")
  ((2 0) "2-0")
  ((1 1) "1-1") ; doup
  ((1 3) "1-3")
  ((2 2) "2-2")
  ((2 3) "2-3") ; <<
  ((1 2) "1-2")
  ((0 0) "0-0")
  ((0 1) "0-1")
  ((1 1) "1-1") ; <<
  ((0 2) "0-2")
  ((2 1) "2-1")
  ((2 3) "2-3") ; <---<<<
  ))

(defun c:test()
  (mapcar 'print (unique a))
  (princ)
  )
I've reached the age where the happy hour is a nap. (°¿°)
Windows 10 core i7 4790k 4Ghz 32GB GTX 970
Please support this web site.

CADDOG

  • Newt
  • Posts: 82
  • wishbonesr
Re: optimization for a grid/coords duplicate combiner
« Reply #10 on: December 13, 2012, 04:37:18 PM »
Noticed a problem if there were more than two matches in the list the vl-remove would remove ALL matching items.

Good catch -> as everybody's does as well (except for original post).
Downside is that now CAB's is as efficient as original  :|

< edit >
OOPS.  Correction.  inerb's works with 2+ duplicate coords.
« Last Edit: December 13, 2012, 06:32:40 PM by CADDOG »

Lee Mac

  • Seagull
  • Posts: 12915
  • London, England
Re: optimization for a grid/coords duplicate combiner
« Reply #11 on: December 13, 2012, 06:05:00 PM »
Noticed a problem if there were more than two matches in the list the vl-remove would remove ALL matching items.

Good catch -> as everybody's does as well (except for original post).

As far as I can see, Irneb's code should handle multiple duplicate items correctly.

Evidently I hadn't completely understood your intentions for this function, as I purposely modified Irneb's function to skip additional duplicates to improve efficiency...

CADDOG

  • Newt
  • Posts: 82
  • wishbonesr
Re: optimization for a grid/coords duplicate combiner
« Reply #12 on: December 13, 2012, 06:30:33 PM »
As far as I can see, Irneb's code should handle multiple duplicate items correctly.

Your right!  I'm now up to 15 variations, and I mis-labelled his (most of mine don't work and are plays on what you guys submitted - is why i didn't post them as well).

So those that work...
CAB:unique:v2
CD:merge-grids
IN:merge-grids

I didn't expect this...
Code: [Select]
Elapsed milliseconds / relative speed for 128 iteration(s):
    (CD:MERGE-GRIDS A).....17940 / 1.57 <fastest>
    (IN:MERGE-GRIDS A).....27347 / 1.03
    (CAB:UNIQUE:V2 A)......28158 / 1.00 <slowest>

I was really looking forward to the 12x increase by cab's original

Nesting the vl:list->string function in the cd:merge-grids function is what made the, almost insignificant, difference.
« Last Edit: December 13, 2012, 06:37:08 PM by CADDOG »

Lee Mac

  • Seagull
  • Posts: 12915
  • London, England
Re: optimization for a grid/coords duplicate combiner
« Reply #13 on: December 13, 2012, 06:57:53 PM »
Another variation:

Code - Auto/Visual Lisp: [Select]
  1. (defun LM:group ( l / a r x )
  2.     (while (setq x (car l) a (car x))
  3.         (setq l
  4.             (vl-remove-if
  5.                 (function
  6.                     (lambda ( b )
  7.                         (if (equal a (car b))
  8.                             (setq x (list a (strcat (cadr x) "\n" (cadr b))))
  9.                         )
  10.                     )
  11.                 )
  12.                 (cdr l)
  13.             )
  14.             r (cons x r)
  15.         )
  16.     )
  17.     (reverse r)
  18. )

CADDOG

  • Newt
  • Posts: 82
  • wishbonesr
Re: optimization for a grid/coords duplicate combiner
« Reply #14 on: December 13, 2012, 08:37:39 PM »
Another variation: LM:group

It looked very promising, and much simpler at first, then I was scratching my head for a moment tracking the assignments - very smart.

It's better than inerb, but only slightly (0.09 better).  I can only guess that it's the 62 hits to the IF in the lambda, using CAB's last data sample.  Using live data, the number of if hits is much greater.

And to throw a monkey wrench in the whole thing, not even I honored my original requirement that other data may be present and text will always be last.  Should have supplied a better data set.  The item used for the "primary" data will not matter, as it will most likely be format information for a grid, so the extra "other data" will be lost from all but one duplicate.

It just throws me that this is still working best, even with the extra test for middle data.

Code: [Select]
  (defun CD:merge-grids2 (indexed_lst / return item duplicates middledata vl:list->string)
    (defun vl:list->string (delim ls / out)
      (setq out (apply 'strcat (mapcar (function (lambda (x) (strcat x delim))) ls)))
      (if out (vl-string-right-trim delim out))
      );defun list->string
   
    (while indexed_lst
      (setq item (car indexed_lst)
    indexed_lst (cdr indexed_lst))
      (if (setq duplicates (vl-remove-if-not (function (lambda (x) (equal (car x) (car item)))) indexed_lst))
(progn
  (setq indexed_lst (vl-remove-if (function (lambda (x) (equal (car x) (car item)))) indexed_lst))
  (if (setq middledata (reverse (cdr (reverse (cdr item)))))
    (setq item (append (list (car item)) middledata (list (vl:list->string "\n" (cons (last item) (mapcar 'last duplicates))))))
    (setq item (list (car item) (vl:list->string "\n" (cons (last item) (mapcar 'last duplicates)))))
    );if
  );progn
);if dudplicates
      (setq return (cons item return))
      );while list
    );defun merge-grids

Future data expansion:
Code: [Select]
(setq a '(((2 3) "other" "data" "2-3-0") ; doup
  ((1 0) "other" "data" "1-0")
  ((2 0) "other" "data" "2-0")
  ((1 1) "other" "data" "1-1") ; doup
  ((1 3) "other" "data" "1-3")
  ((2 2) "other" "data" "2-2")
  ((2 3) "other" "data" "2-3-1") ; <<
  ((1 2) "other" "data" "1-2")
  ((0 0) "other" "data" "0-0")
  ((0 1) "other" "data" "0-1")
  ((1 1) "other" "data" "1-1") ; <<
  ((0 2) "other" "data" "0-2")
  ((2 1) "other" "data" "2-1")
  ((2 3) "other" "data" "2-3-2") ; <---<<<
  ((2 3) "other" "data" "2-3-3") ; <---<<<<
  ))

Thanks for the help guys.