TheSwamp

Code Red => AutoLISP (Vanilla / Visual) => Topic started by: Jeremy on November 27, 2011, 04:44:43 PM

Title: Breaking up code - how?
Post by: Jeremy on November 27, 2011, 04:44:43 PM
I suppose I should have posted this as a challenge but it is a problem that I am trying to solve but have only been partially successful at. I would like to be able to read a line of AutoLISP code and break it up into its syntactic units. I would like to feed a string of code and get back a list of two elements, the first element is a list of the actual code broken up into units and the second element is the code commenting (if any). In the list of syntactic units I would like each string in its own list so that I can identify them separate from the other pieces. Some examples, would probably be clearer:

"(+ 2 3)" -> (list (list "(" "+" "2" "3" ")") nil)

"(abs 4);absolute value" -> (list (list "(" "abs" "4" ")") ";absolute value")

"(strcase (strcat "cat" "dog"))"
   -> (list (list "(" "strcase" "(" "strcat" ("cat")("dog") "))") nil)

Note that parentheses are to be pulled away from the functions and elements that they enclose. I'm finding all this tricky because it is not immediately clear to me how one can properly recognize both quotes and semicolons in their proper context. Separating comments seems simple until you start wondering how best to go about it. Simply finding the first semicolon is not enough because you must consider whether it is between quotes or not. I'm hoping some of you gurus might have already tackled this problem and might want to share your wisdom on it.
Title: Re: Breaking up code - how?
Post by: Lee Mac on November 27, 2011, 05:31:49 PM
Following the logic of the other examples, should this:

Quote
Code: [Select]
"(strcase (strcat "cat" "dog"))"
   -> (list (list "(" "strcase" "(" "strcat" ("cat")("dog") "))") nil)

not be:

Code: [Select]
"(strcase (strcat "cat" "dog"))"
-> (list (list "(" "strcase" (list  "(" "strcat" "cat" "dog" ")") ")") nil)

 :?

[ FYI (http://www.theswamp.org/index.php?topic=4429.0) ]

Title: Re: Breaking up code - how?
Post by: MP on November 27, 2011, 06:04:25 PM
Maybe the algorithm used in the prinh (hierarchical print) function defined here (http://www.theswamp.org/index.php?topic=10398.msg132332#msg132332) can give you some ideas:

Example:
Code: [Select]
(prinh '(strcase (strcat "cat" "dog")))
Result:
Code: [Select]
(
    STRCASE
    (
        STRCAT
        "cat"
        "dog"
    )
)

*shrug*
Title: Re: Breaking up code - how?
Post by: ElpanovEvgeniy on November 28, 2011, 02:50:39 AM
My old program, it is a full analysis of code into components.
Perhaps this code can help you ...

program code syntax highlighting (http://elpanov.com/index.php?id=22)
(http://elpanov.com/assets/images/lisp-thm.gif)

Code: [Select]
;; web-04.lsp
(defun ea_lsp-html (f / a fl i l rf s str w wf)
                   ;|
*****************************************************************************************
*   
*   by ElpanovEvgeniy
*   
*   Idea of program Chen Qing Jun
*   last version 2007-06-10
*   yuanqiu@gmail.com
*   
*   My version  20.11.2008
*   
*   Examples:
*   (ea_lsp-html (getfiled "Select a Lisp File" "D:\\Work\\" "lsp" 8))
*   (ea_lsp-html f)
*   
*****************************************************************************************
|;
 (setq rf (open f "r")
       wf (open (strcat (vl-filename-directory f) "/" (vl-filename-base f) ".html")
                "w"
          ) ;_  open
 ) ;_  setq
 (write-line
  (STRCAT
;;;   <s0> - (
;;;   <s1> - subr
;;;   <s2> - "\""
;;;   <s3> - 'int
;;;   <s4> - 'real
;;;   <s5> - 'usubr
;;;   <s6> - ;  ;||;   ;;
;;;   <s7> - ;  "'"

;;;   Добвляем заголовок и стили, для локального просмотра
   "<html><head><style>
pre {
    font-family: Courier, Verdana, Arial, sans-serif;
    color: black;
font-size: 12pt;
font-weight: 600;
    overflow: auto;
border-left: 3px solid #d0d0f0;
margin-left: 5px;
padding-left: 10px;
/*height: 800px;  Высота блока */
}

.s0 {color: red;}
.s1 {
    font-family: Courier, Verdana, Arial, sans-serif;
    color: blue;
    font-size: 12pt;
font-weight: 600;
}
.s2 {color: #FF00FF;}
.s3 {color: #00CC00;}
.s4 {color: #009900;}
.s5 {color: #0000ff;}
.s6 {color: purple; background:#C0C0C0;}
.s7 {color: #800000;}
</style><title>"
   (VL-FILENAME-BASE f)
   (VL-FILENAME-EXTENSION f)
   "</title></head>\n<body><h3>"
   (VL-FILENAME-BASE f)
   (VL-FILENAME-EXTENSION f)
   "</h3><pre>"
  ) ;_ strcat
  wf
 ) ;_  write-line
 (setq l
       (list
        (strcase (strcat ";;\t" (VL-FILENAME-BASE f) (VL-FILENAME-EXTENSION f)) t)
       ) ;_  list
 ) ;_  setq
 (while (car (setq l (cons (read-line rf) l))))
 (setq l   (reverse (cdr l))
       s   (car l)
       i   0
       l   (cdr l)
       str ""
 ) ;_  setq
 (while s
  (progn
   (cond
    ((= (setq a (ascii s)) 0)
     ;; пустая строка
     (write-line (strcat str "\n") wf)
     (setq s   (car l)
           l   (cdr l)
           str ""
     ) ;_  setq
    )
    ((or (= a 32) (= a 9))
     ;; пробелы и табуляция
     (setq str (strcat str (substr s 1 1))
           s   (substr s 2)
           i   0
     ) ;_  setq
    )
    ((or (= a 39) (= a 46))
     ;; апостроф
     (setq str (strcat str "<span class=s7>" (substr s 1 1))
           s   (substr s 2)
     ) ;_  setq
     (while (= (ascii s) 39)
      (setq str (strcat str (substr s 1 1))
            s   (substr s 2)
      ) ;_  setq
     ) ;_  while
     (setq str (strcat str "</span>"))
    )
    ((< 39 a 42)
     ;; скобки
     (setq str (strcat str "<span class=s0>" (substr s 1 1))
           s   (substr s 2)
     ) ;_  setq
     (while (< 39 (ascii s) 42)
      (setq str (strcat str (substr s 1 1))
            s   (substr s 2)
      ) ;_  setq
     ) ;_  while
     (setq str (strcat str "</span>"))
    )
    ((= a 59)
     ;; многострочные коментарии
     (if (= (substr s 2 1) "|")
      ;;(ищем конец коментариев..
      (progn (setq str (strcat str "<span class=s6>"))
             (while (and s (not (setq i (vl-string-search "|;" s))))
              (write-line (strcat str s) wf)
              (setq s   (car l)
                    l   (cdr l)
                    str ""
              ) ;_  setq
             ) ;_  while
             (setq str (strcat (substr s 1 (+ 2 i)) "</span>"))
             (setq s (substr s (+ 3 i)))
      ) ;_  progn
      (progn
       (write-line (strcat str "<span class=s6>" s "</span>") wf)
       (setq s   (car l)
             l   (cdr l)
             str ""
       ) ;_  setq
      ) ;_  progn
     ) ;_  if
    )
    ((= a 34)
     ;; текстовые строки
     (progn (setq str (strcat str "<span class=s2>")
                  i   0
                  fl  t
            ) ;_  setq
            (while (and s fl)
             (while (and s
                         (setq i (vl-string-search "\"" s (setq i (1+ i))))
                         (not (or (zerop i)
                                  (not (= (substr s i 1) "\\"))
                                  (= (substr s (1- i) 2) "\\\\")
                              ) ;_  or
                         ) ;_  not
                    ) ;_  and
             ) ;_  while
             (if (null i)
              (setq str (strcat str s "\n")
                    s   (car l)
                    l   (cdr l)
                    i   -1
              ) ;_  setq
              (setq str (strcat str (substr s 1 (1+ i)) "</span>")
                    s   (substr s (+ 2 i))
                    fl  nil
                    i   0
              ) ;_  if
             ) ;_  if
            ) ;_  while
     ) ;_  progn
    )
    ((progn (setq i (apply (function min)
                           (vl-remove-if (function null)
                                         (list (vl-string-search "(" s)
                                               (vl-string-search ")" s)
                                               (vl-string-search " " s)
                                               (vl-string-search "\t" s)
                                               (vl-string-search "'" s)
                                               (vl-string-search ";" s)
                                               (vl-string-search "\"" s)
                                               (strlen s)
                                         ) ;_  list
                           ) ;_  vl-remove-if
                    ) ;_  apply
                  w (substr s 1 i)
                  s (substr s (1+ i))
            ) ;_  setq
            (or (= (setq a (type (eval (read w)))) 'subr)
                (= (strcase w) "NIL")
                (= (strcase w) "T")
            ) ;_  or
     ) ;_  progn
     (setq str (strcat str "<span class=s1>" w "</span>"))
    )
    ((= a 'usubr) (setq str (strcat str w)))
    ((= a 'int)
     (setq str (strcat str "<span class=s3>" w "</span>"))
    )
    ((= a 'real)
     (setq str (strcat str "<span class=s4>" w "</span>"))
    )
    (t (setq str (strcat str w)))
   ) ;_  cond
   s
  ) ;_  progn
  (if (and s (= (ascii s) 0))
   (progn (write-line str wf)
          (setq s   (car l)
                l   (cdr l)
                str ""
                i   0
          ) ;_  setq
   ) ;_  progn
  ) ;_  if
 ) ;_  while
 (write-line "</pre></body></html>" wf)
 (close rf)
 (close wf)
) ;_  defun
Title: Re: Breaking up code - how?
Post by: irneb on November 28, 2011, 12:02:16 PM
Here's an quick attempt from me:
Code: [Select]
(vl-load-com)

(defun string-parse+ (str chars / lst split n)
  (setq n   -1
        lst (vl-remove-if
              'null
              (mapcar '(lambda (c)
                         (setq n (1+ n))
                         (if (vl-position c chars)
                           n
                         )
                       )
                      (vl-string->list str)
              )
            )
  )
  (foreach n (reverse lst)
    (setq split (cons (substr str (1+ n) 1) (vl-remove "" (cons (substr str (+ n 2)) split)))
          str   (substr str 1 n)
    )
  )
  (if (/= (car lst) 0)
    (cons (substr str 1 (car lst)) split)
    split
  )
)

(defun split-code (code / portions string lst)
  (setq portions (string-parse+ code (vl-string->list "() \n\r\t\";")))
  (while portions
    (cond
      ((wcmatch (car portions) "\"")
       (if string
         (setq lst (cons (strcat "\"" string "\"") lst) string nil)
         (setq string "")
       )
      )
      (string
       (setq string (strcat string (car portions)))
      )
      ((eq (car portions) ";")
       (setq lst      (cons (apply 'strcat (cdr portions)) lst)
             portions nil
       )
      )
      ((not (wcmatch (car portions) " ,\t,\n,\r"))
       (setq lst (cons (car portions) lst))
      )
    )
    (setq portions (cdr portions))
  )
  (if string (setq lst (cons (strcat "\"" string "\"") lst)))
  (if (and (car lst) (not (eq (car lst) "")))
    (cons (reverse (cdr lst)) (list (car lst)))
    (cons (reverse (cdr lst)) nil)
  )
)
Seems to work for your examples. A few assumptions  :pissed: ...  :lmao: :
Title: Re: Breaking up code - how?
Post by: Jeremy on November 28, 2011, 03:53:21 PM
Wow, You guys are busy as bees! I'm always amazed at how fast people post around here. I think some tinkering with one or more of the examples will get me close enough to what I need. Thanks all. :-)
Title: Re: Breaking up code - how?
Post by: irneb on November 29, 2011, 01:21:21 PM
Glad to help. Here's a fix on my code to account for escaped double quotes inside of strings.  Replace the ((wcmatch (car portions) "\"") ... ) with:
Code: [Select]
((wcmatch (car portions) "\"")
       (if string
         (if (wcmatch string "*\\")
           (setq string (strcat string "\""))
           (setq lst (cons (strcat "\"" string "\"") lst) string nil)
         )
         (setq string "")
       )
      )
And to include the semi-colon:
Code: [Select]
((eq (car portions) ";")
       (setq lst      (cons (apply 'strcat portions) lst)
             portions nil
       )
      )
It should now work fine, here's the result of a test:
Code: [Select]
_$ (split-code "(test 123.4 \"String with ; and escaped \\\"quote\\\"\") ;| Multiline Comment\n|;")
(("(" "test" "123.4" "\"String with ; and escaped \\\"quote\\\"\"" ")") ";| Multiline Comment\n|;")
The multiline comments only work if they're at the end of the code. To get multilines working when they're at the start / midway I'll leave for your exercise (Tip: use similar to what I've done for strings).