Author Topic: Ignoring certain blocks on output  (Read 17333 times)

0 Members and 1 Guest are viewing this topic.

Marc'Antonio Alessi

  • Swamp Rat
  • Posts: 1451
  • Marco
Re: Ignoring certain blocks on output
« Reply #30 on: April 07, 2013, 04:04:21 PM »
On the topic of CSV parsing, this is what I use (accounting for both quotes & commas present in the cell data, and also allowing for alternative cell delimiter characters) though I make no claims for its efficiency.
I'm sorry but I have some problems with the English language, I did not understand what you mean and where is your example.

"a, b, c,\"John \"\"The Baptist\"\" Smith\",4,5,6"       is a correct CSV format?

Sorry Marc, I should have explained further.

The string:
Code: [Select]
"a, b, c,\"John \"\"The Baptist\"\" Smith\",4,5,6"is indeed a valid CSV format.

My CSV parsing function will then return the result:
Code: [Select]
_$ (LM:csv->lst "a, b, c,\"John \"\"The Baptist\"\" Smith\",4,5,6" 44 0)
("a" " b" " c" "John \"The Baptist\" Smith" "4" "5" "6")
To demonstrate, try creating a CSV file with the above cell values and open the CSV file using a plain text editor.

Note that my function does not remove whitespace between cell data as I wrote the function with the intention to return the exact CSV content; if whitespace should be removed, the above result may be processed using:
Code: [Select]
_$ (mapcar '(lambda ( x ) (vl-string-trim " \t" x)) (LM:csv->lst "a, b, c,\"John \"\"The Baptist\"\" Smith\",4,5,6" 44 0))
("a" "b" "c" "John \"The Baptist\" Smith" "4" "5" "6")

Sorry for the confusion.
OK Thanks.
My error was:
;;--------------------=={ CSV to List }==---------------------;;
;;                                                            ;;
;;  Parses a line from a CSV file into a list of cell values. ;;
;;------------------------------------------------------------;;
;;  Author: Lee Mac, Copyright © 2012 - www.lee-mac.com       ;;
;;------------------------------------------------------------;;
;;  Arguments:                                                ;;
;;  str - string read from CSV file                           ;;
;;  sep - CSV separator token                                 ;;

>>> sep - CSV separator token   >>> in ascii value!

Ciao.

Marc'Antonio Alessi

  • Swamp Rat
  • Posts: 1451
  • Marco
Re: Ignoring certain blocks on output
« Reply #31 on: April 07, 2013, 04:11:02 PM »
Hi Marco.  In an Excel sheet, enter this, including all the double-quotes:

       "John "the baptist" Smith"

Save to CSV, and look at the output.
Tony,
 "John "the baptist" Smith"    it is very redundant!
Perhaps this is an acceptable result:
("a" " b" " c" "\"John \"\"The Baptist\"\" Smith\"" "4" "5" "6")

I go to sleep, good night.


TheMaster

  • Guest
Re: Ignoring certain blocks on output
« Reply #32 on: April 08, 2013, 07:21:26 PM »
Hi Marco.  In an Excel sheet, enter this, including all the double-quotes:

       "John "the baptist" Smith"

Save to CSV, and look at the output.
Tony,
 "John "the baptist" Smith"    it is very redundant!
Perhaps this is an acceptable result:
("a" " b" " c" "\"John \"\"The Baptist\"\" Smith\"" "4" "5" "6")

I go to sleep, good night.

Hi Marco.

The underlying objective of the code which the list->string function from the document I attached to my OP in this thread is based on, is to correctly-parse CSV-formatted data coming from (amongt other sources) Excel.

To do that correctly requires the character-by-character approach, because of the special handling of data surrounded by double-quotes, including literal double-quotes between an outer pair of double-quotes. Is it redundant? Perhaps, but that is the CSV format, and code must be able to handle CSV- formatted data correctly. If it doesn't handle the data correctly, than how fast it runs is meaningless.


Lee Mac

  • Seagull
  • Posts: 12906
  • London, England
Re: Ignoring certain blocks on output
« Reply #33 on: April 09, 2013, 08:12:52 AM »
If it doesn't handle the data correctly, than how fast it runs is meaningless.

Couldn't agree more.

Marc'Antonio Alessi

  • Swamp Rat
  • Posts: 1451
  • Marco
Re: Ignoring certain blocks on output
« Reply #34 on: April 09, 2013, 10:51:48 AM »
Nuovo tentativo (new attempt):
Code: [Select]
; Version 1.11 - 09-04-2013
(defun ALE_String_ToListDQ (InpStr CarDlm / SttPos EndPos TmpLst TmpStr TmpPos SttDQt EndDQt TrueFl)
  (setq
    CarDlm (ascii CarDlm)   SttPos 0
    EndPos (vl-string-position CarDlm InpStr)
    SttDQt (vl-string-position 34 InpStr)
  )
  (and SttDQt (setq EndDQt (vl-string-position 34 InpStr (1+ SttDQt))))
  (while EndPos
    (and
      EndDQt
      (= "\"" (substr InpStr (1+ EndDQt) 1)) (setq TrueFl T)
      (setq EndDQt (vl-string-position 34 InpStr (+ 2 EndDQt)))
    )
    (cond
      ( (and EndDQt (< SttDQt EndPos EndDQt))
        (setq EndPos (vl-string-position CarDlm InpStr (1+ EndPos)))
      )
      ( T
        (and SttDQt (setq SttDQt (vl-string-position 34 InpStr (1+ EndPos))))
        (setq TmpStr (substr InpStr (1+ SttPos) (- EndPos SttPos)))
        (and
          TrueFl
          (setq TmpPos 0)
          (while (setq TmpPos (vl-string-search  "\"\"" TmpStr TmpPos))
            (setq TmpStr (vl-string-subst "\"" "\"\"" TmpStr TmpPos)
             TmpPos (1+ TmpPos)
            )
          )
          (setq TmpStr (vl-string-trim  "\"" TmpStr) TrueFl nil)
        )
        (setq
          TmpLst (cons TmpStr TmpLst)
          SttPos (1+ EndPos) EndPos (vl-string-position CarDlm InpStr SttPos)
        )
        (and SttDQt (setq EndDQt (vl-string-position 34 InpStr (1+ SttDQt))))
      )
    )
  )
  (reverse (cons (substr InpStr (1+ SttPos)) TmpLst))
)
(ALE_String_ToListDQ "a, b, c,\"John \"\"The,Baptist\"\" Smith\",4,5,6" ",")
=> ("a" " b" " c" "John \"The,Baptist\" Smith" "4" "5" "6")

Lee Mac

  • Seagull
  • Posts: 12906
  • London, England
Re: Ignoring certain blocks on output
« Reply #35 on: April 09, 2013, 11:04:14 AM »
Try this Marc:
Code: [Select]
(setq str "\"Smith,John\",\"\"\"The Baptist\"\"\"")
The result should be:
Code: [Select]
("Smith,John" "\"The Baptist\"")
:-)

Marc'Antonio Alessi

  • Swamp Rat
  • Posts: 1451
  • Marco
Re: Ignoring certain blocks on output
« Reply #36 on: April 09, 2013, 11:29:01 AM »
Try this Marc:
Code: [Select]
(setq str "\"Smith,John\",\"\"\"The Baptist\"\"\"")
The result should be:
Code: [Select]
("Smith,John" "\"The Baptist\"")
:-)
:-(
"\"Smith,John\",\"\"\"The Baptist\"\"\""
do you think this is a valid value?, can you post an image of the cell? like in:
http://www.theswamp.org/index.php?topic=40660.msg459469#msg459469
Grazie.



Lee Mac

  • Seagull
  • Posts: 12906
  • London, England
Re: Ignoring certain blocks on output
« Reply #37 on: April 09, 2013, 11:47:09 AM »
Try this Marc:
Code: [Select]
(setq str "\"Smith,John\",\"\"\"The Baptist\"\"\"")
The result should be:
Code: [Select]
("Smith,John" "\"The Baptist\"")
:-)
:-(
"\"Smith,John\",\"\"\"The Baptist\"\"\""
do you think this is a valid value?, can you post an image of the cell? like in:
http://www.theswamp.org/index.php?topic=40660.msg459469#msg459469
Grazie.

Sure:



Providing the CSV delimiter is a comma, the above will yield a text value of:
Code: [Select]
"\"Smith,John\",\"\"\"The Baptist\"\"\""
Though, the example provided in the thread to which you have linked demonstrates a similar result if you needed proof.
« Last Edit: April 09, 2013, 11:50:46 AM by Lee Mac »

Marc'Antonio Alessi

  • Swamp Rat
  • Posts: 1451
  • Marco
Re: Ignoring certain blocks on output
« Reply #38 on: April 09, 2013, 12:11:55 PM »
>> Providing the CSV delimiter is a comma, the above will yield a text value of: ...

I have modified my italian settings (also decimal sep.): "," ";"    to    "." ","
and now I can see.

Maybe I try a new attempt...

Ciao.

Lee Mac

  • Seagull
  • Posts: 12906
  • London, England
Re: Ignoring certain blocks on output
« Reply #39 on: April 09, 2013, 12:15:34 PM »
>> Providing the CSV delimiter is a comma, the above will yield a text value of: ...

I have modified my italian settings (also decimal sep.): "," ";"    to    "." ","
and now I can see.

Ah yes, the function also needs to account for the use of different CSV delimiters! :evil:

Marc'Antonio Alessi

  • Swamp Rat
  • Posts: 1451
  • Marco
Re: Ignoring certain blocks on output
« Reply #40 on: April 09, 2013, 12:48:24 PM »
Ah yes, the function also needs to account for the use of different CSV delimiters! :evil:
May be we need to look the settings in Control Panel:
Excel                        :  Smith,John   "The Baptist"
CSV italian settings  :   Smith,John;"""The Baptist"""
CSV english settings:  "Smith,John","""The Baptist"""

Buona serata.

Lee Mac

  • Seagull
  • Posts: 12906
  • London, England
Re: Ignoring certain blocks on output
« Reply #41 on: April 09, 2013, 01:10:41 PM »
Ah yes, the function also needs to account for the use of different CSV delimiters! :evil:
May be we need to look the settings in Control Panel:
Excel                        :  Smith,John   "The Baptist"
CSV italian settings  :   Smith,John;"""The Baptist"""
CSV english settings:  "Smith,John","""The Baptist"""

Buona serata.

My Read CSV function does exactly that ;-)

Marc'Antonio Alessi

  • Swamp Rat
  • Posts: 1451
  • Marco
Re: Ignoring certain blocks on output
« Reply #42 on: April 10, 2013, 10:36:51 AM »
Tony & Lee,

do you think that there are other special cases or the following examples cover all the possible complications?

Code: [Select]
Comando: (setq str "\"abc,123\",\"\"\"ABC\"\"\",\"abc\"\"ABC\"\"123\"")
"\"abc,123\",\"\"\"ABC\"\"\",\"abc\"\"ABC\"\"123\""
Comando: (LM:csv->lst str 44 0)
("abc,123" "\"ABC\"" "abc\"ABC\"123")

Comando: (setq str "a, b, c,\"John \"\"The Baptist\"\" Smith\",4,5,6")
"a, b, c,\"John \"\"The Baptist\"\" Smith\",4,5,6"
Comando: (LM:csv->lst str 44 0)
("a" " b" " c" "John \"The Baptist\" Smith" "4" "5" "6")

Comando: (setq str "552.32,\"Smith, John\",42,350,a,b,c,d")
"552.32,\"Smith, John\",42,350,a,b,c,d"
Comando: (LM:csv->lst str 44 0)
("552.32" "Smith, John" "42" "350" "a" "b" "c" "d")

Comando: (setq str "\"Smith,John\",\"\"\"The Baptist\"\"\"")
"\"Smith,John\",\"\"\"The Baptist\"\"\""
Comando: (LM:csv->lst str 44 0)
("Smith,John" "\"The Baptist\"")

Marc'Antonio Alessi

  • Swamp Rat
  • Posts: 1451
  • Marco
Re: Ignoring certain blocks on output
« Reply #43 on: April 10, 2013, 10:40:05 AM »
IMHO > For moderator: as I wrote, maybe is better to move the CSV to List part of this discussion to anothet tr.

Marc'Antonio Alessi

  • Swamp Rat
  • Posts: 1451
  • Marco
Re: Ignoring certain blocks on output
« Reply #44 on: April 10, 2013, 12:02:10 PM »
2 new attempts (similar engine with vl-string-position):
Code: [Select]
(defun ALE_StringSubstAll (NewStr PatStr InpStr SttPos / NewLen)
  (cond
    ( (= "" PatStr) InpStr )
    ( (setq NewLen (strlen NewStr))
      (while (setq SttPos (vl-string-search PatStr InpStr SttPos))
        (setq
          InpStr (vl-string-subst NewStr PatStr InpStr SttPos)
          SttPos (+ SttPos NewLen)
        )
      )
      InpStr
    )
  )
)
(defun ALE_String_CdfToList (InpStr CarDlm / SttPos EndPos TmpStr OutLst TmpElm)
  (setq
    CarDlm (ascii CarDlm)   SttPos 0
    EndPos (vl-string-position CarDlm InpStr)
  )
  (while EndPos
    (cond
      ( (wcmatch (setq TmpElm (substr InpStr (1+ SttPos) (- EndPos SttPos))) "\"*\"")
        (setq OutLst
          (cons
            (ALE_StringSubstAll "\"" "\"\"" (substr TmpElm 2 (- (strlen TmpElm) 2)) 0)
            OutLst
          )
        )
      )
      ( (wcmatch TmpElm "\"*") (setq TmpStr (substr TmpElm 2)) )
      ( TmpStr
        (setq OutLst
          (cons
            (strcat TmpStr (chr CarDlm) (substr TmpElm 1 (- (strlen TmpElm) 1)))
            OutLst
          )
          TmpStr nil
        )
      )
      ( T (setq OutLst (cons TmpElm OutLst)) )
    )
    (setq SttPos (1+ EndPos) EndPos (vl-string-position CarDlm InpStr SttPos))
  )
  (reverse
    (cons
      (cond
        ( (wcmatch (setq TmpElm (substr InpStr (1+ SttPos))) "\"*\"")
          (ALE_StringSubstAll "\"" "\"\"" (substr TmpElm 2 (- (strlen TmpElm) 2)) 0)
        )
        ( (wcmatch TmpElm "\"*")(setq TmpStr (substr TmpElm 2)) )
        ( TmpStr (strcat TmpStr (chr CarDlm) (substr TmpElm 1 (- (strlen TmpElm) 1))) )
        ( T TmpElm )
      )
      OutLst
    )
  )
)
(defun ALE_String_CdfToListFE (InpStr CarDlm / SttPos EndPos TmpLst TmpStr TmpPos OutLst)
  (setq
    CarDlm (ascii CarDlm)   SttPos 0
    EndPos (vl-string-position CarDlm InpStr)
  )
  (while EndPos
    (setq
      TmpLst (cons (substr InpStr (1+ SttPos) (- EndPos SttPos)) TmpLst)
      SttPos (1+ EndPos) EndPos (vl-string-position CarDlm InpStr SttPos)
    )
  )
  (foreach ForElm (cons (substr InpStr (1+ SttPos)) TmpLst)
    (cond
      (   (wcmatch ForElm "\"*\"")
          (setq TmpPos 0   ForElm (substr ForElm 2 (- (strlen ForElm) 2)))
          (while (setq TmpPos (vl-string-search  "\"\"" ForElm TmpPos))
            (setq
               ForElm (vl-string-subst "\"" "\"\"" ForElm TmpPos)
               TmpPos (1+ TmpPos)
            )
          )
          (setq OutLst (cons ForElm OutLst))
      )
      (  (wcmatch ForElm "*\"")
         (setq TmpStr (substr ForElm 1 (- (strlen ForElm) 1)))
      )
      ( TmpStr
        (setq OutLst
          (cons
            (strcat (substr ForElm 2) (chr CarDlm) TmpStr)
            OutLst
          )
          TmpStr nil
        )
      )
      ( T (setq OutLst (cons ForElm OutLst)) )
    )
  )
  OutLst
)

Test:
Code: [Select]
Comando: (setq str "\"abc,123\",\"\"\"ABC\"\"\",\"abc\"\"ABC\"\"123\"")
"\"abc,123\",\"\"\"ABC\"\"\",\"abc\"\"ABC\"\"123\""
Comando: (ALE_String_CdfToList str ",")
("abc,123" "\"ABC\"" "abc\"ABC\"123")
Comando: (ALE_String_CdfToListFE str ",")
("abc,123" "\"ABC\"" "abc\"ABC\"123")
Comando: (LM:csv->lst str 44 0)
("abc,123" "\"ABC\"" "abc\"ABC\"123")

Comando: (setq str "a, b, c,\"John \"\"The Baptist\"\" Smith\",4,5,6")
"a, b, c,\"John \"\"The Baptist\"\" Smith\",4,5,6"
Comando: (ALE_String_CdfToList str ",")
("a" " b" " c" "John \"The Baptist\" Smith" "4" "5" "6")
Comando: (ALE_String_CdfToListFE str ",")
("a" " b" " c" "John \"The Baptist\" Smith" "4" "5" "6")
Comando: (LM:csv->lst str 44 0)
("a" " b" " c" "John \"The Baptist\" Smith" "4" "5" "6")

Comando: (setq str "552.32,\"Smith, John\",42,350,a,b,c,d")
"552.32,\"Smith, John\",42,350,a,b,c,d"
Comando: (ALE_String_CdfToList str ",")
("552.32" "Smith, John" "42" "350" "a" "b" "c" "d")
Comando: (ALE_String_CdfToListFE str ",")
("552.32" "Smith, John" "42" "350" "a" "b" "c" "d")
Comando: (LM:csv->lst str 44 0)
("552.32" "Smith, John" "42" "350" "a" "b" "c" "d")

Comando: (setq str "\"Smith,John\",\"\"\"The Baptist\"\"\"")
"\"Smith,John\",\"\"\"The Baptist\"\"\""
Comando: (ALE_String_CdfToList str ",")
("Smith,John" "\"The Baptist\"")
Comando: (ALE_String_CdfToListFE str ",")
("Smith,John" "\"The Baptist\"")
Comando: (LM:csv->lst str 44 0)
("Smith,John" "\"The Baptist\"")
/[code]