Author Topic: Autolisp XML parsing with Columbia's code (Read 4596 times)

jgett002 · « **on:** February 22, 2017, 10:34:36 AM »

Hello all,

I am using Columbia's xml parser code found here https://www.theswamp.org/index.php?topic=525.30
It seems to be perfect for what I'm trying to accomplish but I have two small problems. Below is some example code of what my XML format looks like, except the actual will be much longer and have more layers.

<?xml version="1.0" encoding="UTF-8"?>
<bookstore>

<book>
<title lang="en">Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>

<price>30.00</price>
</book>

<book>
<title lang="en">Harry Potter</title>
<author>J K. Rowling</author>

<year>2005</year>
<price>29.99</price>
</book>

</bookstore>

Problem 1: Since the two child nodes "book" have the same name, I can't find a way to differentiate the two when using Columbia's get-child or get-child-value. The code always gives back the first child node because it searches by name. I was thinking about using get-child-list and using the positions to assign VLA objects to child nodes of the same name, but the number of "book" nodes will change with each XML file and I don't know if their positions will always be constant, I don't want that to make things messy. I was also thinking about replacing "book" with "book1", "book2", ... while there is a child node with the name "book". I think put-value function does that. Or, if there's something very simple I'm missing I would greatly appreciate someone filling me in.

Problem 2: As far as I know, Columbia's code can not handle notes in XML. When I use get-child-value on the first node "book" I can extract data from title, author, and year, but not price. When I use get-child-value on the second node "book" (and change it to a unique title) I can extract data from title and author, but not year and price. It returns error unknown name: TAGNAME. When I remove the notes, I can get all values no problem. Should I somehow remove all notes from the files beforehand? Can I write a couple lines of code into Columbia's file to skip over the notes? Or again, is there something simple I am missing here?

Please inform me on what the simplest solutions are to my two problems. Also, I am very new to computer programming in general so I would appreciate it if everyone spoke to me as if I were a small child.

Thanks

steve.carson · « **Reply #1 on:** February 22, 2017, 02:42:22 PM »

Problem #1 - I've been playing around with Columbia's code and DST files (thanks to MP's recent contributions found here: https://www.theswamp.org/index.php?topic=52362.0). Using the xml-get-childlist is the way I'd do it, but instead of using positions, I would iterate over the list and make a new list only containing the xml objects whose nodename is "book". Something like this (untested):

Code: [Select]

(defun get-xml-books (XmlO / return )

    (foreach i (reverse (xml-get-childlist XmlO))
        (if (and (vlax-property-available-p i 'nodeName)
                 (= (vlax-get-property i 'nodeName) "book")
            )
            (setq return (if return (cons i return) (list i)))
        )
    )
    return
)

I don't know enough about xml's to know if all child's will always have a nodename property, so I put the check in there. This should return a list of xml objects of each book.

Problem #2 - To get the value of specific data I've had more luck using the xml-get-child-byattribute function. In your case something like this may work:

Code: [Select]

(defun get-book-property (bookXmlO propName / )

    (xml-get-value
        (xml-get-child-byattribute bookXmlO nil "propname" propName)
    )

)

Keep in mind everything I know about xml's I've learned from messing around with sheetset stuff, so it may not apply to your situation, but I hope it at least nudges you in the right direction.

Steve

dgorsman · « **Reply #2 on:** February 22, 2017, 02:50:07 PM »

Have a look through the MSXML DOM implementation. It sounds like the code being used is manually parsing content, or perhaps stepping through child nodes. Better way of getting things would be to execute selections using XPATH, which ignores comment elements. Proving the select returns a node (via select) or a node list (via select nodes), the results can be stepped through quickly.

Edit: after a longer peek at the posted code I only see stepping through children without any use of XPATH. That would clean up a *lot* of searching.

VovKa · « **Reply #3 on:** February 22, 2017, 05:49:20 PM »

jgett002, if you plan to read lots of data from xml i'd suggest reading the whole xml file as one list and then work with it the lisp way

jgett002 · « **Reply #4 on:** February 23, 2017, 09:03:26 AM »

Steve,

Thanks for the input. If I continue with Columbia's code I think your solution to Problem #1 would work. Though I guess I should look into other xml parsing methods based on what the other replies are telling me. Although, I don't understand how your solution to Problem #2 will work since my child nodes like "year" and "price" don't have attributes.

dgorsman,

Do you know of any references to xpath that can help with my specific application? As I said in my original post, I'm really just a novice at programming. When I try to research these things its so hard to understand what anybody is talking about since I don't know a lot of the basics. I feel like I need to google ten different programming terms that are written in the explanation of the topic I was originally researching. I went back and forth deciding if I should really spend some time learning about the MSXML DOM and Activex and vlax- commands but eventually decided that I didn't want to open up that can of worms since 1) Columbia's code is really close to working and 2) after I assign variables to the xml data the rest of my program is mostly just drawing (and the user interface which I am working through fine). I figured I would save all this research for when I have a better understanding of coding in general.

Anyway, if you can either help me with an explanation or with a reference I can read, I will definitely look into it. If you think it would be too difficult for a beginner maybe I will just have a very inefficient code for my first program. FYI, please read my reply to VovKa below and let me know if you think I should absolutely use XPATH instead of stepping through.

Vovka,

My XML documents will be about 2,500 lines long and I am extracting information from around 100 nodes or so. What do you suggest?

Thanks everybody

VovKa · « **Reply #5 on:** February 23, 2017, 09:49:25 AM »

Quote from: jgett002 on February 23, 2017, 09:03:26 AM

What do you suggest?

https://www.theswamp.org/index.php?topic=33065.msg385105#msg385105

dgorsman · « **Reply #6 on:** February 23, 2017, 10:39:29 AM »

Gotta learn the basics first. Have a look at https://www.w3schools.com/ both for the XML (links at the bottom-left), and HTML and CSS.

Extracting information from hundreds of nodes is soooo much easier with XPATH. For example, here's something I do in our weld count report:

count(//Connection[SizeMain = $cur_size and Type = $cur_type and FlagExisting = 'FALSE' and BOMStatus = $cur_bomstatus]) = 0

That returns the number of Connection nodes, which have child node SizeMain that are the current size, child node Type of the current type, and so on. And that's one of the simple ones.

jgett002 · « **Reply #7 on:** February 24, 2017, 09:22:48 AM »

Vovka,

My intuition is telling me that for an XML file of 2,500 lines XPATH would most likely be faster than parsing the entire document into a list and searching through that. Anybody have any input on the direction I should be going in?

dgorsman,
The syntax looks pretty simple for XPATH, and if it's going to speed up my program I might as well do it right. But I'm not seeing any information online for actually implementing it into AutoLisp, except for small snippets of code like you just posted. So I have basic questions like do I still need the MSXML DOM (I'm assuming so), how do I call out the xml file, do I need to load up the XPATH language, etc... Can you show me an example of the setup code and then a couple lines of code actually getting information i.e. that gets the value of "price" from my second "book" node. I think if I have that I'll be good to go.

Also, I read that you need express tools to be able to run this code? And that it may cost extra to get those on autocad? I want to eventually put this program in other offices when it's refined. I don't want limitations to which computers I can put it on. What are the limitations for AutoCAD version and the requirement for express tools?

Thanks, I really appreciate the help

dgorsman · « **Reply #8 on:** February 24, 2017, 10:28:09 AM »

You don't need Express tools to do "stuff" with XML. it comes with AutoCAD, is usually an optional install. It costs *nothing*.

The process of using MSXML DOM with LISP is straightforward; most of it uses the COM/ActiveX parts of VLISP to invoke methods and read/write to properties. You'll need to bone up on functions (vlax-invoke-method ...) and (vlax-get-property...).

You create a DOMDocument object using the (vlax-create-object ...) function, then load it from the file (or from text, but we'll stick with files for now) using the "load" method e.g.

Code: [Select]

(setq
   dom_doc
     (vlax-create-object "MSXML2.DOMDocument.6.0")
)
(vlax-invoke-method dom_doc 'load "Some fully qualified filename here.xml")

The Document node object is the top-most level of the content and normally where you start from, and is a property of the DOMDocument object.

Code: [Select]

(setq doc_node (vlax-get-property dom_doc 'documentElement))

The document node, like all nodes, has a number of methods to get child nodes. Two of the more important are SelectNodes (returns all matching nodes) and SelectSingleNode (return only the first matching node). Each of them takes a string argument with an XPATH expression e.g.

Code: [Select]

(setq
   one_node
      (vlax-invoke-method cur_node 'SelectSingleNode "One/Two[three]")
)

Should return the first node of type "Two" with a parent node of "One" and a child node of "three". *BUT* that's only in the context of the node being searched. If that is the document node it will search the entire document; if its a node lower down in the tree (for example, you may have multiple sub-sections and you only want to search in one) it can be performed on a node low enough to exclude those you don't want to bother searching.

steve.carson · « **Reply #9 on:** February 24, 2017, 01:26:54 PM »

Quote from: jgett002 on February 23, 2017, 09:03:26 AM

Steve,

Thanks for the input. If I continue with Columbia's code I think your solution to Problem #1 would work. Though I guess I should look into other xml parsing methods based on what the other replies are telling me. Although, I don't understand how your solution to Problem #2 will work since my child nodes like "year" and "price" don't have attributes.

You're right. I downloaded the xml file from the link dgorsman posted (which is amazingly similar to what you are doing), and my solution for problem 2 isn't applicable. This seems to work though:

Code: [Select]

(defun get-book-property (bookXmlO propName / )

    (xml-get-child-value bookXmlo nil propName)

)

jgett002 · « **Reply #10 on:** February 28, 2017, 03:48:15 PM »

Hello again,

Just wanted to thank you all for your suggestions and help anyone down the line that reads this. I ended up using dgorsman's method with XPath. I decided to learn more about ActiveX and visual lisp and XPATH, and contrary to what I was saying earlier I am so glad that I did. When I wrote this post I was searching through other people's code, trying to decipher it and change it to make it useful while not understanding anything that was going on. After I did my research I was able to write this code below on my first try without any errors. It solves my two problems just fine.

(vl-load-com)

(setq dom_doc (vlax-create-object "MSXML2.DOMDocument.6.0"))

(vlax-invoke-method dom_doc 'load "C:\\...file path...\\Books.xml")

(setq doc_node (vlax-get-property dom_doc 'documentElement))

(setq price (vlax-invoke-method doc_node 'SelectSingleNode "book[2]/price"))

(setq price_value (vlax-get-property price 'text))

It returns 29.99, which is the value of the price from the second "book" node, which is also after that note that was messing me up before. Now that I have a solid foundation of what I'm doing I can get into the real programming for my large xml file.

FYI for anyone that wants to learn how to do this themselves, my three main resources were:
https://msdn.microsoft.com/en-us/library/ms757828(v=vs.85).aspx
https://msdn.microsoft.com/en-us/library/ms763798(v=vs.85).aspx
https://www.w3schools.com/xml/xpath_intro.asp

dgorsman · « **Reply #11 on:** March 01, 2017, 10:13:20 AM »

Excellent. Comments in XML are considered an element, just like everything else, so are included when you step through them with next sibling, first child, and so on; in that case you would need to check the element type (or not use comments). XPATH only counts the nodes specified in the criteria, so you can include useful information in the XML as comments and still have it searchable.

T.Willey · « **Reply #12 on:** March 01, 2017, 04:58:39 PM »

Question: Since the OP is 'creating' an object, should the object be released when done?

Code: [Select]

(setq dom_doc (vlax-create-object "MSXML2.DOMDocument.6.0"))

< do stuff >

(vlax-release-object dom_doc)

dgorsman · « **Reply #13 on:** March 02, 2017, 11:01:46 AM »

I haven't noticed much of a difference either way, as I localize dom_doc in the function. Its likely getting picked up by garbage collection once all references are removed, like other vla-objects. I could be wrong.

News:

Author Topic: Autolisp XML parsing with Columbia's code (Read 4596 times)

jgett002

Autolisp XML parsing with Columbia's code

steve.carson

Re: Autolisp XML parsing with Columbia's code

dgorsman

Re: Autolisp XML parsing with Columbia's code

VovKa

Re: Autolisp XML parsing with Columbia's code

jgett002

Re: Autolisp XML parsing with Columbia's code

VovKa

Re: Autolisp XML parsing with Columbia's code

dgorsman

Re: Autolisp XML parsing with Columbia's code

jgett002

Re: Autolisp XML parsing with Columbia's code

dgorsman

Re: Autolisp XML parsing with Columbia's code

steve.carson

Re: Autolisp XML parsing with Columbia's code

jgett002

Re: Autolisp XML parsing with Columbia's code

dgorsman

Re: Autolisp XML parsing with Columbia's code

T.Willey

Re: Autolisp XML parsing with Columbia's code

dgorsman

Re: Autolisp XML parsing with Columbia's code