Author Topic: Any way to call data from an Apache Parquet file from a lisp routine?  (Read 671 times)

0 Members and 1 Guest are viewing this topic.

paulgdowlo

  • Mosquito
  • Posts: 2
I am keen to call data from an Apache Parquet file via Autolisp. Apache Parquet is an open source, column-oriented data file format designed for efficient data storage and retrieval.
Any thoughts on how this might be possible?

It's Alive!

  • Retired
  • Needs a day job
  • Posts: 8926
  • AKA Daniel
You could enlist the help of Python and Pandas, read the file and return it to lisp.
Not trivial, but not impossible
https://www.theswamp.org/index.php?topic=58548.msg616160#msg616160

BIGAL

  • Swamp Rat
  • Posts: 1471
  • 40 + years of using Autocad
Post a sample no idea what it looks like. If its fixed length words then can use substr.
A man who never made a mistake never made anything

It's Alive!

  • Retired
  • Needs a day job
  • Posts: 8926
  • AKA Daniel
here's a sample I found, I bit too big

Code - Python: [Select]
  1. from pyrx_imp import Rx, Ge, Gi, Db, Ap, Ed, Sm
  2. import traceback
  3. import pandas as pd  # requires fastparquet
  4.  
  5.  
  6. def PyRxLisp_read_parquet(args):
  7.     try:
  8.         #TODO reat path from arg
  9.         path = "E:\\sample.parquet"
  10.         df = pd.read_parquet(path)
  11.  
  12.         # TODO: headers = df.columns.values.tolist()
  13.         datas = df.values.tolist()
  14.         rb = [(Rx.LispType.kListBegin, 0)]
  15.         for row, data in enumerate(datas):
  16.             rb.append((Rx.LispType.kListBegin, 0))
  17.             for col, value in enumerate(data):
  18.                 rb.append((Rx.LispType.kText, "{}".format(value)))
  19.             rb.append((Rx.LispType.kListEnd, 0))
  20.         rb.append((Rx.LispType.kListEnd, 0))
  21.         return rb
  22.  
  23.     except Exception as err:
  24.         traceback.print_exception(err)
  25.  

Quote
(read_parquet)
(("2016-02-03 07:55:29" "1" "Amanda" "Jordan" "ajordan0@com.com" "Female" "1.197.201.2" "6759521864920116" "Indonesia" "3/8/1971" "49756.53" "Internal Auditor" "1E+02") ("2016-02-03 17:04:03" "2" "Albert" "Freeman" "afreeman1@is.gd" "Male" "218.111.175.34" "" "Canada" "1/16/1968" "150280.17" "Accountant IV" "") ("2016-02-03 01:09:31" "3" "Evelyn" "Morgan" "emorgan2@altervista.org" "Female" "7.161.136.94" "6767119071901597" "Russia" "2/1/1960" "144972.51" "Structural Engineer" "") ("2016-02-03 00:36:21" "4" "Denise" "Riley" "driley3@gmpg.org" "Female" "140.35.109.83" "3576031598965625" "China" "4/8/1997" "90263.05" "Senior Cost Accountant" "") ("2016-02-03 05:05:31" "5" "Carlos" "Burns" "cburns4@miitbeian.gov.cn" "" "169.113.235.40" "5602256255204850" "South Africa" "" "nan" "" "") ("2016-02-03 07:22:34" "6" "Kathryn" "White" "kwhite5@google.com" "Female" "195.131.81.179" "3583136326049310" "Indonesia" "2/25/1983" "69227.11" "Account Executive" "") ("2016-02-03 08:33:08" "7" "Samuel" "Holmes" "sholmes6@foxnews.com" "Male" "232.234.81.197" "3582641366974690" "Portugal" "12/18/1987" "14247.62" "Senior Financial Analyst" "") ("2016-02-03 06:47:06" "8" "Harry" "Howell" "hhowell7@eepurl.com" "Male" "91.235.51.73" "" "Bosnia and Herzegovina" "3/1/1962" "186469.43" "Web Developer IV" "") ("2016-02-03 03:52:53" "9" "Jose" "Foster" "jfoster8@yelp.com" "Male" "132.31.53.61" "" "South Korea" "3/27/1992" "231067.84" "Software Test Engineer I" "1E+02") ("2016-02-03 18:29:47" "10" "Emily" "Stewart" "estewart9@opensource.org" "Female" "143.28.251.245" "3574254110301671" "Nigeria" "1/28/1997" "27234.28" "Health Coach IV" "") ("2016-02-03 00:10:42" "11" "Susan" "Perkins" "sperkinsa@patch.com" "Female" "180.85.0.62" "3573823609854134" "Russia" "" "210001.95" "" "") ("2016-02-03 18:04:34" "12" "Alice" "Berry" "aberryb@wikipedia.org" "Female" "246.225.12.189" "4917830851454417" "China" "8/12/1968" "22944.53" "Quality Engineer" "") ("2016-02-03 18:48:17" "13" "Justin" "Berry" "jberryc@usatoday.com" "Male" "157.7.146.43" "6331109912871813274" "Zambia" "8/15/1975" "44165.46" "Structural Analysis Engineer" "") ("2016-02-03 21:46:52" "14" "Kathy" "Reynolds" "kreynoldsd@redcross.org" "Female" "81.254.172.13" "5537178462965976" "Bosnia and Herzegovina" "6/27/1970" "286592.99" "Librarian" "") ("2016-02-03 08:53:23" "15" "Dorothy" "Hudson" "dhudsone@blogger.com" "Female" "8.59.7.0" "3542586858224170" "Japan" "12/20/1989" "157099.71" "Nurse Practicioner" "<script>alert('hi')</script>") ("2016-02-03 00:44:01" "16" "Bruce" "Willis" "bwillisf@bluehost.com" "Male" "239.182.219.189" "3573030625927601" "Brazil" "" "239100.65" "" "") ("2016-02-03 00:57:45" "17" "Emily" "Andrews" "eandrewsg@cornell.edu" "Female" "29.231.180.172" "30271790537626" "Russia" "4/13/1990" "116800.65" "Food Chemist" "") ("2016-02-03 16:44:24" "18" "Stephen" "Wallace" "swallaceh@netvibes.com" "Male" "152.49.213.62" "5433943468526428" "Ukraine" "1/15/1978" "248877.99" "Account Representative I" "")

plus a 1000 more lines

It's Alive!

  • Retired
  • Needs a day job
  • Posts: 8926
  • AKA Daniel
Reading about Apache Parquet and Arrow. Itís very interesting, and apparently extremely fast.
Memory layout is by column instead of by row, is enforced 64 byte aligned for AVX-512.
https://arrow.apache.org/docs/format/Columnar.html  :mrgreen:

Iím wondering what youíre doing with this with regards to CAD?
IMHO, this is something that is best read into a compute engine, I.e. Pandas, then maybe returning the results of your analysis to lisp.

paulgdowlo

  • Mosquito
  • Posts: 2
Firstly so sorry for not replying earlier. Did not have access to my computer for weeks. Thank you for the suggestions. I will have a play.
The reason I am looking into it is part of an automation I am building to develop and maintain a CAD "standard" including creating required attributed blocks with drop down pick lists (search Ref11 Toolkit) from our current esri GIS feature datasets. At the moment I use INI files for configuration which I can automatically produce. Apache Parquet files is overkill for my needs but would keep everything in one file instead of multiple INI files. Yes Parquet files are super cool. I use FME from SAFE software to create my automations and AcCoreConsole to create the attributed blocks.