Author Topic: Extracting Excel Data  (Read 7115 times)

0 Members and 1 Guest are viewing this topic.

WILL HATCH

  • Bull Frog
  • Posts: 450
Extracting Excel Data
« on: March 28, 2018, 07:28:19 PM »
For years I've used Excel interop to dump spreadsheets into object arrays, but recently was forced to get familiar with OLEDB providers connecting with Access databases and realized there's a better way. This smashes old methods with respect to read performance!

Code - C#: [Select]
  1.         public static DataSet ExtractData(string path)
  2.         {
  3.             try
  4.             {
  5.                 DataSet Data = new DataSet(Path.GetFileNameWithoutExtension(path));
  6.                 string ConnectionString = string.Format(
  7.                     "Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties=\"Excel 12.0 Xml;HDR=No;IMEX=1\";",
  8.                     path);
  9.                 using (var Connection = new OleDbConnection(ConnectionString))
  10.                 {
  11.                     Connection.Open();
  12.                     var Command = new OleDbCommand("", Connection);
  13.                     // Get all Sheets in Excel File
  14.                     DataTable Sheets = Connection.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);
  15.  
  16.                     // Loop through all Sheets to get data
  17.                     foreach (DataRow Row in Sheets.Rows)
  18.                     {
  19.                         string SheetName = Row["TABLE_NAME"].ToString();
  20.  
  21.                         if (!SheetName.EndsWith("$"))
  22.                             continue;
  23.  
  24.                         // Get all rows from the Sheet
  25.                         Command.CommandText = "SELECT * FROM [" + SheetName + "]";
  26.  
  27.                         DataTable Sheet = new DataTable();
  28.                         Sheet.TableName = SheetName;
  29.  
  30.                         OleDbDataAdapter adapter = new OleDbDataAdapter(Command);
  31.                         adapter.Fill(Sheet);
  32.  
  33.                         Data.Tables.Add(Sheet);
  34.                     }
  35.                     Connection.Close();
  36.                     return Data;
  37.                 }
  38.             }
  39.             catch (System.Exception e)
  40.             {
  41.                 throw e.InnerException;
  42.             }
  43.         }

n.yuan

  • Bull Frog
  • Posts: 348
Re: Extracting Excel Data
« Reply #1 on: March 29, 2018, 01:40:23 PM »
Considering the reality that people so often use Excel sheet to store small amount of data, it makes sense to access data in Excel sheet without running Excel app in parallel with AutoCAD app. This was I always did in the past, until MS stopped ship Jet Engine with Windows OS (when MS Access 2007 came out, which stopped using Jet Engine, and changed to use MS Access DB Engine).

The biggest issue with using MS Access DB Engine, is that it has to be installed (with admin privilege), and also has to be 32 or 64=bit explicitly.  Even the computer has MC Access itself installed, it is likely 32-bit version. If AutoCAD is 64-bit, 64-bit Access DB engine has to be installed. If the user happened to stuck with older MS Office/Access (2007/2010), it may not be possible to have 32-bit MS Access and 64-bit Access DB Engine in the same computer. Since it is now year 2018, this probably not a big deal any more, but extra installation with admin privilege alone makes things a bit troublesome.

I'd consider using OpenXml to access data stored in *.xls/*.xlsx would be much better approach, if there is no need to actually open the sheet file in Excel app in the workflow. This way, you only have one extra DLL go with your Add-in DLLs. There are also a lot third party tool or toolkit based on OpenXml, many of them are free and very easy to use (but I always use OpenXml directly, it is easy enough).

dgorsman

  • Water Moccasin
  • Posts: 2437
Re: Extracting Excel Data
« Reply #2 on: March 29, 2018, 02:28:25 PM »
Excel does tend to fall short as data storage.  Aside from access speed, there's the very real possibility of users hiding/deleting rows, columns, and other important parts.  The latter can also be a problem with Access databases too.  Yes, they're easier for humans to work with but there needs to be consideration of how often manual changes are necessary and how much data needs to be stored.  If it's "once in a blue moon" or relatively small amounts of data, a more data-centric and generically supported format like XML is the better choice.
If you are going to fly by the seat of your pants, expect friction burns.

try {GreatPower;}
   catch (notResponsible)
      {NextTime(PlanAhead);}
   finally
      {MasterBasics;}

JohnK

  • Administrator
  • Seagull
  • Posts: 10605
Re: Extracting Excel Data
« Reply #3 on: March 29, 2018, 03:43:52 PM »
Coincidence! I was just dipping my toes back into the C# pool to read some Excel data so this post will come in handy. Thanks!
TheSwamp.org (serving the CAD community since 2003)
Member location map - Add yourself

Donate to TheSwamp.org

JohnK

  • Administrator
  • Seagull
  • Posts: 10605
Re: Extracting Excel Data
« Reply #4 on: March 30, 2018, 03:06:42 PM »
Holly bananas that OLEDB method is fast.

Note: I had to install https://www.microsoft.com/en-us/download/details.aspx?id=13255 - the x64 version did not work - for this method to work for me.
TheSwamp.org (serving the CAD community since 2003)
Member location map - Add yourself

Donate to TheSwamp.org

MickD

  • King Gator
  • Posts: 3619
  • (x-in)->[process]->(y-out) ... simples!
Re: Extracting Excel Data
« Reply #5 on: March 30, 2018, 05:49:50 PM »
I agree with going the OpenXML route if you _have_ to stick with Excel otherwise I'd use SQLite. It doesn't have a UI but there are free viewer/editors and for a simple flat db a .Net dataset GUI form would be a snap for basic editing/viewing.
It can store millions of records and can be stored as a simple file with a project or on a server.
"Short cuts make long delays,' argued Pippin.”
J.R.R. Tolkien

JohnK

  • Administrator
  • Seagull
  • Posts: 10605
Re: Extracting Excel Data
« Reply #6 on: March 31, 2018, 08:51:07 AM »
On that note, if the data is just tabular you'd be hard-pressed to beat CSV.
TheSwamp.org (serving the CAD community since 2003)
Member location map - Add yourself

Donate to TheSwamp.org

CADbloke

  • Bull Frog
  • Posts: 342
  • Crash Test Dummy
Re: Extracting Excel Data
« Reply #7 on: April 02, 2018, 06:39:38 AM »
+1 for SQLite, I use it in tvCAD. LinqPad & DB Browser are 2 ways to query & edit the data. https://github.com/msallin/SQLiteCodeFirst is an excellent Entity Framework bridge. SQLite  is much faster than querying Excel (yes, even with Entity Framework). Also, I try not to encourage the use of spreadsheets as a database, they're not. If you must, EPPlus is another good option.

The main problems I see with CSV are when it is edited in Excel and the formatting is butchered. Apart from that, yes, it is a great format for simple tabular data. In .NET, https://github.com/kentcb/KBCsv is a good option to keep it simple.

JohnK

  • Administrator
  • Seagull
  • Posts: 10605
Re: Extracting Excel Data
« Reply #8 on: April 02, 2018, 09:06:38 AM »
.csv is one area that MS Windows never caught up to the *nix world on. Huge amounts of tabular data has always been able to be parsed in ms with built in tools like awk and what not. Mark wrote a tutorial on the topic.

http://www.theswamp.org/index.php?topic=8476.msg108378#msg108378

Also, you'd typically edit .csv files in your editor not excel; the mind-set being that you can easily parse plain text data through several "filters" (programs) to get amazing results quickly. -i.e. take a bunch of plain text->parse->parse->gnuplot->beautiful output.

But, .csv is meant for tabular type data (static data; stuff that's not going to be changing much. Like, weather data and whatnot.) If you have more dynamic data, then a database is better. IMO It would be dumb to keep a database for old weather data (extra overheard for little to no benefit).
TheSwamp.org (serving the CAD community since 2003)
Member location map - Add yourself

Donate to TheSwamp.org

Keith Brown

  • Swamp Rat
  • Posts: 601
Re: Extracting Excel Data
« Reply #9 on: April 05, 2018, 11:43:52 AM »
When and if i have to work with a csv file I use http://joshclose.github.io/CsvHelper/


For working with excel files I use https://www.gemboxsoftware.com/spreadsheet.  The free version is fully functional but limited to 150 rows per sheet and 5 sheets.

Both are extremely fast in my opinion and easy to use.
Keith Brown | AutoCAD MEP Blog | RSS Feed
AutoCAD MEP 2014 / Revit MEP 2014 / EastCoast CAD/CAM addon / Visual Studio 2013

MP

  • Seagull
  • Posts: 17750
  • Have thousands of dwgs to process? Contact me.
Re: Extracting Excel Data
« Reply #10 on: April 05, 2018, 12:57:34 PM »
I rec'd a "verified fraud page / threat source" warning from that first link.
Engineering Technologist • CAD Automation Practitioner
Automation ▸ Design ▸ Drafting ▸ Document Control ▸ Client
cadanalyst@gmail.comhttp://cadanalyst.slack.comhttp://linkedin.com/in/cadanalyst

Keith Brown

  • Swamp Rat
  • Posts: 601
Re: Extracting Excel Data
« Reply #11 on: April 05, 2018, 01:00:26 PM »
I rec'd a "verified fraud page / threat source" warning from that first link.


Weird, what software is throwing that warning?  Here is the github page.  https://github.com/joshclose/csvhelper
Keith Brown | AutoCAD MEP Blog | RSS Feed
AutoCAD MEP 2014 / Revit MEP 2014 / EastCoast CAD/CAM addon / Visual Studio 2013

MP

  • Seagull
  • Posts: 17750
  • Have thousands of dwgs to process? Contact me.
Re: Extracting Excel Data
« Reply #12 on: April 05, 2018, 01:04:06 PM »
Trend micro. As I'm at work you'll forgive me if I don't hit any more urls to invite the IT Police to my desk.

(hope it's a false alarm)
Engineering Technologist • CAD Automation Practitioner
Automation ▸ Design ▸ Drafting ▸ Document Control ▸ Client
cadanalyst@gmail.comhttp://cadanalyst.slack.comhttp://linkedin.com/in/cadanalyst

Mark

  • Custom Title
  • Seagull
  • Posts: 28753
Re: Extracting Excel Data
« Reply #13 on: April 05, 2018, 02:31:56 PM »
Trend micro. As I'm at work you'll forgive me if I don't hit any more urls to invite the IT Police to my desk.

(hope it's a false alarm)
I visited both and no IT cops have showed up yet. <fingers crossed> LOL
TheSwamp.org  (serving the CAD community since 2003)

CAB

  • Global Moderator
  • Seagull
  • Posts: 10401
Re: Extracting Excel Data
« Reply #14 on: April 05, 2018, 02:32:15 PM »
It ran through this check OK
https://www.virustotal.com/#/home/url
I've reached the age where the happy hour is a nap. (°¿°)
Windows 10 core i7 4790k 4Ghz 32GB GTX 970
Please support this web site.