Author Topic: Extracting Excel Data  (Read 1221 times)

0 Members and 1 Guest are viewing this topic.

WILL HATCH

  • Bull Frog
  • Posts: 448
Extracting Excel Data
« on: March 28, 2018, 07:28:19 PM »
For years I've used Excel interop to dump spreadsheets into object arrays, but recently was forced to get familiar with OLEDB providers connecting with Access databases and realized there's a better way. This smashes old methods with respect to read performance!

Code - C#: [Select]
  1.        public static DataSet ExtractData(string path)
  2.        {
  3.            try
  4.            {
  5.                DataSet Data = new DataSet(Path.GetFileNameWithoutExtension(path));
  6.                string ConnectionString = string.Format(
  7.                    "Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties=\"Excel 12.0 Xml;HDR=No;IMEX=1\";",
  8.                    path);
  9.                using (var Connection = new OleDbConnection(ConnectionString))
  10.                {
  11.                    Connection.Open();
  12.                    var Command = new OleDbCommand("", Connection);
  13.                    // Get all Sheets in Excel File
  14.                    DataTable Sheets = Connection.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);
  15.  
  16.                    // Loop through all Sheets to get data
  17.                    foreach (DataRow Row in Sheets.Rows)
  18.                    {
  19.                        string SheetName = Row["TABLE_NAME"].ToString();
  20.  
  21.                        if (!SheetName.EndsWith("$"))
  22.                            continue;
  23.  
  24.                        // Get all rows from the Sheet
  25.                        Command.CommandText = "SELECT * FROM [" + SheetName + "]";
  26.  
  27.                        DataTable Sheet = new DataTable();
  28.                        Sheet.TableName = SheetName;
  29.  
  30.                        OleDbDataAdapter adapter = new OleDbDataAdapter(Command);
  31.                        adapter.Fill(Sheet);
  32.  
  33.                        Data.Tables.Add(Sheet);
  34.                    }
  35.                    Connection.Close();
  36.                    return Data;
  37.                }
  38.            }
  39.            catch (System.Exception e)
  40.            {
  41.                throw e.InnerException;
  42.            }
  43.        }

n.yuan

  • Bull Frog
  • Posts: 262
Re: Extracting Excel Data
« Reply #1 on: March 29, 2018, 01:40:23 PM »
Considering the reality that people so often use Excel sheet to store small amount of data, it makes sense to access data in Excel sheet without running Excel app in parallel with AutoCAD app. This was I always did in the past, until MS stopped ship Jet Engine with Windows OS (when MS Access 2007 came out, which stopped using Jet Engine, and changed to use MS Access DB Engine).

The biggest issue with using MS Access DB Engine, is that it has to be installed (with admin privilege), and also has to be 32 or 64=bit explicitly.  Even the computer has MC Access itself installed, it is likely 32-bit version. If AutoCAD is 64-bit, 64-bit Access DB engine has to be installed. If the user happened to stuck with older MS Office/Access (2007/2010), it may not be possible to have 32-bit MS Access and 64-bit Access DB Engine in the same computer. Since it is now year 2018, this probably not a big deal any more, but extra installation with admin privilege alone makes things a bit troublesome.

I'd consider using OpenXml to access data stored in *.xls/*.xlsx would be much better approach, if there is no need to actually open the sheet file in Excel app in the workflow. This way, you only have one extra DLL go with your Add-in DLLs. There are also a lot third party tool or toolkit based on OpenXml, many of them are free and very easy to use (but I always use OpenXml directly, it is easy enough).

dgorsman

  • Water Moccasin
  • Posts: 2337
Re: Extracting Excel Data
« Reply #2 on: March 29, 2018, 02:28:25 PM »
Excel does tend to fall short as data storage.  Aside from access speed, there's the very real possibility of users hiding/deleting rows, columns, and other important parts.  The latter can also be a problem with Access databases too.  Yes, they're easier for humans to work with but there needs to be consideration of how often manual changes are necessary and how much data needs to be stored.  If it's "once in a blue moon" or relatively small amounts of data, a more data-centric and generically supported format like XML is the better choice.
If you are going to fly by the seat of your pants, expect friction burns.

try {GreatPower;}
   catch (notResponsible)
      {NextTime(PlanAhead);}
   finally
      {MasterBasics;}

John Kaul (Se7en)

  • Administrator
  • Needs a day job
  • Posts: 9117
Re: Extracting Excel Data
« Reply #3 on: March 29, 2018, 03:43:52 PM »
Coincidence! I was just dipping my toes back into the C# pool to read some Excel data so this post will come in handy. Thanks!
“Common sense is not so common.” ~Voltaire

--> Donate to TheSwamp.org <--

John Kaul (Se7en)

  • Administrator
  • Needs a day job
  • Posts: 9117
Re: Extracting Excel Data
« Reply #4 on: March 30, 2018, 03:06:42 PM »
Holly bananas that OLEDB method is fast.

Note: I had to install https://www.microsoft.com/en-us/download/details.aspx?id=13255 - the x64 version did not work - for this method to work for me.
“Common sense is not so common.” ~Voltaire

--> Donate to TheSwamp.org <--

MickD

  • Gator
  • Posts: 3145
  • I don't need a job, I need Money!!
Re: Extracting Excel Data
« Reply #5 on: March 30, 2018, 05:49:50 PM »
I agree with going the OpenXML route if you _have_ to stick with Excel otherwise I'd use SQLite. It doesn't have a UI but there are free viewer/editors and for a simple flat db a .Net dataset GUI form would be a snap for basic editing/viewing.
It can store millions of records and can be stored as a simple file with a project or on a server.
"A language that doesn’t have everything is actually easier to program in than some that do."

        — Dennis M. Ritchie

John Kaul (Se7en)

  • Administrator
  • Needs a day job
  • Posts: 9117
Re: Extracting Excel Data
« Reply #6 on: March 31, 2018, 08:51:07 AM »
On that note, if the data is just tabular you'd be hard-pressed to beat CSV.
“Common sense is not so common.” ~Voltaire

--> Donate to TheSwamp.org <--

CADbloke

  • Bull Frog
  • Posts: 298
  • Crash Test Dummy
Re: Extracting Excel Data
« Reply #7 on: April 02, 2018, 06:39:38 AM »
+1 for SQLite, I use it in tvCAD. LinqPad & DB Browser are 2 ways to query & edit the data. https://github.com/msallin/SQLiteCodeFirst is an excellent Entity Framework bridge. SQLite  is much faster than querying Excel (yes, even with Entity Framework). Also, I try not to encourage the use of spreadsheets as a database, they're not. If you must, EPPlus is another good option.

The main problems I see with CSV are when it is edited in Excel and the formatting is butchered. Apart from that, yes, it is a great format for simple tabular data. In .NET, https://github.com/kentcb/KBCsv is a good option to keep it simple.

John Kaul (Se7en)

  • Administrator
  • Needs a day job
  • Posts: 9117
Re: Extracting Excel Data
« Reply #8 on: April 02, 2018, 09:06:38 AM »
.csv is one area that MS Windows never caught up to the *nix world on. Huge amounts of tabular data has always been able to be parsed in ms with built in tools like awk and what not. Mark wrote a tutorial on the topic.

http://www.theswamp.org/index.php?topic=8476.msg108378#msg108378

Also, you'd typically edit .csv files in your editor not excel; the mind-set being that you can easily parse plain text data through several "filters" (programs) to get amazing results quickly. -i.e. take a bunch of plain text->parse->parse->gnuplot->beautiful output.

But, .csv is meant for tabular type data (static data; stuff that's not going to be changing much. Like, weather data and whatnot.) If you have more dynamic data, then a database is better. IMO It would be dumb to keep a database for old weather data (extra overheard for little to no benefit).
“Common sense is not so common.” ~Voltaire

--> Donate to TheSwamp.org <--

Keith Brown

  • Swamp Rat
  • Posts: 593
Re: Extracting Excel Data
« Reply #9 on: April 05, 2018, 11:43:52 AM »
When and if i have to work with a csv file I use http://joshclose.github.io/CsvHelper/


For working with excel files I use https://www.gemboxsoftware.com/spreadsheet.  The free version is fully functional but limited to 150 rows per sheet and 5 sheets.

Both are extremely fast in my opinion and easy to use.
Keith Brown | AutoCAD MEP Blog | RSS Feed
AutoCAD MEP 2014 / Revit MEP 2014 / EastCoast CAD/CAM addon / Visual Studio 2013

MP

  • Seagull
  • Posts: 17214
Re: Extracting Excel Data
« Reply #10 on: April 05, 2018, 12:57:34 PM »
I rec'd a "verified fraud page / threat source" warning from that first link.
\|// Set goal. Experiment tirelessly until
|Oo| practice has become expertise.  Loop.
|- | LinkedIn | Dropbox | About

Keith Brown

  • Swamp Rat
  • Posts: 593
Re: Extracting Excel Data
« Reply #11 on: April 05, 2018, 01:00:26 PM »
I rec'd a "verified fraud page / threat source" warning from that first link.


Weird, what software is throwing that warning?  Here is the github page.  https://github.com/joshclose/csvhelper
Keith Brown | AutoCAD MEP Blog | RSS Feed
AutoCAD MEP 2014 / Revit MEP 2014 / EastCoast CAD/CAM addon / Visual Studio 2013

MP

  • Seagull
  • Posts: 17214
Re: Extracting Excel Data
« Reply #12 on: April 05, 2018, 01:04:06 PM »
Trend micro. As I'm at work you'll forgive me if I don't hit any more urls to invite the IT Police to my desk.

(hope it's a false alarm)
\|// Set goal. Experiment tirelessly until
|Oo| practice has become expertise.  Loop.
|- | LinkedIn | Dropbox | About

Mark

  • Custom Title
  • Administrator
  • Seagull
  • Posts: 28561
Re: Extracting Excel Data
« Reply #13 on: April 05, 2018, 02:31:56 PM »
Trend micro. As I'm at work you'll forgive me if I don't hit any more urls to invite the IT Police to my desk.

(hope it's a false alarm)
I visited both and no IT cops have showed up yet. <fingers crossed> LOL
TheSwamp.org  (serving the CAD community since 2003)

CAB

  • Global Moderator
  • Seagull
  • Posts: 10312
Re: Extracting Excel Data
« Reply #14 on: April 05, 2018, 02:32:15 PM »
It ran through this check OK
https://www.virustotal.com/#/home/url
I've reached the age where the happy hour is a nap. (°¿°)
Windows 10 core i7 4790k 4Ghz 32GB GTX 970
Please support this web site.

MP

  • Seagull
  • Posts: 17214
Re: Extracting Excel Data
« Reply #15 on: April 05, 2018, 02:33:28 PM »
Good to hear -- thanks for checking guys -- rather err on the side of caution -- cheers.
\|// Set goal. Experiment tirelessly until
|Oo| practice has become expertise.  Loop.
|- | LinkedIn | Dropbox | About

WILL HATCH

  • Bull Frog
  • Posts: 448
Re: Extracting Excel Data
« Reply #16 on: April 05, 2018, 10:12:16 PM »
Holly bananas that OLEDB method is fast.

Note: I had to install https://www.microsoft.com/en-us/download/details.aspx?id=13255 - the x64 version did not work - for this method to work for me.

Yeah, it made me squeal a bit when I ran it the first time!

For all the other inputs, thanks and I agree that direct xml would be better. I'm working in a very MS oriented engineering shop and the engineers here love working in Excel. No need for additional installation, it just worked.

Keith Brown

  • Swamp Rat
  • Posts: 593
Re: Extracting Excel Data
« Reply #17 on: April 09, 2018, 02:31:19 PM »
For all the other inputs, thanks and I agree that direct xml would be better. I'm working in a very MS oriented engineering shop and the engineers here love working in Excel. No need for additional installation, it just worked.


The gembox solution is an API that you use that does not require excel be installed on the machine so there is no secondary installation needed for your engineers.  It is very fast and easy to learn.
Keith Brown | AutoCAD MEP Blog | RSS Feed
AutoCAD MEP 2014 / Revit MEP 2014 / EastCoast CAD/CAM addon / Visual Studio 2013

WILL HATCH

  • Bull Frog
  • Posts: 448
Re: Extracting Excel Data
« Reply #18 on: April 09, 2018, 03:06:43 PM »
Thanks Keith, it looks pretty awesome. The licensing is a bit awkward as I pay the same to read a couple large files in one app as I do to bundle it in my app and distribute freely... but I've also  spent more time on this issue in the past than I could buy back for that price  :uglystupid2:

WILL HATCH

  • Bull Frog
  • Posts: 448
Re: Extracting Excel Data
« Reply #19 on: April 12, 2018, 06:22:26 PM »
So I just received some code today from a colleague that references a library I've never heard of before. I love that it's a nuget package!
https://www.nuget.org/packages/FastExcel/