TheSwamp

Code Red => .NET => Topic started by: WILL HATCH on March 28, 2018, 07:28:19 PM

Title: Extracting Excel Data
Post by: WILL HATCH on March 28, 2018, 07:28:19 PM
For years I've used Excel interop to dump spreadsheets into object arrays, but recently was forced to get familiar with OLEDB providers connecting with Access databases and realized there's a better way. This smashes old methods with respect to read performance!

Code - C#: [Select]
  1.         public static DataSet ExtractData(string path)
  2.         {
  3.             try
  4.             {
  5.                 DataSet Data = new DataSet(Path.GetFileNameWithoutExtension(path));
  6.                 string ConnectionString = string.Format(
  7.                     "Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties=\"Excel 12.0 Xml;HDR=No;IMEX=1\";",
  8.                     path);
  9.                 using (var Connection = new OleDbConnection(ConnectionString))
  10.                 {
  11.                     Connection.Open();
  12.                     var Command = new OleDbCommand("", Connection);
  13.                     // Get all Sheets in Excel File
  14.                     DataTable Sheets = Connection.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);
  15.  
  16.                     // Loop through all Sheets to get data
  17.                     foreach (DataRow Row in Sheets.Rows)
  18.                     {
  19.                         string SheetName = Row["TABLE_NAME"].ToString();
  20.  
  21.                         if (!SheetName.EndsWith("$"))
  22.                             continue;
  23.  
  24.                         // Get all rows from the Sheet
  25.                         Command.CommandText = "SELECT * FROM [" + SheetName + "]";
  26.  
  27.                         DataTable Sheet = new DataTable();
  28.                         Sheet.TableName = SheetName;
  29.  
  30.                         OleDbDataAdapter adapter = new OleDbDataAdapter(Command);
  31.                         adapter.Fill(Sheet);
  32.  
  33.                         Data.Tables.Add(Sheet);
  34.                     }
  35.                     Connection.Close();
  36.                     return Data;
  37.                 }
  38.             }
  39.             catch (System.Exception e)
  40.             {
  41.                 throw e.InnerException;
  42.             }
  43.         }
Title: Re: Extracting Excel Data
Post by: n.yuan on March 29, 2018, 01:40:23 PM
Considering the reality that people so often use Excel sheet to store small amount of data, it makes sense to access data in Excel sheet without running Excel app in parallel with AutoCAD app. This was I always did in the past, until MS stopped ship Jet Engine with Windows OS (when MS Access 2007 came out, which stopped using Jet Engine, and changed to use MS Access DB Engine).

The biggest issue with using MS Access DB Engine, is that it has to be installed (with admin privilege), and also has to be 32 or 64=bit explicitly.  Even the computer has MC Access itself installed, it is likely 32-bit version. If AutoCAD is 64-bit, 64-bit Access DB engine has to be installed. If the user happened to stuck with older MS Office/Access (2007/2010), it may not be possible to have 32-bit MS Access and 64-bit Access DB Engine in the same computer. Since it is now year 2018, this probably not a big deal any more, but extra installation with admin privilege alone makes things a bit troublesome.

I'd consider using OpenXml to access data stored in *.xls/*.xlsx would be much better approach, if there is no need to actually open the sheet file in Excel app in the workflow. This way, you only have one extra DLL go with your Add-in DLLs. There are also a lot third party tool or toolkit based on OpenXml, many of them are free and very easy to use (but I always use OpenXml directly, it is easy enough).
Title: Re: Extracting Excel Data
Post by: dgorsman on March 29, 2018, 02:28:25 PM
Excel does tend to fall short as data storage.  Aside from access speed, there's the very real possibility of users hiding/deleting rows, columns, and other important parts.  The latter can also be a problem with Access databases too.  Yes, they're easier for humans to work with but there needs to be consideration of how often manual changes are necessary and how much data needs to be stored.  If it's "once in a blue moon" or relatively small amounts of data, a more data-centric and generically supported format like XML is the better choice.
Title: Re: Extracting Excel Data
Post by: JohnK on March 29, 2018, 03:43:52 PM
Coincidence! I was just dipping my toes back into the C# pool to read some Excel data so this post will come in handy. Thanks!
Title: Re: Extracting Excel Data
Post by: JohnK on March 30, 2018, 03:06:42 PM
Holly bananas that OLEDB method is fast.

Note: I had to install https://www.microsoft.com/en-us/download/details.aspx?id=13255 - the x64 version did not work - for this method to work for me.
Title: Re: Extracting Excel Data
Post by: MickD on March 30, 2018, 05:49:50 PM
I agree with going the OpenXML route if you _have_ to stick with Excel otherwise I'd use SQLite. It doesn't have a UI but there are free viewer/editors and for a simple flat db a .Net dataset GUI form would be a snap for basic editing/viewing.
It can store millions of records and can be stored as a simple file with a project or on a server.
Title: Re: Extracting Excel Data
Post by: JohnK on March 31, 2018, 08:51:07 AM
On that note, if the data is just tabular you'd be hard-pressed to beat CSV.
Title: Re: Extracting Excel Data
Post by: CADbloke on April 02, 2018, 06:39:38 AM
+1 for SQLite, I use it in tvCAD. LinqPad & DB Browser are 2 ways to query & edit the data. https://github.com/msallin/SQLiteCodeFirst is an excellent Entity Framework bridge. SQLite  is much faster than querying Excel (yes, even with Entity Framework). Also, I try not to encourage the use of spreadsheets as a database, they're not. If you must, EPPlus is another good option.

The main problems I see with CSV are when it is edited in Excel and the formatting is butchered. Apart from that, yes, it is a great format for simple tabular data. In .NET, https://github.com/kentcb/KBCsv is a good option to keep it simple.
Title: Re: Extracting Excel Data
Post by: JohnK on April 02, 2018, 09:06:38 AM
.csv is one area that MS Windows never caught up to the *nix world on. Huge amounts of tabular data has always been able to be parsed in ms with built in tools like awk and what not. Mark wrote a tutorial on the topic.

http://www.theswamp.org/index.php?topic=8476.msg108378#msg108378

Also, you'd typically edit .csv files in your editor not excel; the mind-set being that you can easily parse plain text data through several "filters" (programs) to get amazing results quickly. -i.e. take a bunch of plain text->parse->parse->gnuplot->beautiful output.

But, .csv is meant for tabular type data (static data; stuff that's not going to be changing much. Like, weather data and whatnot.) If you have more dynamic data, then a database is better. IMO It would be dumb to keep a database for old weather data (extra overheard for little to no benefit).
Title: Re: Extracting Excel Data
Post by: Keith Brown on April 05, 2018, 11:43:52 AM
When and if i have to work with a csv file I use http://joshclose.github.io/CsvHelper/ (http://joshclose.github.io/CsvHelper/)


For working with excel files I use https://www.gemboxsoftware.com/spreadsheet (https://www.gemboxsoftware.com/spreadsheet).  The free version is fully functional but limited to 150 rows per sheet and 5 sheets.

Both are extremely fast in my opinion and easy to use.
Title: Re: Extracting Excel Data
Post by: MP on April 05, 2018, 12:57:34 PM
I rec'd a "verified fraud page / threat source" warning from that first link.
Title: Re: Extracting Excel Data
Post by: Keith Brown on April 05, 2018, 01:00:26 PM
I rec'd a "verified fraud page / threat source" warning from that first link.


Weird, what software is throwing that warning?  Here is the github page.  https://github.com/joshclose/csvhelper (https://github.com/joshclose/csvhelper)
Title: Re: Extracting Excel Data
Post by: MP on April 05, 2018, 01:04:06 PM
Trend micro. As I'm at work you'll forgive me if I don't hit any more urls to invite the IT Police to my desk.

(hope it's a false alarm)
Title: Re: Extracting Excel Data
Post by: Mark on April 05, 2018, 02:31:56 PM
Trend micro. As I'm at work you'll forgive me if I don't hit any more urls to invite the IT Police to my desk.

(hope it's a false alarm)
I visited both and no IT cops have showed up yet. <fingers crossed> LOL
Title: Re: Extracting Excel Data
Post by: CAB on April 05, 2018, 02:32:15 PM
It ran through this check OK
https://www.virustotal.com/#/home/url (https://www.virustotal.com/#/home/url)
Title: Re: Extracting Excel Data
Post by: MP on April 05, 2018, 02:33:28 PM
Good to hear -- thanks for checking guys -- rather err on the side of caution -- cheers.
Title: Re: Extracting Excel Data
Post by: WILL HATCH on April 05, 2018, 10:12:16 PM
Holly bananas that OLEDB method is fast.

Note: I had to install https://www.microsoft.com/en-us/download/details.aspx?id=13255 - the x64 version did not work - for this method to work for me.

Yeah, it made me squeal a bit when I ran it the first time!

For all the other inputs, thanks and I agree that direct xml would be better. I'm working in a very MS oriented engineering shop and the engineers here love working in Excel. No need for additional installation, it just worked.
Title: Re: Extracting Excel Data
Post by: Keith Brown on April 09, 2018, 02:31:19 PM
For all the other inputs, thanks and I agree that direct xml would be better. I'm working in a very MS oriented engineering shop and the engineers here love working in Excel. No need for additional installation, it just worked.


The gembox solution is an API that you use that does not require excel be installed on the machine so there is no secondary installation needed for your engineers.  It is very fast and easy to learn.
Title: Re: Extracting Excel Data
Post by: WILL HATCH on April 09, 2018, 03:06:43 PM
Thanks Keith, it looks pretty awesome. The licensing is a bit awkward as I pay the same to read a couple large files in one app as I do to bundle it in my app and distribute freely... but I've also  spent more time on this issue in the past than I could buy back for that price  :uglystupid2:
Title: Re: Extracting Excel Data
Post by: WILL HATCH on April 12, 2018, 06:22:26 PM
So I just received some code today from a colleague that references a library I've never heard of before. I love that it's a nuget package!
https://www.nuget.org/packages/FastExcel/ (https://www.nuget.org/packages/FastExcel/)