Author Topic: Best way to communicate with Excel  (Read 14820 times)

0 Members and 1 Guest are viewing this topic.

kaefer

  • Guest
Best way to communicate with Excel
« on: February 16, 2012, 05:02:11 PM »
Moving the discussion from .NET MISCELLANEOUS/GENERAL Routines, following the release of gile's ExcelStrean.

I've build ExcelStream.dll and tried to put a number on my suspicion (big chunk more fast):

Code: [Select]
time write1000Lines2DArray // 00:00:00.7738569
time write1000LinesStream  // 00:00:02.3321666

Is the stream model particularly suitable for this interop scenario? Ease of handling and automatic disposal are on its plus side. The potentially large number of interactions impair the performance. Alternatives could be a buffered stream or a full-blown database layer. Does anybody have experience with the latter in transferring data between CAD and spreadsheet?

Here's the test code (F#):
Code: [Select]
open ExcelLateBinding

let time f =
    let sw = System.Diagnostics.Stopwatch.StartNew()
    try f() finally sw.Stop()
    printfn "%A" sw.Elapsed

let write1000LinesStream() =
    use xlw = new ExcelWriter(@"C:\TEMP\1000")
    for i in 'A' .. 'F' do
        xlw.Write(string i)
    xlw.NewLine()
    for j in 1 .. 1000 do
        xlw.WriteLine[| box j |]
    xlw.Save()
   
let write1000Lines2DArray() =
    use xlw = new ExcelWriter(@"C:\TEMP\1001")
    let a : obj[,] = Array2D.create 1001 6 null
    for (i, c) in {'A' .. 'F'} |> Seq.mapi(fun i v -> i, v) do
        a.[0, i] <- box(string c)
    for j in 1 .. 1000 do
        a.[j, 0] <- box j
    let r =
        xlw.Worksheet.Get(
            "Range",
            xlw.Worksheet.Get("Cells", 1, 1),
            xlw.Worksheet.Get("Cells", a.GetLength 0, a.GetLength 1))
    r.Set("Value2", a)
    xlw.Save()

gile

  • Gator
  • Posts: 2507
  • Marseille, France
Re: Best way to communicate with Excel
« Reply #1 on: February 17, 2012, 02:39:01 AM »
Hi,

Adding a WriteRange(object[,] values) to ExcelWriter:
Code - C#: [Select]
  1.         public void WriteRange(object[,] values)
  2.         {
  3.             if (StreamEnded)
  4.                 throw new EndOfStreamException();
  5.             int rows = values.GetLength(0) - 1;
  6.             int cols = values.GetLength(1) - 1;
  7.             if (_rowIndex + rows > _rowCount)
  8.                 throw new IndexOutOfRangeException();
  9.             if (_columnIndex + cols > _columnCount)
  10.                 throw new IndexOutOfRangeException();
  11.             object range = Worksheet.Get("Range",
  12.                 Worksheet.Get("Cells", _rowIndex, _columnIndex),
  13.                 Worksheet.Get("Cells", _rowIndex + rows, _columnIndex + cols));
  14.             range.Set("Value2", values);
  15.             _rowIndex += rows + 1;
  16.             _columnIndex = 1;
  17.         }

I made some tests using the PasteFromClipboard() method too as recomanded by Alexander Rivilis.

Code - C#: [Select]
  1.         [CommandMethod("test1")]
  2.         public void Test1()
  3.         {
  4.             Stopwatch sw = Stopwatch.StartNew();
  5.             using (ExcelWriter xlw = new ExcelWriter(@"C:\Temp\Test1.xls"))
  6.             {
  7.                 object[] header = { "A", "B", "C", "D", "E", "F" };
  8.                 xlw.WriteLine(header);
  9.                 for (int i = 1; i < 1001; i++)
  10.                 {
  11.                     object[] arr = { i, i, i, i, i, i };
  12.                     xlw.WriteLine(arr);
  13.                 }
  14.                 xlw.Save();
  15.             }
  16.             sw.Stop();
  17.             AcAp.DocumentManager.MdiActiveDocument.Editor.WriteMessage(
  18.                 "\nWriteLine: {0}", sw.Elapsed.TotalMilliseconds);
  19.         }
  20.  
  21.         [CommandMethod("test2")]
  22.         public void Test2()
  23.         {
  24.             Stopwatch sw = Stopwatch.StartNew();
  25.             using (ExcelWriter xlw = new ExcelWriter(@"C:\Temp\Test2.xls"))
  26.             {
  27.                 object[,] arr2 = new object[1001, 6];
  28.                 object[] header = { "A", "B", "C", "D", "E", "F" };
  29.                 for (int i = 0; i < 6; i++)
  30.                 {
  31.                     arr2[0, i] = header[i];
  32.                 }
  33.                 for (int i = 1; i < 1001; i++)
  34.                 {
  35.                     for (int j = 0; j < 6; j++)
  36.                     {
  37.                         arr2[i, j] = i;
  38.                     }
  39.                 }
  40.                 xlw.WriteRange(arr2);
  41.                 xlw.Save();
  42.             }
  43.             sw.Stop();
  44.             AcAp.DocumentManager.MdiActiveDocument.Editor.WriteMessage(
  45.                 "\nWriteRange: {0}", sw.Elapsed.TotalMilliseconds);
  46.         }
  47.  
  48.         [CommandMethod("test3")]
  49.         public void Test3()
  50.         {
  51.             Stopwatch sw = Stopwatch.StartNew();
  52.             using (ExcelWriter xlw = new ExcelWriter(@"C:\Temp\Test3.xls"))
  53.             {
  54.                 StringBuilder sb = new StringBuilder("A\tB\tC\tD\tE\tF\n");
  55.                 for (int i = 1; i < 1001; i++)
  56.                 {
  57.                     for (int j = 0; j < 5; j++)
  58.                     {
  59.                         sb.Append(i);
  60.                         sb.Append("\t");
  61.                     }
  62.                     sb.Append(i);
  63.                     sb.Append("\n");
  64.                 }
  65.                 System.Windows.Forms.Clipboard.SetText(sb.ToString());
  66.                 xlw.PasteFromClipboard();
  67.                 xlw.Save();
  68.             }
  69.             sw.Stop();
  70.             AcAp.DocumentManager.MdiActiveDocument.Editor.WriteMessage(
  71.                 "\nPasteFromClipboard: {0}", sw.Elapsed.TotalMilliseconds);
  72.         }

Here're the results I got:

Commande: test1
WriteLine: 9741.4183

Commande: test2
WriteRange: 376.182

Commande: test3
PasteFromClipboard: 379.2482
Speaking English as a French Frog

vegbruiser

  • Guest
Re: Best way to communicate with Excel
« Reply #2 on: February 17, 2012, 06:30:45 AM »
Having looked at the ExcelReader Class would a ReadRange() method be possible (is it even worth it?) and do you think it would give (seemingly) the same efficiencies as WriteRange()?

kaefer

  • Guest
Re: Best way to communicate with Excel
« Reply #3 on: February 17, 2012, 06:56:33 AM »
Having looked at the ExcelReader Class would a ReadRange() method be possible (is it even worth it?) and do you think it would give (seemingly) the same efficiencies as WriteRange()?

I'd say it would, with UsedRange as the property on the Excel side to operate on. Beware of corner cases: An empty sheet returns null, a sheet with a single cell returns that cells' value. Everything else should come over as a two-dimensional array of objects.

fixo

  • Guest
Re: Best way to communicate with Excel
« Reply #4 on: February 17, 2012, 07:17:20 AM »
Isn't it a time to digg into OpenXML API? | :) |

MexicanCustard

  • Swamp Rat
  • Posts: 705
Re: Best way to communicate with Excel
« Reply #5 on: February 17, 2012, 08:04:42 AM »
Isn't it a time to digg into OpenXML API? | :) |

Or better yet ClosedXML http://closedxml.codeplex.com/
Revit 2019, AMEP 2019 64bit Win 10

vegbruiser

  • Guest
Re: Best way to communicate with Excel
« Reply #6 on: February 17, 2012, 09:38:00 AM »
I'm no expert but shouldn't this:

Code: [Select]
object range = Worksheet.Get("Range",
                Worksheet.Get("Cells", _rowIndex, _columnIndex),
                Worksheet.Get("Cells", _rowIndex + rows, _columnIndex + cols));

be like this:

Code: [Select]
dynamic range = Worksheet.Get("Range",
                Worksheet.Get("Cells", _rowIndex, _columnIndex),
                Worksheet.Get("Cells", _rowIndex + rows, _columnIndex + cols));

If I use object instead of dynamic I get an error thus:

Quote
'object' does not contain a definition for 'Set' and no extension method 'Set' accepting a first argument of type 'object' could be found (are you missing a using directive or an assembly reference?)

But using dynamic crashes AutoCAD?

 :?

gile

  • Gator
  • Posts: 2507
  • Marseille, France
Re: Best way to communicate with Excel
« Reply #7 on: February 17, 2012, 03:24:39 PM »
I'm no expert but shouldn't this:

Code: [Select]
object range = Worksheet.Get("Range",
                Worksheet.Get("Cells", _rowIndex, _columnIndex),
                Worksheet.Get("Cells", _rowIndex + rows, _columnIndex + cols));

be like this:

Code: [Select]
dynamic range = Worksheet.Get("Range",
                Worksheet.Get("Cells", _rowIndex, _columnIndex),
                Worksheet.Get("Cells", _rowIndex + rows, _columnIndex + cols));

If I use object instead of dynamic I get an error thus:

Quote
'object' does not contain a definition for 'Set' and no extension method 'Set' accepting a first argument of type 'object' could be found (are you missing a using directive or an assembly reference?)

But using dynamic crashes AutoCAD?

 :?

The WriteRange() method have to be added to the ExcelWriter class (in ExcelStream_LateBinding) which uses the LateBinding class where the Set() extension method is defined.

If you use ExcelStream_Dynamic which uses the dynamic type instead of late binding, you have to write:
Code: [Select]
            dynamic range = Worksheet.Range[
                Worksheet.Cells[_rowIndex, _columnIndex],
                Worksheet.Cells[_rowIndex + rows, _columnIndex + cols]];
            range.Value2 = values;

During this week-end I'll try to update the codes.
Speaking English as a French Frog

gile

  • Gator
  • Posts: 2507
  • Marseille, France
Re: Best way to communicate with Excel
« Reply #8 on: February 18, 2012, 08:23:14 AM »
Hi,

I updated the downloads for the ExcelStreamLateBinding and ExcelStreamDynamic libraries.
I added some overloaded ExcelReader.ReadRange() and ExcelWriter.WriteRange() methods and modify the  ExcelReader.ReadToEnd() method which require or return a two dimensional object array using the way showned by kaefer (thanks again).

During the tests I did, I noticed that the late binding route is a little faster than the dynamic one.

Fixo, MexicanCustard,
I know the XML or ADO route allows to read/write Excel much more faster (because this way doesn't need to start an Excel process), but my goal when I wrote the ExcelStream little library was to provides methods to make excel read/write easier and avoid version or platform dependencies.
« Last Edit: February 18, 2012, 09:14:27 AM by gile »
Speaking English as a French Frog

vegbruiser

  • Guest
Re: Best way to communicate with Excel
« Reply #9 on: February 18, 2012, 05:20:07 PM »
I'm no expert but shouldn't this:

Code: [Select]
object range = Worksheet.Get("Range",
                Worksheet.Get("Cells", _rowIndex, _columnIndex),
                Worksheet.Get("Cells", _rowIndex + rows, _columnIndex + cols));

be like this:

Code: [Select]
dynamic range = Worksheet.Get("Range",
                Worksheet.Get("Cells", _rowIndex, _columnIndex),
                Worksheet.Get("Cells", _rowIndex + rows, _columnIndex + cols));

If I use object instead of dynamic I get an error thus:

Quote
'object' does not contain a definition for 'Set' and no extension method 'Set' accepting a first argument of type 'object' could be found (are you missing a using directive or an assembly reference?)

But using dynamic crashes AutoCAD?

 :?

The WriteRange() method have to be added to the ExcelWriter class (in ExcelStream_LateBinding) which uses the LateBinding class where the Set() extension method is defined.

If you use ExcelStream_Dynamic which uses the dynamic type instead of late binding, you have to write:
Code: [Select]
            dynamic range = Worksheet.Range[
                Worksheet.Cells[_rowIndex, _columnIndex],
                Worksheet.Cells[_rowIndex + rows, _columnIndex + cols]];
            range.Value2 = values;

During this week-end I'll try to update the codes.
That explains why it didn't work for me then - I only added it to the Dynamic version, not the late-binding one.

I will download the updated source files on Monday. :)

Thanks,

Alex.

kaefer

  • Guest
Re: Best way to communicate with Excel
« Reply #10 on: February 19, 2012, 06:50:03 AM »
I added some overloaded ExcelReader.ReadRange() and ExcelWriter.WriteRange() methods and modify the  ExcelReader.ReadToEnd() method

You may be on to something here. The built-in functionality for data exchange between AutoCAD and Excel is lacking in flexibility. I imagine plenty of purpose-built interop tools being around; they would benefit from an easy-to-use programming model.

Take your sample program: Transfer the attributes of the block references in a selection set. There's only a little to add, and one of my tools dating from Visual Lisp days could be sent into retirement:
  • Conversion from System.Data.DataTable, such that DataColumn.DataType determines the NumberFormat of their cells,
  • Conversion to System.Data.DataTable,
  • User interaction issues arising from Workbook.Close/Application.Quit.

To preserve the format, I'd like to suggest a WriteTable instance method for ExcelWriter (not sure if the stream position is correctly determined).
Code - C#: [Select]
  1. public void WriteTable(DataTable table, int startRow, int startCol)
  2. {
  3.     if (StreamEnded)
  4.         throw new EndOfStreamException();
  5.     int rows = table.Rows.Count;
  6.     int cols = table.Columns.Count - 1;
  7.     if (startRow + rows > _rowCount)
  8.         throw new EndOfStreamException();
  9.     if (startCol + cols > _columnCount)
  10.         throw new EndOfStreamException();
  11.     object range = Worksheet.Get("Range",
  12.         Worksheet.Get("Cells", startRow, startCol),
  13.         Worksheet.Get("Cells", startRow + rows, startCol + cols));
  14.     object rangeFormat = range.Get("Range",
  15.         range.Get("Cells", 1, 1),
  16.         range.Get("Cells", 1, cols + 1));
  17.     rangeFormat.Get("Font").Set("Bold", true);
  18.     rangeFormat.Set("NumberFormat", "@");
  19.     for(int i = 0; i < table.Columns.Count; i++)
  20.     {
  21.         Type dataType = table.Columns[i].DataType;
  22.         string numberFormat = null;
  23.         if (dataType == typeof(string))
  24.             numberFormat = "@";
  25.         else if(dataType == typeof(int) || dataType == typeof(double))
  26.             numberFormat = "0";
  27.         if (numberFormat == null) continue;
  28.         rangeFormat = range.Get("Range",
  29.             range.Get("Cells", 2, i + 1),
  30.             range.Get("Cells", rows + 1, i + 1));
  31.         rangeFormat.Set("NumberFormat", numberFormat);
  32.     }
  33.     range.Set("Value2", table.ToArray2D());
  34.     _rowIndex = startRow + rows + 1;
  35.     _columnIndex = 1;
  36. }

This needs an extension on DataTable:
Code - C#: [Select]
  1. public static class DatatableExtensions
  2. {
  3.     public static object[,] ToArray2D(this System.Data.DataTable table)
  4.     {
  5.         object[,] a = new object[table.Rows.Count + 1, table.Columns.Count];
  6.         for (int i = 0; i < table.Columns.Count; i++)
  7.         {
  8.             a[0, i] = (object)table.Columns[i].ColumnName;
  9.         }
  10.         for (int j = 0; j < table.Rows.Count; j++)
  11.         {
  12.             for (int i = 0; i < table.Columns.Count; i++)
  13.             {
  14.                 a[j + 1, i] = table.Rows[j][i];
  15.             }
  16.         }
  17.         return a;
  18.     }
  19. }

The other direction, conversion from the incoming object to a DataTable. Note that there's no need to rebase the array.
Code - C#: [Select]
  1. public static System.Data.DataTable OfArray2D(object o)
  2. {
  3.     System.Data.DataTable table = new System.Data.DataTable();
  4.     if (o != null)
  5.     {
  6.         object[,] a = o as object[,];
  7.         if (a == null)
  8.         {
  9.             a = new object[1, 1] { { o } };
  10.         }
  11.         int jmin = a.GetLowerBound(0), jmax = a.GetUpperBound(0);
  12.         int imin = a.GetLowerBound(1), imax = a.GetUpperBound(1);
  13.  
  14.         for (int i = imin; i <= imax; i++)
  15.         {
  16.             table.Columns.Add(a[jmin, i].ToString(), typeof(object));
  17.         }
  18.         for (int j = jmin + 1; j <= jmax; j++)
  19.         {
  20.             table.Rows.Add();
  21.             for (int i = imin; i <= imax; i++)
  22.             {
  23.                 table.Rows[table.Rows.Count - 1][i] = a[j, i];
  24.             }
  25.         }
  26.     }
  27.     return table;
  28. }

To use those methods, change the calls in the EATT/IATT sample to
xlw.WriteTable(table, 1, 1); and
System.Data.DataTable table = OfArray2D(xlr.ReadRange(1, 1));

Finally, with the original Dispose() implementation Excel always gets closed here. I haven't yet figured out why the process lingers on in the process table. Anyway, if it should stay open for user control then shouldn't we test for it?

Code - C#: [Select]
  1. public void Dispose()
  2. {
  3.     if (ExcelApp != null)
  4.     {
  5.         if(!Convert.ToBoolean(ExcelApp.Get("UserControl")))
  6.         {
  7.             Close();
  8.             ExcelApp.Set("DisplayAlerts", true);
  9.             ExcelApp.Invoke("Quit");
  10.         }
  11.         ExcelApp.ReleaseInstance();
  12.         ExcelApp = null;
  13.     }
  14. }

gile

  • Gator
  • Posts: 2507
  • Marseille, France
Re: Best way to communicate with Excel
« Reply #11 on: February 19, 2012, 07:30:05 AM »
Coincidence or logical contination ? I was exactly looking at this.
As shown in some examples other there I think using an intermediate DataTable instance is interesting when dealing from (or to) block attributes and Excel (or AutoCAD) table.
Using the DataTable to set the cells format is also very interesting, thanks again.
Speaking English as a French Frog

kaefer

  • Guest
Re: Best way to communicate with Excel
« Reply #12 on: February 19, 2012, 08:52:35 AM »
As shown in some examples other there I think using an intermediate DataTable instance is interesting when dealing from (or to) block attributes and Excel (or AutoCAD) table.

DataTable has other interesting options:
Relations via DataSet
Filtering/Viewing via DataView
Support for Linq queries, e.g. Intersect; most easily by LINQ to DataSet Extensions

Keep it coming!

gile

  • Gator
  • Posts: 2507
  • Marseille, France
Re: Best way to communicate with Excel
« Reply #13 on: February 19, 2012, 03:33:43 PM »
Quote
DataTable has other interesting options:
Relations via DataSet
Filtering/Viewing via DataView
Support for Linq queries, e.g. Intersect; most easily by LINQ to DataSet Extensions
For sure !

I updated the ExcelStream libraries including your suggestion.
Speaking English as a French Frog

vegbruiser

  • Guest
Re: Best way to communicate with Excel
« Reply #14 on: February 20, 2012, 07:42:19 AM »
Thanks Gile for the update.

Here's an example that I wrote using the ExcelReader.RangetoDataTable() method to import data from an Excel Workbook in an AutoCAD table:

Code: [Select]
               /// <summary>
        /// Opens an Excel file using the ExcelReader class
        /// reads the values (from a specific sheet if necessary)
        /// creates an AutoCAD Table with the values
        /// Originally copied from here: http://bit.ly/wtBLVZ
        /// </summary>
        [CommandMethod("test4")]
        public void test4()
        {
            Document doc = Application.DocumentManager.MdiActiveDocument;
            Database db = doc.Database;
            Editor ed = doc.Editor;
            Stopwatch sw = Stopwatch.StartNew();
           
            //object[][] values;
            System.Data.DataTable table;
            using (ExcelReader xlr = new ExcelReader(@"C:\temp\test1.xls"))
            {
                table= xlr.RangeToDataTable(1,1);
            }

            if (table == null)
                return;
            using (Transaction tr = db.TransactionManager.StartTransaction())
            {
                BlockTable bt = db.BlockTableId.GetObject(OpenMode.ForRead) as BlockTable;

                BlockTableRecord btrMs = (BlockTableRecord)bt[BlockTableRecord.ModelSpace].GetObject(OpenMode.ForRead);

                Table bomTable = new Table();
                bomTable.TableStyle = db.Tablestyle;
                bomTable.Position = Point3d.Origin;

                TableStyle ts = (TableStyle)tr.GetObject(bomTable.TableStyle, OpenMode.ForRead);
                double txtHeight = ts.TextHeight(RowType.DataRow);
                //add the correct number of rows & columns
                bomTable.InsertColumns(1,1,table.Columns.Count);
                bomTable.InsertRows(1,1,table.Rows.Count);
                //write the header row.
                DataRow row = table.Rows[1];
                for (int i = 0; i < table.Columns.Count; i++)
                {
                    bomTable.Cells[0, i].TextString = table.Rows[0][i].ToString();
                }
                // then ALL the data rows
                for (int i = 1; i < table.Rows.Count; i++)
                {
                    for (int j = 1; j < table.Columns.Count; j++)
                    {
                        bomTable.Cells[i, j].TextString = table.Rows[i][j].ToString();
                    }
                }

                bomTable.GenerateLayout();

                btrMs.UpgradeOpen();
                btrMs.AppendEntity(bomTable);
                tr.AddNewlyCreatedDBObject(bomTable, true);
                tr.Commit();
            }
            sw.Stop();
            AcadApp.DocumentManager.MdiActiveDocument.Editor.WriteMessage(
                "\nPasteFromClipboard: {0}", sw.Elapsed.TotalMilliseconds);
        }

It has an issue to do with the column heads and rows within the AutoCAD table itself, but I think that's due to my not having a style setup in the .dwg file already?