TheSwamp

Code Red => .NET => Topic started by: Keith Brown on January 10, 2017, 11:38:20 AM

Title: Serializing xml files to memory.
Post by: Keith Brown on January 10, 2017, 11:38:20 AM
Hi,


I have some functions where I am serializing some large xml files to store in the named object dictionary.  Everything works great except for speed.  The deserializing speed is not bad and is something that I can live with but the actual serializing of the memory stream takes an incredible amount of time.  The slow down appears to be when breaking the memory stream into small chunks to store in a result buffer.  When the file is fairly small, it is exremely fast, but as the file size increases the speed slows down in a non linear fashion.  The time it takes to do the first 200 result buffers is slower by factors larger than a magnitude of 10 to do the next 200 and so on.  I believe that this is due to the how the xrecord is storing the result buffer.  I have attempted to break my memory stream down into smaller chunks before passing to the code that places them into the result buffer but I have not had alot of success with rebuilding the stream.

Has anyone had success serializing xml files to drawing memory?  Do you use large files?

Here are the two methods that I use to serialize to resultbuffers.  I did not write the original code and I am not sure where I got it from.  It might have been a class at AU from Jerry Winters but not 100% sure.  The code to convert to and from a memory stream is boilerplate and is not shown here.  However, i think the key to success is to break the memorystream into small chunks before calling the streamtoresultbuffer.

Code - C#: [Select]
  1. private ResultBuffer StreamToResultBuffer(MemoryStream memoryStream, string applicationName)
  2. {
  3.    ResultBuffer resultBuffer = new ResultBuffer(new TypedValue(Convert.ToInt32(DxfCode.ExtendedDataRegAppName), applicationName));
  4.    var i = 0;
  5.    memoryStream.Seek(0, SeekOrigin.Begin);
  6.    while (i < memoryStream.Length) {
  7.       var length = Convert.ToInt32(Math.Min(memoryStream.Length - i, MaxChunkSize));
  8.       var datachunk = new byte[length];
  9.       memoryStream.Read(datachunk, 0, length);
  10.       resultBuffer.Add(new TypedValue(Convert.ToInt32(DxfCode.ExtendedDataBinaryChunk), datachunk));
  11.       i += MaxChunkSize;
  12.    }
  13.    return resultBuffer;
  14. }
  15.  
  16. private MemoryStream ResultBufferToStream(ResultBuffer resultBuffer)
  17. {
  18.    MemoryStream memoryStream = new MemoryStream();
  19.    TypedValue[] values = resultBuffer.AsArray();
  20.  
  21.  
  22.    // Start from 1 to skip application name
  23.    for (int i = 1; i <= values.Length - 1; i++) {
  24.       byte[] datachunk = (byte[])values(i).Value;
  25.       memoryStream.Write(datachunk, 0, datachunk.Length);
  26.    }
  27.    memoryStream.Position = 0;
  28.    return memoryStream;
  29. }
Title: Re: Serializing xml files to memory.
Post by: dgorsman on January 10, 2017, 12:42:55 PM
I don't push raw data to XRecords (although, is that XData you are using in the code??); I break it up appropriately to what it is/does i.e. a length gets a DXF code for a double, and its stored as a double.  Then there's no need to convert it from XML fragments every time its needed.  On the occasion that I do need access to truly large-scale XML content I keep the XML object at the document or application scope, and only store the filename and path of the XML source file.
Title: Re: Serializing xml files to memory.
Post by: Keith Brown on January 10, 2017, 12:51:35 PM

I am using the NOD and storing in dictionaries and XRecords.  The resultbuffer gets pushed to the XRecord.

I am not really storing specific entity information.  It is basically data and can be approximately 75,000 kb in size or even larger depending on the number of objects in the drawing.


I have tried to stay away from using an external xml file due to the headaches of keeping track of its location, duplicating it, renaming it upon saveas command, etc.


However, some tests have shown that I can write the file in about 5 seconds whereas writing it to drawing memory takes upwards of 20 minutes which is a less than ideal time frame.


The specs for the program required that the data be stored in memory, but I think I might push back against that if I cannot get the time lowered.
Title: Re: Serializing xml files to memory.
Post by: dgorsman on January 10, 2017, 03:39:23 PM
If it doesn't come from a file in the first place, where does the XML content come from?  Database query?  Constructed from scratch?
Title: Re: Serializing xml files to memory.
Post by: Keith Brown on January 10, 2017, 03:51:27 PM
It is processed from the drawing itself.  I am reading CADWorx piping entities and determining how the entities are connected to each other.  This information is then used to determine stress on the systems and then highlight the high stress areas.


I did not anticipate the customers using the software with hundreds of piping systems in a single drawing. (thousands of entities, 30k+)  So I have added a flag to the settings section that allows them to use either external or internal memory with a recommendation of using external memory for larger systems.  I would like to optimize my routine however as I find it useful and do use it to save information to an entity.  I find it quicker and easier to serialize a class and then store that information as xml in the objects extension dictionary rather than create result buffers with dxf codes.  This is time consuming to me and not worth the payoff when I can achieve the same result with a much faster design time using xml.



Title: Re: Serializing xml files to memory.
Post by: dgorsman on January 10, 2017, 04:12:44 PM
 :yay!:  Sounds like I'm doing something very similar with connection and other reports, but I don't get down into XML until late in the process.  Processing time has been acceptable so far, even with including XREFs (there's been a few 30k batches, those only take a couple of minutes).
Title: Re: Serializing xml files to memory.
Post by: MickD on January 10, 2017, 04:20:14 PM
I have a similar situation where I store xml in the NOD, the difference is I store just the xml xml nodes related to entities on the entities themselves, this way the drawing is self-managing when people copy/delete entities. I can also edit the entity xml if I need.
When I want to export the XML I loop through the dwg and put the whole shebang together as a complete xml doc and it takes a few seconds tops for hundreds of entities. Most drawings export only about 3-4mb though but I wouldn't imagine larger files taking much longer really.

The code I use for data chunks is pretty much the same and I got it on the Swamp (I think Daniel wrote it (nullptr)) :)
Title: Re: Serializing xml files to memory.
Post by: Keith Brown on January 10, 2017, 04:25:27 PM
My issue is the sheer size of the xml file.  The one that I am looking at right now is 1.2 million lines long. 
Title: Re: Serializing xml files to memory.
Post by: MexicanCustard on January 10, 2017, 04:25:52 PM
I find it quicker and easier to serialize a class and then store that information as xml in the objects extension dictionary rather than create result buffers with dxf codes.  This is time consuming to me and not worth the payoff when I can achieve the same result with a much faster design time using xml.

Obviously it's not always quicker and easier.  If you must store large data in the drawing, then I think I'd try loading at document creation and unloading during a document save.  Modify the in memory classes in between.  The user wont notice a longer delay during those operations since cad takes forever during those operations anyway.
Title: Re: Serializing xml files to memory.
Post by: MexicanCustard on January 10, 2017, 04:27:20 PM
My issue is the sheer size of the xml file.  The one that I am looking at right now is 1.2 million lines long. 

At 1.2 million lines you've successfully defeated any purpose of using an XML file.  I'd seriously consider alternative storage methods. 
Title: Re: Serializing xml files to memory.
Post by: Keith Brown on January 10, 2017, 04:31:41 PM
Obviously it's not always quicker and easier.  If you must store large data in the drawing, then I think I'd try loading at document creation and unloading during a document save.  Modify the in memory classes in between.  The user wont notice a longer delay during those operations since cad takes forever during those operations anyway.


Quicker and easier as in design time.  I can serialize and save a class to xml and store in the NOD in just a few minutes of coding with my base class.
Title: Re: Serializing xml files to memory.
Post by: Keith Brown on January 10, 2017, 04:53:07 PM
At 1.2 million lines you've successfully defeated any purpose of using an XML file.  I'd seriously consider alternative storage methods.


That is certainly up for debate.  I have no issues reading/processing the data.  It takes only 3 seconds to read the file and now only 1 minute to save the data.  The previous slow down was due to the AutoCAD API and its implementation of ResultBuffers.


Using a database is out of the question due to design restrictions.


The only optimization that I can see that can be done now is to save the in memory representation of the file and switch between them when loading/switching documents as shown in this post.
http://through-the-interface.typepad.com/through_the_interface/2006/10/perdocument_dat_2.html (http://through-the-interface.typepad.com/through_the_interface/2006/10/perdocument_dat_2.html)

Title: Re: Serializing xml files to memory.
Post by: It's Alive! on January 10, 2017, 06:46:00 PM
What about using compression i.e. GZipStream?
http://www.theswamp.org/index.php?topic=26687.0
Title: Re: Serializing xml files to memory.
Post by: Keith Brown on January 12, 2017, 08:34:24 AM
What about using compression i.e. GZipStream?
http://www.theswamp.org/index.php?topic=26687.0 (http://www.theswamp.org/index.php?topic=26687.0)


OMG! Duh!  After some quick refactoring to incorporate compressing/decompressing the xml string I ended with a file that previously took 27 min to save to the NOD and now takes less than 1 min.

+1 for zipping.  Thank you very much nullptr.
Title: Re: Serializing xml files to memory.
Post by: CADbloke on January 21, 2017, 05:35:04 AM
Thought about using JSON? JSON.NET has been quite fast for me. See http://stackoverflow.com/a/37207034/492 for how I use it to store Key - Value pairs
Title: Re: Serializing xml files to memory.
Post by: Keith Brown on January 21, 2017, 08:49:13 AM
I thought about it but in this case I do not believe that it would be any different.  My issues were not with the speed of serialization but with the speed of the AutoCAD .NET API saving that information into the drawing database via ResultBuffers.  Once I zipped the resulting xml file the speed increased dramatically.


** Edit ** Now that I think about it, the information serialized to json would be a smaller file size as json does not require as much extra information in its file.  I just might look into this.