Author Topic: Fully Searchable PDF output How To Fix. Text Width Factor, Oblique Angle, Mtext  (Read 2245 times)

0 Members and 1 Guest are viewing this topic.

THansen

  • Guest
I have been working to correct issues we did not realize we had with PDF output from AutoCAD DWG and Inventor IDW files.  I have seen several posts related to the same thing so I wanted to share what I have discovered and the fix for it to date.

If you are finding that your DWG files or IDW files when output to PDF from just about any method are not FULLY searchable your not alone!

ROOT CAUSE
1.) Any Text Style using a SHX font will not convert to searchable text.  Instead it is output as an image.

WHY - SHX fonts are unique to AutoDesk products and were developed for the pen plotter days.  In today's world there is no reason to be using them.  Since they are unique to Autodesk and "old" they are not a native font to your modern OS.  As a result when you try to process the file with a PDF print driver, pc3 file, etc. the program does not recognize the font and thus outputs it as an image.  Images are not searchable text!

FIX - Change the font used in all Text Styles to a TrueType font (TTF) or an OpenType font with TrueType Outlines.  To determine what fonts are on your system START>SEARCH>FONTS and double click on a font.  At the top left corner it will indicate what it is.  For Arial Black (OpenType) the information given is "OpenType Layout, Digitally Signed, TrueType Outlines"

Try to "standardize" on fonts with these features:
A.) A STANDARD font already installed on your OS that is popular and has been around for a long time and has a font family with a lot of choices for different "widths" and "styles" like condensed, narrow, Bold, Italic, Narrow Italic, etc.

B.) A font that is designed for different languages to cover your localization needs.  This makes translating drawings much easier at a latter date.  Arial is a good choice since it is designed for Latin; Greek: Cyrillic; Hebrew; and Arabic.

When applying the new font to existing Text Styles use the existing Text Style's Text Width Factor (TWF) setting to clue in on what font to use.  If the TWF is 0.85 then maybe Arial Narrow is a good fit.

2.) Any TTF that has been altered in anyway by AutoCAD will not be recognized as a standard font and will convert as an image.

A.) The TWF MUST be 1.0
Instead of using TWF to condense or expand text use the font family style instead; Arial Narrow, Arial Narrow Bold, etc.  You may have to use other fonts to accomplish this as Arial may not have enough choices in the font family.

FIX - Use lisp to change all EXISTING text to a TWF of 1.0.  This includes single line text, Mtext, text in blocks, text in dynamic blocks, text in attributes, etc.

B.) The Text Oblique Angle (TOA) must be 0.0
Instead of using the TOA use different font styles like Italic for example.

FIX - Use lisp to change all EXISTING text to a TOA of 0.0.  This includes single line text, Mtext, text in blocks, text in dynamic blocks, text in attributes, etc.

C.) Any text inside a block or object that has not been scaled on x;y symmetrically will not be recognized as "text" and converted as an image.  You can scale a block, it just has to be symmetrical.  Scaling to x=1.5, y=1.5 works.  However x=1.5, y=2 does not.  Again, the program cannot match the shape of the font to a "standard" font.

FIX - Use lisp to re-scale all blocks to be symmetrical or explode the blocks.

In the process of doing this you can also change text height and other attributes of text elements if desired.

WHAT'S BEEN TRIED
Since we have 90,000 plus drawings with the above issues I have tried several programs to try and get a fully searchable PDF with limited success.  In all cases except for Acrobat Pro the result is some unsearchable text. This includes the following: Adobe PDF, CutePDF, ClarityPDF, PrimoPDF, DWG to PDF.pc3, etc.  I even tried exporting to DWF, DWFx first then trying to conversion to PDF and got the same results.

OCR
Typically our drawings are a mixed bag.....some have ALL SHX fonts, some have TTF that are altered, and some are all TTF that have not been altered.  90% have SHX and altered TTF since our title blocks have the TTF but with TWF of from .8 to .95.

The PDF output of these yields some searchable text and some not searchable.  Most of it being non-searchable.

We tried processing with OCR but the results are not very accurate AND most pages cannot be OCR processed anyway.  Text elements that are close to lines and other objects confuse the OCR process.  This was most notable in item balloons, revision triangles, and BOM tables.  Having a searchable BOM is one of the key needs for us.

The OCR process also converts the ENTIRE drawing to an image.  It then "scans" for "shapes" and tries to match them to existing fonts on your system.  It puts this OCR output on a hidden layer directly under the text.  This is why when you try to highlight text the area selected is not directly under the actual text.

With Acrobat Pro......if the page contains a single character of "rendered" text the the program assumes the entire page has already been OCRed and will not allow you to process the rest of the page.  Adobe.....???  So any page that has TTF that is not altered will come over as rendered text and OCR will not work!

ACROBAT PRO
Acrobat Pro works provided you process the DWG file directly and you have to go into the Acrobat settings and set the PATH statement for the location of the SHX fonts, plotter, and plot config. files on your system.  Acrobat then substitutes a TTF font for each SHX font found in the file when producing the output.  It does the same for fonts that have been altered by comparing the "shape" to its database of fonts and substituting one that is close.  It can even create a new font on the fly if needed.  This is not a fast process as it takes some processing HP to do this.

FIX THE ROOT CAUSE
I have read that AutoCAD 2016 has some better tools for producing searchable PDF but our workflow creates the PDF from a DWF file.  We use Vault and it creates DWF or DWFx visualization files.  It will not create a PDF w/o purchasing an add on.  So for our use the only way to correct this is to fix the root cause.

LISP PROGRAMS
I found a lisp by Lee Mac called FixAllText that comes close to doing what is needed.  I modified the lisp to include code to change the existing text styles to use the Arial Font and set all Text Style attributes as needed. ie TWF=1.0 and TOA=0.0.  I did not change text height, color, layer, etc.

What is missing is code that will look at existing Text Style "TWF" settings and automatically choose a font family style based on existing TWF setting.

I would like to to see variables at the top of the code for something like the following:

If TWF is in the range 0.95 to 1.0 to 1.05 use ARIAL REGULAR
If TWF is in the range 0.95 to 1.0 to 1.05 AND the TOA is not 0.0 use ARIAL REGULAR ITALIC
If TWF is in the range 0.85 to 0.94 use ARIAL NARROW BOLD
If TWF is in the range 0.75 to 0.84 use ARIAL NARROW
If TWF is in the range 1.06 to 1.15 use ARIAL BOLD
If TWF is in the range 1.16 to 1.50 use ARIAL BLACK REGULAR
etc.

Or if these is a way with lisp to determine how close the start and stop points of text are to other elements then base font selection on that??

Code I have to date is attached.....Any help is appreciated


rkmcswain

  • Swamp Rat
  • Posts: 978

....SHX fonts are unique to AutoDesk products and were developed for the pen plotter days.  In today's world there is no reason to be using them. 

Plenty of reasons to keep using them, two of them are:
1. They are fast
2. You can control the "weight" of the strokes by plot style, giving you an almost infinite range of printed text "weights". With TTF, you get Regular and Bold, and maybe "Light" if you don't mind switching to a related, but different .ttf font - The "weight" of the strokes is proportional to the height of the character.


Obviously, there are arguments for TTF also. It's not really a right or wrong issue. It's what fits your purpose better.




ChrisCarlson

  • Guest
Well...one option AutoDesk initiated was the system variable EPDFSHX, this will allow all SHX text to be searchable via "comments" within the PDF.

If you are wanting this to be completely automagically, you could run into issues if multiple lines of text are single text entities vs an mtext entity. If it was my company, I would change all current and future drawings while leaving the old drawings as-is.