Author Topic: LISP OCR for arch font  (Read 3301 times)

0 Members and 1 Guest are viewing this topic.

CAB

  • Global Moderator
  • Seagull
  • Posts: 10401
LISP OCR for arch font
« on: December 17, 2014, 05:01:59 PM »
I have 30+ DWG file that were obviously converted from PDF but The arch.shx font was converted to plines.

Needing to modify text & was wanting to select the text that is now plines & convert it back to text so I can modify it.

Thinking out lout about the process:
Select the plines
sort from upper left to lower right to get the order of the letter's.
Separate out all touching plines as individual letters
scale into a box with grid line segments
create a pattern of crossing segments
compare those segments to know letters of the arch font
create text matching letters found.
Not sure how to detect spaces yet

Any better ideas?

Out for the evening.
I've reached the age where the happy hour is a nap. (°¿°)
Windows 10 core i7 4790k 4Ghz 32GB GTX 970
Please support this web site.

danallen

  • Guest
Re: LISP OCR for arch font
« Reply #1 on: December 17, 2014, 06:00:47 PM »
Create a master list of font characters and explode, see if they match the file you received. The creation order of objects should stay relatively the same, thus you might only need to match for first element for each character, based on relative x,y distance for first two points. Then you could assume the next N objects belong to that same character.

or

print only exploded text and scan, then do OCR off vector using some other program. Assuming that can create vector text file, import into CAD and align.

hmspe

  • Bull Frog
  • Posts: 362
Re: LISP OCR for arch font
« Reply #2 on: December 17, 2014, 06:09:24 PM »
I would think that a standard OCR program would be better, even if you had to print 8.5 X 11 sections and run them through a scanner.

Thinking out loud here, too, if I was doing this in lisp I would probably start by making a vector map of each character in the font, with the vectors originating from the lower-left most point on the character.  This could be done manually after exploding the font or could be dome by parsing the SHP file for the font.  I'd probably break the list down into sub-sets based on the number of vectors in a character.

Then:
1.  select all the line segments
2.  get a segment from the set
3.  use a crossing window about 2.5 X text height in both directions and centered on the dxf 10 point of the segment
4.  from that selection set get all interconnected segments
5.  remove the interconnected segments from the original selection set in step 1.
6.  find the lower left endpoint, then make a vector map
7.  match the vector map against the master list for the characters with the same number of vectors
8.  if there is a match remove the segments from the drawing and place character in a different color on a dedicated layer at the correct location.  This will make any character that could not be converted readily visible.
9.  loop from #2

The vector maps would need to be ordered in order of increasing angle to make the compare easy, and the vector map would need to include information on the offset from the lower left endpoint to the insertion point to get the characters in the correct positions.  Character widths would also be good to have.

After that:
1.  get a selection set of the new text
2.  order the set left to right then top to bottom
3.  going left to right on a line of characters check the distances between the insertion points of pairs of characters.  If the distance is greater than the width of the left character plus 1/2 of the en width assume there are spaces.  Should be easy to calculate the number of spaces if you know the width of a space.
4.  repeat step 3 for each line

"Science is the belief in the ignorance of experts." - Richard Feynman

Kerry

  • Mesozoic relic
  • Seagull
  • Posts: 11654
  • class keyThumper<T>:ILazy<T>
Re: LISP OCR for arch font
« Reply #3 on: December 17, 2014, 06:13:20 PM »
There are a couple of free OCR readers that could be used with PDF images created on the fly from the bounding box of the selected polylines.
 ... perhaps  :|
kdub, kdub_nz in other timelines.
Perfection is not optional.
Everything will work just as you expect it to, unless your expectations are incorrect.
Discipline: None at all.

danallen

  • Guest
Re: LISP OCR for arch font
« Reply #4 on: December 17, 2014, 06:41:04 PM »

CAB

  • Global Moderator
  • Seagull
  • Posts: 10401
Re: LISP OCR for arch font
« Reply #5 on: December 17, 2014, 08:53:34 PM »
Forgot to attach my test file.
Here it is.
I've reached the age where the happy hour is a nap. (°¿°)
Windows 10 core i7 4790k 4Ghz 32GB GTX 970
Please support this web site.

hmspe

  • Bull Frog
  • Posts: 362
Re: LISP OCR for arch font
« Reply #6 on: December 17, 2014, 10:35:29 PM »
I'm thinking something like the following to get the vector angles for SHX font characters.  Rough code quickly thrown together, so be nice, please.  I'm sure this can be done more efficiently.  I use Bricscad and use their built-in VLE- functions.  I've included equivalent functions for Autocad taken from the Bricscad Lisp Developer Support Package (LDSP), which is freeware.  The call to TXTEXP may need to be modified to work with Autocad's express tool for exploding text.  TXTEXP is a built-in command in Bricscad. 

The code outputs a list of the leftmost, then bottom-most line segment endpoint for each printable character, and a list of vector angles from the leftmost, then bottom-most line segment endpoint to each other line segment endpoint in a SHX font.  Seems to work with arch.shx.

Code - Auto/Visual Lisp: [Select]
  1. (defun c:map ( / adist angle_list angle1 base char_angle_list char_point_list
  2.               character counter counter2 e1 e2 elements lst memb point_list pt str)
  3.  
  4.   (defun remove_doubles (lst)                                                                       ; by gile at theswamp.org
  5.     (if lst
  6.       (cons (car lst) (remove_doubles (vl-remove (car lst) lst)))
  7.     )
  8.   )
  9.  
  10.   (setvar "cmdecho" 0)
  11.   (setq counter    33
  12.         pt            (list 0.0 0.0 0.0)
  13.         adist         (* (getvar "textsize") 1.5)
  14.         char_point_list nil
  15.         char_angle_list nil
  16.   )      
  17.   (while (< counter 128)
  18.     (setq point_list nil
  19.           angle_list nil
  20.     )
  21.     (setq str (chr counter))
  22.     (entmakex (list (cons 0 "TEXT")                                                                 ; entity type
  23.                     (cons 8 "POWER")                                                                ; layer
  24.                     (cons 7 (getvar "textstyle"))                                                   ; style
  25.                     (cons 40 (getvar "textsize"))                                                   ; height
  26.                     (cons 62 256)                                                                   ; color bylayer
  27.                     (cons 50 0.0)                                                                   ; rotation
  28.                     (cons 41 1.0)                                                                   ; width
  29.                     (cons 10 pt)                                                                    ; dummy value
  30.                     (cons 11 pt)                                                                    ; insert point
  31.                     (cons 72 0)                                                                     ; horizontal justification
  32.                     (cons 73 0)                                                                     ; baseline justification
  33.                     (cons 1 str)                                                                    ; the string
  34.               )
  35.     )
  36.     (setq character (entlast))
  37.     (command "txtexp" character "")
  38.     (setq elements (ssget "c" (list (* -1. adist) (* -1. adist))
  39.                               (list adist adist)
  40.                    )
  41.     )
  42.     (setq elements (vle-selectionset->list elements))
  43.     (foreach memb elements
  44.       (command "_.explode" memb)
  45.     )  
  46.     (setq elements (ssget "c" (list (* -1. adist) (* -1. adist))
  47.                               (list adist adist)
  48.                    )
  49.     )
  50.     (setq elements (vle-selectionset->list elements))
  51.     (foreach memb elements
  52.       (setq point_list (cons (vle-entget 10 memb) point_list))
  53.       (setq point_list (cons (vle-entget 11 memb) point_list))
  54.     )
  55.     (setq point_list (vl-sort point_list
  56.                        (function
  57.                          (lambda (e1 e2)
  58.                            (< (car e1) (car e2))
  59.                          )
  60.                        )
  61.                      )
  62.     )                
  63.     (setq base (nth 0 point_list))
  64.     (setq counter2 0)
  65.     (while (= (car base) (car (nth counter2 point_list)))
  66.       (if (< (cadr (nth counter2 point_list)) (cadr base))
  67.         (setq base (nth counter2 point_list))
  68.       )
  69.       (setq counter2 (1+ counter2))
  70.     )
  71.     (setq point_list (vl-remove base point_list))
  72.     (foreach memb point_list
  73.       (if (/= memb base)
  74.         (progn
  75.           (setq angle1 (angle base memb))
  76.           (setq angle_list (cons angle1 angle_list))
  77.         )
  78.       )
  79.     )  
  80.     (setq angle_list (remove_doubles angle_list))
  81.     (setq angle_list (vl-sort angle_list
  82.                        (function
  83.                          (lambda (e1 e2)
  84.                            (< e1 e2)
  85.                          )
  86.                        )
  87.                      )
  88.     )  
  89.     (setq char_point_list (cons base char_point_list)
  90.           char_angle_list (cons angle_list char_angle_list)
  91.     )
  92.     (foreach memb elements
  93.       (entdel memb)
  94.     )
  95.     (setq counter (1+ counter))
  96.   )
  97. (print char_point_list)
  98. (print char_angle_list)
  99.   (princ)
  100. )  
  101.  
  102.  
  103. ;;  The following are taken from VLE-EXTENSION.LSP, freeware (c) Menhirs NV.
  104. ;;  VLE-EXTENSION is part of the Bricscad Lisp Developer Support Package (LDSP)
  105. ;;  and provides Autocad equivalent functions for VLE- functions in Bricscad.
  106.  
  107. ;;================================================================================
  108. ;;| FUNCTION : vle-selectionset->list                                            |
  109. ;;|------------------------------------------------------------------------------|
  110. ;;| (vle-selectionset->list selectionset)                                        |
  111. ;;|                                                                              |
  112. ;;| creates a normal list of entities from given selection set                   |
  113. ;;|                                                                              |
  114. ;;| Arguments : 'selectionset' the selection set to be returned as entity list   |
  115. ;;|                                                                              |
  116. ;;| Return    :  list of plain entity names, or NIL if selectionset is empty     |
  117. ;;|                                                                              |
  118. ;;| Example   :  (vle-selectionset->list ss) -> (<#Entity name> <#Entity name>)  |
  119. ;;================================================================================
  120. (if (not vle-selectionset->list)
  121.   (defun vle-selectionset->list ( ss / lst sslen )
  122.     (if (and ss (= (type ss) 'PICKSET) (setq sslen (sslength ss)))
  123.       (while (>= (setq sslen (1- sslen)) 0)
  124.         (setq lst (cons (ssname ss sslen) lst))
  125.       )
  126.     )
  127.     lst
  128.   )
  129. )
  130.  
  131. ;;================================================================================
  132. ;;| FUNCTION : vle-entget                                                        |
  133. ;;|------------------------------------------------------------------------------|
  134. ;;| (vle-entget dxf ename)                                                       |
  135. ;;|                                                                              |
  136. ;;| retrieves the entity's property value for specified DXF group code;          |
  137. ;;| in typical case, this is a high-performance replacement for usual code like  |
  138. ;;| (cdr (assoc dxf (entget ename)))                                             |
  139. ;;|                                                                              |
  140. ;;| Arguments : 'dxf'   the DXF group code to retrieve                           |
  141. ;;|           : 'ename' the entity's ename                                       |
  142. ;;|                                                                              |
  143. ;;| Return    :  the value for specified DXF group code as plain Lisp data       |
  144. ;;|                                                                              |
  145. ;;| Example   :  (vle-entget 62 entity) returns the color for entity             |
  146. ;;|                                                                              |
  147. ;;| Notes :  (vle-entget) does always return a non-NIL value for omitted         |
  148. ;;|          "default" values (ByLayer etc.); NIL if 'dxf' code is not supported |
  149. ;;|          by the entity                                                       |
  150. ;;================================================================================
  151. (if (not vle-entget)
  152.   (defun vle-entget ( dxf ename / res )
  153.     (if (not (setq res (cdr (assoc dxf (entget ename)))))
  154.       (cond  ;; retrieve omitted default values
  155.         ((= dxf 6)   (setq res "ByLayer")) ;; linetype
  156.         ((= dxf 38)  (setq res 0.0))       ;; elevation
  157.         ((= dxf 39)  (setq res 0.0))       ;; thickness
  158.         ((= dxf 48)  (setq res 1.0))       ;; linetype scale
  159.         ((= dxf 60)  (setq res 0))         ;; visibility 0=visible
  160.         ((= dxf 62)  (setq res 256))       ;; color 256=ByLayer
  161.         ((= dxf 67)  (setq res 0))         ;; modelspace / paperspace entity 0=modelspace
  162.         ((= dxf 370) (setq res -1))        ;; lineweight -1=ByLayer
  163.       )
  164.     )
  165.     res
  166.   )
  167. )
  168.  
  169.  
"Science is the belief in the ignorance of experts." - Richard Feynman