Author Topic: Windows Speech to Text (STT) via AutoLisp / Visual Lisp?  (Read 2614 times)

0 Members and 1 Guest are viewing this topic.

CodeDing

  • Newt
  • Posts: 63
Windows Speech to Text (STT) via AutoLisp / Visual Lisp?
« on: November 09, 2023, 03:10:31 PM »
A while back, BigAL posted a program that used the Windows Sapi COM object to have your computer speak the text provided (Text to Speech, TTS).

After some research I can see that Sapi can be used for Speech to Text (STT), but I cannot understand how to implement it via lisp. Can somebody help create a program so I can understand how this would work?

Speak API (Sapi) Overview:
https://learn.microsoft.com/en-us/previous-versions/windows/desktop/ms720151(v=vs.85)#api-for-speech-recognition

Sapi documentaion:
https://learn.microsoft.com/en-us/previous-versions/windows/desktop/ms720099(v=vs.85)

Simple use of code might look like:
Code - Auto/Visual Lisp: [Select]
  1. (defun c:TEST ( / txt)
  2.   (getstring "\nPress Enter at any time to begin Speech Recognition...")
  3.   (setq txt (sapi-stt))
  4.   (alert (strcat "Your Text:\n\n" txt))
  5.   (princ)
  6. )
  7.  

Best,
~DD
Senior CAD Tech & AI Specialist
Need AutoLisp help?
Try my custom GPT 'AutoLISP Ace'

Vaidas

  • Newt
  • Posts: 66
Re: Windows Speech to Text (STT) via AutoLisp / Visual Lisp?
« Reply #1 on: November 09, 2023, 03:30:40 PM »
This is something I used for fun:

Code: [Select]
(setq sapi (vlax-create-object "Sapi.SpVoice"))
(vlax-invoke sapi "Speak" "I wish you the best of everything in New 2011 Year! Thanks for using KitoxToolset from www.kitox.com" 0)
(vlax-release-object sapi)
(mapcar 'chr '(107 105 116 111 120 46 99 111 109))

JohnK

  • Administrator
  • Seagull
  • Posts: 10681
Re: Windows Speech to Text (STT) via AutoLisp / Visual Lisp?
« Reply #2 on: November 09, 2023, 03:32:23 PM »
Code - Auto/Visual Lisp: [Select]
  1.   (setq sapi (vlax-create-object "Sapi.SpVoice"))
  2.   (vlax-invoke sapi "Speak" "I'm sorry Dave, I'm afraid I can't do that ." 0)
  3.   )
  4.  


EDIT: Second place. I was beaten by Vaidas.
TheSwamp.org (serving the CAD community since 2003)
Member location map - Add yourself

Donate to TheSwamp.org

Vaidas

  • Newt
  • Posts: 66
Re: Windows Speech to Text (STT) via AutoLisp / Visual Lisp?
« Reply #3 on: November 09, 2023, 03:38:11 PM »
Sorry JohnK, for my guerilla marketing :)
(mapcar 'chr '(107 105 116 111 120 46 99 111 109))

kdub_nz

  • Mesozoic keyThumper
  • SuperMod
  • Water Moccasin
  • Posts: 2166
  • class keyThumper<T>:ILazy<T>
Re: Windows Speech to Text (STT) via AutoLisp / Visual Lisp?
« Reply #4 on: November 09, 2023, 03:49:04 PM »
Third place, with the usual proviso ...
added link:
https://chat.openai.com/share/8107d8c4-02c2-4f7f-bf51-5a15f5fafdce

I was tempted to respond with a "Thank You" so the bit-heads will treat me gently when the revolution comes.
« Last Edit: November 09, 2023, 03:58:17 PM by kdub_nz »
Called Kerry in my other life
Retired; but they dragged me back in !

I live at UTC + 13.00

---
some people complain about loading the dishwasher.
Sometimes the question is more important than the answer.

CodeDing

  • Newt
  • Posts: 63
Re: Windows Speech to Text (STT) via AutoLisp / Visual Lisp?
« Reply #5 on: November 09, 2023, 04:07:19 PM »
 :straight:

I tried to explain it clearly but I can see the confusion lol.

All of the provided examples so far are for TEXT TO SPEECH which is Not what I need.

I am looking for an example of SPEECH TO TEXT pleeasseeeee

Best,
~DD
Senior CAD Tech & AI Specialist
Need AutoLisp help?
Try my custom GPT 'AutoLISP Ace'

Vaidas

  • Newt
  • Posts: 66
Re: Windows Speech to Text (STT) via AutoLisp / Visual Lisp?
« Reply #6 on: November 09, 2023, 04:23:52 PM »
Oh, I need to read twice... My apologies, I did not made experiments for your issue on my side.
(mapcar 'chr '(107 105 116 111 120 46 99 111 109))

kdub_nz

  • Mesozoic keyThumper
  • SuperMod
  • Water Moccasin
  • Posts: 2166
  • class keyThumper<T>:ILazy<T>
Re: Windows Speech to Text (STT) via AutoLisp / Visual Lisp?
« Reply #7 on: November 09, 2023, 04:36:06 PM »
ChatGPT

You could use Python,
or :

Quote
To create a speech-to-text feature in AutoCAD using AutoLISP, you can leverage the Windows Speech Recognition API through COM automation. Here's a basic example of AutoLISP code that uses the Windows Speech Recognition API to convert speech to text in AutoCAD:

```lisp
(defun speech-to-text ()
  (vl-load-com)
  (setq recognizer (vlax-create-object "SAPI.SpSharedRecognizer"))
  (setq recognizer-ctx (vlax-get-property recognizer 'Recognizer))
  (vlax-invoke-method recognizer-ctx 'SetInput "speech input" t)
 
  (setq audio (vlax-create-object "SAPI.SpAudioFormat"))
  (vlax-put-property audio 'Type :spATDictation)
  (vlax-put-property audio 'FormatType :SAFT22kHz16BitMono)
  (vlax-put-property recognizer 'AudioInput audio)

  (setq reco-context (vlax-get-property recognizer 'CreateRecoContext))
  (vlax-invoke-method reco-context 'SetNotifyWinEventSink reco-context)
 
  (vlax-invoke-method reco-context 'SetInterest :SPEI_RECOGNITION :SPEI_RECOGNITION)

  (while
    (progn
      (setq event (vlax-invoke-method reco-context 'WaitForNotifyEvent 500))
      (if (= event :SPEI_RECOGNITION)
        (progn
          (setq result (vlax-get-property reco-context 'GetResult))
          (setq phrase (vlax-get-property result 'PhraseInfo))
          (setq text (vlax-get-property phrase 'GetText))
          (princ "\nRecognized Text: ")
          (princ text)
          t)
        t)))
  (vlax-release-object recognizer)
)

(speech-to-text)
```

Please note that this code requires Windows Speech Recognition to be installed and properly configured on your system. Also, keep in mind that real-time speech recognition can be resource-intensive, so the performance may vary based on your system's capabilities.

Make sure to test and modify the code as needed to fit your specific requirements and environment.



Quote
Creating a speech-to-text functionality in AutoCAD using AutoLISP is not straightforward because AutoLISP does not have native support for speech recognition. However, you can achieve this by leveraging external tools and libraries. One way to do this is by using a Python script to handle speech recognition and then communicate with AutoCAD through the COM interface.

Here's an example of how you can achieve speech-to-text functionality in AutoCAD using a combination of AutoLISP and Python:

1. **Python Script (speech_to_text.py)**: Write a Python script that uses a speech recognition library to convert speech to text. You can use a library like SpeechRecognition, which supports various speech recognition engines.

```python
import speech_recognition as sr

def recognize_speech():
    recognizer = sr.Recognizer()
    with sr.Microphone() as source:
        print("Say something:")
        audio = recognizer.listen(source)
    try:
        text = recognizer.recognize_google(audio)
        return text
    except sr.UnknownValueError:
        return "Speech Recognition could not understand audio"
    except sr.RequestError as e:
        return f"Could not request results from Google Speech Recognition service; {e}"

if __name__ == "__main__":
    print(recognize_speech())
```

2. **AutoLISP Code**: Use AutoLISP to call the Python script and retrieve the speech-to-text output. AutoLISP can call external commands using the `COMMAND` function.

```lisp
(defun c:speech-to-text ()
  (setq cmd (strcat "python " (vl-filename-mktemp nil nil ".py")))
  (setq result (command cmd))
  (princ (strcat "\nSpeech to text result: " result))
  (princ)
)
```

In this AutoLISP code:

- `(vl-filename-mktemp nil nil ".py")` creates a temporary Python script file.
- `(command cmd)` executes the Python script using the `COMMAND` function, which runs an external command and returns the output.
- The result is then displayed in the AutoCAD command line.

To use this functionality, load the AutoLISP code into AutoCAD and type `SPEECH-TO-TEXT` in the command line. Make sure you have Python installed on your system and the SpeechRecognition library (`pip install SpeechRecognition`) to run the Python script.

Please note that this approach requires Python to be installed on your system and may require additional configuration based on your specific environment.




Called Kerry in my other life
Retired; but they dragged me back in !

I live at UTC + 13.00

---
some people complain about loading the dishwasher.
Sometimes the question is more important than the answer.

It's Alive!

  • Retired
  • Needs a day job
  • Posts: 8927
  • AKA Daniel
Re: Windows Speech to Text (STT) via AutoLisp / Visual Lisp?
« Reply #8 on: November 09, 2023, 04:54:46 PM »
I tried the python version here
https://forums.augi.com/showthread.php?177316-text-recognition&p=1355226&viewfull=1#post1355226

sphinx didn't recognize my voice very well. I think connecting it to google would be better


Code - Python: [Select]
  1. import PyRx as Rx
  2. import PyGe as Ge
  3. import PyGi as Gi
  4. import PyDb as Db
  5. import PyAp as Ap
  6. import PyEd as Ed
  7. import traceback
  8.  
  9. import speech_recognition as sr
  10.  
  11. def PyRxCmd_doit():
  12.     try:
  13.         r = sr.Recognizer()
  14.         with sr.Microphone() as source:
  15.             print("Say something!")
  16.             audio = r.listen(source)
  17.         try:
  18.             result =  r.recognize_sphinx(audio)
  19.             Ap.DocManager().sendStringToExecute(Ap.curDoc(), result+"\n")
  20.             print("Sphinx thinks you said " + result)
  21.         except sr.UnknownValueError:
  22.             print("Sphinx could not understand audio")
  23.         except sr.RequestError as e:
  24.             print("Sphinx error; {0}".format(e))
  25.        
  26.     except Exception as err:
  27.         traceback.print_exception(err)
  28.  

CodeDing

  • Newt
  • Posts: 63
Re: Windows Speech to Text (STT) via AutoLisp / Visual Lisp?
« Reply #9 on: November 09, 2023, 05:28:08 PM »
Yeah it looks like ChatGPT got about no further than I originally did. That code obviously doesn't execute (typical with the AutoLisp codes it generates lol).

I haven't tried the python yet. Hoping to accomplish this via Lisp, but if it comes to it, then so be it maybe I'll resort to Python/.NET if I have to.
Senior CAD Tech & AI Specialist
Need AutoLisp help?
Try my custom GPT 'AutoLISP Ace'

It's Alive!

  • Retired
  • Needs a day job
  • Posts: 8927
  • AKA Daniel
Re: Windows Speech to Text (STT) via AutoLisp / Visual Lisp?
« Reply #10 on: November 09, 2023, 06:36:54 PM »
You’re on the right track with Sapi. 

I would try to find this and port it (Simple Dictation for Visual Basic)
https://learn.microsoft.com/en-us/previous-versions/windows/desktop/ms720187(v=vs.85)



 

kdub_nz

  • Mesozoic keyThumper
  • SuperMod
  • Water Moccasin
  • Posts: 2166
  • class keyThumper<T>:ILazy<T>
Re: Windows Speech to Text (STT) via AutoLisp / Visual Lisp?
« Reply #11 on: November 09, 2023, 07:06:45 PM »
I think with ChatGPT the skills required are asking the correct question and having the experience to test the response and recognise the options when/if  the response seems unsuitable. . . . it's Definitely not always good magic.
Called Kerry in my other life
Retired; but they dragged me back in !

I live at UTC + 13.00

---
some people complain about loading the dishwasher.
Sometimes the question is more important than the answer.

CodeDing

  • Newt
  • Posts: 63
Re: Windows Speech to Text (STT) via AutoLisp / Visual Lisp?
« Reply #12 on: November 09, 2023, 07:28:19 PM »
I hope people can understand that I'm not just being lazy and that I've tried my very best to get this created. I just can NOT get something useful created, so I need something that truly shows me a useful example, and not just stepping blocks.

But don't just take my word for it. I'll do my best to show you my working steps..

So the VERY FIRST sentence here:
https://learn.microsoft.com/en-us/previous-versions/windows/desktop/ms720151(v=vs.85)#api-for-speech-recognition

...says this:
Quote
Just as ISpVoice is the main interface for speech synthesis, ISpRecoContext is the main interface for speech recognition.

...and reading this, I take 2 key words out:
Quote
ISpVoice & ISpRecoContext

...Looking at ISpVoice, I can see that it is used in this function for the Text-to-Speech tools (I'm just using one from this thread, but I already have a function that uses this api, and referenced it before starting this thread):
Code - Auto/Visual Lisp: [Select]
  1. (setq sapi (vlax-create-object "Sapi.SpVoice"))
  2. (vlax-invoke sapi "Speak" "I wish you the best of everything in New 2011 Year! Thanks for using KitoxToolset from www.kitox.com" 0)

...Which would lead me to believe that I should probably be using ISpRecoContext when I start my Speech-to-Text approach. So, when I select that hyperlink in the first sencence (here's the page):
https://learn.microsoft.com/en-us/previous-versions/windows/desktop/ms718476(v=vs.85)

...then I read that portion, it states this:
Quote
A new ISpRecoContext object can be created by calling ISpRecognizer::CreateRecoContext.

...Well, now that tells me that I need to start with a ISpRecognizer... Well how do I make one of those? Check this page out..
https://learn.microsoft.com/en-us/previous-versions/windows/desktop/ms720099(v=vs.85)

...when I search the right column, we see our ISpRecognizer is created via the SpSharedRecognizer Object class. Well, that's a good sign because this code does in fact work:
Code - Auto/Visual Lisp: [Select]
  1. (setq recognizer (vlax-create-object "SAPI.SpSharedRecognizer"))

...and it even pops up the Listener on my machine:
[not sure how to post images, see attached image "listener_capture.png"]

So now... what's next? Well, remember we were told earlier that this object can create ISpRecoContext. So let's dump our 'recognizer' properties to check:
Code: [Select]
Command: (vlax-dump-object recognizer t)
; ISpeechRecognizer: ISpeechRecognizer Interface
; Property values:
;   AudioInput = #<VLA-OBJECT ISpeechObjectToken 00000206e310a410>
;   AudioInputStream = Exception occurred
;   IsShared (RO) = -1
;   Profile = #<VLA-OBJECT ISpeechObjectToken 00000206e310a6e0>
;   Recognizer = #<VLA-OBJECT ISpeechObjectToken 00000206e310ae30>
;   State = 2
;   Status (RO) = #<VLA-OBJECT ISpeechRecognizerStatus 00000206e3b60c60>
; Methods supported:
;   CreateRecoContext ()
;   DisplayUI (4)
;   EmulateRecognition (3)
;   GetAudioInputs (2)
;   GetFormat (1)
;   GetProfiles (2)
;   GetRecognizers (2)
;   IsUISupported (2)


...Well look at that! A supported method is "CreateRecoContext ()". So let's do that:
Code: [Select]
Command: (setq reco-context (vlax-invoke recognizer 'CreateRecoContext))
#<VLA-OBJECT ISpeechRecoContext 00000206e310d600>
Command: (vlax-dump-object reco-context t)
; ISpeechRecoContext: ISpeechRecoContext Interface
; Property values:
;   AudioInputInterferenceStatus (RO) = 0
;   CmdMaxAlternates = 0
;   EventInterests = 327679
;   Recognizer (RO) = #<VLA-OBJECT ISpeechRecognizer 00000206e3b60288>
;   RequestedUIType (RO) = ""
;   RetainedAudio = 0
;   RetainedAudioFormat = #<VLA-OBJECT ISpeechAudioFormat 00000206e310a530>
;   State = 1
;   Voice = #<VLA-OBJECT ISpeechVoice 00000206e310de00>
;   VoicePurgeEvent = 0
; Methods supported:
;   Bookmark (3)
;   CreateGrammar (1)
;   CreateResultFromMemory (1)
;   Pause ()
;   Resume ()
;   SetAdaptationData (1)


...Great, so now what? When I circle back to the ISpRecognizer documentation..
https://learn.microsoft.com/en-us/previous-versions/windows/desktop/ms718548(v=vs.85)

...It doesn't even say what to do next? So this is where I get lost. I have 2 useful objects, but no guidance on what to do with them. I can see all of the methods down at the bottom, but they definitely don't match my listed options when I dump my 'recognizer' object... So that's where this dies.

I've tried looking at VBA codes to see if I can extrapolate those somehow. But no luck.


So please, please, if someone can provide some useable code to get further along, that would be super cool.. because when I summarize what I'm able to accomplish.. it looks like this LoL:
Code - Auto/Visual Lisp: [Select]
  1. (defun c:TEST ( / recognizer reco-context)
  2.   (setq recognizer (vlax-create-object "SAPI.SpSharedRecognizer"))
  3.   (vlax-dump-object recognizer t)
  4.   (setq reco-context (vlax-invoke recognizer 'CreateRecoContext))
  5.   (vlax-dump-object reco-context t)
  6.   (vlax-release-object recognizer)
  7.   (princ)
  8. )
  9.  

Best,
~DD
Senior CAD Tech & AI Specialist
Need AutoLisp help?
Try my custom GPT 'AutoLISP Ace'

It's Alive!

  • Retired
  • Needs a day job
  • Posts: 8927
  • AKA Daniel
Re: Windows Speech to Text (STT) via AutoLisp / Visual Lisp?
« Reply #13 on: November 09, 2023, 07:58:40 PM »
Not lazy, you’re exploring areas where few have.
Hoping someone would throw out a sample was a long shot at best
Now its plan B, get the VB dictation sample, grind through it line by line


CodeDing

  • Newt
  • Posts: 63
Re: Windows Speech to Text (STT) via AutoLisp / Visual Lisp?
« Reply #14 on: November 09, 2023, 08:19:57 PM »
It's Alive!,

Can you post or point me to the VB Dictation Sample code?
I can't seem to find it by searching. I don't have Visual Studio installed on this machine. I have VS Code.

EDIT:
Nvm, just found it!
https://learn.microsoft.com/en-us/previous-versions/windows/desktop/ms720590(v=vs.85)

Best,
~DD
Senior CAD Tech & AI Specialist
Need AutoLisp help?
Try my custom GPT 'AutoLISP Ace'

BIGAL

  • Swamp Rat
  • Posts: 1474
  • 40 + years of using Autocad
Re: Windows Speech to Text (STT) via AutoLisp / Visual Lisp?
« Reply #15 on: November 09, 2023, 08:48:32 PM »
Years ago I did talk to Autocad I am sure just used the inbuilt voice input inside windows. !@#$ was undo. Stuff like "Line" worked. It was actually slow. given the advances in speech recognition should be better now. May have a another play.

yep worked
Enter Command spoke Line
Start of Line

Make sure TV is off etc got some weird stuff happening.



Slow to use and stops, I used previously speech to text and could teach it commands like !@#$ means something, I think I used stop for enter.
« Last Edit: November 09, 2023, 09:00:19 PM by BIGAL »
A man who never made a mistake never made anything

CodeDing

  • Newt
  • Posts: 63
Re: Windows Speech to Text (STT) via AutoLisp / Visual Lisp?
« Reply #16 on: November 09, 2023, 09:10:01 PM »
BigAl,

Yes, I know the Windows Dictation can run in the foreground. The problem is that it runs continuously, and I would rather manage the text string myself. So that's why I want the API.

When I look at the VB sample here:
https://learn.microsoft.com/en-us/previous-versions/windows/desktop/ms720590(v=vs.85)

...it appears to utilize an EventHandler to manage when the Recognizer is finished. We can see this is accomplished via the "RC_Recognition" function:
Code: [Select]
Private Sub RC_Recognition( ...
Does anybody know how I can manage COM events in Visual Lisp? Yes, I know about "vlr-" methods, but not sure how I could apply them in this instance to my Recognizer?
Any examples of Visual Lisp COM event handling?

Best,
~DD
Senior CAD Tech & AI Specialist
Need AutoLisp help?
Try my custom GPT 'AutoLISP Ace'