Author Topic: Windows Speech to Text (STT) via AutoLisp / Visual Lisp? (Read 2615 times)

CodeDing · « **on:** November 09, 2023, 03:10:31 PM »

A while back, BigAL posted a program that used the Windows Sapi COM object to have your computer speak the text provided (Text to Speech, TTS).

After some research I can see that Sapi can be used for Speech to Text (STT), but I cannot understand how to implement it via lisp. Can somebody help create a program so I can understand how this would work?

Speak API (Sapi) Overview:
https://learn.microsoft.com/en-us/previous-versions/windows/desktop/ms720151(v=vs.85)#api-for-speech-recognition

Sapi documentaion:
https://learn.microsoft.com/en-us/previous-versions/windows/desktop/ms720099(v=vs.85)

Simple use of code might look like:

Code - Auto/Visual Lisp: [Select]

(defun c:TEST ( / txt)
  (getstring "\nPress Enter at any time to begin Speech Recognition...")
  (setq txt (sapi-stt))
  (alert (strcat "Your Text:\n\n" txt))
  (princ)
)
 

Best,
~DD

Vaidas · « **Reply #1 on:** November 09, 2023, 03:30:40 PM »

This is something I used for fun:

Code: [Select]

(setq sapi (vlax-create-object "Sapi.SpVoice"))
(vlax-invoke sapi "Speak" "I wish you the best of everything in New 2011 Year! Thanks for using KitoxToolset from www.kitox.com" 0)
(vlax-release-object sapi)

JohnK · « **Reply #2 on:** November 09, 2023, 03:32:23 PM »

Code - Auto/Visual Lisp: [Select]

(progn
  (setq sapi (vlax-create-object "Sapi.SpVoice"))
  (vlax-invoke sapi "Speak" "I'm sorry Dave, I'm afraid I can't do that ." 0)
  (vlax-release-object sapi)
  )
 

EDIT: Second place. I was beaten by Vaidas.

Vaidas · « **Reply #3 on:** November 09, 2023, 03:38:11 PM »

Sorry JohnK, for my guerilla marketing

kdub_nz · « **Reply #4 on:** November 09, 2023, 03:49:04 PM »

Third place, with the usual proviso ...
added link:
https://chat.openai.com/share/8107d8c4-02c2-4f7f-bf51-5a15f5fafdce

I was tempted to respond with a "Thank You" so the bit-heads will treat me gently when the revolution comes.

CodeDing · « **Reply #5 on:** November 09, 2023, 04:07:19 PM »

I tried to explain it clearly but I can see the confusion lol.

All of the provided examples so far are for TEXT TO SPEECH which is Not what I need.

I am looking for an example of SPEECH TO TEXT pleeasseeeee

Best,
~DD

Vaidas · « **Reply #6 on:** November 09, 2023, 04:23:52 PM »

Oh, I need to read twice... My apologies, I did not made experiments for your issue on my side.

kdub_nz · « **Reply #7 on:** November 09, 2023, 04:36:06 PM »

ChatGPT

You could use Python,
or :

Quote

To create a speech-to-text feature in AutoCAD using AutoLISP, you can leverage the Windows Speech Recognition API through COM automation. Here's a basic example of AutoLISP code that uses the Windows Speech Recognition API to convert speech to text in AutoCAD:

```lisp
(defun speech-to-text ()
(vl-load-com)
(setq recognizer (vlax-create-object "SAPI.SpSharedRecognizer"))
(setq recognizer-ctx (vlax-get-property recognizer 'Recognizer))
(vlax-invoke-method recognizer-ctx 'SetInput "speech input" t)

(setq audio (vlax-create-object "SAPI.SpAudioFormat"))
(vlax-put-property audio 'Type :spATDictation)
(vlax-put-property audio 'FormatType :SAFT22kHz16BitMono)
(vlax-put-property recognizer 'AudioInput audio)

(setq reco-context (vlax-get-property recognizer 'CreateRecoContext))
(vlax-invoke-method reco-context 'SetNotifyWinEventSink reco-context)

(vlax-invoke-method reco-context 'SetInterest :SPEI_RECOGNITION :SPEI_RECOGNITION)

(while
(progn
(setq event (vlax-invoke-method reco-context 'WaitForNotifyEvent 500))
(if (= event :SPEI_RECOGNITION)
(progn
(setq result (vlax-get-property reco-context 'GetResult))
(setq phrase (vlax-get-property result 'PhraseInfo))
(setq text (vlax-get-property phrase 'GetText))
(princ "\nRecognized Text: ")
(princ text)
t)
t)))
(vlax-release-object recognizer)
)

(speech-to-text)
```

Please note that this code requires Windows Speech Recognition to be installed and properly configured on your system. Also, keep in mind that real-time speech recognition can be resource-intensive, so the performance may vary based on your system's capabilities.

Make sure to test and modify the code as needed to fit your specific requirements and environment.

Quote

Creating a speech-to-text functionality in AutoCAD using AutoLISP is not straightforward because AutoLISP does not have native support for speech recognition. However, you can achieve this by leveraging external tools and libraries. One way to do this is by using a Python script to handle speech recognition and then communicate with AutoCAD through the COM interface.

Here's an example of how you can achieve speech-to-text functionality in AutoCAD using a combination of AutoLISP and Python:

1. **Python Script (speech_to_text.py)**: Write a Python script that uses a speech recognition library to convert speech to text. You can use a library like SpeechRecognition, which supports various speech recognition engines.

```python
import speech_recognition as sr

def recognize_speech():
recognizer = sr.Recognizer()
with sr.Microphone() as source:
print("Say something:")
audio = recognizer.listen(source)
try:
text = recognizer.recognize_google(audio)
return text
except sr.UnknownValueError:
return "Speech Recognition could not understand audio"
except sr.RequestError as e:
return f"Could not request results from Google Speech Recognition service; {e}"

if __name__ == "__main__":
print(recognize_speech())
```

2. **AutoLISP Code**: Use AutoLISP to call the Python script and retrieve the speech-to-text output. AutoLISP can call external commands using the `COMMAND` function.

```lisp
(defun c:speech-to-text ()
(setq cmd (strcat "python " (vl-filename-mktemp nil nil ".py")))
(setq result (command cmd))
(princ (strcat "\nSpeech to text result: " result))
(princ)
)
```

In this AutoLISP code:

- `(vl-filename-mktemp nil nil ".py")` creates a temporary Python script file.
- `(command cmd)` executes the Python script using the `COMMAND` function, which runs an external command and returns the output.
- The result is then displayed in the AutoCAD command line.

To use this functionality, load the AutoLISP code into AutoCAD and type `SPEECH-TO-TEXT` in the command line. Make sure you have Python installed on your system and the SpeechRecognition library (`pip install SpeechRecognition`) to run the Python script.

Please note that this approach requires Python to be installed on your system and may require additional configuration based on your specific environment.

It's Alive! · « **Reply #8 on:** November 09, 2023, 04:54:46 PM »

I tried the python version here
https://forums.augi.com/showthread.php?177316-text-recognition&p=1355226&viewfull=1#post1355226

sphinx didn't recognize my voice very well. I think connecting it to google would be better

Code - Python: [Select]

import PyRx as Rx
import PyGe as Ge
import PyGi as Gi
import PyDb as Db
import PyAp as Ap
import PyEd as Ed
import traceback
 
import speech_recognition as sr
 
def PyRxCmd_doit():
    try:
        r = sr.Recognizer()
        with sr.Microphone() as source:
            print("Say something!")
            audio = r.listen(source)
        try:
            result =  r.recognize_sphinx(audio)
            Ap.DocManager().sendStringToExecute(Ap.curDoc(), result+"\n")
            print("Sphinx thinks you said " + result)
        except sr.UnknownValueError:
            print("Sphinx could not understand audio")
        except sr.RequestError as e:
            print("Sphinx error; {0}".format(e))
       
    except Exception as err:
        traceback.print_exception(err)
 

CodeDing · « **Reply #9 on:** November 09, 2023, 05:28:08 PM »

Yeah it looks like ChatGPT got about no further than I originally did. That code obviously doesn't execute (typical with the AutoLisp codes it generates lol).

I haven't tried the python yet. Hoping to accomplish this via Lisp, but if it comes to it, then so be it maybe I'll resort to Python/.NET if I have to.

It's Alive! · « **Reply #10 on:** November 09, 2023, 06:36:54 PM »

You’re on the right track with Sapi.

I would try to find this and port it (Simple Dictation for Visual Basic)
https://learn.microsoft.com/en-us/previous-versions/windows/desktop/ms720187(v=vs.85)

kdub_nz · « **Reply #11 on:** November 09, 2023, 07:06:45 PM »

I think with ChatGPT the skills required are asking the correct question and having the experience to test the response and recognise the options when/if the response seems unsuitable. . . . it's Definitely not always good magic.

CodeDing · « **Reply #12 on:** November 09, 2023, 07:28:19 PM »

I hope people can understand that I'm not just being lazy and that I've tried my very best to get this created. I just can NOT get something useful created, so I need something that truly shows me a useful example, and not just stepping blocks.

But don't just take my word for it. I'll do my best to show you my working steps..

So the VERY FIRST sentence here:
https://learn.microsoft.com/en-us/previous-versions/windows/desktop/ms720151(v=vs.85)#api-for-speech-recognition

...says this:

Quote

Just as ISpVoice is the main interface for speech synthesis, ISpRecoContext is the main interface for speech recognition.

...and reading this, I take 2 key words out:

Quote

ISpVoice & ISpRecoContext

...Looking at ISpVoice, I can see that it is used in this function for the Text-to-Speech tools (I'm just using one from this thread, but I already have a function that uses this api, and referenced it before starting this thread):

Code - Auto/Visual Lisp: [Select]

(setq sapi (vlax-create-object "Sapi.SpVoice"))
(vlax-invoke sapi "Speak" "I wish you the best of everything in New 2011 Year! Thanks for using KitoxToolset from www.kitox.com" 0)
(vlax-release-object sapi)

...Which would lead me to believe that I should probably be using ISpRecoContext when I start my Speech-to-Text approach. So, when I select that hyperlink in the first sencence (here's the page):
https://learn.microsoft.com/en-us/previous-versions/windows/desktop/ms718476(v=vs.85)

...then I read that portion, it states this:

Quote

A new ISpRecoContext object can be created by calling ISpRecognizer::CreateRecoContext.

...Well, now that tells me that I need to start with a ISpRecognizer... Well how do I make one of those? Check this page out..
https://learn.microsoft.com/en-us/previous-versions/windows/desktop/ms720099(v=vs.85)

...when I search the right column, we see our ISpRecognizer is created via the SpSharedRecognizer Object class. Well, that's a good sign because this code does in fact work:

Code - Auto/Visual Lisp: [Select]

(setq recognizer (vlax-create-object "SAPI.SpSharedRecognizer"))

...and it even pops up the Listener on my machine:
[not sure how to post images, see attached image "listener_capture.png"]

So now... what's next? Well, remember we were told earlier that this object can create ISpRecoContext. So let's dump our 'recognizer' properties to check:

Code: [Select]

Command: (vlax-dump-object recognizer t)
; ISpeechRecognizer: ISpeechRecognizer Interface
; Property values:
;   AudioInput = #<VLA-OBJECT ISpeechObjectToken 00000206e310a410>
;   AudioInputStream = Exception occurred
;   IsShared (RO) = -1
;   Profile = #<VLA-OBJECT ISpeechObjectToken 00000206e310a6e0>
;   Recognizer = #<VLA-OBJECT ISpeechObjectToken 00000206e310ae30>
;   State = 2
;   Status (RO) = #<VLA-OBJECT ISpeechRecognizerStatus 00000206e3b60c60>
; Methods supported:
;   CreateRecoContext ()
;   DisplayUI (4)
;   EmulateRecognition (3)
;   GetAudioInputs (2)
;   GetFormat (1)
;   GetProfiles (2)
;   GetRecognizers (2)
;   IsUISupported (2)

...Well look at that! A supported method is "CreateRecoContext ()". So let's do that:

Code: [Select]

Command: (setq reco-context (vlax-invoke recognizer 'CreateRecoContext))
#<VLA-OBJECT ISpeechRecoContext 00000206e310d600>
Command: (vlax-dump-object reco-context t)
; ISpeechRecoContext: ISpeechRecoContext Interface
; Property values:
;   AudioInputInterferenceStatus (RO) = 0
;   CmdMaxAlternates = 0
;   EventInterests = 327679
;   Recognizer (RO) = #<VLA-OBJECT ISpeechRecognizer 00000206e3b60288>
;   RequestedUIType (RO) = ""
;   RetainedAudio = 0
;   RetainedAudioFormat = #<VLA-OBJECT ISpeechAudioFormat 00000206e310a530>
;   State = 1
;   Voice = #<VLA-OBJECT ISpeechVoice 00000206e310de00>
;   VoicePurgeEvent = 0
; Methods supported:
;   Bookmark (3)
;   CreateGrammar (1)
;   CreateResultFromMemory (1)
;   Pause ()
;   Resume ()
;   SetAdaptationData (1)

...Great, so now what? When I circle back to the ISpRecognizer documentation..
https://learn.microsoft.com/en-us/previous-versions/windows/desktop/ms718548(v=vs.85)

...It doesn't even say what to do next? So this is where I get lost. I have 2 useful objects, but no guidance on what to do with them. I can see all of the methods down at the bottom, but they definitely don't match my listed options when I dump my 'recognizer' object... So that's where this dies.

I've tried looking at VBA codes to see if I can extrapolate those somehow. But no luck.

So please, please, if someone can provide some useable code to get further along, that would be super cool.. because when I summarize what I'm able to accomplish.. it looks like this LoL:

Code - Auto/Visual Lisp: [Select]

(defun c:TEST ( / recognizer reco-context)
  (setq recognizer (vlax-create-object "SAPI.SpSharedRecognizer"))
  (vlax-dump-object recognizer t)
  (setq reco-context (vlax-invoke recognizer 'CreateRecoContext))
  (vlax-dump-object reco-context t)
  (vlax-release-object recognizer)
  (princ)
)
 

Best,
~DD

It's Alive! · « **Reply #13 on:** November 09, 2023, 07:58:40 PM »

Not lazy, you’re exploring areas where few have.
Hoping someone would throw out a sample was a long shot at best
Now its plan B, get the VB dictation sample, grind through it line by line

CodeDing · « **Reply #14 on:** November 09, 2023, 08:19:57 PM »

It's Alive!,

Can you post or point me to the VB Dictation Sample code?
I can't seem to find it by searching. I don't have Visual Studio installed on this machine. I have VS Code.

EDIT:
Nvm, just found it!
https://learn.microsoft.com/en-us/previous-versions/windows/desktop/ms720590(v=vs.85)

Best,
~DD

News:

Author Topic: Windows Speech to Text (STT) via AutoLisp / Visual Lisp? (Read 2615 times)

CodeDing

Windows Speech to Text (STT) via AutoLisp / Visual Lisp?

Vaidas

Re: Windows Speech to Text (STT) via AutoLisp / Visual Lisp?

JohnK

Re: Windows Speech to Text (STT) via AutoLisp / Visual Lisp?

Vaidas

Re: Windows Speech to Text (STT) via AutoLisp / Visual Lisp?

kdub_nz

Re: Windows Speech to Text (STT) via AutoLisp / Visual Lisp?

CodeDing

Re: Windows Speech to Text (STT) via AutoLisp / Visual Lisp?

Vaidas

Re: Windows Speech to Text (STT) via AutoLisp / Visual Lisp?

kdub_nz

Re: Windows Speech to Text (STT) via AutoLisp / Visual Lisp?

It's Alive!

Re: Windows Speech to Text (STT) via AutoLisp / Visual Lisp?

CodeDing

Re: Windows Speech to Text (STT) via AutoLisp / Visual Lisp?

It's Alive!

Re: Windows Speech to Text (STT) via AutoLisp / Visual Lisp?

kdub_nz

Re: Windows Speech to Text (STT) via AutoLisp / Visual Lisp?

CodeDing

Re: Windows Speech to Text (STT) via AutoLisp / Visual Lisp?

It's Alive!

Re: Windows Speech to Text (STT) via AutoLisp / Visual Lisp?

CodeDing

Re: Windows Speech to Text (STT) via AutoLisp / Visual Lisp?