How to convert speech to text using JavaScript?

Javascript Web Development Front End Technology

Overview

To convert the spoken words to the text we generally use the Web Speech API’s component that is “SpeechRecognition”. The SpeechRecognition component recognizes the spoken words in the form of audio and converts them to the text. The spoken words are stored in it in an array which are then displayed inside the HTML element on the browser screen.

Syntax

The basic syntax used in is −

let recognization = new webkitSpeechRecognition();

We can also use SpeechRecognition() instead of webkitSpeechRecognition() as webkitSpeechRecognition() is used in chrome and apple safari browser for speech recognition.

Algorithm

Step 1 − Create a HTML page as given below, create a HTML button using <button> tag. Add an onclick event in it with the function name “runSpeechRecog()”. Also create a <p> tag with id “action” in it.
Step 2 − Create a runSpeechRecog() arrow function inside a script tag as we are using internal javascript.
Step 3 − Select the “p” tag of HTML using Document Object Model (DOM) as document.getElementById(). Store it in a variable.
Step 4 − Create an object of a webkitSpeechRecognition() constructor and store it in a reference variable. So that all the methods of webkitSpeechRecognition() class will be in the reference variable.

let recognization = new webkitSpeechRecognition();

Step 5 − Use “recognition.onstart()“, this function will return the action when the recognition is started.

recognization.onstart = () => {
   action.innerHTML = "Listening...";
}

Step 6 − Now use recognition.onresult() to display the spoken words on the screen.

recognization.onresult = (e) => {
   var transcript = e.results[0][0].transcript;
   var confidence = e.results[0][0].confidence;
   output.innerHTML = transcript;
   output.classList.remove("hide")
   action.innerHTML = "";
}

Step 7 − Use the recognition.start() method to start the speech recognition.

recognization.start();

Example

<html>
<head>
   <title>Speech to text</title>
</head>
   <body>
      <div class="speaker" style="display: flex;justify-content: space-between;width: 13rem;box-shadow: 0 0 13px #0000003d;border-radius: 5px;">
         <p id="action" style="color: grey;font-weight: 800; padding: 0; padding-left: 2rem;"></p>
         <button onclick="runSpeechRecog()" style="border: transparent;padding: 0 0.5rem;">
            Speech
         </button>
      </div>
      <h3 id="output" class="hide"></h3>
      <script>
         runSpeechRecog = () => {
            document.getElementById("output").innerHTML = "Loading text...";
            var output = document.getElementById('output');
            var action = document.getElementById('action');
            let recognization = new webkitSpeechRecognition();
            recognization.onstart = () => {
               action.innerHTML = "Listening...";
            }
            recognization.onresult = (e) => {
               var transcript = e.results[0][0].transcript;
               output.innerHTML = transcript;
               output.classList.remove("hide")
               action.innerHTML = "";
            }
            recognization.start();
         }
      </script>
   </body>
</html>

Description

When the “runSpeechRecog()” function is triggered the webkitSpeechRecognition() is initialized and all the properties of this are stored in the reference and shows the below output as the browser is ready to listen to the user's spoken words.

When the user has stopped speaking the sentence, the result is stored in the form of an array of words. Then these words are returned as a transcript of a sentence on the user browser screen. For example a user runs this speech to text program on its browser and presses the speech button and start speaking as “tutorialpoint.com”, as user stops speaking the speech recognition program will stop and will display the transcript on the browser as “tutorialpoint.com”.

Conclusion

The Web Speech API of JavaScript is used in many types of applications. As the web speech api has two different components as SpeechRecognition API which is used for speech-text conversion and SpeechSynthesis API which is used for text-speech conversion. The above SpeechRecognition is supported for the browser Chrome, Apple Safari, Opera.

Aman Gupta

Updated on: 24-Mar-2023

11K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started