Building Voice-controlled Applications with JavaScript and Speech Recognition APIs

Javascript Web Development Front End Technology

Voice-controlled applications have become increasingly popular in recent years, allowing users to interact with technology through speech rather than traditional input methods. JavaScript, being one of the most widely used programming languages for web development, provides a powerful platform for building such applications. In this article, we will explore how to utilise JavaScript and Speech Recognition APIs to create voice-controlled applications. We will dive into the process of setting up speech recognition, capturing and processing user speech, and implementing voice commands in your applications.

Setting Up Speech Recognition

Before we start building our voice-controlled application, we need to set up the speech recognition functionality. Fortunately, modern web browsers provide built-in support for the Web Speech API, which allows developers to leverage speech recognition capabilities.

Let's take a look at how to initialise the speech recognition API in JavaScript −

// Check browser support for speech recognition
if ('SpeechRecognition' in window || 'webkitSpeechRecognition' in window) {
   // Create a new instance of the SpeechRecognition object
   const recognition = new (window.SpeechRecognition || window.webkitSpeechRecognition)();

   // Configure recognition settings
   recognition.continuous = true; // Enable continuous speech recognition
   recognition.interimResults = false; // Do not return interim results

   // Event handler for when speech is recognized
   recognition.onresult = (event) => {
      const result = event.results[event.results.length - 1][0].transcript;
      console.log('Recognized speech:', result);
   };

   // Start speech recognition
   recognition.start();
} else {
   console.log('Speech recognition not supported');
}

Explanation

In the code snippet above, we first check if the browser supports speech recognition by checking the existence of the SpeechRecognition or webkitSpeechRecognition objects. If supported, we create a new instance of the SpeechRecognition object and configure its settings. We set continuous to true to allow continuous speech recognition and interimResults to false to only receive final results. Finally, we define an event handler onresult to process the recognized speech.

If speech recognition is supported in the browser, it will start listening for speech input. Once speech is recognized, it will log the recognized speech in the console.

Capturing and Processing User Speech

Now that we have set up speech recognition, we need to capture and process user speech in our voice-controlled application. The onresult event handler we defined earlier provides us with the recognized speech.

Let's extend our previous code to capture user speech and process it −

// ...

// Event handler for when speech is recognized
recognition.onresult = (event) => {
   const result = event.results[event.results.length - 1][0].transcript;
   console.log('Recognized speech:', result);

   // Process the recognized speech
   processSpeech(result);
};

// Function to process the recognized speech
function processSpeech(speech) {
   // Perform actions based on the recognized speech
   if (speech.includes('hello')) {
      console.log('User greeted with "hello"');
      // Perform greeting action
      // ...
   } else if (speech.includes('search')) {
      console.log('User wants to search');
      // Perform search action
      // ...
   } else {
      console.log('Unrecognised speech');
   }
}

Explanation

In the updated code snippet, we have added a function processSpeech to handle the recognized speech. Inside this function, we can perform various actions based on the content of the recognized speech. In the example, we check if the speech includes the word "hello" or "search" and log appropriate messages. You can customise the actions based on your application's requirements.

Assuming the user speaks the word "hello" or "search," the corresponding log message will appear in the console. If the recognized speech does not match any predefined phrases, it will log "Unrecognised speech."

Implementing Voice Commands

// ...

// Event handler for when speech is recognized
recognition.onresult = (event) => {
   const result = event.results[event.results.length - 1][0].transcript;
   console.log('Recognized speech:', result);

   // Process the recognized speech
   processSpeech(result);
};

// Function to process the recognized speech
function processSpeech(speech) {
   // Perform actions based on the recognized speech
   if (speech.includes('play')) {
      console.log('User wants to play');
      // Perform play action
      // ...
   } else if (speech.includes('stop')) {
      console.log('User wants to stop');
      // Perform stop action
      // ...
   } else if (speech.includes('volume up')) {
      console.log('User wants to increase volume');
      // Perform volume up action
      // ...
   } else if (speech.includes('volume down')) {
      console.log('User wants to decrease volume');
      // Perform volume down action
      // ...
   } else {
      console.log('Unrecognised speech');
   }
}

Explanation

In the updated code snippet, we have extended the processSpeech function to include voice commands such as "play," "stop," "volume up," and "volume down." When the recognized speech matches any of these commands, the corresponding actions are performed. You can customise the voice commands and actions based on your application's requirements.

If the recognized speech matches any of the voice commands, the corresponding log message will appear in the console. For example, if the user says "play," the console will log "User wants to play." If the recognized speech does not match any predefined commands, it will log "Unrecognised speech."

Conclusion

Voice-controlled applications offer an intuitive and convenient way for users to interact with technology. By leveraging JavaScript and the Speech Recognition APIs, developers can build powerful voice-controlled applications. In this article, we explored the process of setting up speech recognition, capturing and processing user speech, and implementing voice commands in JavaScript. We learned how to initialise the speech recognition API, capture user speech, process it based on predefined phrases, and perform actions accordingly. With this knowledge, you can now embark on building your own voice-controlled applications and provide a seamless user experience.

In conclusion, the combination of JavaScript and Speech Recognition APIs opens up a whole new world of possibilities for voice-controlled applications. Whether you're developing a virtual assistant, a hands-free control system, or any other voice-driven application, JavaScript and speech recognition offer the tools you need to create an engaging user experience.

Mukul Latiyan

Updated on: 25-Jul-2023

167 Views

Kickstart Your Career

Get certified by completing the course

Get Started