Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Building Voice-controlled Applications with JavaScript and Speech Recognition APIs
Voice-controlled applications have become increasingly popular in recent years, allowing users to interact with technology through speech rather than traditional input methods. JavaScript, being one of the most widely used programming languages for web development, provides a powerful platform for building such applications. In this article, we will explore how to utilise JavaScript and Speech Recognition APIs to create voice-controlled applications. We will dive into the process of setting up speech recognition, capturing and processing user speech, and implementing voice commands in your applications.
Setting Up Speech Recognition
Before we start building our voice-controlled application, we need to set up the speech recognition functionality. Fortunately, modern web browsers provide built-in support for the Web Speech API, which allows developers to leverage speech recognition capabilities.
Let's take a look at how to initialise the speech recognition API in JavaScript:
<!DOCTYPE html>
<html>
<head>
<title>Voice Recognition Setup</title>
</head>
<body>
<h2>Speech Recognition Demo</h2>
<button id="startBtn">Start Listening</button>
<div id="output"></div>
<script>
// Check browser support for speech recognition
if ('SpeechRecognition' in window || 'webkitSpeechRecognition' in window) {
// Create a new instance of the SpeechRecognition object
const recognition = new (window.SpeechRecognition || window.webkitSpeechRecognition)();
// Configure recognition settings
recognition.continuous = true; // Enable continuous speech recognition
recognition.interimResults = false; // Do not return interim results
// Event handler for when speech is recognized
recognition.onresult = (event) => {
const result = event.results[event.results.length - 1][0].transcript;
document.getElementById('output').innerHTML = 'Recognized speech: ' + result;
};
// Start speech recognition
document.getElementById('startBtn').onclick = () => {
recognition.start();
};
} else {
document.getElementById('output').innerHTML = 'Speech recognition not supported';
}
</script>
</body>
</html>
Explanation
In the code snippet above, we first check if the browser supports speech recognition by checking the existence of the SpeechRecognition or webkitSpeechRecognition objects. If supported, we create a new instance of the SpeechRecognition object and configure its settings. We set continuous to true to allow continuous speech recognition and interimResults to false to only receive final results. Finally, we define an event handler onresult to process the recognized speech.
If speech recognition is supported in the browser, it will start listening for speech input when the button is clicked. Once speech is recognized, it will display the recognized speech on the webpage.
Capturing and Processing User Speech
Now that we have set up speech recognition, we need to capture and process user speech in our voice-controlled application. The onresult event handler we defined earlier provides us with the recognized speech.
Let's extend our previous code to capture user speech and process it:
<!DOCTYPE html>
<html>
<head>
<title>Speech Processing</title>
</head>
<body>
<h2>Voice Command Processing</h2>
<button id="startBtn">Start Listening</button>
<div id="output"></div>
<div id="commands"></div>
<script>
if ('SpeechRecognition' in window || 'webkitSpeechRecognition' in window) {
const recognition = new (window.SpeechRecognition || window.webkitSpeechRecognition)();
recognition.continuous = true;
recognition.interimResults = false;
// Event handler for when speech is recognized
recognition.onresult = (event) => {
const result = event.results[event.results.length - 1][0].transcript;
document.getElementById('output').innerHTML = 'Recognized speech: ' + result;
// Process the recognized speech
processSpeech(result);
};
// Function to process the recognized speech
function processSpeech(speech) {
const commandDiv = document.getElementById('commands');
// Perform actions based on the recognized speech
if (speech.toLowerCase().includes('hello')) {
commandDiv.innerHTML = 'User greeted with "hello"';
// Perform greeting action
} else if (speech.toLowerCase().includes('search')) {
commandDiv.innerHTML = 'User wants to search';
// Perform search action
} else {
commandDiv.innerHTML = 'Unrecognised speech';
}
}
document.getElementById('startBtn').onclick = () => {
recognition.start();
};
} else {
document.getElementById('output').innerHTML = 'Speech recognition not supported';
}
</script>
</body>
</html>
Explanation
In the updated code snippet, we have added a function processSpeech to handle the recognized speech. Inside this function, we can perform various actions based on the content of the recognized speech. In the example, we check if the speech includes the word "hello" or "search" and display appropriate messages. You can customise the actions based on your application's requirements.
Assuming the user speaks the word "hello" or "search," the corresponding message will appear on the webpage. If the recognized speech does not match any predefined phrases, it will display "Unrecognised speech."
Implementing Voice Commands
<!DOCTYPE html>
<html>
<head>
<title>Voice Commands</title>
</head>
<body>
<h2>Media Control Voice Commands</h2>
<button id="startBtn">Start Listening</button>
<div id="output"></div>
<div id="commands"></div>
<div id="status">Status: Ready</div>
<script>
if ('SpeechRecognition' in window || 'webkitSpeechRecognition' in window) {
const recognition = new (window.SpeechRecognition || window.webkitSpeechRecognition)();
recognition.continuous = true;
recognition.interimResults = false;
// Event handler for when speech is recognized
recognition.onresult = (event) => {
const result = event.results[event.results.length - 1][0].transcript;
document.getElementById('output').innerHTML = 'Recognized speech: ' + result;
// Process the recognized speech
processSpeech(result);
};
// Function to process the recognized speech
function processSpeech(speech) {
const commandDiv = document.getElementById('commands');
const statusDiv = document.getElementById('status');
const lowerSpeech = speech.toLowerCase();
// Perform actions based on the recognized speech
if (lowerSpeech.includes('play')) {
commandDiv.innerHTML = 'Command: Play';
statusDiv.innerHTML = 'Status: Playing...';
// Perform play action
} else if (lowerSpeech.includes('stop')) {
commandDiv.innerHTML = 'Command: Stop';
statusDiv.innerHTML = 'Status: Stopped';
// Perform stop action
} else if (lowerSpeech.includes('volume up')) {
commandDiv.innerHTML = 'Command: Volume Up';
statusDiv.innerHTML = 'Status: Volume increased';
// Perform volume up action
} else if (lowerSpeech.includes('volume down')) {
commandDiv.innerHTML = 'Command: Volume Down';
statusDiv.innerHTML = 'Status: Volume decreased';
// Perform volume down action
} else {
commandDiv.innerHTML = 'Unrecognised speech';
statusDiv.innerHTML = 'Status: Command not found';
}
}
document.getElementById('startBtn').onclick = () => {
recognition.start();
document.getElementById('status').innerHTML = 'Status: Listening...';
};
} else {
document.getElementById('output').innerHTML = 'Speech recognition not supported';
}
</script>
</body>
</html>
Explanation
In the updated code snippet, we have extended the processSpeech function to include voice commands such as "play," "stop," "volume up," and "volume down." When the recognized speech matches any of these commands, the corresponding actions are performed. You can customise the voice commands and actions based on your application's requirements.
If the recognized speech matches any of the voice commands, the corresponding message will appear on the webpage. For example, if the user says "play," the page will display "Command: Play" and update the status. If the recognized speech does not match any predefined commands, it will show "Unrecognised speech."
Browser Compatibility
The Web Speech API is supported in most modern browsers including Chrome, Firefox, Safari, and Edge. However, some browsers may require the webkit prefix, which is why we check for both SpeechRecognition and webkitSpeechRecognition in our code.
Key Points
- Always check browser support before implementing speech recognition
- Use
continuous: truefor ongoing speech recognition - Convert speech to lowercase for better command matching
- Provide user feedback for recognized commands and status updates
- Handle unrecognized speech gracefully with appropriate messages
Conclusion
Voice-controlled applications offer an intuitive and convenient way for users to interact with technology. By leveraging JavaScript and the Speech Recognition APIs, developers can build powerful voice-controlled applications that enhance user experience and accessibility.
