Text to Voice conversion using Web Speech API of Google Chrome

Nowadays, audiobooks are more preferred by readers over traditional books as they can absorb knowledge while multitasking. Many websites also include audio versions of articles, allowing users to listen instead of reading.

To convert text to speech, we use the Web Speech API available in modern browsers. In this tutorial, we will learn how to use the Web Speech API to convert text to voice with practical examples.

Syntax

Users can follow the syntax below to use the Web Speech API for text-to-speech conversion.

var synth = window.speechSynthesis;
var speechObj = new SpeechSynthesisUtterance(text);
synth.speak(speechObj);

In the above syntax, we initialize the SpeechSynthesisUtterance() object with the text to speak as a parameter. Then we use the speak() method to convert text to audio.

Basic Text-to-Speech Example

The example below demonstrates basic usage of the Web Speech API. We use an HTML textarea for text input and a button to trigger speech conversion.

<html>
<body>
   <h3>Text to Voice Conversion using Web Speech API</h3>
   <div class="container">
      <textarea name="text" id="text" cols="30" rows="10">Add text to speak.</textarea>
      <br><br>
      <button id="speak">Speak Text</button>
   </div>
   
   <script>
      var synth = window.speechSynthesis;
      var speak = document.getElementById("speak");
      
      speak.addEventListener("click", () => {
         var text = document.getElementById("text").value;
         var speechObj = new SpeechSynthesisUtterance(text);
         synth.speak(speechObj);
      });
   </script>
</body>
</html>

When users click the "Speak Text" button, the browser will read aloud the text entered in the textarea using the default voice and settings.

Advanced Text-to-Speech with Controls

This example demonstrates advanced features including voice selection, rate and pitch control, and playback controls (pause, resume, cancel).

<html>
<head>
   <style>
      textarea {
         border: 2px solid green;
         width: 500px;
      }
      .controls {
         margin-top: 10px;
      }
      .controls label {
         display: inline-block;
         width: 60px;
      }
   </style>
</head>
<body>
   <h3>Advanced Text to Voice using Web Speech API</h3>
   <div class="container">
      <textarea name="text" id="text" cols="30" rows="10">Add text to speak.</textarea>
      
      <div class="controls">
         <label for="voice-select">Voice:</label>
         <select name="voice" id="voice-select"></select>
         <br><br>
         
         <label for="rate-select">Rate:</label>
         <input type="range" name="rate" id="rate-select" min="0.1" max="2" step="0.1" value="1">
         <span id="rate-value">1.0</span>
         <br><br>
         
         <label for="pitch-select">Pitch:</label>
         <input type="range" name="pitch" id="pitch-select" min="0" max="2" step="0.1" value="1">
         <span id="pitch-value">1.0</span>
         <br><br>
         
         <button id="btn">Speak</button>
         <button id="pause">Pause</button>
         <button id="resume">Resume</button>
         <button id="cancel">Cancel</button>
      </div>
   </div>
   
   <script>
      // Access DOM elements
      const textarea = document.getElementById('text');
      const voiceSelect = document.getElementById('voice-select');
      const rateSelect = document.getElementById('rate-select');
      const pitchSelect = document.getElementById('pitch-select');
      const rateValue = document.getElementById('rate-value');
      const pitchValue = document.getElementById('pitch-value');
      const speakBtn = document.getElementById('btn');
      const pauseBtn = document.getElementById('pause');
      const resumeBtn = document.getElementById('resume');
      const cancelBtn = document.getElementById('cancel');
      
      // Initialize Speech API
      const speechSynth = window.speechSynthesis;
      let voices = [];
      
      function populateVoiceList() {
         voices = speechSynth.getVoices();
         let voiceOptions = '';
         
         voices.forEach((voice, index) => {
            const option = `${voice.name} (${voice.lang})`;
            voiceOptions += `<option data-name="${voice.name}" data-lang="${voice.lang}">${option}</option>`;
         });
         
         voiceSelect.innerHTML = voiceOptions;
      }
      
      // Load voices when available
      speechSynth.onvoiceschanged = populateVoiceList;
      
      function textToSpeech() {
         if (textarea.value.trim() !== '') {
            const utterance = new SpeechSynthesisUtterance(textarea.value);
            
            // Set selected voice
            const selectedOption = voiceSelect.selectedOptions[0];
            const selectedVoiceName = selectedOption.getAttribute('data-name');
            
            voices.forEach(voice => {
               if (voice.name === selectedVoiceName) {
                  utterance.voice = voice;
               }
            });
            
            // Set rate and pitch
            utterance.rate = rateSelect.value;
            utterance.pitch = pitchSelect.value;
            
            // Error handling
            utterance.error = (event) => {
               console.error('Speech synthesis error:', event.error);
            };
            
            speechSynth.speak(utterance);
         }
      }
      
      // Update rate display
      rateSelect.addEventListener('change', () => {
         rateValue.textContent = rateSelect.value;
      });
      
      // Update pitch display
      pitchSelect.addEventListener('change', () => {
         pitchValue.textContent = pitchSelect.value;
      });
      
      // Event listeners for controls
      speakBtn.addEventListener('click', (e) => {
         e.preventDefault();
         textToSpeech();
      });
      
      pauseBtn.addEventListener('click', (e) => {
         e.preventDefault();
         speechSynth.pause();
      });
      
      resumeBtn.addEventListener('click', (e) => {
         e.preventDefault();
         speechSynth.resume();
      });
      
      cancelBtn.addEventListener('click', (e) => {
         e.preventDefault();
         speechSynth.cancel();
      });
   </script>
</body>
</html>

Key Features

  • Voice Selection: Choose from available system voices in different languages
  • Rate Control: Adjust speaking speed from 0.1 (very slow) to 2.0 (very fast)
  • Pitch Control: Modify voice pitch from 0 (low) to 2.0 (high)
  • Playback Controls: Pause, resume, or cancel speech synthesis

Browser Compatibility

The Web Speech API is supported in modern browsers including Chrome, Firefox, Safari, and Edge. However, available voices may vary between operating systems and browsers.

Conclusion

The Web Speech API provides a powerful way to convert text to speech in web applications. With basic implementation, you can create simple text-to-speech functionality, while advanced features allow full control over voice characteristics and playback.

Updated on: 2026-03-15T23:19:01+05:30

1K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements