If you don’t know about it yet, the HTML5 web speech API specification is now working on Google Chrome and partially in the Apple Safari browser(See the browser support status here). That means you can now develop voice-driven web applications. We hope that other browsers will start supporting this very soon as well. In this tutorial, I will explain how we can begin developing an application that uses this. I will also refer you to a small wrapper library with the easy-to-use abstraction I wrote recently.
Voice To Text API:
We can alternatively mention it as ‘Speech Recognition API’ as well. What it does is capture the user’s voice through the input system and convert it to text. So, there is a need for voice recognition technology here. This feature is currently supported only in the Google Chrome browser. By default, it uses Google’s voice recognition service. Here is a code example of implementing it:
var recognizer = new webkitSpeechRecognition();
recognizer.lang = "en";
recognizer.onresult = function(event) {
if (event.results.length > 0) {
var result = event.results[event.results.length-1];
if(result.isFinal) {
console.log(result[0].transcript);
}
}
};
recognizer.start();
Code language: JavaScript (javascript)
The above code snippet will request permission from the user to allow input through microphone access. Then, it will capture the sound you talk to, send it to an external service for recognition, and get the result back inside the ‘onresult’ event handler. Thus, we can see the output in the browser’s console window.
This class definition also exposes an optional ‘serviceURI’ property, which you can use to define the service URL you like to use for voice recognition.
Text To Voice API:
Text-to-voice conversion is a simple way to play a given text in a robotic voice. Here is a simple code snippet for this:
var su = new SpeechSynthesisUtterance();
su.lang = "en";
su.text = "Hello World";
speechSynthesis.speak(su);
Code language: JavaScript (javascript)
As you can see, it’s pretty straightforward. We need to pass the desired language and text to it, and it’s all set to play.
The Wrapper Library And Usage Example:
As I already experienced, some steps are obvious and can be made simple with a simple wrapper. Thus, to make your life easier, I started a small JavaScript library to ease the use of this API. Here is the GitHub link:
https://github.com/ranacseruet/webspeech
It’s also registered as a bower package too! So, if you are using Bower for front-end package management, you can run commands like:
$bower install webspeech
Code language: PHP (php)
And you should be just fine!
Here is a very simple-to-use example:
<input id="text">
<button onclick="talk()">Talk It!</button>
<button onclick="listen()">Voice</button>
<script src="../bower_components/platform/platform.js"></script>
<script src="../src/webspeech.js"></script>
<script>
var speaker = new webspeech.Speaker();
var listener = new webspeech.Listener();
function talk() {
speaker.speak("en", document.getElementById("text").value);
}
function listen() {
listener.listen("en", function(text) {
document.getElementById("text").value = text;
});
}
</script>
Code language: HTML, XML (xml)
Final Words:
Even Chrome for Windows has another issue: It doesn’t support continuously capturing voice; instead, you will have to allow it on the browser every time you want to say something. However, there is a workaround to get rid of this annoying access allowance: host your application on SSL. Then, only one-time access will work for all later times.
I hope this simple tutorial will help you get started with HTML5 web speech API easily and without much difficulty. If you are facing some issues, please let me know via comments. Happy coding 🙂
tung says
Useful. Thank you!
Girish says
Hi
From where can i download the ../bower_components/platform/platform.js file
Md Ali Ahsan Rana says
Please refer to ‘install’ section on the documention: https://github.com/ranacseruet/webspeech . Hope it helps.
hi , it was work thank you , vut how can ı change language? ı want to use turkish. this is posible can you help me how can ı do it.
with the best regards.
Amazing! Do you know if it´s ok to use this Voice-To-Text-API for commercial use?
Don’t have much idea. But if its web app and as browsers are starting to support it, I don’t see much of a problem in any way.
text-to-voice ok, but voice-to-text don’t work why?
excuse me. but where is the ../bower_components/platform/platform.js file and also why the voice to text doesn’t work.
Hey Tomas,
If you replace the bower_components to node_modules. It’ll work.
How can you implement this on visual studio, I am trying to create an asp.net mvc site that allows speech to text and then translation of that text. I need the speech recognition to be able to understand arabic and translate it to english.
The text to voice conversion works fine for me. But voice-text isn’t working. Please help me. Do I need upload the files to webserver?
Hello Sir,
i need to use the web API speech to text converter and insert it in my matlab script, how can copy the converted voice data i mean the text to a separated text file?
please advise.
thanks in advance
is not working for telugu script?? why could you please explian???
Dear Mr Md Ali Ahsan Rana
About your article bellow, I think it’s great. I’m a student, and I really like this part of the computer science. I wonder if you can help me with an issue?
Talk It!
Voice
http://../bower_components/platform/platform.js
http://../src/webspeech.js
var speaker = new webspeech.Speaker();
var listener = new webspeech.Listener();
function talk() {
speaker.speak(“en”, document.getElementById(“text”).value);
}
function listen() {
listener.listen(“en”, function(text) {
document.getElementById(“text”).value = text;
});
}
How can I do, in the article above, to store “text” in a variable?
I wish I could use this variable later, for another purpose.
Your command line:
Speaker.speak (“en”, document.getElementById (“text”).
Just get the id = “text” value, and use spearker.speak. However, once this line has been made, I no longer have access to this value.
I would like to be able to store the “text” in a variable and display it on the screen to verify that it was actually stored. Can you help?
tank you,
Tcharles
Iam using four text boxes in my html code and i want the html5 web speech api to read all the four text box value, but it reads only the last text textbox value. Can you help me! what goes wrong?