Building speech driven apps in Windows Phone is quite easy. There are three key concepts that you need to become familiar with. These concepts are: Voice commands, text to speech and voice recognition.
Voice commands are, like the name implies, directives that the OS have the ability to understand and act on them when spoken by the user, while activating your application and passing as a parameter any additional information so your app can respond accordingly. Since the voice commands exists at the OS level – which means that your app is not active- you have to register them the first time your application runs.
The next concept that is important to understand is text to speech, again, like the name rightly suggest, this functionality is about converting words to the corresponding spoken sounds. You do this by leveraging the Windows Phone speech API, which in turn uses the voices installed in your phone. By default you have two voices, male and female. Besides the gender of the voice, these voices have a specific pronunciation according to the culture that you choose when you first configure your phone.
And the final concept is voice recognition. From the engineering’s perspective, this is the most difficult feature to deliver. Just consider the complexity of understanding speech from a vast population where each person doesn’t exactly make the same sound –it’s a known fact that each person has a unique voice, literally.
Fortunately, the speech recognition API hides all this complexity from you. Beneath the covers the API leverages Bing for free text dictation scenarios. You can also narrow the scope of the recognition by using custom grammars, and by doing so you can implement a solution where your app, instead of identifying fixed words or phrases identifies the semantic content of a phrase that can be spoken in several ways. Since custom grammars significantly reduce the scope of the patterns that need to be recognized, this is something that the API can perform with great accuracy without relying on the Bing backend.
HAL the Calculator
To help understand how these concepts apply in a actual implementation of a Windows Phone app. I am going to describe how you can create a simple voice driven calculator.
The app is inspired by HAL 9000, a very determined computer with an agenda, from the classic movie 2001 Space Odyssey. I remember the first time I saw the movie and remember how cool and natural was the interaction between humans and the computer just by using voice.
What’s even cooler now, is that we could easily build apps that mimic such interaction! Check out a quick demo of the application on the video below
In my next post, I will start discussing the implementation of the app, starting with voice commands. Stay tune!