Friday, May 3, 2013

Speech to Text Apis

Hi All,
Most of the people don’t prefer to write. Instead, they search for alternate options in order to save time. If nothing is feasible, then only they choose to write. This paved way for lot of technological advancements like Xerox machine, Scanner, Printer etc. Same way, when it comes to mobile device, people don’t prefer to type because of the smaller size keyboard and other constraints.

One of my friends recently bought a big screen smartphone. It is not comfortable to place the phone in pocket because of the size. I asked him why didn’t you buy a normal size phone. He replied that in this big screen itself he had difficulties in keying-in the keypad letters. Seems most of the time he touched two letters instead of one.

Same way, when you are driving or doing some other work, you will find it difficult to pick the phone and type. This is the place where speech to text conversion plays an important role. I used couple of speech to text APIs in my iOS apps. Here are some frameworks which might be useful for you too.

1. OpenEars
This is a open source framework available for iOS. We can directly download the framework and start developing the apps. It supports both speech to text and text to speech conversion. This works completely offline. The only downfall is since it works offline, you have to define a dictionary of words you want it to recognize.

This library is very much useful when we want to pick only certain words from speech. e.g.: I used this library for navigating views and filling datas in an application. When the user says “Go to Home”, the app will navigate to home page. Also I can fill the form data (refer screenshot of the app) as, “set current age 25″, “set gender male” so the app will fill the data properly.



Here I created the dictionary with the words I want the app to recognize. For directly converting all the speech to text, we have to build the dictionary containing all the words so that it will recognize the speech and convert it into text. Words database are already available in the net. We can directly use that or we can use our own set of words.
http://www.politepix.com/openears/

2. SpeechKit
This is another framework which will convert speech to text using a server. Implementing this framework is much easier than OpenEars. We don’t have to define dictionary of words or anything, just open the view already available with the framework. (it will give us the UI also). It will recognize the speech and send us the text in a callback function. The only drawback is that it needs internet connectivity.

 

http://dragonmobile.nuancemobiledeveloper.com/public/Help/DragonMobileSDKReference_iOS/SpeechKit_Guide/Basics.html
Also there are lot of other frameworks available for converting speech to text. We have to decide the one which fulfills our needs.

Bye Till Next,

- Jeyabalaji

4 comments:

  1. Super da ! Cant SIRI be reused here ? Dont they give any frameworks ?

    ReplyDelete
  2. No da. We can't use siri. Apple didn't give any APIs. Hope they will give APIs in future.

    ReplyDelete
  3. Great machi ... Recently I came to know that Google is integrating its search with SIRI to build the next generation search Engine

    ReplyDelete
  4. "Same way, when you are driving or doing some other work" - Don't use mobile phones while driving :)

    ReplyDelete