After a long time, one more occasion to use my blog. Yes, we(Me, Sudeep, Parashu) won "Best in Show" award in Yahoo's Openhack India 2010. Yes!! once again!! Like previous years, 24hrs coding, Beer, Food, Tea, Redbull etc etc.. One good change this time was increase in number of hackers. Yes, there were 430 hackers flooding Taj Residency.
What we did this time?
Automatic, Real-time close captioning/translation for flickr videos.
We captured the audio stream that comes out to speaker and gave as input to mic. Used Microsoft Speech API and Julius to convert the speech to text. Used a GreaseMonkey script to sync with transcription server(our local box) and video and displayed the transcribed text on the video. Before displaying the actual text on the video, based on the user's choice we translate the text and show it on video. (We used Google's Translate API for this).
Some of the speech recognition frameworks that we tried are sphinx 4.0, Windows SAPI, Julius. None of these are 100% accurate. but definitely better than just watching videos with out any captions. Have read that Nuance Dragon is really doing good in this space but its very costly.
Extension and usefulness
There are infinite number of video's on internet, we cant manually caption everything. We use this hack to auto caption it. It might not be accurate, but we can store the auto generated caption as srt(close caption standard) file and provide simple UI for users to edit/correct the captions if they think the auto generated caption is wrong. in this way the speech recognition system can train itself. Over a short period of time, by using the internet crowd, we can get a good speech recognition engine.
What all did we get as award?
1. Chris Heilmann's complement !!!
2. Certificate signed by David Filo !!!
3. XBOX 360 Elite and 3 IPod nano 4G 8gb :) :)