SpeechLab develops system for automatic transcription of news broadcasts

Speech recognition system for automatic transcription of czech broadcasts, photo: http://itakura.kes.vslib.cz

In one previous edition of Czech Science we reported on a talking computer image developed by the SpeechLab at the Technical University in Liberec. As promised, we now bring you more news from the SpeechLab. The team recently developed a computer programme that can automatically transcribe news programmes on TV or radio.

As the main evening TV news programme begins, a special computer can start writing down what it hears - or more precisely translating the sounds into a text form. As we know, TV news is not one continuous flow of speech, it includes jingles, background noises and also many different speakers. The transcription programme has to tackle all that. Professor Jan Nouza is the head of the research team at the SpeechLab in Liberec.

"First it must recognise individual speakers. If the speakers are known to the system, the system can recognise and identify their names. And in this way the whole news is split into shorter parts according to the individual speakers or contributors in the news. And also there is another problem - to remove the parts which are not speech, for example the introductory and final jingles. Fortunately in Czech TV news, if I am right, there are no commercials at the moment."

This Speech Recognition System for Automatic Transcription of Czech Broadcasts has a very large vocabulary. That's also because Czech is an inflected language and to every noun, adjective and verb there are a number of forms depending on the word's position in a sentence. The current version of the computer programme can recognise around 200,000 words. To transcribe a ten-minute news programme it needs around forty minutes. If like myself, you're still thinking of a big machine connected to a TV set by cables, you're wrong.

"Now it's not a big problem to buy a TV card which allows your computer to input directly the TV signal and display it on your monitor. The main problem is that the TV or radio stream is quite a long signal. For example, TV news takes 20-30 minutes. So the first task is to split it into shorter parts, like articles or speaker turns."

I also asked Professor Nouza from the SpeechLab in Liberec who might be interested in having a transcribed version of TV news...

"This application is not meant for individual people but for institutions that are very interested in the content of spoken programmes, like TV news or talk shows, and so on. There are several companies in our country that monitor not only all the newspaper texts published in the Czech Republic but also the spoken programmes on TV and radio stations. And these companies then can do monitoring on what was said in different programmes, which topic, which persons were mentioned and so on."

For more information go to http://itakura.kes.vslib.cz/kes/systele.html