john Gill technology header image

Developments in Speech Systems



Forty years ago, it was only in science fiction that humans could talk to machines and that the machines could talk back. Since then there have been dramatic changes in electronic processing speeds and better algorithms have been developed for handling speech output and input.


Speech Output

With the exception of a few languages such as Finnish, written text cannot be converted perfectly to spoken text by the application of a few rules of pronunciation. Firstly any system has to identify the syllable boundaries; this is not a trivial computational task. The problem is compounded in the English language in that some words are pronounced differently depending on the meaning of the word (eg "lead").

Speech synthesis can be done totally by applying a long and complex set of rules. This can produce recognisable speech but it often sounds far from natural. One method for alleviating this problem is to store common words or phrases; these are stored as digitised speech. This sounds more natural for many applications. However most people would not choose to listen to poetry even from the better synthetic speech systems.

One problem with synthetic speech is that it has less redundancy than natural speech, and this makes it harder for a listener who has a hearing impairment.

On a public terminal, such as a cash dispenser or ticket selling machine, speech output could be provided for all users or just for those who request it. One possibility is for the user's card to include coding that indicates that they would like speech output; this speech could be delivered by loudspeaker, through a headphone, or by short-range radio link to a mobile phone handset. There is a European standard for how to code this information on a card, and the UK government has included it in the list of recommended standards for new government (both central and local) cards.


Speech Input

Speech recognition is a less mature technology than speech synthesis. To recognise full vocabulary speech is difficult, since a computer has difficulty in differentiating "grade A" from "grey day" unless it has knowledge of the context. Even whole phrases can be confused; for instance "recognise speech?" might be interpreted as "wreck a nice beach?".

By training a system with an individual speaker, accuracy can be improved. However a more significant improvement in accuracy can be obtained by limiting the vocabulary and introducing pauses between words. Such a system can be used for giving commands to a system; in this case the command words can be chosen to sound different, and the system is often tolerant of different accents.

The accuracy of all speech recognition systems deteriorates significantly with any increase in background noise. The use of throat microphones can help, but frequently this is not a practical solution.

For public terminals, speech recognition can be incorporated in the terminal or network, or in a personal device (such as a PDA or mobile phone handset) which is linked to the terminal (possibly via a short-range wireless link).


Implementation

The business case for incorporating speech systems into a device or network will depend on the application. In general, the cost of incorporating extra features is considerably less if incorporated at the design stage of a new product, rather than retrofitting it to an existing product.

However businesses can be influenced by other factors than perceiving a short-term commercial advantage; these factors can include regulation, procurement policies and legislation.


Dr John Gill

 



John Gill Technology Limited Footer
John Gill Technology Limited Footer