Scientists Teach Computers to Read Our Lips

Speech-based human-computer interaction

by Tessel Renzenbrink | 04 April, 2016

We've come a long way since we used punched cards to interact with computers. However, the holy grail of human-computer interfacing (HCI) is still a distant future: to interact with machines the way we naturally express ourselves by combining speech, gestures and facial expressions.

In the HCI domain, audio-based speech recognition has progressed significantly, giving us applications like Apple's Siri and Google Now and speech-to-text conversion in real time. Automated speech recognition (ASR) could be further improved with automated lip-reading, in the same way humans are aided by visual cues to understand what people say. But the field of visual speech recognition still performs rather poorly. Now, two scientists have presented a new method for visual speech recognition.

New method for automated lip-reading
Dr Helen L. Bear and Prof Richard Harvey of the University of East Anglia in the UK hope to improve performance of visual ASR with a new method for automated lip-reading. Their findings were published in the Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing 2016.

Bear, who has been studying visual speech recognition for years, pointed out a number of other use cases for the technology: medical applications for people with hearing impairments and communication in noisy environments. Another possible application is determining what is being said based on video-only material such as CCTV footage.