Raising Robovoices
“If you just chain together automatic transcription, translation, and speech synthesis, you end up accumulating too many errors.”
Raising Robovoices Read MoreSpeech Recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers with the main benefit of searchability. It is also known as automatic speech recognition (ASR), computer speech recognition or speech to text (STT). It incorporates knowledge and research in the computer science, linguistics and computer engineering fields. The reverse process is speech synthesis.
—Wikipedia, “Speech recognition”
(See this article for more on history; models, methods and algorithms; applications; performance; and further information.)
Speech Recognition, also known as automatic speech recognition (ASR), computer speech recognition, or speech-to-text, is a capability which enables a program to process human speech into a written format. While it’s commonly confused with voice recognition, speech recognition focuses on the translation of speech from a verbal format to a text one whereas voice recognition just seeks to identify an individual user’s voice.
—IBM, “What is speech recognition?”
(See this article for more on speech recognition history, key features, algorithms, cases, and related solutions.)
“If you just chain together automatic transcription, translation, and speech synthesis, you end up accumulating too many errors.”
Raising Robovoices Read MoreCommunications of the ACM, May 2018
By Björn W. Schuller
“Communication with computing machinery has become increasingly ‘chatty’ these days: Alexa, Cortana, Siri, and many more dialogue systems have hit the consumer market on a broader basis than ever, but do any of them truly notice our emotions and react to them like a human conversational partner would? In fact, the discipline of automatically recognizing human emotion and affective states from speech, usually referred to as Speech Emotion Recognition or SER for short, has by now surpassed the “age of majority,” celebrating the 22nd anniversary after the seminal work of Daellert et al. in 1996—arguably the first research paper on the topic. However, the idea has existed even longer, as the first patent dates back to the late 1970s.”
Speech Emotion Recognition: Two Decades in a Nutshell, Benchmarks, and Ongoing Trends Read More