Automatic Speech Recognition (ASR) Systems Explained
Advertisement
Speech recognition is now widely used across various applications in electronic gadgets and systems. A major application is supporting self-service database lookups using IVRS (Integrated Voice Response System). IVRS helps callers search for information either through touchscreen menu selections or voice commands.
Other applications include:
- Operators providing directory assistance are no longer needed, allowing companies to save on salaries.
- Employees providing extension number guidance are not needed with the use of directories.
- Encouraging the use of ‘self-service’ options while offering a “speak with agent” option for a paid service.
- Providing features like speed-assisted dialing, where the caller simply speaks the name of the person they want to call while driving.
- Automated attendant features.
How It Works?
Figure depicting a simple speech recognition system block diagram.
As shown in the image, the user speaks, and the system stores and decodes the speech into text based on acoustic and language models, as well as gender.
- Speech recognition first detects and captures spoken words.
- It converts these words into a digital representation after removing noise. This is done using DSP algorithms.
- It breaks up sounds into smaller chunks if it encounters very large patterns.
- Then, the speech is assigned particular phonemes based on records in databases and probabilities.
Automatic speech recognition is essentially a computer-driven conversion of spoken language into readable text in a real-time environment.
Benefits of Automatic Speech Recognition Systems
- Cost reduction due to automating the system.
- Searchable text capability.
- Accessibility for the deaf.
Protocols
G.711, G.723.1, G.729, H.323, RTP, and SIP are protocols and services used for speech recognition and Voice over IP (VOIP).