Speech Recognition

Speech recognition technology converts human speech into electrical signals and transforms these signals into coding patterns with assigned meanings. It doesn't translate the spoken word into a dictionary spelling, but rather into a computer-recognizable form that usually initiates some action, i.e., the generation of some text, a signal being sent or an event being recorded.

Speech recognition can be used in data collection in two efficient modes of operation: batch and real time. In batch mode, the user's application data may be downloaded from host systems into portable terminals, automatically updated, then uploaded back to the host at the end of a work shift. For real-time data collection, speech recognition systems are combined with radio frequency to provide mobility and faster interaction to the host application.

Workers wear a microphone/speaker headset connected to a unit that recognizes words in a programmed vocabulary and converts them into analog electrical signals. The analog signals usually are changed into digital values and decoded by template matching or feature analysis. That unit's output goes into a personal computer or to a stand-alone voice recognition device.

In some applications, particularly multi-step inspections, synthesized voice prompts help to verify complete inspection. Speech recognition, combined with voice output, prompts the operator through a series of tasks and verifies users’ input for correctness.

Speech recognition is ideal where speed and accuracy are a requirement, or when an operator's hands and/or eyes need to be utilized for functions other than written or typed data collection. Typical applications are receiving/shipping, distribution, order picking, part tracking, laboratory work, inventory control, PC-board inspection, forklift operations, sortation or materials processing, quality control and warehousing.

Speech recognition is gaining popularity because it requires minimal training, allows capture and entry of data while operators are performing their normal work and is cost-efficient. While the more powerful speech recognition systems have been speaker-dependent, meaning a vocabulary has to be read into the system by each user, speanker-independent systems are now available that offer comparable performance and accuracy. In speaker-dependent systems, users “train” systems to recognize their individual voices, allowing speakers with accents, dialects or a need for work-specific vocabulary to utilize the system. Speaker-independent systems understand words stored from a pre-sampled average pool of speakers and therefore require no training, but have limited special vocabularies.

Speech recognition systems are divided into two other categories: continuous speech and discrete utterance. Continuous speech allows the user to talk at a normal speaking rate. Discrete utterance systems require a slight pause after each word or phrase is spoken.