Speech recognition - Wikipedia. Speech recognition is the inter- disciplinary sub- field of computational linguistics that develops methodologies and technologies that enables the recognition and translation of spoken language into text by computers. It is also known as "automatic speech recognition" (ASR), "computer speech recognition", or just "speech to text" (STT). It incorporates knowledge and research in the linguistics, computer science, and electrical engineering fields. Some speech recognition systems require "training" (also called "enrollment") where an individual speaker reads text or isolated vocabulary into the system. The system analyzes the person's specific voice and uses it to fine- tune the recognition of that person's speech, resulting in increased accuracy. Systems that do not use training are called "speaker independent"[1] systems. Systems that use training are called "speaker dependent". Microsoft Windows XP's default speech synthesizer voice saying "The quick brown fox jumps over the lazy dog 1,234,567,890 times". It is then followed by a.Speech recognition applications include voice user interfaces such as voice dialing (e. Call home"), call routing (e. I would like to make a collect call"), domotic appliance control, search (e. Direct Voice Input). The term voice recognition[2][3][4] or speaker identification[5][6] refers to identifying the speaker, rather than what they are saying. Recognizing the speaker can simplify the task of translating speech in systems that have been trained on a specific person's voice or it can be used to authenticate or verify the identity of a speaker as part of a security process. From the technology perspective, speech recognition has a long history with several waves of major innovations. Most recently, the field has benefited from advances in deep learning and big data. The advances are evidenced not only by the surge of academic papers published in the field, but more importantly by the worldwide industry adoption of a variety of deep learning methods in designing and deploying speech recognition systems. These speech industry players include Google, Microsoft, IBM, Baidu, Apple, Amazon, Nuance, Sound. Hound, Ifly. Tek, CDAC many of which have publicized the core technology in their speech recognition systems as being based on deep learning. History[edit]Early work[edit]In 1. Bell Labs researchers built a system for single- speaker digit recognition. Their system worked by locating the formants in the power spectrum of each utterance.[7] The 1. E-Speaking offers desktop command and control speech recognition software compatible with and complementary to Windows XP and Windows 2000. A fully functional version. Tabtight professional, free when you need it, VPN service. Gunnar Fant developed the source- filter model of speech production and published it in 1. Unfortunately, funding at Bell Labs dried up for several years when, in 1. John Pierce wrote an open letter that was critical of speech recognition research.[8] Pierce defunded speech recognition research at Bell Labs where no research on speech recognition was done until Pierce retired and James L. Flanagan took over. Raj Reddy was the first person to take on continuous speech recognition as a graduate student at Stanford University in the late 1. Previous systems required the users to make a pause after each word. Reddy's system was designed to issue spoken commands for the game of chess. Also around this time Soviet researchers invented the dynamic time warping (DTW) algorithm and used it to create a recognizer capable of operating on a 2. The DTW algorithm processed the speech signal by dividing it into short frames, e. Although DTW would be superseded by later algorithms, the technique of dividing the signal into frames would carry on. Achieving speaker independence was a major unsolved goal of researchers during this time period. In 1. 97. 1, DARPA funded five years of speech recognition research through its Speech Understanding Research program with ambitious end goals including a minimum vocabulary size of 1,0. BBN, IBM, Carnegie Mellon and Stanford Research Institute all participated in the program.[1. The government funding revived speech recognition research that had been largely abandoned in the United States after John Pierce's letter. Despite the fact that CMU's Harpy system met the original goals of the program, many predictions turned out to be nothing more than hype, disappointing DARPA administrators. This disappointment led to DARPA not continuing the funding.[1. Several innovations happened during this time, such as the invention of beam search for use in CMU's Harpy system.[1. The field also benefited from the discovery of several algorithms in other fields such as linear predictive coding and cepstral analysis. During the late 1. Leonard Baum developed the mathematics of Markov chains at the Institute for Defense Analysis. At CMU, Raj Reddy's students James Baker and Janet M. Baker began using the Hidden Markov Model (HMM) for speech recognition.[1. James Baker had learned about HMMs from a summer job at the Institute of Defense Analysis during his undergraduate education.[1. The use of HMMs allowed researchers to combine different sources of knowledge, such as acoustics, language, and syntax, in a unified probabilistic model. Under Fred Jelinek's lead, IBM created a voice activated typewriter called Tangora, which could handle a 2. Jelinek's statistical approach put less emphasis on emulating the way the human brain processes and understands speech in favor of using statistical modeling techniques like HMMs. Jelinek's group independently discovered the application of HMMs to speech.[1. This was controversial with linguists since HMMs are too simplistic to account for many common features of human languages.[1. However, the HMM proved to be a highly useful way for modeling speech and replaced dynamic time warping to become the dominant speech recognition algorithm in the 1. IBM had a few competitors including Dragon Systems founded by James and Janet M. Baker in 1. 98. 2.[1. The 1. 98. 0s also saw the introduction of the n- gram language model. Katz introduced the back- off model in 1. During the same time, also CSELT was using HMM (the diphonies were studied since 1. Italian.[2. 0][2. At the same time, CSELT led a series of European projects (Esprit I, II), and summarized the state- of- the- art in a book, later (2. Much of the progress in the field is owed to the rapidly increasing capabilities of computers. At the end of the DARPA program in 1. PDP- 1. 0 with 4 MB ram.[1. Using these computers it could take up to 1. A few decades later, researchers had access to tens of thousands of times as much computing power. As the technology advanced and computers got faster, researchers began tackling harder problems such as larger vocabularies, speaker independence, noisy environments and conversational speech. In particular, this shifting to more difficult tasks has characterized DARPA funding of speech recognition since the 1. For example, progress was made on speaker independence first by training on a larger variety of speakers and then later by doing explicit speaker adaptation during decoding. Further reductions in word error rate came as researchers shifted acoustic models to be discriminative instead of using maximum likelihood models.[2. In the mid- Eighties new speech recognition microprocessors were released: for example RIPAC, an independent- speaker recognition (for continuous speech) chip tailored for telephone services, was presented in the Netherlands in 1. It was designed by CSELT/Elsag and manufactured by SGS. Practical speech recognition[edit]The 1. Two of the earliest products were Dragon Dictate, a consumer product released in 1. Kurzweil Applied Intelligence released in 1. AT& T deployed the Voice Recognition Call Processing service in 1. The technology was developed by Lawrence Rabiner and others at Bell Labs. By this point, the vocabulary of the typical commercial speech recognition system was larger than the average human vocabulary.[1. Raj Reddy's former student, Xuedong Huang, developed the Sphinx- II system at CMU. The Sphinx- II system was the first to do speaker- independent, large vocabulary, continuous speech recognition and it had the best performance in DARPA's 1. Handling continuous speech with a large vocabulary was a major milestone in the history of speech recognition. Huang went on to found the speech recognition group at Microsoft in 1. Raj Reddy's student Kai- Fu Lee joined Apple where, in 1. Apple computer known as Casper. Lernout & Hauspie, a Belgium- based speech recognition company, acquired several other companies, including Kurzweil Applied Intelligence in 1. Dragon Systems in 2. The L& H speech technology was used in the Windows XP operating system.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. Archives
November 2017
Categories |