WO1999003092A2 - Modular speech recognition system and method - Google Patents

Modular speech recognition system and method Download PDF

Info

Publication number
WO1999003092A2
WO1999003092A2 PCT/US1998/012723 US9812723W WO9903092A2 WO 1999003092 A2 WO1999003092 A2 WO 1999003092A2 US 9812723 W US9812723 W US 9812723W WO 9903092 A2 WO9903092 A2 WO 9903092A2
Authority
WO
WIPO (PCT)
Prior art keywords
call
specialized vocabulary
speech recognition
speech
feature vectors
Prior art date
Application number
PCT/US1998/012723
Other languages
French (fr)
Other versions
WO1999003092A3 (en
Inventor
Arthur Gerald Herkert
Oleg Andric
Lu Chang
Gil Alterovitz
Original Assignee
Motorola Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Motorola Inc. filed Critical Motorola Inc.
Priority to AU79780/98A priority Critical patent/AU7978098A/en
Publication of WO1999003092A2 publication Critical patent/WO1999003092A2/en
Publication of WO1999003092A3 publication Critical patent/WO1999003092A3/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/26Devices for calling a subscriber
    • H04M1/27Devices whereby a plurality of signals may be stored simultaneously
    • H04M1/271Devices whereby a plurality of signals may be stored simultaneously controlled by voice recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Telephonic Communication Services (AREA)

Abstract

A modular speech recognition system (201) includes a bulk memory (230) that stores a plurality of specialized vocabulary databases (231), a front end processor (305) that generates a first set of feature vectors based on an analysis of a first set of sampled speech data received during a call, a local memory (370) that stores a specialized vocabulary database (371); and a recognition processor (355) that accepts the first set of feature vectors and generates a recognition result based on the first set of feature vectors and specialized vocabulary database. The specialized vocabulary database (371) is a copy of one of the plurality of specialized vocabulary databases (231) stored in the bulk memory (230). The specialized vocabulary database (231) is selected from the plurality of specialized vocabulary databases (371) in response to information associated with the call.

Description

MODULAR SPEECH RECOGNITION SYSTEM AND METHOD
Field of the Invention
This invention relates in general to speech recognition systems and in particular to a speech recognition module that is used in a speech recognition system that handles interactive calls.
Background of the Invention
Until recently, pagers typically received and stored only numbers. However, developments in pager technology, including the introduction of alpha pagers (which allow for text data), have brought up an issue of a lack of a universal means to accept text messages from a caller for communication to pagers. Some solutions have been provided, notably WordSender™ and electronic mail paging servers. Unfortunately, these are not ubiquitous solutions that are applicable in every situation and accessible to the ordinary user. Telephone networks, if they could be utilized for this purpose, would present an acceptable universal solution, as evinced by their wide availability in the home, office, and many public facilities. The current approach to offering alpha messaging service by voice telephone communication typically requires service providers to deploy telephone operators to handle alpha paging requests. The necessity of human operators constitutes a significant portion of the costs associated with alpha paging and greatly limits the market segment to which the product may be targeted.
A possible solution to the problem of needing human operators is the use of speech recognition technology incorporating a dialog response system. To be useful in typical paging systems, the dialog response system must be capable of handling several, and often many, telephone calls simultaneously. Informative and effective dialogs that do not confuse or frustrate the user are required. Usually, an effective dialog system is not a generic one, but rather is dependent on a number of factors such as the subject matter and its associated jargon as well as the caller's geographic location, socio-economic background, linguistic considerations, and cultural expectations. Thus, a customizable system can be effective in increasing the robustness of user interfaces and data entry, while a generic system will likely not be appropriate in many situations.
The design of speech recognition systems typically includes trading off the parameters of speed of recognition, accuracy of recognition, and the amount of local memory that is used by the program performing the speech recognition. (Local memory is memory directly accessible on a high speed processor bus, as opposed to bulk memory such as hard disk memory.) Programs performing speech recognition have both an executable portion and a database portion. Both portions of the program must typically be located in memory directly accessible by a processor, such as random access memory (RAM), read only memory (ROM), or electrically programmable read only memory (EPROM). Such memory is needed in order to provide a speed of recognition that is sufficiently rapid for interactive speech recognition systems, while also being sufficiently accurate. When the executable portion and/or the database portion are made available in media normally used for distribution, such as compact disk read only memory (CD ROM) or floppy disk, sufficient local memory such as RAM is typically provided into which to copy the portion made available on such media. Although RAM memory costs have been declining, the cost of memory in speech recognition systems is still a predominant factor in providing a robust speech recognizer because the database portion of typical speech recognizers is large, particularly for those speech recognizers that are designed to be able to recover speech that encompasses anything more than a small set of jargon. As the number and size of vocabulary models and networks grow, the processor time needed to determine that a certain vocabulary model matches the actual speech pattern increases. Constraining the number and size of models places a limitation on the accuracy of the speech recognition algorithm. Thus, there appears to be an implied compromise among factors such as speed, accuracy, and the quantity of models that can be supported.
Thus, what is needed is a technique to provide a speech recognition system that can provide customized dialog responses for several telephone lines in an economical and efficient manner. Brief Description of the Drawings
FIG. 1 shows an electrical block diagram of portions of a fixed portion of a radio communication system, in accordance with the preferred and alternative embodiments of the present invention.
FIG. 2 shows an electrical block diagram of a modular speech recognition system and other portions of a paging controller used in the radio communication system, in accordance with the preferred and alternative embodiments of the present invention. FIG. 3 shows a more detailed electrical block diagram of the modular speech recognition module, in accordance with the preferred embodiment of the present invention.
FIG. 4 shows a flow chart that describes a method used in the paging controller for providing interactive dialog during one or more telephone calls, in accordance with the preferred and alternative embodiments of the present invention.
Detailed Description of the Drawings
Referring to FIG. 1, an electrical block diagram of portions of a fixed portion of a radio communication system 100 is shown in accordance with the preferred and alternative embodiments of the present invention. The fixed portion of the radio communication system 100 comprises conventional telephones 111 connected through a conventional switched telephone network (STN) 112 by conventional telephone links 113 to a system controller, which in this example is a paging controller 114. The paging controller 114 oversees the operation of a paging radio fixed network 116, typically comprising a plurality of radio frequency transmitter /receivers coupled to the paging controller 114 by conventional telephone links 115. The paging controller 114 encodes and decodes inbound and outbound telephone addresses into formats that are compatible with land line message switch computers. The paging controller 114 also functions to encode and schedule outbound messages, which can include such information as analog voice messages, digital alphanumeric messages, and response commands, for transmission by the radio frequency transmitter /receivers to a plurality of selective call radios (not shown in FIG. 1). The paging controller 114 further functions to decode inbound messages, including unsolicited and response messages, received by the radio frequency transmitter /receivers from the plurality of selective call radios.
It should be noted that the paging controller 114 is capable of operating in a distributed transmission control environment that allows mixing conventional cellular, simulcast, satellite, or other coverage schemes involving a plurality of radio frequency transmitter /receivers and conventional antennas, for providing reliable radio signals within a geographic area as large as a worldwide network. Moreover, as one of ordinary skill in the art would recognize, the telephonic and selective call radio communication system functions may reside in separate system controllers 114 which operate either independently or in a networked fashion.
It will be appreciated that the selective call radios are of several types of radios, including two way pagers, conventional mobile radios, conventional or trunked mobile radios which have a data terminal attached thereto, or which optionally have data terminal capability designed in. Each of the selective call radios assigned for use with the radio communication system fixed network 100 has an address assigned thereto which is a unique selective call address. The address enables the transmission of a message from the paging controller 114 only to the addressed selective call radio, and identifies messages and responses received at the paging controller 114 from the selective call radio. Furthermore, each of one or more of the selective call radios can have a unique telephone number assigned thereto, the telephone number being unique within the STN 112. A list of the assigned selective call addresses and correlated telephone numbers for the selective call radios is stored in the paging controller 114 in the form of a subscriber database.
Referring to FIG. 2, an electrical block diagram of a modular speech recognition system 201 and other portions of the paging controller 114 are shown, in accordance with the preferred embodiment of the present invention. The paging controller 114 schedules and queues data and stored voice messages for transmission to the selective call radios, connects telephone calls and uses a processor generated interactive dialog for determining messages to be transmitted to the selective call radios, and receives acknowledgments, demand responses, unsolicited data and stored audio messages, and telephone calls from the selective call radios. The paging controller 114 in this example comprises an STN interface 210, an the modular speech recognition system 201. The modular speech recognition system 201 comprises a paging controller processor 240, a hard disk memory 230, one type I speech recognition module (SRM-I) 220 and two essentially identical type II speech recognition modules (SRM-II) 221, 222, which are all intercoupled by an external bus 225. The STN interface 210 handles the switched telephone network (STN) 112 physical connection, connecting and disconnecting telephone calls at the telephone links 113, and routing call related information between the telephone links 113, the SRM-I 220 and the paging controller processor 240, under control of the paging controller processor 240.
When a telephone call is received, and sufficient resources are available to process the call, it is connected by the STN interface 210, under control of the paging controller processor 240, to the SRM-I 220 by a conventional serial processor interface 215 for processing of call information received during the call, and is connected to the paging controller processor 240 by the external bus 225 for communication of information that is generated by the paging controller processor 240 and transmitted in the telephone call, during an interactive dialog controlled by the paging controller processor 240. (Alternatively, the call can be connected from the STN interface 210 to the SRM-I 220 by the external bus 225.) The interactive dialog which is supported by the preferred embodiment of the present invention is substantially more sophisticated than those commonly in use today, in which the information presented by the caller is typically restricted to digits entered from the telephone keypad or clearly spoken single words, such as the digits or "yes" or "no." A plurality of specialized vocabulary databases are stored in the hard disk memory 230. The hard disk memory 230 also stores a plurality of conventional digitized voice response segments which are transmitted in a conventional manner during a call to a caller.
Call information is received as digital information such as information that conveys the telephone number of the initiator of the telephone call. Call information also includes digitized analog information, such as digitized voice signals of the caller or digitized dual tone multifrequency tones generated by the caller's activation of telephone instrument keys, or computer generated digitized stored voice responses transmitted from the paging controller 114 to the caller. For simplicity, both the digital information and digitized analog information received or transmitted during a call and associated with the call is described herein as being "in the call" or "during the call." Digitized analog information is received and transmitted by the paging controller processor 240 in a plurality of simultaneous calls connected to the STN interface 210 in a conventional time multiplexed manner. One or more telephone calls are simultaneously call connected by the STN interface 210, under control of the paging controller processor 240 to the SRM-I 220, which provides a front end processing of information received in the calls, resulting in the generation of a series of sets of feature vectors for each telephone call, wherein each set typically represents a phrase of the received portion of an interactive dialog. The feature vectors generated from information received in each telephone call are coupled to a recognition processor 355, 356 (see FIG. 3) of one of the speech recognition modules 220, 221, 222 by the external bus 225. Since in the example shown in FIG. 2 there are three such recognition portions, three telephone calls can be front end processed simultaneously. The only significant difference in the SRM-II 221 and SRM-222 is in an identification code of each.
The hard disk drive 230 is a conventional disk drive, such as a 2.1 Gigabyte drive commonly supplied with computers sold today. Alternatively, another form of bulk memory such as a conventional compact disk read only memory (CD ROM) drive could be used. Paging controller 114 is preferably a Wireless Message Gateway ™
Administrator! paging terminal manufactured by Motorola, Inc., of Schaumburg Illinois, and modified by the addition of unique speech recognition modules 220, 221, 222, the unique control functions as described herein with reference to the paging controller processor 240, and the plurality of specialized vocabulary databases stored in the hard disk drive 230.
It will be appreciated that other conventional processing systems that include a telephone interface and support for the unique speech recognition modules 220, 221, 222 could alternatively be modified for use as the paging controller 114. It will be further appreciated that the paging controller 114 can be configured to handle more telephone calls by using more SRM-IFs, up to the capacity of the front end portion of the SRM-I, and more SRM-I's, up to the capacity of the STN interface 210, or the physical and or capacity of the paging controller 114.
Referring to FIG. 3, a more detailed electrical block diagram of the modular speech recognition module is shown, in accordance with the preferred embodiment of the present invention. The SRM-I 220 and one SRM-II 222 are shown, as well as the hard disk memory 230, the paging controller processor 240, and the external bus 225.
The SRM-I 220 provides the front end processing described above (with reference to FIG. 2) by means of a front end processor 305 that comprises an electrically programmable read only memory (EPROM) 310, a random access memory (RAM) 320, a microprocessor 330, and an external bus input output driver (EXT BUS I/O DVR) 325, which are all mounted to a printed circuit board (not shown in FIGs. 1-2), and intercoupled by an internal bus 340. The EPROM 310 comprises a unique front end processing segment 315 as well as conventional segments, which together control the operation of the microprocessor 330, and thereby the front end processor 305. The RAM 320 comprises memory storage space sufficient to store three maximum sets of feature vectors. In this example, feature vector sets named feature vector set A (VS A) 321 and feature vector set B (VS B) 322 are stored in RAM 320.
The recognition processors 355, 356 of the SRM-I 220 and SRM-II 222 each comprise an EPROM 360, a RAM 370, a microprocessor 380, and an external bus input output driver (EXT I/O BUS DVR) 375, intercoupled by an internal bus 390. These circuits form the recognition processor 355 and are all mounted to the printed circuit board to which the circuits forming the front end processor 305 are also mounted, forming the SRM- I 220. The SRM-II 222 comprises a printed circuit board (not shown in FIGs. 1-2) having the same layout as the printed circuit board of the SRM- I 220, but having only the circuits that form the recognition processor 356 mounted thereto. The EPROM 360 comprises a unique recognizer segment 365 as well as conventional segments which control the operation of the microprocessor 380, and thereby the recognition processor 355. The RAM 370 comprises memory storage space sufficient to store one maximum set of feature vectors and one maximum specialized vocabulary database. In this example, a copy of the vector set B 322 is stored in recognition processor 356 of SRM-II 222 and a vector set named vector set C (VS C) 372 is stored in recognition processor 355 of SRM-I 220. In this example, a specialized vocabulary database named specialized vocabulary database N (VDB N) 373 is stored in recognition processor 356 of SRM-II 222 and a specialized vocabulary database named specialized vocabulary database M (VDB M) 371 is stored in recognition processor 355 of SRM-I 220.
The hard disk memory 230 comprises sufficient memory storage space to store P specialized vocabulary databases identified as specialized vocabulary databases 1 through P (VDB 1-P) 231. Specialized vocabulary database N 373 and specialized vocabulary database M 371 are copies of two of the P specialized vocabulary databases stored in the hard disk memory 230, although they may alternatively be copies of one specialized vocabulary database stored in the hard disk memory 230.
The paging controller processor 240 comprises EPROM 361, RAM 371, external bus input/output driver 376, and microprocessor 381, intercoupled by internal bus 391. The EPROM 361 comprises a unique dialog control segment 366 as well as conventional segments which control the operation of the microprocessor 381, and thereby the paging controller processor 240.
The RAMs 320, 370, 371 are conventional read/write RAMs, preferably 64 Megabytes each. The EPROMs 310, 360, 361 are conventional EPROMs programmed with conventional and unique segments as described above. The microprocessors 330 and 380 are microprocessors of the 56000 family of digital signal processors made by Motorola, Inc. of Schaumburg, IL. The microprocessor 381 is a microprocessor of the 68000 family of microprocessors made by Motorola, Inc. The internal busses 340, 390, 391 are parallel microprocessor busses of conventional design uniquely laid out on the printed circuit board described above for intercoupling the devices described above with reference to the speech recognition modules 220, 221, 222. By being directly coupled to the microprocessors 330, 380, and 381 by the respective internal busses 340, 390, 391, the RAMs 320, 370, 371 are local memories; that is, the stored information is read from and written to them by a central processing portion of the respective microprocessors 330, 380, and 381 on a random addressed basis, at a bussed speed. It will be appreciated that the RAMs 320, 370, 371 could alternatively be a portion of the microprocessors 330, 380, and 381 themselves, by being integrated on the same substrate as the central processing unit and other elements of the microprocessors 330, 380, and 381.
The external bus 225 is a microprocessor bus that intercouples the speech recognition modules 220, 221, 222, the hard disk memory, and the paging controller processor 240 by flat cables and connectors. The external bus input/output drivers 325, 375, 376 are conventional devices for driving the external bus 225, which is a conventional SCSI (small computer systems interface) bus.
It will be appreciated that the RAMs 320, 370, 371 could be dynamic or static RAM devices, that the EPROM could alternatively be of other type such as masked ROM or flash ROM, and that the microprocessors 330 and 380 could alternatively be of other types of digital signal processors or possibly microprocessors such as those of the PowerPC™ or Pentium® families of processors, and that the microprocessor 381 could be of another type such as a microprocessor of the PowerPC™ or
Pentium® families of processors. It will be further appreciated that in alternative embodiments of the present invention, the functions of the front end processor 305 and recognition processor 355 could be provided by using one microprocessor of sufficient speed and capability. In such a case, the RAMs 320, 370 could be combined, although the memory space would have to be essentially the same as provided by both. A similar situation exists for the EPROMs 310, 360. With one microprocessor replacing the microprocessors 330, 380, only one internal bus is needed for the SRM-I 220, and vector sets which are communicated from the front end processor 305 to the recognition processor 355 are moved, when necessary, between RAM locations using the single internal bus. It will be further appreciated, that in another alternative embodiment of the present invention wherein a paging controller supports sufficiently few telephone links 113, the functions of the SRM-1 220 and the paging controller processor 240 could further be combined using one microprocessor of sufficient speed and capability. In these alternative embodiments in which a processor performs the functions of two or more of the processors 305, 355, 240 of the preferred embodiment, the functions provided by the front end processor 305, the recognition section 355, and the paging controller processor 240 are separated as described herein with reference to the preferred embodiment, and specialized vocabulary databases are locally copied into RAM from those stored in the hard disk memory.
In yet another alternative embodiment in accordance with the present invention, suitable for a use with a plurality of non-trunked telephone lines, a plurality of analog recognition modules similar in design to the SRM-I 220 are used. The STN interface 210 in this embodiment is a telephone interface for connecting a plurality of analog telephone lines 113 and the analog recognition module differs from the SRM-I 220 essentially only in that it converts an analog signal from one line to digitized speech samples. Variations are possible wherein the STN interface 210 digitizes and time multiplexes several analog telephone lines and one or more SRM-I 220 and a plurality of SRM-II 221's are used.
However, in all of the embodiments, a hard disk drive or other bulk memory is used to store the plurality of specialized vocabulary databases from which a specialized vocabulary database is copied to a local memory one at a time, providing the benefits of fast, accurate, and cost efficient interactive dialogs.
Referring to FIG. 4, a flow chart is shown describing a method used in a paging controller 114 for providing interactive dialog during one or more telephone calls. In this example, recognition processor 356 of speech recognition module 222 is available and a new call (hereafter for simplicity, "the call") is received at the STN interface 210. At step 405, a connection is made to the call. The paging controller processor 240 controls the STN interface 210 to connect the call to the SRM-I 220 for front end processing (generation of a set of feature vectors). At step 410, in response to identification of the connection of a new telephone call, which is one form of predetermined digital information associated with the call, and in further response to identification of an exchange code that is the calling telephone's exchange, the paging controller processor 240 at step 420 selects from hard disk memory 230 a specialized vocabulary database from the specialized vocabulary databases 231 stored in the hard disk memory 230. Each of the specialized vocabulary databases 231 is designed for a relatively narrow set of jargon. For example, the specialized vocabulary database selected for response to the telephone call of the example is a vocabulary database specialized for identifying numbers in Spanish, in response to a call having just been connected and in response to the exchange number of the call received being one known to be primarily used by Spanish speaking callers (this combination of information is a predetermined set of digital information received in the call). Because the specialized vocabulary databases are specialized, each one will fit within the RAM 370 of one of the recognition processors 355, 356. The paging controller processor 240 controls a selected one of the speech recognition modules 220, 222 to download at step 430 a copy of the selected specialized vocabulary database into its RAM 370. For this example, it is assumed that SRM-II 222 is selected and a copy of specialized vocabulary database N 373 is downloaded into RAM 370.
Thereafter, the paging controller processor 240 further selects an initial dialog response phrase in Spanish (in digitized voice form), which says for example (translated to English), "Please say the paging number you are calling." The paging controller processor 240 communicates this dialog response phrase to the STN interface 210 for transmission in the telephone call to the caller.
When time multiplexed voice information is received in the phone call after the call is connected, the front end processor 305 analyzes the time multiplexed voice information obtained in the call at step 415 by generating speech samples therefrom and generates feature vectors therefrom in a conventional manner. This analysis continues until a predetermined break point occurs in the digitized voice signal, such as a 0.5 second pause, at which time a first set of conventional feature vectors 322 is completely generated at step 425 which is based on and represents a first set of conventional speech samples. The break point is determined by the front end processor 220. In this example, the set of feature vectors 322 is determined from the voice information initially received in the connected call. When completed, the set of feature vectors 322 is copied at step 435 to the RAM 370 of recognition processor 356, the same recognition processor selected by the paging controller processor 240 for receiving a copy of the specialized vocabulary database N 373. The recognition processor 356 then generates a recognition result at step 440 based on the set of feature vectors B 322 and the specialized vocabulary database N 373. For example, the caller may say "346-9876" (in Spanish), for which the recognition result is the set of numbers 3469876 in ASCII (American Standard for Coded Information Interchange). The recognition result is determined quickly and accurately. It will be appreciated that while this function could be alternatively performed by the conventional method of asking the user to enter DTMF tones in the United States, the use of voice to provide the digits is a more ubiquitous solution to obtaining digits in countries where DTMF dialing is not as prevalent as in the United States. Furthermore, the use of voice digits is more natural for many users than using keypad keys.
The paging controller 114 can alternatively select a specialized vocabulary database based on an identification of a predetermined set of recognition results generated from a set of sampled speech data received from the connected call, at step 445. This is illustrated by a continuation of the example being described above. The paging controller processor 240 identifies at step 445 the recognition result 3469876 as a set of seven digits identifying a pager used by a pediatrician. In response to this identification, the paging controller processor 240 selects another specialized vocabulary database L at step 420 for copying into the RAM 370 of recognition processor 356. The specialized vocabulary database L is a specialized database of jargon associated with pediatricians, in Spanish. In this manner, the next set of feature vectors generated by front end processor 305 from the digitized voice data received in the same call are copied to SRM II 222 and analyzed using specialized vocabulary database L. This again results in a fast and accurate generation of a recognition result. This process of repeatedly selecting a specialized vocabulary database from the set of specialized vocabulary databases stored in the hard disk memory 230, based on information associated with the telephone call is continued until the telephone call is completed.
By now it will be appreciated that the preferred and alternative embodiments of the present invention provide a unique configurations of circuit devices and databases that permit a cost effective use of interactive dialogs in a paging system handling a wide variety of jargons (such as medicine, law, electrician, and real estate) in a wide variety of languages by avoiding the use of one large vocabulary database stored in a large RAM, which would be very costly, or one large vocabulary database stored in bulk (mass) memory, such as hard disk or CD ROM, in which recognition would be impractically slow due to the inherent slow access times of such mass memory. The unique arrangement involves using a series of smaller, specialized vocabulary databases selected during a telephone call; for example, 15 specialized vocabulary databases including the pediatrician jargon of the above example as well as electrician and real estate jargon in English, Spanish, Haitian, German, and French; wherein the specialized vocabulary database is based on data received within the call and copied into a RAM 370 of a recognition processor 355, 356 from a set of smaller, specialized vocabulary databases 231 stored in the hard disk memory 230. The unique arrangement further permits the economic handling of a plurality of simultaneous phone calls by separating the recognition processing function performed by the recognition processors 355, 356 from the relatively less intensive feature vector generation function performed by the front end processor 305 and the less intensive dialog control function performed by the paging controller processor 240.
It will be further appreciated that such benefits are available from using essentially the same method and unique arrangement of apparatus as described herein, for communication systems other than paging communication systems, such as catalog order placement systems and reservation systems. We claim:

Claims

Claims
1. A speech recognition module, comprising: a local memory that stores a specialized vocabulary database; and a recognition processor that accepts a first set of feature vectors and generates a recognition result based on the specialized vocabulary database stored in the local memory and the first set of feature vectors, wherein the first set of feature vectors represents a first set of speech samples obtained during a call, and wherein the specialized vocabulary database is a copy of one of a plurality of specialized vocabulary databases stored in a bulk memory, and wherein the specialized vocabulary database is selected from the plurality of specialized vocabulary databases in response to information associated with the call.
2. The speech recognition module according to claim 1, wherein the first set of speech vectors are also stored in the local memory.
3. The speech recognition module according to claim 1, further comprising: a front end processor that generates the first set of feature vectors based on an analysis of the first set of sampled speech data.
4. The speech recognition module according to claim 3, wherein the front end processor generates a plurality of sets of feature vectors from a plurality of sets of sampled speech data obtained in a plurality of calls and couples to one or more of other speech recognition modules one of the plurality of sets of sampled speech data.
5. The speech recognition module according to claim 3, wherein the recognition processor comprises a first microprocessor controlled by a recognition program segment, and the local memory is a random access memory directly accessible by the first microprocessor, and wherein the speech recognition module further comprises a circuit board for mounting and coupling the first microprocessor and the random access memory, and wherein the bulk memory is located external to the circuit board, and wherein the speech recognition module further comprises a means for transferring the specialized vocabulary database from the bulk memory to the random access memory, and wherein the front end processor comprises a second microprocessor controlled by a speech analysis segment, and wherein the second microprocessor is mounted to the circuit board and coupled to the first microprocessor and the local memory.
6. The speech recognition module according to claim 5, wherein the first and the second microprocessors are embodied in a single microprocessor.
7. The speech recognition module according to claim 1, wherein the information is a set of digital information received during the call.
8. The speech recognition module according to claim 1, wherein the information is a set of recognition results generated by the recognition processor from a second set of feature vectors representing a second set of sampled speech data obtained during the call.
9. The speech recognition module according to claim 1, wherein the recognition processor comprises a first microprocessor controlled by a recognition program segment, and the local memory is a random access memory directly accessible by the first microprocessor.
10. The speech recognition module according to claim 9, further comprising a circuit board for mounting and coupling the first microprocessor and the random access memory, wherein the bulk memory is located external to the circuit board, and wherein the speech recognition module further comprises a means for transferring the specialized vocabulary database from the bulk memory to the random access memory.
11. A modular speech recognition system, comprising: a bulk memory that stores a plurality of specialized vocabulary databases; a front end processor that generates a first set of feature vectors based on an analysis of a first set of sampled speech data received during a call; a local memory that stores a specialized vocabulary database; and a recognition processor that accepts the first set of feature vectors and generates a recognition result based on the first set of feature vectors and the specialized vocabulary database, wherein the specialized vocabulary database is a copy of one of the plurality of specialized vocabulary databases stored in the bulk memory, and wherein the specialized vocabulary database is selected from the plurality of specialized vocabulary databases in response to information associated with the call.
12. The modular speech recognition system according to claim 11, wherein the speech recognition system further comprises a dialog controller that detects the information and selects the specialized vocabulary database for copying into the local memory.
13. The modular speech recognition system according to claim 12, wherein the information is digital information received during the call.
14. The modular speech recognition system according to claim 12, wherein the information is a set of recognition results generated by the recognition processor from a second set of feature vectors representing a second set of sampled speech data obtained during the call.
15. A method for speech recognition during a call, comprising in a system controller the steps of: selecting a specialized vocabulary database from a plurality of specialized vocabulary databases stored in a bulk memory, in response to information associated with the call; copying the specialized vocabulary database into a local memory; and generating a recognition result based on the specialized vocabulary database stored in the local memory and a first set of feature vectors that represents a first set of speech samples obtained during the call.
16. The method according to claim 15, further comprising the step of generating the first set of feature vectors based on an analysis of the first set of sampled speech data obtained during the call.
17. The method according to claim 15, wherein in the step of selecting, the information is digital information received during the call.
18. The method according to claim 15, wherein in the step of selecting, the information is a set of recognition results generated from a second set of feature vectors representing a second set of sampled speech data obtained during the call.
PCT/US1998/012723 1997-07-07 1998-06-18 Modular speech recognition system and method WO1999003092A2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU79780/98A AU7978098A (en) 1997-07-07 1998-06-18 Modular speech recognition system and method

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US88907497A 1997-07-07 1997-07-07
US08/889,074 1997-07-07

Publications (2)

Publication Number Publication Date
WO1999003092A2 true WO1999003092A2 (en) 1999-01-21
WO1999003092A3 WO1999003092A3 (en) 1999-04-01

Family

ID=25394469

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1998/012723 WO1999003092A2 (en) 1997-07-07 1998-06-18 Modular speech recognition system and method

Country Status (2)

Country Link
AU (1) AU7978098A (en)
WO (1) WO1999003092A2 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001015140A1 (en) * 1999-07-01 2001-03-01 Telum Canada, Inc. Speech recognition system for data entry
CN112037792A (en) * 2020-08-20 2020-12-04 北京字节跳动网络技术有限公司 Voice recognition method and device, electronic equipment and storage medium
CN112530398A (en) * 2020-11-14 2021-03-19 国网河南省电力公司检修公司 Portable human-computer interaction operation and maintenance device based on voice conversion function
CN112951237A (en) * 2021-03-18 2021-06-11 深圳奇实科技有限公司 Automatic voice recognition method and system based on artificial intelligence

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4922538A (en) * 1987-02-10 1990-05-01 British Telecommunications Public Limited Company Multi-user speech recognition system
US5054082A (en) * 1988-06-30 1991-10-01 Motorola, Inc. Method and apparatus for programming devices to recognize voice commands
US5325421A (en) * 1992-08-24 1994-06-28 At&T Bell Laboratories Voice directed communications system platform
US5371901A (en) * 1991-07-08 1994-12-06 Motorola, Inc. Remote voice control system
US5375063A (en) * 1991-09-20 1994-12-20 Clemson University Apparatus and method for voice controlled apparel machine
US5479488A (en) * 1993-03-15 1995-12-26 Bell Canada Method and apparatus for automation of directory assistance using speech recognition

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4922538A (en) * 1987-02-10 1990-05-01 British Telecommunications Public Limited Company Multi-user speech recognition system
US5054082A (en) * 1988-06-30 1991-10-01 Motorola, Inc. Method and apparatus for programming devices to recognize voice commands
US5371901A (en) * 1991-07-08 1994-12-06 Motorola, Inc. Remote voice control system
US5375063A (en) * 1991-09-20 1994-12-20 Clemson University Apparatus and method for voice controlled apparel machine
US5325421A (en) * 1992-08-24 1994-06-28 At&T Bell Laboratories Voice directed communications system platform
US5479488A (en) * 1993-03-15 1995-12-26 Bell Canada Method and apparatus for automation of directory assistance using speech recognition

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
HOLMES J.N., "Speech Synthesis and Recognition", 1988, pp. 102-103, XP002915526 *
RABINER L.R., "Applications of Speech Recognition in the Area of Telecommunications", IEEE, 1997, pages 501-510, XP002915528 *
RABINER L.R., "Applications of Voice Processing to Telecommunication", PROCEEDINGS OF THE IEEE, Vol. 82, No. 2, February 1994, XP002915527 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001015140A1 (en) * 1999-07-01 2001-03-01 Telum Canada, Inc. Speech recognition system for data entry
CN112037792A (en) * 2020-08-20 2020-12-04 北京字节跳动网络技术有限公司 Voice recognition method and device, electronic equipment and storage medium
CN112037792B (en) * 2020-08-20 2022-06-17 北京字节跳动网络技术有限公司 Voice recognition method and device, electronic equipment and storage medium
CN112530398A (en) * 2020-11-14 2021-03-19 国网河南省电力公司检修公司 Portable human-computer interaction operation and maintenance device based on voice conversion function
CN112951237A (en) * 2021-03-18 2021-06-11 深圳奇实科技有限公司 Automatic voice recognition method and system based on artificial intelligence

Also Published As

Publication number Publication date
WO1999003092A3 (en) 1999-04-01
AU7978098A (en) 1999-02-08

Similar Documents

Publication Publication Date Title
EP0477688B1 (en) Voice recognition telephone dialing
US6070072A (en) Method and apparatus for intelligently generating an error report in a radio communication system
US6493558B1 (en) TD-SMS messaging gateway
US6327346B1 (en) Method and apparatus for setting user communication parameters based on voice identification of users
US6907264B1 (en) Methods and apparatus for modularization of real time and task oriented features in wireless communications
KR100273187B1 (en) Apparatus for inbound channel seclection in a communication system
US6400940B1 (en) Customized on line user guide
US6496693B1 (en) Method and apparatus for transmitting data to a pager in a communications system
CN101005666B (en) Method and system for sos call in mobile telecommunication terminal
EP0576205A2 (en) Automatic processing of calls with different communication modes in a telecommunications system
EP0701381A2 (en) Method and apparatus for determining the features assigned to a telephone subscriber
KR20020019081A (en) Method and apparatus for rejecting a request for call initialization
WO2001017297A1 (en) Method and apparatus for remote activation of wireless device features using short message services (sms)
CN101483689A (en) Methods and devices for dynamic menu update
JP2002524927A (en) Mobile telephone equipment and call transfer services
WO2002078301A1 (en) Method and system for multiple stage dialing using voice recognition
EP1495644B1 (en) Simultaneous nationwide update of database information on mobile communications devices
US6522725B2 (en) Speech recognition system capable of flexibly changing speech recognizing function without deteriorating quality of recognition result
RU2001132548A (en) A malicious call processing method and a switching apparatus for implementing this method
CN1573731A (en) Information providing method for vehicle and information providing apparatus for vehicle
WO1999003092A2 (en) Modular speech recognition system and method
ES2253869T3 (en) PROCEDURE OF ACTIVATION OF A CALL TRANSFER PROCESS, AND TELEPHONE TERMINAL FOR PRACTICE OF THE PROCEDURE.
EP0766445A2 (en) Audio communications interface, method and communications system for connecting a remote agent to a telephonic switch
US6615036B1 (en) Method and apparatus for storing data within a communication system
CN1753529A (en) Mobile communication terminal with data unload function and control method thereof

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AL AM AT AU AZ BB BG BR BY CA CH CN CZ DE DK EE ES FI GB GE HU IL IS JP KE KG KR KZ LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK TJ TM TR TT UA UG UZ VN

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE

AK Designated states

Kind code of ref document: A3

Designated state(s): AL AM AT AU AZ BB BG BR BY CA CH CN CZ DE DK EE ES FI GB GE HU IL IS JP KE KG KR KZ LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK TJ TM TR TT UA UG UZ VN

AL Designated countries for regional patents

Kind code of ref document: A3

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase in:

Ref country code: KR

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase in:

Ref country code: CA