US6871179B1 - Method and apparatus for executing voice commands having dictation as a parameter - Google Patents

Method and apparatus for executing voice commands having dictation as a parameter Download PDF

Info

Publication number
US6871179B1
US6871179B1 US09/348,425 US34842599A US6871179B1 US 6871179 B1 US6871179 B1 US 6871179B1 US 34842599 A US34842599 A US 34842599A US 6871179 B1 US6871179 B1 US 6871179B1
Authority
US
United States
Prior art keywords
command
dictation
component
voice command
pattern
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related, expires
Application number
US09/348,425
Inventor
Thomas A. Kist
Burn L. Lewis
Bruce D. Lucas
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nuance Communications Inc
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Family has litigation
First worldwide family litigation filed litigation Critical https://patents.darts-ip.com/?family=34272247&utm_source=google_patent&utm_medium=platform_link&utm_campaign=public_patent_search&patent=US6871179(B1) "Global patent litigation dataset” by Darts-ip is licensed under a Creative Commons Attribution 4.0 International License.
US case filed in Delaware District Court litigation https://portal.unifiedpatents.com/litigation/Delaware%20District%20Court/case/1%3A09-cv-00585 Source: District Court Jurisdiction: Delaware District Court "Unified Patents Litigation Data" by Unified Patents is licensed under a Creative Commons Attribution 4.0 International License.
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Priority to US09/348,425 priority Critical patent/US6871179B1/en
Assigned to INTERNATIONAL BUSINESS MACHINES CORPORATION reassignment INTERNATIONAL BUSINESS MACHINES CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: LUCAS, BRUCE D., LEWIS BURN L., KIST, THOMAS A.
Application granted granted Critical
Publication of US6871179B1 publication Critical patent/US6871179B1/en
Assigned to NUANCE COMMUNICATIONS, INC. reassignment NUANCE COMMUNICATIONS, INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: INTERNATIONAL BUSINESS MACHINES CORPORATION
Adjusted expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/226Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics
    • G10L2015/228Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics of application context

Definitions

  • This invention relates to the field of computer speech recognition and more particularly to a method and system for executing voice commands having ordinary dictation as a parameter.
  • Speech recognition is the process by which an acoustic signal received by microphone is converted into a set of words by a computer. These recognized words may then be used in a variety of computer software applications. For example, speech recognition may be used to input data, prepare documents and control the operation of system and application software.
  • Speech recognition systems can recognize and insert dictated text in a variety of software applications. For example, one can use a speech system to dictate a letter into a word processing document. Simply stated, a speech recognition engine receives the user's dictated words in the form of speech signals, which it processes using known algorithms. The processed signals are then “recognized” by identifying a corresponding text phrase in a vocabulary database. The text is then conveyed to an active software application, where it is displayed. This type of spoken utterance is considered to be ordinary dictation because it is merely transcribed and does not execute a control command.
  • the speech recognition system may also be used to control the operation of voice-enabled system and application software.
  • the software is controlled by a user issuing voice commands for performing system or application events.
  • NLU natural language understanding
  • finite state grammar systems There are two broad categories of speech recognition systems for executing voice commands: natural language understanding (NLU) systems and finite state grammar systems.
  • NLU systems permit total linguistic flexibility in command expression by recognizing as commands, spoken phrases in terms naturally intuitive to the speaker. For example, an NLU system is likely to recognize the spoken utterance, “would you be a dear and open the Pensky file for me?”, as instructing the system to execute a “file open” command for the file named “Pensky”.
  • NLU systems are extremely complex, and at this point, operate only on very sophisticated computers.
  • the vast majority of commercial speech recognition systems are finite grammar systems.
  • the user in the above example would utter a much more structured phrase, such as, “open file Pensky”.
  • the speech recognition engine Upon receiving the speech signals corresponding to the spoken phrase, the speech recognition engine processes the signals to determine whether they correspond to a command coded within one or more command sets or grammars. If so, the command is processed and executed by the software so as to perform the corresponding event, in this case, opening the “Pensky” file.
  • the simplest command grammars correlate each command or function that the system can perform to one speech command.
  • More advanced finite state grammar systems allow for increased linguistic flexibility by including alternative commands for performing each function, so that a speaker can utter any one of a number of expressions to perform the event.
  • these systems convert spoken phrases into one of a finite set of functional expressions using translation rules or by parsing annotation in the grammar.
  • existing speech recognition systems are capable of receiving speech signals from a user and either recognizing the signals as ordinary dictation or as a voice command for performing an event.
  • typical speech systems are unable to recognize voice commands that include ordinary dictation so as to execute a command having dictation as a parameter.
  • voice command is, “send a note to Bill regarding today's meeting”, which is intended to call up an E-mail application that will send a message to a colleague named “Bill” with “today's meeting” displayed in the message subject text field.
  • Typical speech systems are likely to interpret this statement as ordinary dictation, transcribing the entire spoken phrase as text in a document, despite the fact that it includes elements of both a command and ordinary dictation.
  • the statement may be recognized only as a command to execute the E-mail application, without inserting the dictation “today's meeting” in the subject line.
  • command grammars contain only a finite number of command patterns. It is impractical, if not impossible, to code into a command grammar the tens of thousands of words or word combinations in a given language. Thus, typical systems limit the grammar sets to contain phrases indicating functions relevant to performing computer software events. These functional phrases comprise a much smaller sub-set of an entire language, yet are extremely useful in carrying out software application events. Because the vast majority of phrases used in ordinary dictation are left out of the command grammars, typical finite speech systems are unable to incorporate the dictation portion in commands.
  • the present invention provides a method and system to execute voice commands, having ordinary dictation as a parameter, for performing system and application software events.
  • the present invention provides a method for executing a voice command in the form of a spoken utterance having a dictation portion.
  • the method begins by receiving a user input corresponding to the spoken utterance. This input is processed to identify a pattern of words forming the spoken utterance which matches a predetermined command pattern.
  • a computer system command is identified that corresponds to the pre-determined command pattern and has at least one parameter.
  • the one or more parameters are extracted from words contained in a dictation portion of the voice command which are distinct from the pattern of words matching the command pattern.
  • the computer system command is then processed to perform an event in accordance with the one or more command parameters.
  • the words forming the dictation portion of the voice command may be embedded within the pattern of words matching the command pattern.
  • the dictation portion of the voice command can be comprised of any set of words in a voice recognition engine vocabulary. Consequently, the event performed by the system can include inserting the dictation portion of the spoken utterance at a location in a word processing document or any other location specified by the computer system command.
  • the system may identify a pattern of words in the spoken utterance to match any one of a plurality of the pre-determined command patterns.
  • Each of the plurality of command patterns can belong to at least one pre-determined command pattern set.
  • a command pattern in any of the sets can only be matched when the set is in an active state.
  • the command pattern sets are placed in an active state according to the operating state of the computer system. If no pattern of words forming the spoken utterance matches the predetermined command pattern, the system provides a software application with recognized text.
  • Another preferred embodiment of the present includes a system for executing a voice command in the form of a spoken utterance having a dictation portion.
  • the system includes programming for receiving a user input corresponding to the spoken utterance.
  • the system also includes programming for identifying a pattern of words forming the spoken utterance which matches a pre-determined command pattern as well as a computer system command that corresponds to the pre-determined command pattern and has at least one parameter.
  • the system includes programming for extracting the one or more parameters from words contained in a dictation portion of the voice command, which are distinct from the pattern of words matching the command pattern.
  • the computer system command is then processed to perform an event in accordance with the one or more command parameters.
  • the words forming the dictation portion of the voice command may be embedded within the pattern of words matching the command pattern.
  • the dictation portion of the voice command can be comprised of any set of words in a voice recognition engine vocabulary. Consequently, the system can include programming for inserting the dictation portion of the spoken utterance at a location in a word processing document or any other location specified by the computer system command.
  • Yet another aspect of this system is that it includes programming to identify a pattern of words in the spoken utterance to match any one of a plurality of the pre-determined command patterns.
  • Each of the plurality of command patterns can belong to at least one pre-determined command pattern set.
  • a command pattern in any of the sets can only be matched when the set is in an active state.
  • the command pattern sets are placed in an active state according to the operating state of the computer system. If no pattern of words forming the spoken utterance matches the pre-determined command pattern, programming is included to provide a software application with recognized text.
  • the present invention provides the object and advantage of recognizing spoken utterances that include a combination of voice commands and ordinary dictation. Once recognized, events can be performed according to the dictation portion of the spoken utterances.
  • FIG. 1 shows a computer system for speech recognition with which the method and system of the present invention may be used
  • FIG. 2 is a block diagram showing a typical architecture for the computer system of FIG. 1 having a speech recognition engine
  • FIG. 3 is a block diagram showing the architecture for a speech recognition engine using multiple constraints in the recognition process.
  • FIG. 4 is a flow chart showing the process for executing voice commands incorporating ordinary dictation according to the present invention.
  • the computer system 10 is preferably comprised of a computer 12 having a central processor 14 , at least one memory device 16 and related electronic circuitry (not shown).
  • the computer system 10 also includes user input devices, a keyboard 18 and a pointing device 20 , a microphone 22 , audio loud speakers 24 , and a video display 26 , all of which are operatively connected to the computer 12 via suitable interface circuitry.
  • the keyboard 18 , pointing device 20 and loud speakers 24 may be a part of the computer system 10 , but are not required for the operation of the invention.
  • the computer system 10 can be satisfied by any one of many high-speed multi-media personal computers commercially available from manufacturers such as International Business Machines Corporation, Compaq, Hailed Packard, or Apple Computers.
  • the memory device 16 preferably includes an electronic random access memory module and a bulk storage device, such as a magnetic disk drive.
  • the central processor 14 may include any suitable processing chip, such as any of the Pentium family microprocessing chips commercially available from Intel Corporation.
  • FIG. 2 which illustrates a typical architecture for a computer system 10 adapted for speech recognition
  • the system includes an operating system 28 and a speech recognition system 30 .
  • the speech recognition system 30 comprises a speech recognition engine application 32 and a voice navigation application 34 .
  • a speech text processor application 36 may also be included.
  • the speech recognition engine application 32 can be used with any other application program which is to be voice enabled.
  • the speech recognition engine 32 , voice navigator 34 and text processor 36 are shown in FIG. 2 as separate application programs. It should be noted, however, that these applications could be implemented as a single, more complex application.
  • the operating system 28 is one of the Windows family of operating systems, such as Windows NT, Windows '95 or Windows '98, which are available from Microsoft Corporation of Redmond, Wash.
  • Windows NT Windows NT
  • Windows '95 Windows '98
  • the present invention is not limited in this regard, however, as it may also be used with any other type of computer operating system.
  • an analog audio signal containing speech commands is received by the microphone 22 and processed within the computer 12 by conventional audio circuitry, having an analog to digital convertor, which produces a digitized form of the signal.
  • the operating system 28 transfers the digital command signal to the speech recognition system 30 , where the command is recognized by the speech recognition engine 32 .
  • FIG. 3 illustrates an architecture for a finite grammar speech recognition system using multiple constraints during the recognition process.
  • the speech recognition engine 32 receives the digitized speech signal from the operating system 28 .
  • the signal is subsequently transformed in representation block 38 into a useful set of data by sampling the signal at some fixed rate, typically every 10-20 msec.
  • the representation block produces a new representation of the audio signal which can then be used in subsequent stages of the voice recognition process to determine the probability that the waveform portion just analyzed corresponds to a particular phonetic event. This process is intended to emphasize perceptually important speaker independent features of the speech signals received from the operating system.
  • classification block 40 the processed speech signal is used to identify a subset of probable phrases corresponding to the speech signal. This subset of probable phrases is searched at block 42 to obtain the recognized phrase.
  • classification block 40 is preferably performed by acoustic modeling block 44 , context modeling block 46 and lexical/grammatical modeling block 48 .
  • known algorithms process the speech command signal to adapt speaker-independent acoustic models, contained in memory 16 , to the acoustic signal of the current speaker and identify one or more probable matching phrases.
  • additional algorithms may be used to process the speech signal according to the current state of the computer system as well as context events, including prior commands, system control activities, timed activities, and application activation, occurring prior to or contemporaneously with the spoken command.
  • these data structures include activities such as: user inputs by voice, mouse, stylus or keyboard; operation of drop-down menus or buttons; the activation of applications or applets within an application; prior commands; and idle events, i.e., when no activity is logged in an event queue for a prescribed time period.
  • the system states and events can be statistically analyzed, using statistical modeling techniques, to identify one or more probable commands matching the context in which the command was given.
  • algorithms conform the digitized speech signal to lexical and grammatical models. These models are used to help restrict the number of possible words corresponding to a speech signal according to a word's use in relation to neighboring words.
  • the lexical model may be simply a vocabulary of words understood by the system.
  • the grammatical model sometimes referred to as a language model, may be specified simply as a finite state network, where the permissible words following each word are explicitly listed, but is preferably a more sophisticated finite grammar having a plurality of grammar sets containing multiple command patterns, as described below.
  • the present invention includes all three of the above-described modeling techniques.
  • the invention is not limited in this regard and can be performed using alternative modeling techniques.
  • it may be practiced without the event-based context modeling step.
  • each modeling technique may be performed independently from, or interdependently with, the other models.
  • the system receives a user input in the form of a spoken utterance.
  • Spoken utterances can be issued as ordinary dictation, voice commands or voice commands incorporating dictation.
  • Ordinary dictation is a spoken utterance which does not contain a pattern of words recognizable by the system for controlling the operation of system or application software. Instead, dictation is spoken merely to have the system convert the spoken words into text within an electronic document.
  • a user issues ordinary dictation when preparing a letter or inputting data within an application text field.
  • a voice command is a spoken utterance which causes the system to perform a pre-determined function within system or application software other than simply transcribing text, such as opening a file, deleting text in a document or repositioning an active “window”.
  • a voice command incorporating dictation is a combination of these two utterances, having words comprising a dictation portion embedded within a pattern of words comprising a command.
  • the system of the present invention can be used to recognize all three types of spoken utterances, the invention is intended to address the unique difficulties in recognizing voice commands incorporating dictation. Accordingly, the following discussion will focus on the voice commands mixed with dictation.
  • Typical speech systems are likely to interpret these mixed spoken utterances as ordinary dictation, transcribing the entire spoken phrase as text in a document, or the dictation may be ignored.
  • the primary reason existing speech systems have difficulty with these types of mixed voice commands is that the command grammars are, by necessity, coded with a limited number of command patterns. Because the vast majority of phrases used in ordinary dictation are left out of the command grammars, typical finite speech systems are unable to incorporate the dictation portion within the command.
  • step 52 the system recognizes whether the spoken utterance contains a recognizable pattern of words by comparing the recognized words against a plurality of predetermined command patterns of words contained in one or more active command pattern sets or grammars. Individual command pattern sets are placed in an active state depending upon the state in which the computer system is operating when the voice command is issued.
  • An exemplary mixed voice command is, “schedule a meeting on Thursday regarding next quarter's sales plan”. This utterance is issued by a user to initiate a voice-enabled scheduling application and insert the text “next quarter's sales plan” in a meeting text field at a calendar location for Thursday. The utterance is comprised of the command portion “schedule a meeting on Thursday regarding” and the dictation portion “next quarter's sales plan”.
  • the dictation may be comprised of any set of words in a voice recognition vocabulary, which could consist of tens of thousands of words.
  • the pattern of words forming the command on the other hand, must conform to the limited command patterns coded into one or more of the active command grammars.
  • the system searches the active command grammars for a command pattern corresponding to the recognized speech signals.
  • a corresponding command grammar can be coded to include separate scheduling commands for each day of the week.
  • a primary command pattern is coded into the grammar having a “day” variable marker indicating that the day of the week may any one of the days coded as a sub-command pattern in the same or a different grammar.
  • the system can schedule a meeting for any day of the week with only one primary pattern coded into the grammar.
  • This technique is not limited to the days of the week, and can be employed for any other category of terms such as months, colors, names, telephone numbers, etc.
  • step 54 if the system is unable to match the words of the spoken utterance with a corresponding pre-determined command pattern, the process advances to step 56 , at which point the entire phrase is deemed to be dictation.
  • the system then identifies one or more text phrases in a vocabulary set corresponding to the entire spoken utterance and inserts the text in an active software application.
  • a computer system command is a functional expression used by the system or application software for performing an event.
  • the speech recognition engine coordinates the grammar with a suitable scripting program to cast the computer system command in a form recognizable to the desired speech-enabled system or application software for performing the desired event.
  • the command expression includes one or more parameters corresponding to the voice command.
  • at step 60 at least one of these parameters is extracted from the dictation portion of the voice command. The entire dictation portion may constitute a command expression parameter, or it may be broken down into sub-portions used as separate parameters.
  • a translation rule for the above example is “schedule a meeting for ⁇ day> regarding ⁇ text>>schedulemeeting( ⁇ day>, ⁇ text>)”.
  • This rule includes two variable parameters ⁇ day> and ⁇ text>.
  • Appropriate day parameters are provided as a sub-pattern coded within the active command grammar. The day of the week spoken by the user is matched against this set of sub-patterns and used to identify the intended day of the week in the calendar.
  • the text parameter is extracted from the entire dictation portion “next quarters sales plan”.
  • the scripting program generates the computer system command: “schedulemeeting(Thursday, next quarter sales plan)”.
  • the command expression is sent to the active application to perform the event.
  • the application opens a scheduling program and inserts “next quarter's sales plan” in a suitable meeting text field for Thursday. The process then returns to step 50 to receive additional user input.
  • the present invention can be used to insert or “paste” the dictation portion of the spoken voice command into a system or application program.
  • the present invention is not limited in this regard as the dictation portion may be incorporated into a computer system command to perform any number of functions or events.
  • the user may issue the command “load all files regarding first quarter results”.
  • the system will recognize “load all files regarding” as a pattern of words matching a pre-determined grammar command.
  • a translation rule such as “load all files regarding ⁇ text>”loadfiles( ⁇ text>)”, having a single parameter comprising the dictation portion “first quarter results” is applied to create the computer system command “loadfiles(first quarter results)”.
  • This command is then used, for example, by a word processing application to search the file names of all stored documents for the text “first quarter results”, or the closest match, and then to open up the corresponding files.

Abstract

In a computer speech recognition system, the present invention provides a method and system for recognizing and executing a voice command that has a dictation portion. Upon receiving a user input, the spoken utterance is processed to identify a pattern of words which matches a pre-determined command pattern. Then, computer system command is identified that corresponds to the pre-determined command pattern and has at least one parameter. The parameter is extracted from a dictation portion of the spoken utterance which is separate from the pattern of words matching the command pattern. The computer system command is then processed to perform an event in accordance with the parameter. If the spoken utterance does not contain a pattern of words matching a pre-determined command pattern, then the spoken utterance is recognized as dictation and inserted at a specified location into an electronic document or other system or application software.

Description

CROSS REFERENCE TO RELATED APPLICATIONS
(Not Applicable).
STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT
(Not Applicable).
BACKGROUND OF THE INVENTION
1. Technical Field
This invention relates to the field of computer speech recognition and more particularly to a method and system for executing voice commands having ordinary dictation as a parameter.
2. Description of the Related Art
Speech recognition is the process by which an acoustic signal received by microphone is converted into a set of words by a computer. These recognized words may then be used in a variety of computer software applications. For example, speech recognition may be used to input data, prepare documents and control the operation of system and application software.
Speech recognition systems can recognize and insert dictated text in a variety of software applications. For example, one can use a speech system to dictate a letter into a word processing document. Simply stated, a speech recognition engine receives the user's dictated words in the form of speech signals, which it processes using known algorithms. The processed signals are then “recognized” by identifying a corresponding text phrase in a vocabulary database. The text is then conveyed to an active software application, where it is displayed. This type of spoken utterance is considered to be ordinary dictation because it is merely transcribed and does not execute a control command.
As mentioned, the speech recognition system may also be used to control the operation of voice-enabled system and application software. Typically, the software is controlled by a user issuing voice commands for performing system or application events. There are two broad categories of speech recognition systems for executing voice commands: natural language understanding (NLU) systems and finite state grammar systems. NLU systems permit total linguistic flexibility in command expression by recognizing as commands, spoken phrases in terms naturally intuitive to the speaker. For example, an NLU system is likely to recognize the spoken utterance, “would you be a dear and open the Pensky file for me?”, as instructing the system to execute a “file open” command for the file named “Pensky”. However, NLU systems are extremely complex, and at this point, operate only on very sophisticated computers.
Consequently, the vast majority of commercial speech recognition systems are finite grammar systems. In a simple finite grammar system, the user in the above example would utter a much more structured phrase, such as, “open file Pensky”. Upon receiving the speech signals corresponding to the spoken phrase, the speech recognition engine processes the signals to determine whether they correspond to a command coded within one or more command sets or grammars. If so, the command is processed and executed by the software so as to perform the corresponding event, in this case, opening the “Pensky” file.
The simplest command grammars correlate each command or function that the system can perform to one speech command. More advanced finite state grammar systems allow for increased linguistic flexibility by including alternative commands for performing each function, so that a speaker can utter any one of a number of expressions to perform the event. Typically, these systems convert spoken phrases into one of a finite set of functional expressions using translation rules or by parsing annotation in the grammar. These systems, despite having a finite grammar system, enable a user to speak more naturally when issuing voice commands.
As stated, existing speech recognition systems are capable of receiving speech signals from a user and either recognizing the signals as ordinary dictation or as a voice command for performing an event. However, typical speech systems are unable to recognize voice commands that include ordinary dictation so as to execute a command having dictation as a parameter.
One example of such a voice command is, “send a note to Bill regarding today's meeting”, which is intended to call up an E-mail application that will send a message to a colleague named “Bill” with “today's meeting” displayed in the message subject text field. Typical speech systems are likely to interpret this statement as ordinary dictation, transcribing the entire spoken phrase as text in a document, despite the fact that it includes elements of both a command and ordinary dictation. Alternatively, the statement may be recognized only as a command to execute the E-mail application, without inserting the dictation “today's meeting” in the subject line.
A basic reason existing speech systems have difficulty with these types of mixed voice commands is that the command grammars contain only a finite number of command patterns. It is impractical, if not impossible, to code into a command grammar the tens of thousands of words or word combinations in a given language. Thus, typical systems limit the grammar sets to contain phrases indicating functions relevant to performing computer software events. These functional phrases comprise a much smaller sub-set of an entire language, yet are extremely useful in carrying out software application events. Because the vast majority of phrases used in ordinary dictation are left out of the command grammars, typical finite speech systems are unable to incorporate the dictation portion in commands.
Accordingly, there is a need to provide a finite grammar speech recognition system able to execute voice commands having ordinary dictation as a parameter.
SUMMARY OF THE INVENTION
The present invention provides a method and system to execute voice commands, having ordinary dictation as a parameter, for performing system and application software events.
Specifically, in a system adapted for speech recognition, the present invention provides a method for executing a voice command in the form of a spoken utterance having a dictation portion. The method begins by receiving a user input corresponding to the spoken utterance. This input is processed to identify a pattern of words forming the spoken utterance which matches a predetermined command pattern. A computer system command is identified that corresponds to the pre-determined command pattern and has at least one parameter. The one or more parameters are extracted from words contained in a dictation portion of the voice command which are distinct from the pattern of words matching the command pattern. The computer system command is then processed to perform an event in accordance with the one or more command parameters.
Another aspect of the invention is that the words forming the dictation portion of the voice command may be embedded within the pattern of words matching the command pattern. The dictation portion of the voice command can be comprised of any set of words in a voice recognition engine vocabulary. Consequently, the event performed by the system can include inserting the dictation portion of the spoken utterance at a location in a word processing document or any other location specified by the computer system command.
Still another aspect of the invention is that the system may identify a pattern of words in the spoken utterance to match any one of a plurality of the pre-determined command patterns. Each of the plurality of command patterns can belong to at least one pre-determined command pattern set. According to a preferred embodiment, a command pattern in any of the sets can only be matched when the set is in an active state. The command pattern sets are placed in an active state according to the operating state of the computer system. If no pattern of words forming the spoken utterance matches the predetermined command pattern, the system provides a software application with recognized text.
Another preferred embodiment of the present includes a system for executing a voice command in the form of a spoken utterance having a dictation portion. Specifically, the system includes programming for receiving a user input corresponding to the spoken utterance. The system also includes programming for identifying a pattern of words forming the spoken utterance which matches a pre-determined command pattern as well as a computer system command that corresponds to the pre-determined command pattern and has at least one parameter. Also, the system includes programming for extracting the one or more parameters from words contained in a dictation portion of the voice command, which are distinct from the pattern of words matching the command pattern. The computer system command is then processed to perform an event in accordance with the one or more command parameters.
Another aspect of this system is that the words forming the dictation portion of the voice command may be embedded within the pattern of words matching the command pattern. The dictation portion of the voice command can be comprised of any set of words in a voice recognition engine vocabulary. Consequently, the system can include programming for inserting the dictation portion of the spoken utterance at a location in a word processing document or any other location specified by the computer system command.
Yet another aspect of this system is that it includes programming to identify a pattern of words in the spoken utterance to match any one of a plurality of the pre-determined command patterns. Each of the plurality of command patterns can belong to at least one pre-determined command pattern set. According to another preferred embodiment, a command pattern in any of the sets can only be matched when the set is in an active state. The command pattern sets are placed in an active state according to the operating state of the computer system. If no pattern of words forming the spoken utterance matches the pre-determined command pattern, programming is included to provide a software application with recognized text.
Thus, the present invention provides the object and advantage of recognizing spoken utterances that include a combination of voice commands and ordinary dictation. Once recognized, events can be performed according to the dictation portion of the spoken utterances.
These and other objects, advantages and aspects of the invention will become apparent from the following description. In the description, reference is made to the accompanying drawings which form a part hereof, and in which there is shown a preferred embodiment of the invention. Such embodiment does not necessarily represent the full scope of the invention and reference is made therefore, to the claims herein for interpreting the scope of the invention.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 shows a computer system for speech recognition with which the method and system of the present invention may be used;
FIG. 2 is a block diagram showing a typical architecture for the computer system of FIG. 1 having a speech recognition engine;
FIG. 3 is a block diagram showing the architecture for a speech recognition engine using multiple constraints in the recognition process; and
FIG. 4 is a flow chart showing the process for executing voice commands incorporating ordinary dictation according to the present invention.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
Referring to the drawings in detail, wherein like reference characters represent corresponding elements throughout the several views, more specifically referring to FIG. 1, a computer system with which the present invention may be practiced is referred to generally by reference number 10. Referring to FIGS. 1 & 2, the computer system 10 is preferably comprised of a computer 12 having a central processor 14, at least one memory device 16 and related electronic circuitry (not shown). The computer system 10 also includes user input devices, a keyboard 18 and a pointing device 20, a microphone 22, audio loud speakers 24, and a video display 26, all of which are operatively connected to the computer 12 via suitable interface circuitry. The keyboard 18, pointing device 20 and loud speakers 24 may be a part of the computer system 10, but are not required for the operation of the invention.
Generally, the computer system 10, as described above, can be satisfied by any one of many high-speed multi-media personal computers commercially available from manufacturers such as International Business Machines Corporation, Compaq, Hailed Packard, or Apple Computers. The memory device 16 preferably includes an electronic random access memory module and a bulk storage device, such as a magnetic disk drive. The central processor 14 may include any suitable processing chip, such as any of the Pentium family microprocessing chips commercially available from Intel Corporation.
Referring to FIG. 2, which illustrates a typical architecture for a computer system 10 adapted for speech recognition, the system includes an operating system 28 and a speech recognition system 30. The speech recognition system 30 comprises a speech recognition engine application 32 and a voice navigation application 34. A speech text processor application 36 may also be included. However, the invention is not limited in this regard and the speech recognition engine application 32 can be used with any other application program which is to be voice enabled. Also, the speech recognition engine 32, voice navigator 34 and text processor 36 are shown in FIG. 2 as separate application programs. It should be noted, however, that these applications could be implemented as a single, more complex application.
In a preferred embodiment, the operating system 28 is one of the Windows family of operating systems, such as Windows NT, Windows '95 or Windows '98, which are available from Microsoft Corporation of Redmond, Wash. The present invention is not limited in this regard, however, as it may also be used with any other type of computer operating system.
Referring still to FIG. 2, in general, an analog audio signal containing speech commands is received by the microphone 22 and processed within the computer 12 by conventional audio circuitry, having an analog to digital convertor, which produces a digitized form of the signal. The operating system 28 transfers the digital command signal to the speech recognition system 30, where the command is recognized by the speech recognition engine 32.
FIG. 3 illustrates an architecture for a finite grammar speech recognition system using multiple constraints during the recognition process. Generally, the speech recognition engine 32 receives the digitized speech signal from the operating system 28. The signal is subsequently transformed in representation block 38 into a useful set of data by sampling the signal at some fixed rate, typically every 10-20 msec. The representation block produces a new representation of the audio signal which can then be used in subsequent stages of the voice recognition process to determine the probability that the waveform portion just analyzed corresponds to a particular phonetic event. This process is intended to emphasize perceptually important speaker independent features of the speech signals received from the operating system. In classification block 40, the processed speech signal is used to identify a subset of probable phrases corresponding to the speech signal. This subset of probable phrases is searched at block 42 to obtain the recognized phrase.
Referring still to FIG. 3, classification block 40 is preferably performed by acoustic modeling block 44, context modeling block 46 and lexical/grammatical modeling block 48. At acoustic modeling block 44, known algorithms process the speech command signal to adapt speaker-independent acoustic models, contained in memory 16, to the acoustic signal of the current speaker and identify one or more probable matching phrases.
At block 46, additional algorithms may be used to process the speech signal according to the current state of the computer system as well as context events, including prior commands, system control activities, timed activities, and application activation, occurring prior to or contemporaneously with the spoken command. Specifically, these data structures include activities such as: user inputs by voice, mouse, stylus or keyboard; operation of drop-down menus or buttons; the activation of applications or applets within an application; prior commands; and idle events, i.e., when no activity is logged in an event queue for a prescribed time period. The system states and events can be statistically analyzed, using statistical modeling techniques, to identify one or more probable commands matching the context in which the command was given.
At block 48, algorithms conform the digitized speech signal to lexical and grammatical models. These models are used to help restrict the number of possible words corresponding to a speech signal according to a word's use in relation to neighboring words. The lexical model may be simply a vocabulary of words understood by the system. The grammatical model, sometimes referred to as a language model, may be specified simply as a finite state network, where the permissible words following each word are explicitly listed, but is preferably a more sophisticated finite grammar having a plurality of grammar sets containing multiple command patterns, as described below.
In its preferred embodiment, the present invention includes all three of the above-described modeling techniques. However, the invention is not limited in this regard and can be performed using alternative modeling techniques. For example, it may be practiced without the event-based context modeling step. Also, each modeling technique may be performed independently from, or interdependently with, the other models.
Referring now to FIG. 4, at step 50, the system receives a user input in the form of a spoken utterance. Spoken utterances can be issued as ordinary dictation, voice commands or voice commands incorporating dictation. Ordinary dictation is a spoken utterance which does not contain a pattern of words recognizable by the system for controlling the operation of system or application software. Instead, dictation is spoken merely to have the system convert the spoken words into text within an electronic document. Typically, a user issues ordinary dictation when preparing a letter or inputting data within an application text field. On the other hand, a voice command is a spoken utterance which causes the system to perform a pre-determined function within system or application software other than simply transcribing text, such as opening a file, deleting text in a document or repositioning an active “window”. A voice command incorporating dictation is a combination of these two utterances, having words comprising a dictation portion embedded within a pattern of words comprising a command.
Although the system of the present invention can be used to recognize all three types of spoken utterances, the invention is intended to address the unique difficulties in recognizing voice commands incorporating dictation. Accordingly, the following discussion will focus on the voice commands mixed with dictation.
Typical speech systems are likely to interpret these mixed spoken utterances as ordinary dictation, transcribing the entire spoken phrase as text in a document, or the dictation may be ignored. As mentioned above, the primary reason existing speech systems have difficulty with these types of mixed voice commands is that the command grammars are, by necessity, coded with a limited number of command patterns. Because the vast majority of phrases used in ordinary dictation are left out of the command grammars, typical finite speech systems are unable to incorporate the dictation portion within the command.
Referring still to FIG. 4, the process advances to step 52, wherein the system recognizes whether the spoken utterance contains a recognizable pattern of words by comparing the recognized words against a plurality of predetermined command patterns of words contained in one or more active command pattern sets or grammars. Individual command pattern sets are placed in an active state depending upon the state in which the computer system is operating when the voice command is issued.
An exemplary mixed voice command is, “schedule a meeting on Thursday regarding next quarter's sales plan”. This utterance is issued by a user to initiate a voice-enabled scheduling application and insert the text “next quarter's sales plan” in a meeting text field at a calendar location for Thursday. The utterance is comprised of the command portion “schedule a meeting on Thursday regarding” and the dictation portion “next quarter's sales plan”.
The dictation may be comprised of any set of words in a voice recognition vocabulary, which could consist of tens of thousands of words. The pattern of words forming the command, on the other hand, must conform to the limited command patterns coded into one or more of the active command grammars. Thus, the system searches the active command grammars for a command pattern corresponding to the recognized speech signals.
For the above example, a corresponding command grammar can be coded to include separate scheduling commands for each day of the week. Preferably, however, a primary command pattern is coded into the grammar having a “day” variable marker indicating that the day of the week may any one of the days coded as a sub-command pattern in the same or a different grammar. In this way, the system can schedule a meeting for any day of the week with only one primary pattern coded into the grammar. This technique is not limited to the days of the week, and can be employed for any other category of terms such as months, colors, names, telephone numbers, etc.
At step 54, if the system is unable to match the words of the spoken utterance with a corresponding pre-determined command pattern, the process advances to step 56, at which point the entire phrase is deemed to be dictation. The system then identifies one or more text phrases in a vocabulary set corresponding to the entire spoken utterance and inserts the text in an active software application.
Otherwise, at step 58, if a matching command pattern is identified in the preceding step, the system identifies a corresponding computer system command expression. A computer system command is a functional expression used by the system or application software for performing an event. The speech recognition engine coordinates the grammar with a suitable scripting program to cast the computer system command in a form recognizable to the desired speech-enabled system or application software for performing the desired event. The command expression includes one or more parameters corresponding to the voice command. At step 60, at least one of these parameters is extracted from the dictation portion of the voice command. The entire dictation portion may constitute a command expression parameter, or it may be broken down into sub-portions used as separate parameters.
It will be appreciated by those skilled in the art that the precise mechanism for identifying and scripting the command expression can vary from system to system. One approach involves the use of translation or rewrite rules. These rules are coded within the command grammars to generate the command expression. For instance, a translation rule for the above example is “schedule a meeting for <day> regarding <text>>schedulemeeting(<day>, <text>)”. This rule includes two variable parameters <day> and <text>. Appropriate day parameters are provided as a sub-pattern coded within the active command grammar. The day of the week spoken by the user is matched against this set of sub-patterns and used to identify the intended day of the week in the calendar. The text parameter is extracted from the entire dictation portion “next quarters sales plan”. Thus, applying the rule to the spoken utterance of the example, the scripting program generates the computer system command: “schedulemeeting(Thursday, next quarter sales plan)”.
This is one example of a translation rule and it will be recognized by those skilled in the art that many different translation rules are possible for use with different commands and formats, all such rules being within the scope of the invention. For systems that do not support translation rules, different methods of producing the command expression of the command would be required. For example, a parsing program can be used for this purpose to parse annotation in the grammar set. It will be appreciated by those skilled in the art 1 that any suitable procedure can be used to accomplish the foregoing result provided that it is capable of taking a given phrase and determining its functional command expression or result.
Referring still to FIG. 4, at step 62, the command expression is sent to the active application to perform the event. In particular, for the above example, the application opens a scheduling program and inserts “next quarter's sales plan” in a suitable meeting text field for Thursday. The process then returns to step 50 to receive additional user input.
The above example illustrates that the present invention can be used to insert or “paste” the dictation portion of the spoken voice command into a system or application program. However, the present invention is not limited in this regard as the dictation portion may be incorporated into a computer system command to perform any number of functions or events. For example, the user may issue the command “load all files regarding first quarter results”. In this example, the system will recognize “load all files regarding” as a pattern of words matching a pre-determined grammar command. A translation rule such as “load all files regarding <text>”loadfiles(<text>)”, having a single parameter comprising the dictation portion “first quarter results” is applied to create the computer system command “loadfiles(first quarter results)”. This command is then used, for example, by a word processing application to search the file names of all stored documents for the text “first quarter results”, or the closest match, and then to open up the corresponding files.
While the foregoing specification illustrates and describes the preferred embodiments of the invention, it is to be understood that the invention is not limited to the precise construction herein disclosed. The invention can be embodied in other specific forms without departing from the spirit or essential attributes of the invention. Accordingly, reference should be made to the following claims, rather than to the foregoing specification, as indicating the scope of the invention.

Claims (10)

1. In a speech recognition system, a method of processing a voice command comprising:
identifying a voice command having a voice command component and a dictation component within a contiguous utterance, wherein said voice command component is specified by a command grammar and said dictation component is free-form text which is not specified by said command grammar, and wherein said dictation component is embedded within said voice command; and
executing said identified voice command component using at least a part of said dictation component as an execution parameter of said voice command.
2. The method of claim 1, wherein said executing step comprises:
loading a translation rule and linking said voice command component to an application command using said translation rule; and
providing said application command to an associated computing application.
3. The method of claim 2, wherein said providing step comprises providing said at least a part of said dictation component as a parameter of said application command to said associated computing application.
4. The method of claim 3, wherein said providing step further comprises inserting said at least a part of said dictation component in a text field of said associated computing application.
5. The method of claim 1, wherein said executing step comprises providing said voice command component to an associated computing application for processing, and further providing said at least a part of said dictation component as a parameter of said voice command to said computing application.
6. A machine-readable storage, having stored thereon a computer program having a plurality of code sections executable by a machine for causing the machine to perform the steps of:
identifying a voice command having a voice command component and a dictation component within a contiguous utterance, wherein said voice command component is specified by a command grammar and said dictation component is free-form text which is not specified by said command grammar, and wherein said dictation component is embedded within said voice command; and
executing said identified voice command component using at least a part of said dictation component as an execution parameter of said voice command.
7. The machine-readable storage of claim 6, further comprising:
loading a translation rule and linking said voice command component to an application command using said translation rule; and
providing said application command to an associated computing application.
8. The machine-readable storage of claim 7, wherein said providing step comprises providing said at least a part of said dictation component as a parameter of said application command to said associated computing application.
9. The machine-readable storage of claim 8, wherein said providing step further comprises inserting said at least a part of said dictation component in a text field of said associated computing application.
10. The machine-readable storage of claim 6, wherein said executing step comprises providing said voice command component to an associated computing application for processing, and further providing said at least a part of said dictation component as a parameter of said voice command to said computing application.
US09/348,425 1999-07-07 1999-07-07 Method and apparatus for executing voice commands having dictation as a parameter Expired - Fee Related US6871179B1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US09/348,425 US6871179B1 (en) 1999-07-07 1999-07-07 Method and apparatus for executing voice commands having dictation as a parameter

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US09/348,425 US6871179B1 (en) 1999-07-07 1999-07-07 Method and apparatus for executing voice commands having dictation as a parameter

Publications (1)

Publication Number Publication Date
US6871179B1 true US6871179B1 (en) 2005-03-22

Family

ID=34272247

Family Applications (1)

Application Number Title Priority Date Filing Date
US09/348,425 Expired - Fee Related US6871179B1 (en) 1999-07-07 1999-07-07 Method and apparatus for executing voice commands having dictation as a parameter

Country Status (1)

Country Link
US (1) US6871179B1 (en)

Cited By (42)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040133874A1 (en) * 2001-03-30 2004-07-08 Siemens Ag Computer and control method therefor
US20050119896A1 (en) * 1999-11-12 2005-06-02 Bennett Ian M. Adjustable resource based speech recognition system
US7050977B1 (en) 1999-11-12 2006-05-23 Phoenix Solutions, Inc. Speech-enabled server for internet website and method
US20060198608A1 (en) * 2005-03-04 2006-09-07 Girardi Frank D Method and apparatus for coaching athletic teams
US20060235696A1 (en) * 1999-11-12 2006-10-19 Bennett Ian M Network based interactive speech recognition system
US20060279548A1 (en) * 2005-06-08 2006-12-14 Geaghan Bernard O Touch location determination involving multiple touch location processes
US20070185702A1 (en) * 2006-02-09 2007-08-09 John Harney Language independent parsing in natural language systems
US20080167871A1 (en) * 2007-01-04 2008-07-10 Samsung Electronics Co., Ltd. Method and apparatus for speech recognition using device usage pattern of user
US20080215327A1 (en) * 1999-11-12 2008-09-04 Bennett Ian M Method For Processing Speech Data For A Distributed Recognition System
US20080300886A1 (en) * 2007-05-17 2008-12-04 Kimberly Patch Systems and methods of a structured grammar for a speech recognition command system
US20080312929A1 (en) * 2007-06-12 2008-12-18 International Business Machines Corporation Using finite state grammars to vary output generated by a text-to-speech system
US20090132256A1 (en) * 2007-11-16 2009-05-21 Embarq Holdings Company, Llc Command and control of devices and applications by voice using a communication base system
US20090234655A1 (en) * 2008-03-13 2009-09-17 Jason Kwon Mobile electronic device with active speech recognition
US20100040207A1 (en) * 2005-01-14 2010-02-18 At&T Intellectual Property I, L.P. System and Method for Independently Recognizing and Selecting Actions and Objects in a Speech Recognition System
US20100169098A1 (en) * 2007-05-17 2010-07-01 Kimberly Patch System and method of a list commands utility for a speech recognition command system
US7802183B1 (en) * 2001-05-17 2010-09-21 Essin Daniel J Electronic record management system
US8165886B1 (en) * 2007-10-04 2012-04-24 Great Northern Research LLC Speech interface system and method for control and interaction with applications on a computing system
US20130166290A1 (en) * 2011-12-26 2013-06-27 Denso Corporation Voice recognition apparatus
US8694683B2 (en) 1999-12-29 2014-04-08 Implicit Networks, Inc. Method and system for data demultiplexing
US8751232B2 (en) 2004-08-12 2014-06-10 At&T Intellectual Property I, L.P. System and method for targeted tuning of a speech recognition system
US8824659B2 (en) 2005-01-10 2014-09-02 At&T Intellectual Property I, L.P. System and method for speech-enabled call routing
US20140343950A1 (en) * 2013-05-15 2014-11-20 Maluuba Inc. Interactive user interface for an intelligent assistant
WO2015025330A1 (en) * 2013-08-21 2015-02-26 Kale Aaditya Kishore A system to enable user to interact with an electronic processing device using voice of the user
US9009042B1 (en) * 2013-04-29 2015-04-14 Google Inc. Machine translation of indirect speech
US9112972B2 (en) 2004-12-06 2015-08-18 Interactions Llc System and method for processing speech
US20160078773A1 (en) * 2014-09-17 2016-03-17 Voicebox Technologies Corporation System and method of providing task-based solicitation of request related user inputs
US9406078B2 (en) 2007-02-06 2016-08-02 Voicebox Technologies Corporation System and method for delivering targeted advertisements and/or providing natural language processing based on advertisements
US9570070B2 (en) 2009-02-20 2017-02-14 Voicebox Technologies Corporation System and method for processing multi-modal device interactions in a natural language voice services environment
US9620113B2 (en) 2007-12-11 2017-04-11 Voicebox Technologies Corporation System and method for providing a natural language voice user interface
US9626703B2 (en) 2014-09-16 2017-04-18 Voicebox Technologies Corporation Voice commerce
US9711143B2 (en) 2008-05-27 2017-07-18 Voicebox Technologies Corporation System and method for an integrated, multi-modal, multi-device natural language voice services environment
US9747896B2 (en) 2014-10-15 2017-08-29 Voicebox Technologies Corporation System and method for providing follow-up responses to prior natural language inputs of a user
US9898459B2 (en) 2014-09-16 2018-02-20 Voicebox Technologies Corporation Integration of domain information into state transitions of a finite state transducer for natural language processing
CN108010523A (en) * 2016-11-02 2018-05-08 松下电器(美国)知识产权公司 Information processing method and recording medium
WO2018097969A1 (en) * 2016-11-22 2018-05-31 Knowles Electronics, Llc Methods and systems for locating the end of the keyword in voice sensing
US20190027150A1 (en) * 2016-03-29 2019-01-24 Alibaba Group Holding Limited Audio message processing method and apparatus
US10297249B2 (en) 2006-10-16 2019-05-21 Vb Assets, Llc System and method for a cooperative conversational voice user interface
US10331784B2 (en) 2016-07-29 2019-06-25 Voicebox Technologies Corporation System and method of disambiguating natural language processing requests
CN110136702A (en) * 2018-02-09 2019-08-16 宏碁股份有限公司 Speech recognition system and its method
US10431214B2 (en) 2014-11-26 2019-10-01 Voicebox Technologies Corporation System and method of determining a domain and/or an action related to a natural language input
US10614799B2 (en) 2014-11-26 2020-04-07 Voicebox Technologies Corporation System and method of providing intent predictions for an utterance prior to a system detection of an end of the utterance
US11599332B1 (en) 2007-10-04 2023-03-07 Great Northern Research, LLC Multiple shell multi faceted graphical user interface

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5659665A (en) * 1994-12-08 1997-08-19 Lucent Technologies Inc. Method and apparatus for including speech recognition capabilities in a computer system
US5664061A (en) * 1993-04-21 1997-09-02 International Business Machines Corporation Interactive computer system recognizing spoken commands
US5799279A (en) * 1995-11-13 1998-08-25 Dragon Systems, Inc. Continuous speech recognition of text and commands
US6081782A (en) * 1993-12-29 2000-06-27 Lucent Technologies Inc. Voice command control and verification system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5664061A (en) * 1993-04-21 1997-09-02 International Business Machines Corporation Interactive computer system recognizing spoken commands
US6081782A (en) * 1993-12-29 2000-06-27 Lucent Technologies Inc. Voice command control and verification system
US5659665A (en) * 1994-12-08 1997-08-19 Lucent Technologies Inc. Method and apparatus for including speech recognition capabilities in a computer system
US5799279A (en) * 1995-11-13 1998-08-25 Dragon Systems, Inc. Continuous speech recognition of text and commands
US6088671A (en) * 1995-11-13 2000-07-11 Dragon Systems Continuous speech recognition of text and commands

Cited By (106)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7647225B2 (en) 1999-11-12 2010-01-12 Phoenix Solutions, Inc. Adjustable resource based speech recognition system
US20080300878A1 (en) * 1999-11-12 2008-12-04 Bennett Ian M Method For Transporting Speech Data For A Distributed Recognition System
US7050977B1 (en) 1999-11-12 2006-05-23 Phoenix Solutions, Inc. Speech-enabled server for internet website and method
US8229734B2 (en) 1999-11-12 2012-07-24 Phoenix Solutions, Inc. Semantic decoding of user queries
US20060235696A1 (en) * 1999-11-12 2006-10-19 Bennett Ian M Network based interactive speech recognition system
US7657424B2 (en) 1999-11-12 2010-02-02 Phoenix Solutions, Inc. System and method for processing sentence based queries
US20070179789A1 (en) * 1999-11-12 2007-08-02 Bennett Ian M Speech Recognition System With Support For Variable Portable Devices
US8762152B2 (en) 1999-11-12 2014-06-24 Nuance Communications, Inc. Speech recognition system interactive agent
US8352277B2 (en) 1999-11-12 2013-01-08 Phoenix Solutions, Inc. Method of interacting through speech with a web-connected server
US20080052078A1 (en) * 1999-11-12 2008-02-28 Bennett Ian M Statistical Language Model Trained With Semantic Variants
US9076448B2 (en) 1999-11-12 2015-07-07 Nuance Communications, Inc. Distributed real time speech recognition system
US20080215327A1 (en) * 1999-11-12 2008-09-04 Bennett Ian M Method For Processing Speech Data For A Distributed Recognition System
US20080255845A1 (en) * 1999-11-12 2008-10-16 Bennett Ian M Speech Based Query System Using Semantic Decoding
US7725307B2 (en) 1999-11-12 2010-05-25 Phoenix Solutions, Inc. Query engine for processing voice based queries including semantic decoding
US7912702B2 (en) 1999-11-12 2011-03-22 Phoenix Solutions, Inc. Statistical language model trained with semantic variants
US7873519B2 (en) 1999-11-12 2011-01-18 Phoenix Solutions, Inc. Natural language speech lattice containing semantic variants
US7831426B2 (en) 1999-11-12 2010-11-09 Phoenix Solutions, Inc. Network based interactive speech recognition system
US9190063B2 (en) 1999-11-12 2015-11-17 Nuance Communications, Inc. Multi-language speech recognition system
US20070185717A1 (en) * 1999-11-12 2007-08-09 Bennett Ian M Method of interacting through speech with a web-connected server
US20050119896A1 (en) * 1999-11-12 2005-06-02 Bennett Ian M. Adjustable resource based speech recognition system
US7725321B2 (en) 1999-11-12 2010-05-25 Phoenix Solutions, Inc. Speech based query system using semantic decoding
US7672841B2 (en) 1999-11-12 2010-03-02 Phoenix Solutions, Inc. Method for processing speech data for a distributed recognition system
US7698131B2 (en) 1999-11-12 2010-04-13 Phoenix Solutions, Inc. Speech recognition system for client devices having differing computing capabilities
US7702508B2 (en) 1999-11-12 2010-04-20 Phoenix Solutions, Inc. System and method for natural language processing of query answers
US7725320B2 (en) 1999-11-12 2010-05-25 Phoenix Solutions, Inc. Internet based speech recognition system with dynamic grammars
US7729904B2 (en) 1999-11-12 2010-06-01 Phoenix Solutions, Inc. Partial speech processing device and method for use in distributed systems
US8694683B2 (en) 1999-12-29 2014-04-08 Implicit Networks, Inc. Method and system for data demultiplexing
US9270790B2 (en) 1999-12-29 2016-02-23 Implicit, Llc Method and system for data demultiplexing
US9591104B2 (en) 1999-12-29 2017-03-07 Implicit, Llc Method and system for data demultiplexing
US10027780B2 (en) 1999-12-29 2018-07-17 Implicit, Llc Method and system for data demultiplexing
US10033839B2 (en) 1999-12-29 2018-07-24 Implicit, Llc Method and system for data demultiplexing
US10225378B2 (en) 1999-12-29 2019-03-05 Implicit, Llc Method and system for data demultiplexing
US20040133874A1 (en) * 2001-03-30 2004-07-08 Siemens Ag Computer and control method therefor
US7802183B1 (en) * 2001-05-17 2010-09-21 Essin Daniel J Electronic record management system
US9368111B2 (en) 2004-08-12 2016-06-14 Interactions Llc System and method for targeted tuning of a speech recognition system
US8751232B2 (en) 2004-08-12 2014-06-10 At&T Intellectual Property I, L.P. System and method for targeted tuning of a speech recognition system
US9112972B2 (en) 2004-12-06 2015-08-18 Interactions Llc System and method for processing speech
US9350862B2 (en) 2004-12-06 2016-05-24 Interactions Llc System and method for processing speech
US9088652B2 (en) 2005-01-10 2015-07-21 At&T Intellectual Property I, L.P. System and method for speech-enabled call routing
US8824659B2 (en) 2005-01-10 2014-09-02 At&T Intellectual Property I, L.P. System and method for speech-enabled call routing
US20100040207A1 (en) * 2005-01-14 2010-02-18 At&T Intellectual Property I, L.P. System and Method for Independently Recognizing and Selecting Actions and Objects in a Speech Recognition System
US7966176B2 (en) * 2005-01-14 2011-06-21 At&T Intellectual Property I, L.P. System and method for independently recognizing and selecting actions and objects in a speech recognition system
US20060198608A1 (en) * 2005-03-04 2006-09-07 Girardi Frank D Method and apparatus for coaching athletic teams
US20060279548A1 (en) * 2005-06-08 2006-12-14 Geaghan Bernard O Touch location determination involving multiple touch location processes
US20070185702A1 (en) * 2006-02-09 2007-08-09 John Harney Language independent parsing in natural language systems
US8229733B2 (en) 2006-02-09 2012-07-24 John Harney Method and apparatus for linguistic independent parsing in a natural language systems
US10297249B2 (en) 2006-10-16 2019-05-21 Vb Assets, Llc System and method for a cooperative conversational voice user interface
US11222626B2 (en) 2006-10-16 2022-01-11 Vb Assets, Llc System and method for a cooperative conversational voice user interface
US10510341B1 (en) 2006-10-16 2019-12-17 Vb Assets, Llc System and method for a cooperative conversational voice user interface
US10515628B2 (en) 2006-10-16 2019-12-24 Vb Assets, Llc System and method for a cooperative conversational voice user interface
US10755699B2 (en) 2006-10-16 2020-08-25 Vb Assets, Llc System and method for a cooperative conversational voice user interface
US20080167871A1 (en) * 2007-01-04 2008-07-10 Samsung Electronics Co., Ltd. Method and apparatus for speech recognition using device usage pattern of user
US10529329B2 (en) 2007-01-04 2020-01-07 Samsung Electronics Co., Ltd. Method and apparatus for speech recognition using device usage pattern of user
US9824686B2 (en) * 2007-01-04 2017-11-21 Samsung Electronics Co., Ltd. Method and apparatus for speech recognition using device usage pattern of user
US11080758B2 (en) 2007-02-06 2021-08-03 Vb Assets, Llc System and method for delivering targeted advertisements and/or providing natural language processing based on advertisements
US9406078B2 (en) 2007-02-06 2016-08-02 Voicebox Technologies Corporation System and method for delivering targeted advertisements and/or providing natural language processing based on advertisements
US10134060B2 (en) 2007-02-06 2018-11-20 Vb Assets, Llc System and method for delivering targeted advertisements and/or providing natural language processing based on advertisements
US20100169098A1 (en) * 2007-05-17 2010-07-01 Kimberly Patch System and method of a list commands utility for a speech recognition command system
US8538757B2 (en) 2007-05-17 2013-09-17 Redstart Systems, Inc. System and method of a list commands utility for a speech recognition command system
US20080300886A1 (en) * 2007-05-17 2008-12-04 Kimberly Patch Systems and methods of a structured grammar for a speech recognition command system
US8150699B2 (en) 2007-05-17 2012-04-03 Redstart Systems, Inc. Systems and methods of a structured grammar for a speech recognition command system
US20080312929A1 (en) * 2007-06-12 2008-12-18 International Business Machines Corporation Using finite state grammars to vary output generated by a text-to-speech system
US11599332B1 (en) 2007-10-04 2023-03-07 Great Northern Research, LLC Multiple shell multi faceted graphical user interface
US9466293B1 (en) * 2007-10-04 2016-10-11 Samsung Electronics Co., Ltd. Speech interface system and method for control and interaction with applications on a computing system
US8165886B1 (en) * 2007-10-04 2012-04-24 Great Northern Research LLC Speech interface system and method for control and interaction with applications on a computing system
US8996375B1 (en) * 2007-10-04 2015-03-31 Great Northern Research, LLC Speech interface system and method for control and interaction with applications on a computing system
US9881606B2 (en) 2007-11-16 2018-01-30 Centurylink Intellectual Property Llc Command and control of devices and applications by voice using a communication base system
US10255918B2 (en) 2007-11-16 2019-04-09 Centurylink Intellectual Property Llc Command and control of devices and applications by voice using a communication base system
US20090132256A1 (en) * 2007-11-16 2009-05-21 Embarq Holdings Company, Llc Command and control of devices and applications by voice using a communication base system
US9881607B2 (en) 2007-11-16 2018-01-30 Centurylink Intellectual Property Llc Command and control of devices and applications by voice using a communication base system
US10482880B2 (en) 2007-11-16 2019-11-19 Centurylink Intellectual Property Llc Command and control of devices and applications by voice using a communication base system
US9514754B2 (en) 2007-11-16 2016-12-06 Centurylink Intellectual Property Llc Command and control of devices and applications by voice using a communication base system
US9026447B2 (en) * 2007-11-16 2015-05-05 Centurylink Intellectual Property Llc Command and control of devices and applications by voice using a communication base system
US10347248B2 (en) 2007-12-11 2019-07-09 Voicebox Technologies Corporation System and method for providing in-vehicle services via a natural language voice user interface
US9620113B2 (en) 2007-12-11 2017-04-11 Voicebox Technologies Corporation System and method for providing a natural language voice user interface
US20090234655A1 (en) * 2008-03-13 2009-09-17 Jason Kwon Mobile electronic device with active speech recognition
US10089984B2 (en) 2008-05-27 2018-10-02 Vb Assets, Llc System and method for an integrated, multi-modal, multi-device natural language voice services environment
US10553216B2 (en) 2008-05-27 2020-02-04 Oracle International Corporation System and method for an integrated, multi-modal, multi-device natural language voice services environment
US9711143B2 (en) 2008-05-27 2017-07-18 Voicebox Technologies Corporation System and method for an integrated, multi-modal, multi-device natural language voice services environment
US9953649B2 (en) 2009-02-20 2018-04-24 Voicebox Technologies Corporation System and method for processing multi-modal device interactions in a natural language voice services environment
US10553213B2 (en) 2009-02-20 2020-02-04 Oracle International Corporation System and method for processing multi-modal device interactions in a natural language voice services environment
US9570070B2 (en) 2009-02-20 2017-02-14 Voicebox Technologies Corporation System and method for processing multi-modal device interactions in a natural language voice services environment
US20130166290A1 (en) * 2011-12-26 2013-06-27 Denso Corporation Voice recognition apparatus
US9123327B2 (en) * 2011-12-26 2015-09-01 Denso Corporation Voice recognition apparatus for recognizing a command portion and a data portion of a voice input
US9009042B1 (en) * 2013-04-29 2015-04-14 Google Inc. Machine translation of indirect speech
US20140343950A1 (en) * 2013-05-15 2014-11-20 Maluuba Inc. Interactive user interface for an intelligent assistant
US9292254B2 (en) * 2013-05-15 2016-03-22 Maluuba Inc. Interactive user interface for an intelligent assistant
WO2015025330A1 (en) * 2013-08-21 2015-02-26 Kale Aaditya Kishore A system to enable user to interact with an electronic processing device using voice of the user
US10216725B2 (en) 2014-09-16 2019-02-26 Voicebox Technologies Corporation Integration of domain information into state transitions of a finite state transducer for natural language processing
US10430863B2 (en) 2014-09-16 2019-10-01 Vb Assets, Llc Voice commerce
US9626703B2 (en) 2014-09-16 2017-04-18 Voicebox Technologies Corporation Voice commerce
US11087385B2 (en) 2014-09-16 2021-08-10 Vb Assets, Llc Voice commerce
US9898459B2 (en) 2014-09-16 2018-02-20 Voicebox Technologies Corporation Integration of domain information into state transitions of a finite state transducer for natural language processing
US20160078773A1 (en) * 2014-09-17 2016-03-17 Voicebox Technologies Corporation System and method of providing task-based solicitation of request related user inputs
US9747896B2 (en) 2014-10-15 2017-08-29 Voicebox Technologies Corporation System and method for providing follow-up responses to prior natural language inputs of a user
US10229673B2 (en) 2014-10-15 2019-03-12 Voicebox Technologies Corporation System and method for providing follow-up responses to prior natural language inputs of a user
US10431214B2 (en) 2014-11-26 2019-10-01 Voicebox Technologies Corporation System and method of determining a domain and/or an action related to a natural language input
US10614799B2 (en) 2014-11-26 2020-04-07 Voicebox Technologies Corporation System and method of providing intent predictions for an utterance prior to a system detection of an end of the utterance
US11037568B2 (en) * 2016-03-29 2021-06-15 Alibaba Group Holding Limited Audio message processing method and apparatus
US20190027150A1 (en) * 2016-03-29 2019-01-24 Alibaba Group Holding Limited Audio message processing method and apparatus
US10331784B2 (en) 2016-07-29 2019-06-25 Voicebox Technologies Corporation System and method of disambiguating natural language processing requests
CN108010523A (en) * 2016-11-02 2018-05-08 松下电器(美国)知识产权公司 Information processing method and recording medium
CN108010523B (en) * 2016-11-02 2023-05-09 松下电器(美国)知识产权公司 Information processing method and recording medium
WO2018097969A1 (en) * 2016-11-22 2018-05-31 Knowles Electronics, Llc Methods and systems for locating the end of the keyword in voice sensing
CN110136702B (en) * 2018-02-09 2021-05-04 宏碁股份有限公司 Speech recognition system and method thereof
CN110136702A (en) * 2018-02-09 2019-08-16 宏碁股份有限公司 Speech recognition system and its method

Similar Documents

Publication Publication Date Title
US6871179B1 (en) Method and apparatus for executing voice commands having dictation as a parameter
US11739641B1 (en) Method for processing the output of a speech recognizer
US6327566B1 (en) Method and apparatus for correcting misinterpreted voice commands in a speech recognition system
US6374214B1 (en) Method and apparatus for excluding text phrases during re-dictation in a speech recognition system
US6910012B2 (en) Method and system for speech recognition using phonetically similar word alternatives
EP0785540B1 (en) Continuous speech recognition of text and commands
US6308151B1 (en) Method and system using a speech recognition system to dictate a body of text in response to an available body of text
US6356869B1 (en) Method and apparatus for discourse management
US6839669B1 (en) Performing actions identified in recognized speech
US6801897B2 (en) Method of providing concise forms of natural commands
US6173266B1 (en) System and method for developing interactive speech applications
US20020173955A1 (en) Method of speech recognition by presenting N-best word candidates
US6314397B1 (en) Method and apparatus for propagating corrections in speech recognition software
US6334102B1 (en) Method of adding vocabulary to a speech recognition system
US20020123892A1 (en) Detecting speech recognition errors in an embedded speech recognition system
US6745165B2 (en) Method and apparatus for recognizing from here to here voice command structures in a finite grammar speech recognition system
JP2000035795A (en) Enrollment of noninteractive system in voice recognition
US6591236B2 (en) Method and system for determining available and alternative speech commands
US6253177B1 (en) Method and system for automatically determining whether to update a language model based upon user amendments to dictated text
GB2409087A (en) Computer generated prompting
US20020123893A1 (en) Processing speech recognition errors in an embedded speech recognition system
US6345254B1 (en) Method and apparatus for improving speech command recognition accuracy using event-based constraints
US20140081642A1 (en) System and Method for Configuring Voice Synthesis
US6963834B2 (en) Method of speech recognition using empirically determined word candidates
US20060089834A1 (en) Verb error recovery in speech recognition

Legal Events

Date Code Title Description
AS Assignment

Owner name: INTERNATIONAL BUSINESS MACHINES CORPORATION, NEW Y

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:KIST, THOMAS A.;LEWIS BURN L.;LUCAS, BRUCE D.;REEL/FRAME:010102/0420;SIGNING DATES FROM 19990629 TO 19990701

FEPP Fee payment procedure

Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

FPAY Fee payment

Year of fee payment: 4

AS Assignment

Owner name: NUANCE COMMUNICATIONS, INC., MASSACHUSETTS

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:INTERNATIONAL BUSINESS MACHINES CORPORATION;REEL/FRAME:021924/0158

Effective date: 20080930

RR Request for reexamination filed

Effective date: 20101008

B1 Reexamination certificate first reexamination

Free format text: THE PATENTABILITY OF CLAIMS 1-10 IS CONFIRMED. NEW CLAIMS 11-18 ARE ADDED AND DETERMINED TO BE PATENTABLE.

FPAY Fee payment

Year of fee payment: 8

REMI Maintenance fee reminder mailed
LAPS Lapse for failure to pay maintenance fees
STCH Information on status: patent discontinuation

Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362

FP Lapsed due to failure to pay maintenance fee

Effective date: 20170322