US20050288934A1 - Multimodal input method - Google Patents

Multimodal input method Download PDF

Info

Publication number
US20050288934A1
US20050288934A1 US11/152,626 US15262605A US2005288934A1 US 20050288934 A1 US20050288934 A1 US 20050288934A1 US 15262605 A US15262605 A US 15262605A US 2005288934 A1 US2005288934 A1 US 2005288934A1
Authority
US
United States
Prior art keywords
input
inputs
recognition
multimodal
source
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US11/152,626
Other versions
US7630901B2 (en
Inventor
Hiromi Omi
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Canon Inc
Original Assignee
Canon Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Canon Inc filed Critical Canon Inc
Assigned to CANON KABUSHIKI KAISHA reassignment CANON KABUSHIKI KAISHA ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: OMI, HIROMI
Publication of US20050288934A1 publication Critical patent/US20050288934A1/en
Application granted granted Critical
Publication of US7630901B2 publication Critical patent/US7630901B2/en
Expired - Fee Related legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/03Arrangements for converting the position or the displacement of a member into a coded form
    • G06F3/033Pointing devices displaced or positioned by the user, e.g. mice, trackballs, pens or joysticks; Accessories therefor
    • G06F3/038Control and interface arrangements therefor, e.g. drivers or device-embedded control circuitry
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/24Speech recognition using non-acoustical features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2203/00Indexing scheme relating to G06F3/00 - G06F3/048
    • G06F2203/038Indexing scheme relating to G06F3/038
    • G06F2203/0381Multimodal input, i.e. interface arrangements enabling the user to issue commands by simultaneous use of input devices of different nature, e.g. voice plus gesture on digitizer
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/167Audio in a user interface, e.g. using voice commands for navigating, audio feedback

Definitions

  • the present invention relates to a technique for a multimodal user interface.
  • a multimodal user interface is convenient in that it provides the user with a plurality of input sources, such as graphical user interface (GUI) input and speech input, to enable the user to input information by a desired input source.
  • GUI graphical user interface
  • the multimodal user interface is particularly convenient when a plurality of input sources are simultaneously used. For example, by saying “Move this to here” while clicking on items corresponding to “this” and “here” on a GUI, the user can freely control the items even if he/she is unfamiliar with technical terms, such as commands. To allow such an operation to take place, inputs from a plurality of input sources need to be integrated.
  • Examples of methods for integrating inputs from a plurality of input sources include a method in which information related to the type and speed of a mouse event is used (see Japanese Patent Laid-Open No. 8-286887 and Japanese Patent Laid-Open No. 9-81364 (corresponding U.S. Pat. No. 5,781,179)), a method in which a linguistic analysis is performed on the result of speech recognition (see Japanese Patent No. 2993872), a method using context information (see Japanese Patent No. 3375449), a method in which some recognition results whose input times are close to each other are collected and output as a unit of semantic analysis (see Japanese Patent No. 3363283 (corresponding U.S. Pat. No.
  • Japanese Patent Laid-Open No. 7-110734 discloses a technique in which the integration is made pending the subsequent pointing input if the number of pointing inputs is smaller than that of referring expressions in speech input data. Similarly to the examples described above, this technique is based on the premise that the number of referring expressions in speech input data are accurately recognized. Moreover, there is no description of false recognition or a method for the reduction of false recognition rate. In the technique disclosed in Japanese Patent Laid-Open No. 7-110734, if the number of pointing inputs is larger than that of referring expressions in speech input data, error processing is executed and reentry of information is required. Since the reentry of information puts a burden on the user, a technique for preventing such problems needs to be developed.
  • the present invention has been made in view of the circumstances described above, and is directed to the improvement of accuracy in recognizing instructions indicated by input from at least two input sources.
  • the present invention provides a multimodal input method including a receiving step of receiving input information input from at least two input sources; a control step of controlling recognition of input information from a second input source based on the number of inputs from a first input source; and an output step of outputting a recognition result obtained under the control of the control step.
  • the present invention also provides a multimodal input apparatus including a receiving unit configured to receive information input from at least two input sources; a control unit configured to control recognition of input information from a second input source based on a number of inputs from a first input source; and an output unit configured to output a result of the recognition performed under the control of the control unit.
  • FIG. 1 shows the basic structure of an information processing system according to a first embodiment of the present invention.
  • FIG. 2 is a flowchart showing the processing of a multimodal input integrator in the information processing system according to the first embodiment of the present invention.
  • FIG. 3 shows inputs according to the first embodiment of the present invention.
  • FIG. 4 shows an example of multimodal input integration in the information processing system according to the first embodiment of the present invention.
  • FIG. 5 shows another example of multimodal input integration in the information processing system according to the first embodiment of the present invention.
  • FIG. 6 shows another example of multimodal input integration in the information processing system according to the first embodiment of the present invention.
  • FIG. 7 shows another example of multimodal input integration in the information processing system according to the first embodiment of the present invention.
  • FIG. 8 shows an example of multimodal input integration in the information processing system according to a second embodiment of the present invention.
  • FIG. 9 shows another example of multimodal input integration in the information processing system according to the second embodiment of the present invention.
  • FIG. 10 shows another example of multimodal input integration in the information processing system according to the second embodiment of the present invention.
  • FIG. 11 shows an example of multimodal input integration in the information processing system according to a fourth embodiment of the present invention.
  • FIG. 1 shows the basic structure of a system according to the first embodiment of the present invention.
  • the system includes a GUI input unit 101 , a GUI-input interpretation unit 102 , a speech input unit 103 , a speech recognition/interpretation unit 104 , a multimodal input integrator 105 , a storage unit 106 , a markup language parser 107 , a controller 108 , a speech synthesizer 109 , a display unit 110 , and a communication unit 111 .
  • Examples of the GUI input unit 101 include one or more of a group of buttons, a keyboard, a mouse, a touch panel, a pen, and a tablet to be used for giving instructions on the GUI.
  • the GUI input unit 101 serves as an input interface for the entry of various instructions into the system.
  • the system receives input information from these input devices.
  • the GUI-input interpretation unit 102 interprets the information entered in the GUI input unit 101 .
  • the speech input unit 103 includes a microphone and an analog-to-digital (A/D) converter, and is used to input the user's speech.
  • the speech recognition/interpretation unit 104 performs speech recognition, through the use of a known technique, on the speech information input in the speech input unit 103 .
  • the multimodal input integrator 105 integrates the information interpreted in the GUI-input interpretation unit 102 and the speech recognition/interpretation unit 104 .
  • the storage unit 106 is, for example, a hard disk drive unit for storing various kinds of information, and a storage medium, such as a compact disk-read-only memory (CD-ROM) and a digital versatile disk-ROM (DVD-ROM), for providing the system with various kinds of information.
  • a storage medium such as a compact disk-read-only memory (CD-ROM) and a digital versatile disk-ROM (DVD-ROM)
  • Various application programs, user-interface control programs, and data necessary for executing each program are stored in the hard disk drive unit and the storage medium, and are read, under the control of the controller 108 described below, into the system.
  • the markup language parser 107 parses documents written in a markup language.
  • the controller 108 is, for example, one or a combination of a work memory, a central processing unit (CPU), and a micro-processing unit (MPU), and reads programs and data stored in the storage unit 106 to execute various processing.
  • the controller 108 controls the GUI-input interpretation unit 102 , the speech recognition/interpretation unit 104 , and the multimodal input integrator 105 .
  • the speech synthesizer 109 includes a speaker, a headphone, and a digital-to-analog (D/A) converter.
  • the speech synthesizer 109 under the control of the controller 108 , generates audio data based on text to be read, converts the audio data from digital to analog, and outputs the converted data as sound.
  • a known speech synthesis technique is used here.
  • the display unit 110 is a display device, such as a liquid crystal display, and displays various information including images and text.
  • the display unit 110 may be a touch-screen display. In this case, the display unit 110 also serves as the GUI input unit 101 that performs the function of inputting various instructions into the system.
  • the communication unit 111 is a network interface allowing data communication with other devices via a network, such as the Internet or a LAN (local area network).
  • the “integration” variable indicates whether or not it is necessary to integrate with the result of input from the other input source. If integration is necessary, “1” is input in the GUI-input interpretation unit 102 and the speech recognition/interpretation unit 104 .
  • the speech input time T is, as shown in a graph 301 in FIG. 3 (vertical axis: power of speech input, horizontal axis: time), a time period during which power exceeding a threshold value “th” is detected. Alternatively, as shown in a graph 302 in FIG.
  • the speech input period T may be a time period set in the speech input unit 103 , such as a time period obtained by adding a predetermined time period (such as several seconds) to each end of a time period during which power exceeding a threshold value “th” is detected.
  • a predetermined time period such as several seconds
  • step S 211 If integration is not necessary (NO in step S 211 ), one is added to N (step S 212 ) and the process proceeds to step S 210 . If integration is necessary (YES in step S 211 ), it is determined whether or not the number of “?” is equal to NUM (in an example shown in FIG. 4 , a value of “unknown” in each of tables 401 to 403 ) (step S 213 ). If the number of “?” is not equal to NUM (NO in step S 213 ), one is added to N (step S 212 ) and the process proceeds to step S 210 .
  • NUM in an example shown in FIG. 4 , a value of “unknown” in each of tables 401 to 403
  • the Nth-ranked result of speech recognition and interpretation is integrated with the result of GUI interpretation and is output (step S 214 ). For example, a speech input “From here” is integrated with a GUI input “Ebisu” to produce a result “From Ebisu”. Processing then ends.
  • FIGS. 4 to 7 show the flow of interpretation of speech and button input.
  • a speech recognition/interpretation result 401 and a speech recognition/interpretation result 402 are results having the highest certainty and the second highest certainty, respectively.
  • a GUI-input interpretation result 403 shows the interpretation of button input, and only the result with the highest certainty is obtained in this example.
  • the results of interpretation in the GUI-input interpretation unit 102 and the speech recognition/interpretation unit 104 are input to the tables ( 401 to 403 ) described above and transmitted to the multimodal input integrator 105 . If the results in the tables are described in XML, the markup language parser 107 parses the results.
  • step S 214 the interpretation of speech input “From Tokyo to here” and the interpretation of button input “Ebisu” are integrated and outputted as “From Tokyo to Ebisu”.
  • “1” is output as the interpretation of button input
  • a speech recognition/interpretation result is selected based on the number of button inputs within a speech input time. Priorities can thus be assigned to candidates generated from the speech recognition, and the accuracy of the recognition can be improved. This increases the possibility that a correct recognition result can be output from a plurality of candidates, and saves the user from having to reenter information.
  • FIG. 8 shows the first-ranked speech recognition/interpretation result 801 , the second-ranked speech recognition/interpretation result 802 , the first-ranked GUI-input interpretation result 803 , and the second-ranked GUI-input interpretation result 804 .
  • step S 214 the interpretation of a speech input “Here” and the interpretation of a pen input “Ebisu” are integrated and outputted as “Ebisu”.
  • step S 214 the interpretation of a speech input “Around here” and the interpretation of a pen input “Ebisu” are integrated and outputted as “Ebisu”.
  • step S 214 the interpretation of the speech input “These” and the interpretation of the pen input “A, B” are integrated and outputted as “A and B”.
  • the accuracy of the interpretation of the result of speech recognition can be improved by selecting a speech recognition/interpretation result based on the number of GUI inputs within a speech input time.
  • the present invention is not limited to such cases and may be configured such that physical key input from, for example, a keyboard or a numeric keypad can be accepted.
  • An example described here is a copier that is operable with a numeric keypad and speech input. The description is based on the premise that instruction commands are assigned to respective keys on a numeric keypad on the copier as follows: Key 1: paper selection, Key 2: number of copies, Key 3: scaling, Key 4: density, Key 5: duplex printing, Key 6: sort, Key 7: staple sort, and Key 8: advanced mode.
  • the user wants to make 5 copies of 10 pages of an A5-sized document and staple each copy in the upper left corner, the user can make the settings by saying “A5” while holding down Key 1 (or by saying “A5” after clicking Key 1), saying “5 copies” while holding down Key 2 (or by saying “5 copies” after clicking Key 2), and saying “upper left” while holding down Key 7 (or by saying “upper left” after clicking Key 7).
  • saying “A5” while holding down Key 1 or by saying “A5” after clicking Key 1
  • saying “5 copies” while holding down Key 2 or by saying “5 copies” after clicking Key 2)
  • saying “upper left” while holding down Key 7 or by saying “upper left” after clicking Key 7
  • speech input and key input are integrated based on the number of key inputs.
  • the user says “A5, 5 copies, upper left” while holding down Keys 1, 2, and 7 (or after clicking Keys 1, 2 and 7).
  • the three keys may be pressed either simultaneously or sequentially.
  • the number of key inputs which is three, is compared with the number of speech inputs.
  • the ranking of speech recognition candidates is as follows: (1) “5 copies, upper left”, (2) “A5, 5 copies, upper left”, (3) “A5, 5 copies, upper left, darken” and (4) “A2, 50 copies, upper left”.
  • the recognition candidates are narrowed down to the first-ranked “A5, 5 copies, upper left” and the second-ranked “A2, 50 copies, upper left” by eliminating candidates, such as “5 copies, upper left” and “A5, 5 copies, upper left, darken” that do not match the number of key inputs, or by selecting candidates, such as “A5, 5 copies, upper left” and “A2, 50 copies, upper left” that match the number of key inputs.
  • candidates such as “A5, 5 copies, upper left” and “A2, 50 copies, upper left” that match the number of key inputs.
  • buttons buttons
  • the present invention is also applicable to the case where, if numbered menu items on a cell-phone screen can be selected with buttons, the user says, for example, “Show me the help information for this and this” while holding down buttons 3 and 5 .
  • an appropriate candidate is selected from the first to N-th ranked speech recognition/interpretation results based on the number of GUI inputs or physical key inputs.
  • the number of input sources from which input should be used to input information is not limited to the examples described above.
  • an appropriate candidate may be selected from the first to N-th ranked handwriting-input recognition results by using the number of inputs (value of “unknown” in the examples described above) obtained from speech recognition/interpretation results. Since button input generally has a higher certainty than that of speech input (recognition), the number of button inputs may be applied to the other modality.
  • a modality may be selected every time depending on the certainty. For example, a modality with a larger difference in certainty between the first-ranked candidate and the second-ranked candidate may be selected.
  • the number of speech inputs is correct. That is, because of its higher likelihood of the number of inputs, the number of speech inputs is trusted here.
  • the value of the likelihood of the number of inputs may be determined, as described above, based on the difference in certainty between the first-ranked candidate and the second-ranked candidate. It is also possible to determine that a modality in which the number of commonalities, in terms of the number of inputs, among the top-ranked recognition candidates is larger has the higher value of the likelihood of the number of inputs.
  • the likelihood value decreases if the number of inputs is different between them. It is also possible to determine such that, even if there are only small differences in certainty among the first to fourth-ranked candidates, the likelihood value increases if all the candidates have the same number of inputs.
  • the likelihood value may be determined using information, such as “button input generally has a higher certainty than that of speech input (recognition)”. In the example in FIG. 11 , since the number of inputs in the first-ranked candidate for speech input is one, the second-ranked candidate for GUI input is selected, as its number of inputs is one. Thus, “Here” and “Ebisu” are integrated to output the result “Ebisu”.
  • the present invention is not limited to such cases.
  • the number of inputs from both of these sources can be taken into consideration.
  • the user selects a paper size and “duplex printing” on the GUI and enters “10” by key input while saying “B5, single to double side copying, this number of copies”.
  • the number of speech inputs is three
  • the total number of GUI and key inputs is three.
  • the present invention is not limited to this.
  • the present invention is applicable to multimodal input that allows gesture input and gaze input.
  • the user selects an object by gaze input and gives Instructions to the object by gesture input.
  • the object is considered to be selected if, for example, being stared at for a long period of time.
  • a plurality of candidates for the object may be obtained if the recognition cannot be properly performed.
  • two instructions are given by gesture input, it is highly likely that the number of objects to be handled is two. Therefore, by eliminating gaze recognition candidates not having two objects to be handled, the accuracy in recognition can be improved.
  • a storage medium (or recording medium) on which a software program code for performing the functions of the above-described embodiments is recorded may be supplied to a system or apparatus such that a computer (or central processing unit (CPU) or micro-processing unit (MPU)), that is, the system or apparatus reads out and executes the program code stored in the storage medium.
  • a computer or central processing unit (CPU) or micro-processing unit (MPU)
  • CPU central processing unit
  • MPU micro-processing unit
  • the program code read out from the storage medium performs the functions of the above-described embodiments.
  • the storage medium for supplying the program code may be, for example, a flexible disk, a hard disk, an optical disk, a magneto-optical disk, a compact disk-read-only memory (CD-ROM), a CD-recordable (CD-R), a magnetic tape, a non-volatile memory card, and a ROM.
  • a flexible disk for example, a flexible disk, a hard disk, an optical disk, a magneto-optical disk, a compact disk-read-only memory (CD-ROM), a CD-recordable (CD-R), a magnetic tape, a non-volatile memory card, and a ROM.

Abstract

In a multimodal input method, input information input from at least two input sources is received, control of the recognition of input from a second input source is performed based on the number of inputs from a first input source, and a recognition result is output.

Description

    BACKGROUND OF THE INVENTION
  • 1. Field of the Invention
  • The present invention relates to a technique for a multimodal user interface.
  • 2. Description of the Related Art
  • A multimodal user interface is convenient in that it provides the user with a plurality of input sources, such as graphical user interface (GUI) input and speech input, to enable the user to input information by a desired input source. The multimodal user interface is particularly convenient when a plurality of input sources are simultaneously used. For example, by saying “Move this to here” while clicking on items corresponding to “this” and “here” on a GUI, the user can freely control the items even if he/she is unfamiliar with technical terms, such as commands. To allow such an operation to take place, inputs from a plurality of input sources need to be integrated.
  • Examples of methods for integrating inputs from a plurality of input sources include a method in which information related to the type and speed of a mouse event is used (see Japanese Patent Laid-Open No. 8-286887 and Japanese Patent Laid-Open No. 9-81364 (corresponding U.S. Pat. No. 5,781,179)), a method in which a linguistic analysis is performed on the result of speech recognition (see Japanese Patent No. 2993872), a method using context information (see Japanese Patent No. 3375449), a method in which some recognition results whose input times are close to each other are collected and output as a unit of semantic analysis (see Japanese Patent No. 3363283 (corresponding U.S. Pat. No. 5,884,249)), a method in which a delay in the result of the recognition of input data is taken into consideration (see Japanese Patent Laid-Open No. 10-198544), a method in which the intention of the user is detected by statistical learning (see Japanese Patent Laid-Open No. 11-288342 and Japanese Patent Laid-Open No. 2001-100878), a method using a method of grammatical analysis (see Japanese Patent Laid-Open No. 6-282569), a method in which a linguistic analysis is performed to use a semantic structure (see Japanese Patent Laid-Open No. 2000-231427), and a method in which pointing inputs from a pointing device, such as a mouse, are registered in a list, the number of referring expressions in speech input data is compared with the number of pointing inputs in the list, and if the number of the pointing inputs is smaller than that of the referring expressions in speech input data, the number of pointing inputs is adjusted by obtaining the subsequent pointing input, thus integrating a speech input and a pointing input (see Japanese Patent Laid-Open No. 7-110734).
  • In the known examples described above where the input time of each input or the order of inputs are taken into consideration, complex processing must be performed to analyze a plurality of candidates for an input result. Moreover, although the above-described examples are based on the premise that speech inputs can be accurately recognized, it is difficult, under the current speech recognition technologies, to achieve recognition with perfect accuracy. Therefore, solutions to the problem of false recognition are important. However, in the known examples described above, there is no description about a solution in case of false recognition or a method for the reduction of false recognition rate.
  • Japanese Patent Laid-Open No. 7-110734 discloses a technique in which the integration is made pending the subsequent pointing input if the number of pointing inputs is smaller than that of referring expressions in speech input data. Similarly to the examples described above, this technique is based on the premise that the number of referring expressions in speech input data are accurately recognized. Moreover, there is no description of false recognition or a method for the reduction of false recognition rate. In the technique disclosed in Japanese Patent Laid-Open No. 7-110734, if the number of pointing inputs is larger than that of referring expressions in speech input data, error processing is executed and reentry of information is required. Since the reentry of information puts a burden on the user, a technique for preventing such problems needs to be developed.
  • The present invention has been made in view of the circumstances described above, and is directed to the improvement of accuracy in recognizing instructions indicated by input from at least two input sources.
  • SUMMARY OF THE INVENTION
  • To solve the problems described above, the present invention provides a multimodal input method including a receiving step of receiving input information input from at least two input sources; a control step of controlling recognition of input information from a second input source based on the number of inputs from a first input source; and an output step of outputting a recognition result obtained under the control of the control step.
  • The present invention also provides a multimodal input apparatus including a receiving unit configured to receive information input from at least two input sources; a control unit configured to control recognition of input information from a second input source based on a number of inputs from a first input source; and an output unit configured to output a result of the recognition performed under the control of the control unit.
  • Further features of the present invention will become apparent from the following description of exemplary embodiments with reference to the attached drawings.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 shows the basic structure of an information processing system according to a first embodiment of the present invention.
  • FIG. 2 is a flowchart showing the processing of a multimodal input integrator in the information processing system according to the first embodiment of the present invention.
  • FIG. 3 shows inputs according to the first embodiment of the present invention.
  • FIG. 4 shows an example of multimodal input integration in the information processing system according to the first embodiment of the present invention.
  • FIG. 5 shows another example of multimodal input integration in the information processing system according to the first embodiment of the present invention.
  • FIG. 6 shows another example of multimodal input integration in the information processing system according to the first embodiment of the present invention.
  • FIG. 7 shows another example of multimodal input integration in the information processing system according to the first embodiment of the present invention.
  • FIG. 8 shows an example of multimodal input integration in the information processing system according to a second embodiment of the present invention.
  • FIG. 9 shows another example of multimodal input integration in the information processing system according to the second embodiment of the present invention.
  • FIG. 10 shows another example of multimodal input integration in the information processing system according to the second embodiment of the present invention.
  • FIG. 11 shows an example of multimodal input integration in the information processing system according to a fourth embodiment of the present invention.
  • DESCRIPTION OF THE EMBODIMENTS
  • Embodiments of a multimodal input method according to the present invention will now be described with reference to the drawings.
  • First Embodiment
  • FIG. 1 shows the basic structure of a system according to the first embodiment of the present invention. A system accepting speech input and GUI input will be described as an example. The system includes a GUI input unit 101, a GUI-input interpretation unit 102, a speech input unit 103, a speech recognition/interpretation unit 104, a multimodal input integrator 105, a storage unit 106, a markup language parser 107, a controller 108, a speech synthesizer 109, a display unit 110, and a communication unit 111.
  • Examples of the GUI input unit 101 include one or more of a group of buttons, a keyboard, a mouse, a touch panel, a pen, and a tablet to be used for giving instructions on the GUI. The GUI input unit 101 serves as an input interface for the entry of various instructions into the system. The system receives input information from these input devices. The GUI-input interpretation unit 102 interprets the information entered in the GUI input unit 101. A known technique, such as a handwriting recognition technique, is used in the interpretation.
  • The speech input unit 103 includes a microphone and an analog-to-digital (A/D) converter, and is used to input the user's speech. The speech recognition/interpretation unit 104 performs speech recognition, through the use of a known technique, on the speech information input in the speech input unit 103. The multimodal input integrator 105 integrates the information interpreted in the GUI-input interpretation unit 102 and the speech recognition/interpretation unit 104.
  • The storage unit 106 is, for example, a hard disk drive unit for storing various kinds of information, and a storage medium, such as a compact disk-read-only memory (CD-ROM) and a digital versatile disk-ROM (DVD-ROM), for providing the system with various kinds of information. Various application programs, user-interface control programs, and data necessary for executing each program are stored in the hard disk drive unit and the storage medium, and are read, under the control of the controller 108 described below, into the system. The markup language parser 107 parses documents written in a markup language. The controller 108 is, for example, one or a combination of a work memory, a central processing unit (CPU), and a micro-processing unit (MPU), and reads programs and data stored in the storage unit 106 to execute various processing. In addition, the controller 108 controls the GUI-input interpretation unit 102, the speech recognition/interpretation unit 104, and the multimodal input integrator 105.
  • The speech synthesizer 109 includes a speaker, a headphone, and a digital-to-analog (D/A) converter. The speech synthesizer 109, under the control of the controller 108, generates audio data based on text to be read, converts the audio data from digital to analog, and outputs the converted data as sound. A known speech synthesis technique is used here. The display unit 110 is a display device, such as a liquid crystal display, and displays various information including images and text. The display unit 110 may be a touch-screen display. In this case, the display unit 110 also serves as the GUI input unit 101 that performs the function of inputting various instructions into the system. The communication unit 111 is a network interface allowing data communication with other devices via a network, such as the Internet or a LAN (local area network).
  • An integration method in the multimodal input integrator 105 will now be described with reference to the flowchart in FIG. 2. Upon receipt of information interpreted in the GUI-input interpretation unit 102 and the speech recognition/interpretation unit 104, in other words, upon receipt of recognition candidates, the multimodal input integrator 105 outputs the result of interpretation showing “integration=0” in the GUI input (step S201). The “integration” variable indicates whether or not it is necessary to integrate with the result of input from the other input source. If integration is necessary, “1” is input in the GUI-input interpretation unit 102 and the speech recognition/interpretation unit 104. On the other hand, if integration is not necessary, “0” is input in the GUI-input interpretation unit 102 and the speech recognition/interpretation unit 104. The determination as to whether or not it is necessary to integrate with the result of input from another input source is made using a known technique, such as a technique based on whether or not the location for storing the value has been set.
  • If all the results of speech recognition and interpretation give “integration”=0 (YES in step S202), the first-ranked result of the speech recognition and interpretation is output (step S203) and the process is terminated. If there are any results that give “integration”=1 (NO in step S202), the number NUM of “integration”=1 in GUI inputs within a speech input time T is counted (step S204). The speech input time T is, as shown in a graph 301 in FIG. 3 (vertical axis: power of speech input, horizontal axis: time), a time period during which power exceeding a threshold value “th” is detected. Alternatively, as shown in a graph 302 in FIG. 3, the speech input period T may be a time period set in the speech input unit 103, such as a time period obtained by adding a predetermined time period (such as several seconds) to each end of a time period during which power exceeding a threshold value “th” is detected. In the graph 301 in FIG. 3, NUM=2 is obtained if all GUI inputs give “integration”=1.
  • If NUM=0 (YES in step S205), it is determined whether there are any results of speech recognition and interpretation that give “integration”=0 (step S206). If such results exist (YES in step S206), the result with the highest degree of certainty among the results of speech recognition and interpretation that give “integration”=0 is output (step S207) and the process is terminated. If there is no such result (NO in step S206), an error message indicating the failure of integration is output (step S208) and the process is terminated.
  • If NUM=0 is not given (NO in step S205), one is assigned to N (step S209) and the process proceeds to step S210. If there is no Nth-ranked (initially, the first-ranked) result of speech recognition and interpretation (NO in step S210), an error message indicating the failure of integration is output (step S208) and the process is terminated. If the Nth-ranked result exists (YES in step S210), the process proceeds to step S211. In step S211, it is determined whether or not the Nth-ranked result of speech recognition and interpretation gives “integration”=1 (that is, whether or not integration is necessary). If integration is not necessary (NO in step S211), one is added to N (step S212) and the process proceeds to step S210. If integration is necessary (YES in step S211), it is determined whether or not the number of “?” is equal to NUM (in an example shown in FIG. 4, a value of “unknown” in each of tables 401 to 403) (step S213). If the number of “?” is not equal to NUM (NO in step S213), one is added to N (step S212) and the process proceeds to step S210. If the number of “?” is equal to NUM (YES in step S213), the Nth-ranked result of speech recognition and interpretation is integrated with the result of GUI interpretation and is output (step S214). For example, a speech input “From here” is integrated with a GUI input “Ebisu” to produce a result “From Ebisu”. Processing then ends.
  • The following show examples of the above-described multimodal input integration with reference to the drawings. FIGS. 4 to 7 show the flow of interpretation of speech and button input.
  • A first example will now be described with reference to FIG. 4. A speech recognition/interpretation result 401 and a speech recognition/interpretation result 402 are results having the highest certainty and the second highest certainty, respectively. A GUI-input interpretation result 403 shows the interpretation of button input, and only the result with the highest certainty is obtained in this example.
  • Items in each of the results 401 to 403 are: “rank” indicating the ranking of certainty (“1”, “2”, . . . in descending order of certainty, i.e., “1” is the highest level of certainty), “unknown” indicating the number of uncertain values (“value=?” described below), “value” indicating an interpreted value, “time (start)” indicating the input start time, “time (end)” indicating the input end time, “score” indicating the degree of certainty, and “integration” indicating whether or not integration is necessary (“1”=necessary, “0”=not necessary). The results of interpretation in the GUI-input interpretation unit 102 and the speech recognition/interpretation unit 104 are input to the tables (401 to 403) described above and transmitted to the multimodal input integrator 105. If the results in the tables are described in XML, the markup language parser 107 parses the results.
  • The multimodal input integrator 105 performs processing according to the flow in FIG. 2. Since the GUI-input interpretation result 403 is obtained from the input during the speech input time T (02:10:00 to 02:12:00) and gives “integration”=1, NUM=1 is obtained in step S204 and N=1 is set in step S209. Since the first-ranked speech recognition/interpretation result 401 exists (YES in step S210), the process proceeds to step S211. Then, since it is determined that “integration”=1 (YES in step S211) and “unknown”=1=NUM (YES in step S213), the process proceeds to step S214. In step S214, the interpretation of a speech input “From Tokyo to here” and the interpretation of a button input “Ebisu” are integrated and outputted as “From Tokyo to Ebisu”.
  • Similarly, in an example shown in FIG. 5, GUI-input interpretation results 503 and 504 give NUM=2 (step S204). Since the first-ranked speech recognition/interpretation result 501 gives “unknown”=1, which is not equal to NUM=2 (NO in step S213), the second-ranked speech recognition/interpretation result 502 is subsequently examined. Then “unknown”=2=NUM (YES in step S213) is determined from the speech recognition/interpretation result 502. Thus, in step S214, the interpretation of speech input “From here to here” is integrated with the interpretation of button input “Ebisu” and “Yokohama” and outputted as “From Ebisu to Yokohama”.
  • In an example shown in FIG. 6, since a GUI-input interpretation result 604 gives “integration”=0, “1” is outputted as the interpretation of button input. Moreover, NUM=1 is determined from a GUI-input interpretation result 603 (step S204). Then “unknown”=1=NUM (YES in step S213) is determined from a speech recognition/interpretation result 601. Thus, in step S214, the interpretation of speech input “From Tokyo to here” and the interpretation of button input “Ebisu” are integrated and outputted as “From Tokyo to Ebisu”.
  • In an example shown in FIG. 7, since a GUI-input interpretation result 703 gives “integration”=0, “1” is output as the interpretation of button input, and NUM=0 is determined (step S204). Since speech recognition/interpretation results 701 and 702 do not give “integration”=0 (NO in step S206), an error message indicating the failure of integration is output (step S208) and the process is terminated.
  • As described above, in the first embodiment, a speech recognition/interpretation result is selected based on the number of button inputs within a speech input time. Priorities can thus be assigned to candidates generated from the speech recognition, and the accuracy of the recognition can be improved. This increases the possibility that a correct recognition result can be output from a plurality of candidates, and saves the user from having to reenter information.
  • Second Embodiment
  • An information processing system according to the second embodiment of the present invention will now be described. The examples of the first embodiment show the cases where the recognition rate is 100% since buttons are used for GUI input. However, in actual multimodal user interfaces, there are cases where the certainty of interpretation of each input to be integrated is not 100%. In such cases, similar processing to that in the first embodiment can be performed in order of certainty, beginning with the first-ranked interpretation result.
  • In the present embodiment, examples where speech input and pen input are performed will be described with reference to FIGS. 8 to 10. In FIG. 8, items in each of tables (801 to 804) are the same as those in FIGS. 4 to 7 described above. FIG. 8 shows the first-ranked speech recognition/interpretation result 801, the second-ranked speech recognition/interpretation result 802, the first-ranked GUI-input interpretation result 803, and the second-ranked GUI-input interpretation result 804.
  • Similar processing to that in the first embodiment will be performed in order of certainty, beginning with the first-ranked GUI-input interpretation result 803. The first-ranked GUI-input interpretation result 803 is obtained from the input during the speech input time T (02:10:00 to 02:12:00) and gives “integration”=1. Since the number of “value” in the GUI-input interpretation result 803 is one, NUM=1 is obtained in step S204 and N=1 is set in step S209. Since the first-ranked speech recognition/interpretation result 801 exists (YES in step S210), the process proceeds to step S211. Then, since it is determined that “integration”=1 (YES in step S211) and “unknown”=1=NUM (YES in step S213), the process proceeds to step S214. In step S214, the interpretation of a speech input “Here” and the interpretation of a pen input “Ebisu” are integrated and outputted as “Ebisu”.
  • In an example shown in FIG. 9, a GUI-input interpretation result 903 gives NUM=1 (step S204). The first-ranked speech recognition/interpretation result 901 gives “unknown”=1 or above that matches NUM (YES in step S213). Thus, in step S214, the interpretation of a speech input “Around here” and the interpretation of a pen input “Ebisu” are integrated and outputted as “Ebisu”.
  • FIG. 10 shows an example where both pen input and button input are performed as GUI input. Since a GUI-input interpretation result 1005 gives “integration”=0, “1” is outputted as the interpretation of the button input. The first-ranked GUI-input interpretation result 1003 gives NUM=1 (step S204). Since the first-ranked speech recognition/interpretation result 1001 gives “unknown”=2 or above that does not match NUM (NO in step S213), the second-ranked speech recognition/interpretation result 1002 is subsequently examined. Since the second-ranked speech recognition/interpretation result 1002 gives “unknown”=3 that does not match NUM (NO in step S213), the integration cannot be performed. Next, NUM=2 is obtained from the second-ranked GUI-input interpretation result 1004 (step S204). The first-ranked speech recognition/interpretation result 1001 gives “unknown”=2 or above that matches NUM (YES in step S213). Thus, in step S214, the interpretation of the speech input “These” and the interpretation of the pen input “A, B” are integrated and outputted as “A and B”.
  • As described above, in the second embodiment, even if the certainty of the interpretation of each input to be integrated is not 100%, the accuracy of the interpretation of the result of speech recognition can be improved by selecting a speech recognition/interpretation result based on the number of GUI inputs within a speech input time.
  • Third Embodiment
  • Although the examples of the above-described embodiments show the cases where GUI input is accepted, the present invention is not limited to such cases and may be configured such that physical key input from, for example, a keyboard or a numeric keypad can be accepted. An example described here is a copier that is operable with a numeric keypad and speech input. The description is based on the premise that instruction commands are assigned to respective keys on a numeric keypad on the copier as follows: Key 1: paper selection, Key 2: number of copies, Key 3: scaling, Key 4: density, Key 5: duplex printing, Key 6: sort, Key 7: staple sort, and Key 8: advanced mode.
  • If the user wants to make 5 copies of 10 pages of an A5-sized document and staple each copy in the upper left corner, the user can make the settings by saying “A5” while holding down Key 1 (or by saying “A5” after clicking Key 1), saying “5 copies” while holding down Key 2 (or by saying “5 copies” after clicking Key 2), and saying “upper left” while holding down Key 7 (or by saying “upper left” after clicking Key 7). To repeat the same operation, it is better for efficiency to say successively as “A5, 5 copies, upper left” rather than inputting them one by one. However, since the accuracy of speech recognition is not 100% under the current technology, recognition errors, such as “5 sheets, upper left” and “A5, 5 copies, upper left, darken” may frequently occur and generate incorrect candidates.
  • In the present embodiment, in such a case, speech input and key input are integrated based on the number of key inputs. The user says “A5, 5 copies, upper left” while holding down Keys 1, 2, and 7 (or after clicking Keys 1, 2 and 7). The three keys may be pressed either simultaneously or sequentially. Here, the number of key inputs, which is three, is compared with the number of speech inputs. For example, the ranking of speech recognition candidates is as follows: (1) “5 copies, upper left”, (2) “A5, 5 copies, upper left”, (3) “A5, 5 copies, upper left, darken” and (4) “A2, 50 copies, upper left”. The recognition candidates are narrowed down to the first-ranked “A5, 5 copies, upper left” and the second-ranked “A2, 50 copies, upper left” by eliminating candidates, such as “5 copies, upper left” and “A5, 5 copies, upper left, darken” that do not match the number of key inputs, or by selecting candidates, such as “A5, 5 copies, upper left” and “A2, 50 copies, upper left” that match the number of key inputs. Thus, a recognition candidate with the highest likelihood is selected as a recognition result, and “A5, 5 copies, upper left” that the user has said is correctly recognized.
  • Moreover, the present invention is also applicable to the case where, if numbered menu items on a cell-phone screen can be selected with buttons, the user says, for example, “Show me the help information for this and this” while holding down buttons 3 and 5.
  • Fourth Embodiment
  • The example of the above-described embodiments show the cases where an appropriate candidate is selected from the first to N-th ranked speech recognition/interpretation results based on the number of GUI inputs or physical key inputs. However, the number of input sources from which input should be used to input information is not limited to the examples described above. For example, an appropriate candidate may be selected from the first to N-th ranked handwriting-input recognition results by using the number of inputs (value of “unknown” in the examples described above) obtained from speech recognition/interpretation results. Since button input generally has a higher certainty than that of speech input (recognition), the number of button inputs may be applied to the other modality. However, if every input source has a certain degree of ambiguity, it is not appropriate to simply determine that the number of inputs from which input is to be applied to input information from which input source. In such a case, a modality may be selected every time depending on the certainty. For example, a modality with a larger difference in certainty between the first-ranked candidate and the second-ranked candidate may be selected.
  • Such an example will now be described with reference to FIG. 11. FIG. 11 shows an example where the user has circled “Ebisu” with a pen while saying “Here”. Although the user intended to circle only “Ebisu”, a part of the circle is placed over “Shibuya”. According to the interpretation of the speech input, the first-ranked “Here” and the second-ranked “Here and here” have “score”=90 and “score”=55, respectively. As for the interpretation of the pen input, the first-ranked “Shibuya, Ebisu” and the second-ranked “Ebisu” have “score”=95 and “score”=90, respectively. Since the difference in certainty between the first-ranked candidate and the second-ranked candidate is larger in the case of the speech input than that in the case of the pen input, and thus it is highly probable that the first-ranked candidate will be the correct one, it is highly probable that the number of speech inputs is correct. That is, because of its higher likelihood of the number of inputs, the number of speech inputs is trusted here. The value of the likelihood of the number of inputs may be determined, as described above, based on the difference in certainty between the first-ranked candidate and the second-ranked candidate. It is also possible to determine that a modality in which the number of commonalities, in terms of the number of inputs, among the top-ranked recognition candidates is larger has the higher value of the likelihood of the number of inputs. For example, even if there is a large difference in certainty between the first-ranked candidate and the second-ranked candidate, the likelihood value decreases if the number of inputs is different between them. It is also possible to determine such that, even if there are only small differences in certainty among the first to fourth-ranked candidates, the likelihood value increases if all the candidates have the same number of inputs. As described above, the likelihood value may be determined using information, such as “button input generally has a higher certainty than that of speech input (recognition)”. In the example in FIG. 11, since the number of inputs in the first-ranked candidate for speech input is one, the second-ranked candidate for GUI input is selected, as its number of inputs is one. Thus, “Here” and “Ebisu” are integrated to output the result “Ebisu”.
  • Fifth Embodiment
  • Although the above-described embodiments show the cases in which the number of inputs is obtained from one of input source, the present invention is not limited to such cases. For example, in a copier that allows both GUI input and key input, the number of inputs from both of these sources can be taken into consideration. For example, the user selects a paper size and “duplex printing” on the GUI and enters “10” by key input while saying “B5, single to double side copying, this number of copies”. In this case, the number of speech inputs is three, and the total number of GUI and key inputs is three. By eliminating speech recognition candidates having the number of inputs that is not equal to three, the recognition accuracy can be improved.
  • Sixth Embodiment
  • Although the examples of the above-described embodiments show the integration of speech input with other input sources, the present invention is not limited to this. For example, the present invention is applicable to multimodal input that allows gesture input and gaze input. For example, it is assumed that the user selects an object by gaze input and gives Instructions to the object by gesture input. The object is considered to be selected if, for example, being stared at for a long period of time. However, a plurality of candidates for the object may be obtained if the recognition cannot be properly performed. In such a case, if two instructions are given by gesture input, it is highly likely that the number of objects to be handled is two. Therefore, by eliminating gaze recognition candidates not having two objects to be handled, the accuracy in recognition can be improved.
  • In the present invention, a storage medium (or recording medium) on which a software program code for performing the functions of the above-described embodiments is recorded may be supplied to a system or apparatus such that a computer (or central processing unit (CPU) or micro-processing unit (MPU)), that is, the system or apparatus reads out and executes the program code stored in the storage medium.
  • In this case, the program code read out from the storage medium performs the functions of the above-described embodiments.
  • The storage medium for supplying the program code may be, for example, a flexible disk, a hard disk, an optical disk, a magneto-optical disk, a compact disk-read-only memory (CD-ROM), a CD-recordable (CD-R), a magnetic tape, a non-volatile memory card, and a ROM.
  • The functions of the above-described embodiments can be performed not only by a computer reading and executing the program code, but also by an operating system (OS) running on the computer and executing a whole or part of the actual processing in accordance with instructions of the program code.
  • The functions of the above-described embodiments can also be performed, after the program code read out from the storage medium is written in a memory of a function expansion board in a computer or is written in a memory of an expansion unit connected to a computer, by a CPU or the like, which is included in the function expansion board or expansion unit, executing a whole or part of the actual processing in accordance with instructions of the program code.
  • While the present invention has been described with reference to exemplary embodiments, it is to be understood that the invention is not limited to the disclosed embodiments. The scope of the following claims is to be accorded the broadest interpretation so as to encompass all modifications, equivalent structures and functions.
  • This application claims priority from Japanese Patent Application No. 2004-191632 filed Jun. 29, 2004, and Japanese Patent Application No. 2004-379948 filed Dec. 28, 2004, which are hereby incorporated by reference herein in their entirety.

Claims (11)

1. A multimodal input method comprising:
a receiving step of receiving input information input from at least two input sources;
a control step of controlling recognition of input information from a second input source based on the number of inputs from a first input source; and
an output step of outputting a recognition result obtained under control of the control step.
2. The multimodal input method according to claim 1, wherein the control step produces the recognition result by eliminating a recognition candidate for input from the second input source if the number of inputs included in the recognition candidate is not equal to the number of inputs from the first input source.
3. The multimodal input method according to claim 1, wherein the control step produces the recognition result from a recognition candidate for input from the second input source if the number of inputs included in the recognition candidate is equal to the number of inputs from the first input source.
4. The multimodal input method according to claim 1, wherein the number of inputs is a quantity of inputs or a frequency of input from each input source.
5. The multimodal input method according to claim 1, further comprising:
a determining step of determining whether input from the first input source can be counted in the number of inputs,
wherein the control step eliminates an invalid input from the number of inputs based on a determination in the determining step.
6. The multimodal input method according to claim 1, further comprising:
an obtaining step of obtaining a likelihood value for the number of inputs included in each input information input from each input source, the input information being received in the receiving step,
wherein the control step selects one of the input sources having a higher likelihood value as the first input source, the value being obtained in the obtaining step.
7. The multimodal input method according to claim 6, wherein the obtaining step determines at least a top two recognition candidates for the input information from each input source, and determines the likelihood value for the number of inputs such that the input source with a larger difference in certainty between the top two recognition candidates has a higher likelihood value.
8. The multimodal input method according to claim 1, wherein the first input source includes at least one of speech input, graphical user interface input, touch input, key input, gesture input, and gaze input.
9. The multimodal input method according to claim 1, wherein the second input source includes at least one of speech input, GUI input, touch input, key input, gesture input, and gaze input.
10. A control program for causing a computer to execute a multimodal input method comprising:
a receiving step of receiving input information input from at least two input sources;
a control step of controlling recognition of input information from a second input source based on a number of inputs from a first input source; and
an output step of outputting a recognition result obtained under the control of the control step.
11. A multimodal input apparatus comprising:
a receiving unit configured to receive information input from at least two input sources;
a control unit configured to control recognition of input information from a second input source based on the number of inputs from a first input source; and
an output unit configured to output a result of the recognition performed under the control of the control unit.
US11/152,626 2004-06-29 2005-06-14 Multimodal input method Expired - Fee Related US7630901B2 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
JP2004191632 2004-06-29
JP2004-191632 2004-06-29
JP2004-379948 2004-12-28
JP2004379948A JP4416643B2 (en) 2004-06-29 2004-12-28 Multimodal input method

Publications (2)

Publication Number Publication Date
US20050288934A1 true US20050288934A1 (en) 2005-12-29
US7630901B2 US7630901B2 (en) 2009-12-08

Family

ID=35507168

Family Applications (1)

Application Number Title Priority Date Filing Date
US11/152,626 Expired - Fee Related US7630901B2 (en) 2004-06-29 2005-06-14 Multimodal input method

Country Status (2)

Country Link
US (1) US7630901B2 (en)
JP (1) JP4416643B2 (en)

Cited By (138)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080313607A1 (en) * 2007-06-15 2008-12-18 Microsoft Corporation Unified input stack
US20090089251A1 (en) * 2007-10-02 2009-04-02 Michael James Johnston Multimodal interface for searching multimedia content
US20100083186A1 (en) * 2008-09-26 2010-04-01 Microsoft Corporation Magnifier panning interface for natural input devices
US20100079498A1 (en) * 2008-09-26 2010-04-01 Microsoft Corporation Multi-modal interaction for a screen magnifier
US20100241431A1 (en) * 2009-03-18 2010-09-23 Robert Bosch Gmbh System and Method for Multi-Modal Input Synchronization and Disambiguation
US7877500B2 (en) 2002-09-30 2011-01-25 Avaya Inc. Packet prioritization and associated bandwidth and buffer management techniques for audio over IP
US7978827B1 (en) 2004-06-30 2011-07-12 Avaya Inc. Automatic configuration of call handling based on end-user needs and characteristics
US20120078627A1 (en) * 2010-09-27 2012-03-29 Wagner Oliver P Electronic device with text error correction based on voice recognition data
US8218751B2 (en) 2008-09-29 2012-07-10 Avaya Inc. Method and apparatus for identifying and eliminating the source of background noise in multi-party teleconferences
US20130144629A1 (en) * 2011-12-01 2013-06-06 At&T Intellectual Property I, L.P. System and method for continuous multimodal speech and gesture interaction
US8593959B2 (en) 2002-09-30 2013-11-26 Avaya Inc. VoIP endpoint call admission
WO2014015201A1 (en) * 2012-07-20 2014-01-23 Microsoft Corporation Speech and gesture recognition enhancement
WO2014070872A3 (en) * 2012-10-30 2014-06-26 Robert Bosch Gmbh System and method for multimodal interaction with reduced distraction in operating vehicles
US8892446B2 (en) 2010-01-18 2014-11-18 Apple Inc. Service orchestration for intelligent automated assistant
US20150187354A1 (en) * 2012-08-20 2015-07-02 Lg Innotek Co., Ltd. Voice recognition apparatus and method of recognizing voice
EP2942698A1 (en) * 2013-01-31 2015-11-11 Huawei Technologies Co., Ltd. Non-contact gesture control method, and electronic terminal device
US9190062B2 (en) 2010-02-25 2015-11-17 Apple Inc. User profiling for voice input processing
US20150339098A1 (en) * 2014-05-21 2015-11-26 Samsung Electronics Co., Ltd. Display apparatus, remote control apparatus, system and controlling method thereof
US9262612B2 (en) 2011-03-21 2016-02-16 Apple Inc. Device access using voice authentication
US9300784B2 (en) 2013-06-13 2016-03-29 Apple Inc. System and method for emergency calls initiated by voice command
US9330720B2 (en) 2008-01-03 2016-05-03 Apple Inc. Methods and apparatus for altering audio output signals
US9338493B2 (en) 2014-06-30 2016-05-10 Apple Inc. Intelligent automated assistant for TV user interactions
US9368114B2 (en) 2013-03-14 2016-06-14 Apple Inc. Context-sensitive handling of interruptions
US9430463B2 (en) 2014-05-30 2016-08-30 Apple Inc. Exemplar-based natural language processing
US20160283453A1 (en) * 2015-03-26 2016-09-29 Lenovo (Singapore) Pte. Ltd. Text correction using a second input
US9483461B2 (en) 2012-03-06 2016-11-01 Apple Inc. Handling speech synthesis of content for multiple languages
US9495129B2 (en) 2012-06-29 2016-11-15 Apple Inc. Device, method, and user interface for voice-activated navigation and browsing of a document
US9502031B2 (en) 2014-05-27 2016-11-22 Apple Inc. Method for supporting dynamic grammars in WFST-based ASR
US9535906B2 (en) 2008-07-31 2017-01-03 Apple Inc. Mobile device having human language translation capability with positional feedback
US9576574B2 (en) 2012-09-10 2017-02-21 Apple Inc. Context-sensitive handling of interruptions by intelligent digital assistant
US9582608B2 (en) 2013-06-07 2017-02-28 Apple Inc. Unified ranking with entropy-weighted information for phrase-based semantic auto-completion
EP3011751A4 (en) * 2013-07-02 2017-03-01 Samsung Electronics Co., Ltd. Server, control method thereof, image processing apparatus, and control method thereof
US9620104B2 (en) 2013-06-07 2017-04-11 Apple Inc. System and method for user-specified pronunciation of words for speech synthesis and recognition
US9620105B2 (en) 2014-05-15 2017-04-11 Apple Inc. Analyzing audio input for efficient speech and music recognition
US9626955B2 (en) 2008-04-05 2017-04-18 Apple Inc. Intelligent text-to-speech conversion
US9633674B2 (en) 2013-06-07 2017-04-25 Apple Inc. System and method for detecting errors in interactions with a voice-based digital assistant
US9633004B2 (en) 2014-05-30 2017-04-25 Apple Inc. Better resolution when referencing to concepts
US9646609B2 (en) 2014-09-30 2017-05-09 Apple Inc. Caching apparatus for serving phonetic pronunciations
US9646614B2 (en) 2000-03-16 2017-05-09 Apple Inc. Fast, language-independent method for user authentication by voice
US9668121B2 (en) 2014-09-30 2017-05-30 Apple Inc. Social reminders
US9697820B2 (en) 2015-09-24 2017-07-04 Apple Inc. Unit-selection text-to-speech synthesis using concatenation-sensitive neural networks
US9697822B1 (en) 2013-03-15 2017-07-04 Apple Inc. System and method for updating an adaptive speech recognition model
US9711141B2 (en) 2014-12-09 2017-07-18 Apple Inc. Disambiguating heteronyms in speech synthesis
US9715875B2 (en) 2014-05-30 2017-07-25 Apple Inc. Reducing the need for manual start/end-pointing and trigger phrases
US9721566B2 (en) 2015-03-08 2017-08-01 Apple Inc. Competing devices responding to voice triggers
US9734193B2 (en) 2014-05-30 2017-08-15 Apple Inc. Determining domain salience ranking from ambiguous words in natural speech
US9760559B2 (en) 2014-05-30 2017-09-12 Apple Inc. Predictive text input
US9766715B2 (en) 2014-05-01 2017-09-19 Seiko Epson Corporation Head-mount type display device, control system, method of controlling head-mount type display device, and computer program
US9785630B2 (en) 2014-05-30 2017-10-10 Apple Inc. Text prediction using combined word N-gram and unigram language models
US9798393B2 (en) 2011-08-29 2017-10-24 Apple Inc. Text correction processing
US9818400B2 (en) 2014-09-11 2017-11-14 Apple Inc. Method and apparatus for discovering trending terms in speech requests
US9842105B2 (en) 2015-04-16 2017-12-12 Apple Inc. Parsimonious continuous-space phrase representations for natural language processing
US9842101B2 (en) 2014-05-30 2017-12-12 Apple Inc. Predictive conversion of language input
US9858925B2 (en) 2009-06-05 2018-01-02 Apple Inc. Using context information to facilitate processing of commands in a virtual assistant
US9865280B2 (en) 2015-03-06 2018-01-09 Apple Inc. Structured dictation using intelligent automated assistants
US9886432B2 (en) 2014-09-30 2018-02-06 Apple Inc. Parsimonious handling of word inflection via categorical stem + suffix N-gram language models
US9886953B2 (en) 2015-03-08 2018-02-06 Apple Inc. Virtual assistant activation
US9899019B2 (en) 2015-03-18 2018-02-20 Apple Inc. Systems and methods for structured stem and suffix language models
US9922642B2 (en) 2013-03-15 2018-03-20 Apple Inc. Training an at least partial voice command system
US9934775B2 (en) 2016-05-26 2018-04-03 Apple Inc. Unit-selection text-to-speech synthesis based on predicted concatenation parameters
US9953088B2 (en) 2012-05-14 2018-04-24 Apple Inc. Crowd sourcing information to fulfill user requests
US9959870B2 (en) 2008-12-11 2018-05-01 Apple Inc. Speech recognition involving a mobile device
US9966068B2 (en) 2013-06-08 2018-05-08 Apple Inc. Interpreting and acting upon commands that involve sharing information with remote devices
US9966065B2 (en) 2014-05-30 2018-05-08 Apple Inc. Multi-command single utterance input method
US9972304B2 (en) 2016-06-03 2018-05-15 Apple Inc. Privacy preserving distributed evaluation framework for embedded personalized systems
US9971774B2 (en) 2012-09-19 2018-05-15 Apple Inc. Voice-based media searching
US10043516B2 (en) 2016-09-23 2018-08-07 Apple Inc. Intelligent automated assistant
US10049668B2 (en) 2015-12-02 2018-08-14 Apple Inc. Applying neural network language models to weighted finite state transducers for automatic speech recognition
US10049663B2 (en) 2016-06-08 2018-08-14 Apple, Inc. Intelligent automated assistant for media exploration
US10057736B2 (en) 2011-06-03 2018-08-21 Apple Inc. Active transport based notifications
US10067938B2 (en) 2016-06-10 2018-09-04 Apple Inc. Multilingual word prediction
US10074360B2 (en) 2014-09-30 2018-09-11 Apple Inc. Providing an indication of the suitability of speech recognition
US10079014B2 (en) 2012-06-08 2018-09-18 Apple Inc. Name recognition system
US10078631B2 (en) 2014-05-30 2018-09-18 Apple Inc. Entropy-guided text prediction using combined word and character n-gram language models
US10083688B2 (en) 2015-05-27 2018-09-25 Apple Inc. Device voice control for selecting a displayed affordance
US10089072B2 (en) 2016-06-11 2018-10-02 Apple Inc. Intelligent device arbitration and control
US10101822B2 (en) 2015-06-05 2018-10-16 Apple Inc. Language input correction
US10127911B2 (en) 2014-09-30 2018-11-13 Apple Inc. Speaker identification and unsupervised speaker adaptation techniques
US10127220B2 (en) 2015-06-04 2018-11-13 Apple Inc. Language identification from short strings
US10134385B2 (en) 2012-03-02 2018-11-20 Apple Inc. Systems and methods for name pronunciation
US10170123B2 (en) 2014-05-30 2019-01-01 Apple Inc. Intelligent assistant for home automation
US10176167B2 (en) 2013-06-09 2019-01-08 Apple Inc. System and method for inferring user intent from speech inputs
US10186254B2 (en) 2015-06-07 2019-01-22 Apple Inc. Context-based endpoint detection
US10185542B2 (en) 2013-06-09 2019-01-22 Apple Inc. Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant
US20190026266A1 (en) * 2016-07-28 2019-01-24 Panasonic Intellectual Property Management Co., Ltd. Translation device and translation system
US10192552B2 (en) 2016-06-10 2019-01-29 Apple Inc. Digital assistant providing whispered speech
US10199051B2 (en) 2013-02-07 2019-02-05 Apple Inc. Voice trigger for a digital assistant
US10223066B2 (en) 2015-12-23 2019-03-05 Apple Inc. Proactive assistance based on dialog communication between devices
US10241644B2 (en) 2011-06-03 2019-03-26 Apple Inc. Actionable reminder entries
US10241753B2 (en) 2014-06-20 2019-03-26 Interdigital Ce Patent Holdings Apparatus and method for controlling the apparatus by a user
US10241752B2 (en) 2011-09-30 2019-03-26 Apple Inc. Interface for a virtual digital assistant
US10249300B2 (en) 2016-06-06 2019-04-02 Apple Inc. Intelligent list reading
US10255907B2 (en) 2015-06-07 2019-04-09 Apple Inc. Automatic accent detection using acoustic models
US10269345B2 (en) 2016-06-11 2019-04-23 Apple Inc. Intelligent task discovery
US10276170B2 (en) 2010-01-18 2019-04-30 Apple Inc. Intelligent automated assistant
US10283110B2 (en) 2009-07-02 2019-05-07 Apple Inc. Methods and apparatuses for automatic speech recognition
US10289433B2 (en) 2014-05-30 2019-05-14 Apple Inc. Domain specific language for encoding assistant dialog
US10297253B2 (en) 2016-06-11 2019-05-21 Apple Inc. Application integration with a digital assistant
US10318871B2 (en) 2005-09-08 2019-06-11 Apple Inc. Method and apparatus for building an intelligent automated assistant
US10356243B2 (en) 2015-06-05 2019-07-16 Apple Inc. Virtual assistant aided communication with 3rd party service in a communication session
US10354011B2 (en) 2016-06-09 2019-07-16 Apple Inc. Intelligent automated assistant in a home environment
US10366158B2 (en) 2015-09-29 2019-07-30 Apple Inc. Efficient word encoding for recurrent neural network language models
US10410637B2 (en) 2017-05-12 2019-09-10 Apple Inc. User-specific acoustic models
US10446141B2 (en) 2014-08-28 2019-10-15 Apple Inc. Automatic speech recognition based on user feedback
US10446143B2 (en) 2016-03-14 2019-10-15 Apple Inc. Identification of voice inputs providing credentials
US10482874B2 (en) 2017-05-15 2019-11-19 Apple Inc. Hierarchical belief states for digital assistants
US10490187B2 (en) 2016-06-10 2019-11-26 Apple Inc. Digital assistant providing automated status report
US10496753B2 (en) 2010-01-18 2019-12-03 Apple Inc. Automatically adapting user interfaces for hands-free interaction
US10509862B2 (en) 2016-06-10 2019-12-17 Apple Inc. Dynamic phrase expansion of language input
US10521466B2 (en) 2016-06-11 2019-12-31 Apple Inc. Data driven natural language event detection and classification
US10553209B2 (en) 2010-01-18 2020-02-04 Apple Inc. Systems and methods for hands-free notification summaries
US10552013B2 (en) 2014-12-02 2020-02-04 Apple Inc. Data detection
US10568032B2 (en) 2007-04-03 2020-02-18 Apple Inc. Method and system for operating a multi-function portable electronic device using voice-activation
US10567477B2 (en) 2015-03-08 2020-02-18 Apple Inc. Virtual assistant continuity
US10592095B2 (en) 2014-05-23 2020-03-17 Apple Inc. Instantaneous speaking of content on touch devices
US10593346B2 (en) 2016-12-22 2020-03-17 Apple Inc. Rank-reduced token representation for automatic speech recognition
US10652394B2 (en) 2013-03-14 2020-05-12 Apple Inc. System and method for processing voicemail
US10659851B2 (en) 2014-06-30 2020-05-19 Apple Inc. Real-time digital assistant knowledge updates
US10671428B2 (en) 2015-09-08 2020-06-02 Apple Inc. Distributed personal assistant
US10679605B2 (en) 2010-01-18 2020-06-09 Apple Inc. Hands-free list-reading by intelligent automated assistant
US10691473B2 (en) 2015-11-06 2020-06-23 Apple Inc. Intelligent automated assistant in a messaging environment
US10706373B2 (en) 2011-06-03 2020-07-07 Apple Inc. Performing actions associated with task items that represent tasks to perform
US10705794B2 (en) 2010-01-18 2020-07-07 Apple Inc. Automatically adapting user interfaces for hands-free interaction
US10733993B2 (en) 2016-06-10 2020-08-04 Apple Inc. Intelligent digital assistant in a multi-tasking environment
US10747498B2 (en) 2015-09-08 2020-08-18 Apple Inc. Zero latency digital assistant
US10755703B2 (en) 2017-05-11 2020-08-25 Apple Inc. Offline personal assistant
US10762293B2 (en) 2010-12-22 2020-09-01 Apple Inc. Using parts-of-speech tagging and named entity recognition for spelling correction
US10791216B2 (en) 2013-08-06 2020-09-29 Apple Inc. Auto-activating smart responses based on activities from remote devices
US10791176B2 (en) 2017-05-12 2020-09-29 Apple Inc. Synchronization and task delegation of a digital assistant
US10789041B2 (en) 2014-09-12 2020-09-29 Apple Inc. Dynamic thresholds for always listening speech trigger
US10810274B2 (en) 2017-05-15 2020-10-20 Apple Inc. Optimizing dialogue policy decisions for digital assistants using implicit feedback
US11010550B2 (en) 2015-09-29 2021-05-18 Apple Inc. Unified language modeling framework for word prediction, auto-completion and auto-correction
US11025565B2 (en) 2015-06-07 2021-06-01 Apple Inc. Personalized prediction of responses for instant messaging
US11189270B2 (en) 2018-06-26 2021-11-30 Hitachi, Ltd. Method of controlling dialogue system, dialogue system, and data storage medium
US11209970B2 (en) 2018-10-30 2021-12-28 Banma Zhixing Network (Hongkong) Co., Limited Method, device, and system for providing an interface based on an interaction with a terminal
US11217255B2 (en) 2017-05-16 2022-01-04 Apple Inc. Far-field extension for digital assistant services
US11487347B1 (en) * 2008-11-10 2022-11-01 Verint Americas Inc. Enhanced multi-modal communication
US11587559B2 (en) 2015-09-30 2023-02-21 Apple Inc. Intelligent device identification

Families Citing this family (61)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008065789A (en) * 2006-09-11 2008-03-21 Canon Inc Input operation support device and control method
US10002189B2 (en) 2007-12-20 2018-06-19 Apple Inc. Method and apparatus for searching using an active ontology
US8676904B2 (en) 2008-10-02 2014-03-18 Apple Inc. Electronic devices with voice command and contextual data processing capabilities
US8510117B2 (en) * 2009-07-09 2013-08-13 Nuance Communications, Inc. Speech enabled media sharing in a multimodal application
WO2011089450A2 (en) 2010-01-25 2011-07-28 Andrew Peter Nelson Jerram Apparatuses, methods and systems for a digital conversation management platform
US10417037B2 (en) 2012-05-15 2019-09-17 Apple Inc. Systems and methods for integrating third party services with a digital assistant
US9696547B2 (en) 2012-06-25 2017-07-04 Microsoft Technology Licensing, Llc Mixed reality system learned input and functions
US10296160B2 (en) 2013-12-06 2019-05-21 Apple Inc. Method for extracting salient dialog usage from live data
US10276158B2 (en) 2014-10-31 2019-04-30 At&T Intellectual Property I, L.P. System and method for initiating multi-modal speech recognition using a long-touch gesture
US10152299B2 (en) 2015-03-06 2018-12-11 Apple Inc. Reducing response latency of intelligent automated assistants
US10460227B2 (en) 2015-05-15 2019-10-29 Apple Inc. Virtual assistant in a communication session
US20160378747A1 (en) 2015-06-29 2016-12-29 Apple Inc. Virtual assistant for media playback
US11227589B2 (en) 2016-06-06 2022-01-18 Apple Inc. Intelligent list reading
US10474753B2 (en) 2016-09-07 2019-11-12 Apple Inc. Language identification using recurrent neural networks
US11281993B2 (en) 2016-12-05 2022-03-22 Apple Inc. Model and ensemble compression for metric learning
US11204787B2 (en) 2017-01-09 2021-12-21 Apple Inc. Application integration with a digital assistant
DK201770383A1 (en) 2017-05-09 2018-12-14 Apple Inc. User interface for correcting recognition errors
US10417266B2 (en) 2017-05-09 2019-09-17 Apple Inc. Context-aware ranking of intelligent response suggestions
US10726832B2 (en) 2017-05-11 2020-07-28 Apple Inc. Maintaining privacy of personal information
US10395654B2 (en) 2017-05-11 2019-08-27 Apple Inc. Text normalization based on a data-driven learning network
DK201770427A1 (en) 2017-05-12 2018-12-20 Apple Inc. Low-latency intelligent automated assistant
US11301477B2 (en) 2017-05-12 2022-04-12 Apple Inc. Feedback analysis of a digital assistant
US10403278B2 (en) 2017-05-16 2019-09-03 Apple Inc. Methods and systems for phonetic matching in digital assistant services
US20180336275A1 (en) 2017-05-16 2018-11-22 Apple Inc. Intelligent automated assistant for media exploration
US10311144B2 (en) 2017-05-16 2019-06-04 Apple Inc. Emoji word sense disambiguation
US10657328B2 (en) 2017-06-02 2020-05-19 Apple Inc. Multi-task recurrent neural network architecture for efficient morphology handling in neural language modeling
US10445429B2 (en) 2017-09-21 2019-10-15 Apple Inc. Natural language understanding using vocabularies with compressed serialized tries
US10755051B2 (en) 2017-09-29 2020-08-25 Apple Inc. Rule-based natural language processing
US10636424B2 (en) 2017-11-30 2020-04-28 Apple Inc. Multi-turn canned dialog
US10733982B2 (en) 2018-01-08 2020-08-04 Apple Inc. Multi-directional dialog
US10733375B2 (en) 2018-01-31 2020-08-04 Apple Inc. Knowledge-based framework for improving natural language understanding
US10789959B2 (en) 2018-03-02 2020-09-29 Apple Inc. Training speaker recognition models for digital assistants
US10592604B2 (en) 2018-03-12 2020-03-17 Apple Inc. Inverse text normalization for automatic speech recognition
US10818288B2 (en) 2018-03-26 2020-10-27 Apple Inc. Natural assistant interaction
US10909331B2 (en) 2018-03-30 2021-02-02 Apple Inc. Implicit identification of translation payload with neural machine translation
US10928918B2 (en) 2018-05-07 2021-02-23 Apple Inc. Raise to speak
US11145294B2 (en) 2018-05-07 2021-10-12 Apple Inc. Intelligent automated assistant for delivering content from user experiences
US10984780B2 (en) 2018-05-21 2021-04-20 Apple Inc. Global semantic word embeddings using bi-directional recurrent neural networks
US10892996B2 (en) 2018-06-01 2021-01-12 Apple Inc. Variable latency device coordination
US11386266B2 (en) 2018-06-01 2022-07-12 Apple Inc. Text correction
DK180639B1 (en) 2018-06-01 2021-11-04 Apple Inc DISABILITY OF ATTENTION-ATTENTIVE VIRTUAL ASSISTANT
DK201870355A1 (en) 2018-06-01 2019-12-16 Apple Inc. Virtual assistant operation in multi-device environments
DK179822B1 (en) 2018-06-01 2019-07-12 Apple Inc. Voice interaction at a primary device to access call functionality of a companion device
US11076039B2 (en) 2018-06-03 2021-07-27 Apple Inc. Accelerated task performance
US11010561B2 (en) 2018-09-27 2021-05-18 Apple Inc. Sentiment prediction from textual data
US11170166B2 (en) 2018-09-28 2021-11-09 Apple Inc. Neural typographical error modeling via generative adversarial networks
US11462215B2 (en) 2018-09-28 2022-10-04 Apple Inc. Multi-modal inputs for voice commands
US10839159B2 (en) 2018-09-28 2020-11-17 Apple Inc. Named entity normalization in a spoken dialog system
US11475898B2 (en) 2018-10-26 2022-10-18 Apple Inc. Low-latency multi-speaker speech recognition
US11638059B2 (en) 2019-01-04 2023-04-25 Apple Inc. Content playback on multiple devices
US11348573B2 (en) 2019-03-18 2022-05-31 Apple Inc. Multimodality in digital assistant systems
US11423908B2 (en) 2019-05-06 2022-08-23 Apple Inc. Interpreting spoken requests
US11307752B2 (en) 2019-05-06 2022-04-19 Apple Inc. User configurable task triggers
US11475884B2 (en) 2019-05-06 2022-10-18 Apple Inc. Reducing digital assistant latency when a language is incorrectly determined
DK201970509A1 (en) 2019-05-06 2021-01-15 Apple Inc Spoken notifications
US11140099B2 (en) 2019-05-21 2021-10-05 Apple Inc. Providing message response suggestions
US11496600B2 (en) 2019-05-31 2022-11-08 Apple Inc. Remote execution of machine-learned models
DK180129B1 (en) 2019-05-31 2020-06-02 Apple Inc. User activity shortcut suggestions
US11289073B2 (en) 2019-05-31 2022-03-29 Apple Inc. Device text to speech
US11360641B2 (en) 2019-06-01 2022-06-14 Apple Inc. Increasing the relevance of new available information
WO2021056255A1 (en) 2019-09-25 2021-04-01 Apple Inc. Text detection using global geometry estimators

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5781179A (en) * 1995-09-08 1998-07-14 Nippon Telegraph And Telephone Corp. Multimodal information inputting method and apparatus for embodying the same
US5884249A (en) * 1995-03-23 1999-03-16 Hitachi, Ltd. Input device, inputting method, information processing system, and input information managing method
US20020135614A1 (en) * 2001-03-22 2002-09-26 Intel Corporation Updating user interfaces based upon user inputs
US6523061B1 (en) * 1999-01-05 2003-02-18 Sri International, Inc. System, method, and article of manufacture for agent-based navigation in a speech-based data navigation system
US20030115060A1 (en) * 2001-12-13 2003-06-19 Junqua Jean-Claude System and interactive form filling with fusion of data from multiple unreliable information sources
US20040031058A1 (en) * 2002-05-10 2004-02-12 Richard Reisman Method and apparatus for browsing using alternative linkbases
US20040117513A1 (en) * 2002-08-16 2004-06-17 Scott Neil G. Intelligent total access system
US20050010892A1 (en) * 2003-07-11 2005-01-13 Vocollect, Inc. Method and system for integrating multi-modal data capture device inputs with multi-modal output capabilities
US20050165601A1 (en) * 2004-01-28 2005-07-28 Gupta Anurag K. Method and apparatus for determining when a user has ceased inputting data
US7272556B1 (en) * 1998-09-23 2007-09-18 Lucent Technologies Inc. Scalable and embedded codec for speech and audio signals

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH07122879B2 (en) 1993-03-30 1995-12-25 日本電気株式会社 Multimodal input analyzer
JPH07110734A (en) 1993-10-14 1995-04-25 Personal Joho Kankyo Kyokai Multimodal input analysis system
JP3136906B2 (en) 1994-07-21 2001-02-19 トヨタ自動車株式会社 Slip control device for vehicle lock-up clutch
JP3375449B2 (en) 1995-02-27 2003-02-10 シャープ株式会社 Integrated recognition dialogue device
JP2993872B2 (en) 1995-10-16 1999-12-27 株式会社エイ・ティ・アール音声翻訳通信研究所 Multimodal information integration analyzer
JPH10198544A (en) 1997-01-09 1998-07-31 Toshiba Corp Device for interpreting multi-modal input and method for integrating and interpreting the input
JP3822357B2 (en) 1998-02-09 2006-09-20 株式会社東芝 Interface device and method for multimodal input / output device
JP2000231427A (en) 1999-02-08 2000-08-22 Nec Corp Multi-modal information analyzing device
JP2001100878A (en) 1999-09-29 2001-04-13 Toshiba Corp Multi-modal input/output device

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5884249A (en) * 1995-03-23 1999-03-16 Hitachi, Ltd. Input device, inputting method, information processing system, and input information managing method
US5781179A (en) * 1995-09-08 1998-07-14 Nippon Telegraph And Telephone Corp. Multimodal information inputting method and apparatus for embodying the same
US7272556B1 (en) * 1998-09-23 2007-09-18 Lucent Technologies Inc. Scalable and embedded codec for speech and audio signals
US6523061B1 (en) * 1999-01-05 2003-02-18 Sri International, Inc. System, method, and article of manufacture for agent-based navigation in a speech-based data navigation system
US20020135614A1 (en) * 2001-03-22 2002-09-26 Intel Corporation Updating user interfaces based upon user inputs
US20030115060A1 (en) * 2001-12-13 2003-06-19 Junqua Jean-Claude System and interactive form filling with fusion of data from multiple unreliable information sources
US20040031058A1 (en) * 2002-05-10 2004-02-12 Richard Reisman Method and apparatus for browsing using alternative linkbases
US20040117513A1 (en) * 2002-08-16 2004-06-17 Scott Neil G. Intelligent total access system
US20050010892A1 (en) * 2003-07-11 2005-01-13 Vocollect, Inc. Method and system for integrating multi-modal data capture device inputs with multi-modal output capabilities
US20050165601A1 (en) * 2004-01-28 2005-07-28 Gupta Anurag K. Method and apparatus for determining when a user has ceased inputting data

Cited By (202)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9646614B2 (en) 2000-03-16 2017-05-09 Apple Inc. Fast, language-independent method for user authentication by voice
US8593959B2 (en) 2002-09-30 2013-11-26 Avaya Inc. VoIP endpoint call admission
US8370515B2 (en) 2002-09-30 2013-02-05 Avaya Inc. Packet prioritization and associated bandwidth and buffer management techniques for audio over IP
US8015309B2 (en) 2002-09-30 2011-09-06 Avaya Inc. Packet prioritization and associated bandwidth and buffer management techniques for audio over IP
US7877500B2 (en) 2002-09-30 2011-01-25 Avaya Inc. Packet prioritization and associated bandwidth and buffer management techniques for audio over IP
US7877501B2 (en) 2002-09-30 2011-01-25 Avaya Inc. Packet prioritization and associated bandwidth and buffer management techniques for audio over IP
US7978827B1 (en) 2004-06-30 2011-07-12 Avaya Inc. Automatic configuration of call handling based on end-user needs and characteristics
US10318871B2 (en) 2005-09-08 2019-06-11 Apple Inc. Method and apparatus for building an intelligent automated assistant
US8930191B2 (en) 2006-09-08 2015-01-06 Apple Inc. Paraphrasing of user requests and results by automated digital assistant
US9117447B2 (en) 2006-09-08 2015-08-25 Apple Inc. Using event alert text as input to an automated assistant
US8942986B2 (en) 2006-09-08 2015-01-27 Apple Inc. Determining user intent based on ontologies of domains
US10568032B2 (en) 2007-04-03 2020-02-18 Apple Inc. Method and system for operating a multi-function portable electronic device using voice-activation
US20080313607A1 (en) * 2007-06-15 2008-12-18 Microsoft Corporation Unified input stack
US20090089251A1 (en) * 2007-10-02 2009-04-02 Michael James Johnston Multimodal interface for searching multimedia content
US9330720B2 (en) 2008-01-03 2016-05-03 Apple Inc. Methods and apparatus for altering audio output signals
US10381016B2 (en) 2008-01-03 2019-08-13 Apple Inc. Methods and apparatus for altering audio output signals
US9865248B2 (en) 2008-04-05 2018-01-09 Apple Inc. Intelligent text-to-speech conversion
US9626955B2 (en) 2008-04-05 2017-04-18 Apple Inc. Intelligent text-to-speech conversion
US9535906B2 (en) 2008-07-31 2017-01-03 Apple Inc. Mobile device having human language translation capability with positional feedback
US10108612B2 (en) 2008-07-31 2018-10-23 Apple Inc. Mobile device having human language translation capability with positional feedback
US20100083186A1 (en) * 2008-09-26 2010-04-01 Microsoft Corporation Magnifier panning interface for natural input devices
US20100079498A1 (en) * 2008-09-26 2010-04-01 Microsoft Corporation Multi-modal interaction for a screen magnifier
US9372590B2 (en) 2008-09-26 2016-06-21 Microsoft Technology Licensing, Llc Magnifier panning interface for natural input devices
US8176438B2 (en) 2008-09-26 2012-05-08 Microsoft Corporation Multi-modal interaction for a screen magnifier
US8218751B2 (en) 2008-09-29 2012-07-10 Avaya Inc. Method and apparatus for identifying and eliminating the source of background noise in multi-party teleconferences
US11487347B1 (en) * 2008-11-10 2022-11-01 Verint Americas Inc. Enhanced multi-modal communication
US9959870B2 (en) 2008-12-11 2018-05-01 Apple Inc. Speech recognition involving a mobile device
US20100241431A1 (en) * 2009-03-18 2010-09-23 Robert Bosch Gmbh System and Method for Multi-Modal Input Synchronization and Disambiguation
US9123341B2 (en) 2009-03-18 2015-09-01 Robert Bosch Gmbh System and method for multi-modal input synchronization and disambiguation
WO2010107526A1 (en) * 2009-03-18 2010-09-23 Robert Bosch Gmbh System and method for multi-modal input synchronization and disambiguation
US10795541B2 (en) 2009-06-05 2020-10-06 Apple Inc. Intelligent organization of tasks items
US10475446B2 (en) 2009-06-05 2019-11-12 Apple Inc. Using context information to facilitate processing of commands in a virtual assistant
US11080012B2 (en) 2009-06-05 2021-08-03 Apple Inc. Interface for a virtual digital assistant
US9858925B2 (en) 2009-06-05 2018-01-02 Apple Inc. Using context information to facilitate processing of commands in a virtual assistant
US10283110B2 (en) 2009-07-02 2019-05-07 Apple Inc. Methods and apparatuses for automatic speech recognition
US10553209B2 (en) 2010-01-18 2020-02-04 Apple Inc. Systems and methods for hands-free notification summaries
US10706841B2 (en) 2010-01-18 2020-07-07 Apple Inc. Task flow identification based on user intent
US11423886B2 (en) 2010-01-18 2022-08-23 Apple Inc. Task flow identification based on user intent
US10276170B2 (en) 2010-01-18 2019-04-30 Apple Inc. Intelligent automated assistant
US8892446B2 (en) 2010-01-18 2014-11-18 Apple Inc. Service orchestration for intelligent automated assistant
US8903716B2 (en) 2010-01-18 2014-12-02 Apple Inc. Personalized vocabulary for digital assistant
US10679605B2 (en) 2010-01-18 2020-06-09 Apple Inc. Hands-free list-reading by intelligent automated assistant
US10496753B2 (en) 2010-01-18 2019-12-03 Apple Inc. Automatically adapting user interfaces for hands-free interaction
US10705794B2 (en) 2010-01-18 2020-07-07 Apple Inc. Automatically adapting user interfaces for hands-free interaction
US9318108B2 (en) 2010-01-18 2016-04-19 Apple Inc. Intelligent automated assistant
US9548050B2 (en) 2010-01-18 2017-01-17 Apple Inc. Intelligent automated assistant
US10049675B2 (en) 2010-02-25 2018-08-14 Apple Inc. User profiling for voice input processing
US9190062B2 (en) 2010-02-25 2015-11-17 Apple Inc. User profiling for voice input processing
US9633660B2 (en) 2010-02-25 2017-04-25 Apple Inc. User profiling for voice input processing
US20120078627A1 (en) * 2010-09-27 2012-03-29 Wagner Oliver P Electronic device with text error correction based on voice recognition data
US8719014B2 (en) * 2010-09-27 2014-05-06 Apple Inc. Electronic device with text error correction based on voice recognition data
US9075783B2 (en) * 2010-09-27 2015-07-07 Apple Inc. Electronic device with text error correction based on voice recognition data
US10762293B2 (en) 2010-12-22 2020-09-01 Apple Inc. Using parts-of-speech tagging and named entity recognition for spelling correction
US10102359B2 (en) 2011-03-21 2018-10-16 Apple Inc. Device access using voice authentication
US9262612B2 (en) 2011-03-21 2016-02-16 Apple Inc. Device access using voice authentication
US10057736B2 (en) 2011-06-03 2018-08-21 Apple Inc. Active transport based notifications
US10241644B2 (en) 2011-06-03 2019-03-26 Apple Inc. Actionable reminder entries
US10706373B2 (en) 2011-06-03 2020-07-07 Apple Inc. Performing actions associated with task items that represent tasks to perform
US11120372B2 (en) 2011-06-03 2021-09-14 Apple Inc. Performing actions associated with task items that represent tasks to perform
US9798393B2 (en) 2011-08-29 2017-10-24 Apple Inc. Text correction processing
US10241752B2 (en) 2011-09-30 2019-03-26 Apple Inc. Interface for a virtual digital assistant
US20160026434A1 (en) * 2011-12-01 2016-01-28 At&T Intellectual Property I, L.P. System and method for continuous multimodal speech and gesture interaction
US10540140B2 (en) * 2011-12-01 2020-01-21 Nuance Communications, Inc. System and method for continuous multimodal speech and gesture interaction
US9710223B2 (en) * 2011-12-01 2017-07-18 Nuance Communications, Inc. System and method for continuous multimodal speech and gesture interaction
US11189288B2 (en) * 2011-12-01 2021-11-30 Nuance Communications, Inc. System and method for continuous multimodal speech and gesture interaction
US20130144629A1 (en) * 2011-12-01 2013-06-06 At&T Intellectual Property I, L.P. System and method for continuous multimodal speech and gesture interaction
US20180004482A1 (en) * 2011-12-01 2018-01-04 Nuance Communications, Inc. System and method for continuous multimodal speech and gesture interaction
US9152376B2 (en) * 2011-12-01 2015-10-06 At&T Intellectual Property I, L.P. System and method for continuous multimodal speech and gesture interaction
US10134385B2 (en) 2012-03-02 2018-11-20 Apple Inc. Systems and methods for name pronunciation
US9483461B2 (en) 2012-03-06 2016-11-01 Apple Inc. Handling speech synthesis of content for multiple languages
US9953088B2 (en) 2012-05-14 2018-04-24 Apple Inc. Crowd sourcing information to fulfill user requests
US10079014B2 (en) 2012-06-08 2018-09-18 Apple Inc. Name recognition system
US9495129B2 (en) 2012-06-29 2016-11-15 Apple Inc. Device, method, and user interface for voice-activated navigation and browsing of a document
US9093072B2 (en) 2012-07-20 2015-07-28 Microsoft Technology Licensing, Llc Speech and gesture recognition enhancement
WO2014015201A1 (en) * 2012-07-20 2014-01-23 Microsoft Corporation Speech and gesture recognition enhancement
US10037757B2 (en) * 2012-08-20 2018-07-31 Lg Innotek Co., Ltd. Voice recognition apparatus and method of recognizing voice
US20150187354A1 (en) * 2012-08-20 2015-07-02 Lg Innotek Co., Ltd. Voice recognition apparatus and method of recognizing voice
US9576574B2 (en) 2012-09-10 2017-02-21 Apple Inc. Context-sensitive handling of interruptions by intelligent digital assistant
US9971774B2 (en) 2012-09-19 2018-05-15 Apple Inc. Voice-based media searching
WO2014070872A3 (en) * 2012-10-30 2014-06-26 Robert Bosch Gmbh System and method for multimodal interaction with reduced distraction in operating vehicles
US10671342B2 (en) 2013-01-31 2020-06-02 Huawei Technologies Co., Ltd. Non-contact gesture control method, and electronic terminal device
EP2942698A4 (en) * 2013-01-31 2016-09-07 Huawei Tech Co Ltd Non-contact gesture control method, and electronic terminal device
EP2942698A1 (en) * 2013-01-31 2015-11-11 Huawei Technologies Co., Ltd. Non-contact gesture control method, and electronic terminal device
US10978090B2 (en) 2013-02-07 2021-04-13 Apple Inc. Voice trigger for a digital assistant
US10199051B2 (en) 2013-02-07 2019-02-05 Apple Inc. Voice trigger for a digital assistant
US9368114B2 (en) 2013-03-14 2016-06-14 Apple Inc. Context-sensitive handling of interruptions
US10652394B2 (en) 2013-03-14 2020-05-12 Apple Inc. System and method for processing voicemail
US11388291B2 (en) 2013-03-14 2022-07-12 Apple Inc. System and method for processing voicemail
US9922642B2 (en) 2013-03-15 2018-03-20 Apple Inc. Training an at least partial voice command system
US9697822B1 (en) 2013-03-15 2017-07-04 Apple Inc. System and method for updating an adaptive speech recognition model
US9633674B2 (en) 2013-06-07 2017-04-25 Apple Inc. System and method for detecting errors in interactions with a voice-based digital assistant
US9620104B2 (en) 2013-06-07 2017-04-11 Apple Inc. System and method for user-specified pronunciation of words for speech synthesis and recognition
US9966060B2 (en) 2013-06-07 2018-05-08 Apple Inc. System and method for user-specified pronunciation of words for speech synthesis and recognition
US9582608B2 (en) 2013-06-07 2017-02-28 Apple Inc. Unified ranking with entropy-weighted information for phrase-based semantic auto-completion
US9966068B2 (en) 2013-06-08 2018-05-08 Apple Inc. Interpreting and acting upon commands that involve sharing information with remote devices
US10657961B2 (en) 2013-06-08 2020-05-19 Apple Inc. Interpreting and acting upon commands that involve sharing information with remote devices
US10185542B2 (en) 2013-06-09 2019-01-22 Apple Inc. Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant
US10176167B2 (en) 2013-06-09 2019-01-08 Apple Inc. System and method for inferring user intent from speech inputs
US9300784B2 (en) 2013-06-13 2016-03-29 Apple Inc. System and method for emergency calls initiated by voice command
EP3011751A4 (en) * 2013-07-02 2017-03-01 Samsung Electronics Co., Ltd. Server, control method thereof, image processing apparatus, and control method thereof
US10140985B2 (en) 2013-07-02 2018-11-27 Samsung Electronics Co., Ltd. Server for processing speech, control method thereof, image processing apparatus, and control method thereof
US10791216B2 (en) 2013-08-06 2020-09-29 Apple Inc. Auto-activating smart responses based on activities from remote devices
US9766715B2 (en) 2014-05-01 2017-09-19 Seiko Epson Corporation Head-mount type display device, control system, method of controlling head-mount type display device, and computer program
US9965048B2 (en) 2014-05-01 2018-05-08 Seiko Epson Corporation Head-mount type display device, control system, method of controlling head-mount type display device, and computer program
US9620105B2 (en) 2014-05-15 2017-04-11 Apple Inc. Analyzing audio input for efficient speech and music recognition
US20150339098A1 (en) * 2014-05-21 2015-11-26 Samsung Electronics Co., Ltd. Display apparatus, remote control apparatus, system and controlling method thereof
US10592095B2 (en) 2014-05-23 2020-03-17 Apple Inc. Instantaneous speaking of content on touch devices
US9502031B2 (en) 2014-05-27 2016-11-22 Apple Inc. Method for supporting dynamic grammars in WFST-based ASR
US11257504B2 (en) 2014-05-30 2022-02-22 Apple Inc. Intelligent assistant for home automation
US10078631B2 (en) 2014-05-30 2018-09-18 Apple Inc. Entropy-guided text prediction using combined word and character n-gram language models
US9715875B2 (en) 2014-05-30 2017-07-25 Apple Inc. Reducing the need for manual start/end-pointing and trigger phrases
US9842101B2 (en) 2014-05-30 2017-12-12 Apple Inc. Predictive conversion of language input
US10289433B2 (en) 2014-05-30 2019-05-14 Apple Inc. Domain specific language for encoding assistant dialog
US10170123B2 (en) 2014-05-30 2019-01-01 Apple Inc. Intelligent assistant for home automation
US10169329B2 (en) 2014-05-30 2019-01-01 Apple Inc. Exemplar-based natural language processing
US9633004B2 (en) 2014-05-30 2017-04-25 Apple Inc. Better resolution when referencing to concepts
US10083690B2 (en) 2014-05-30 2018-09-25 Apple Inc. Better resolution when referencing to concepts
US10497365B2 (en) 2014-05-30 2019-12-03 Apple Inc. Multi-command single utterance input method
US11133008B2 (en) 2014-05-30 2021-09-28 Apple Inc. Reducing the need for manual start/end-pointing and trigger phrases
US9734193B2 (en) 2014-05-30 2017-08-15 Apple Inc. Determining domain salience ranking from ambiguous words in natural speech
US9760559B2 (en) 2014-05-30 2017-09-12 Apple Inc. Predictive text input
US9430463B2 (en) 2014-05-30 2016-08-30 Apple Inc. Exemplar-based natural language processing
US9785630B2 (en) 2014-05-30 2017-10-10 Apple Inc. Text prediction using combined word N-gram and unigram language models
US9966065B2 (en) 2014-05-30 2018-05-08 Apple Inc. Multi-command single utterance input method
US10241753B2 (en) 2014-06-20 2019-03-26 Interdigital Ce Patent Holdings Apparatus and method for controlling the apparatus by a user
US10659851B2 (en) 2014-06-30 2020-05-19 Apple Inc. Real-time digital assistant knowledge updates
US9668024B2 (en) 2014-06-30 2017-05-30 Apple Inc. Intelligent automated assistant for TV user interactions
US10904611B2 (en) 2014-06-30 2021-01-26 Apple Inc. Intelligent automated assistant for TV user interactions
US9338493B2 (en) 2014-06-30 2016-05-10 Apple Inc. Intelligent automated assistant for TV user interactions
US10446141B2 (en) 2014-08-28 2019-10-15 Apple Inc. Automatic speech recognition based on user feedback
US9818400B2 (en) 2014-09-11 2017-11-14 Apple Inc. Method and apparatus for discovering trending terms in speech requests
US10431204B2 (en) 2014-09-11 2019-10-01 Apple Inc. Method and apparatus for discovering trending terms in speech requests
US10789041B2 (en) 2014-09-12 2020-09-29 Apple Inc. Dynamic thresholds for always listening speech trigger
US10074360B2 (en) 2014-09-30 2018-09-11 Apple Inc. Providing an indication of the suitability of speech recognition
US9646609B2 (en) 2014-09-30 2017-05-09 Apple Inc. Caching apparatus for serving phonetic pronunciations
US10127911B2 (en) 2014-09-30 2018-11-13 Apple Inc. Speaker identification and unsupervised speaker adaptation techniques
US9886432B2 (en) 2014-09-30 2018-02-06 Apple Inc. Parsimonious handling of word inflection via categorical stem + suffix N-gram language models
US9986419B2 (en) 2014-09-30 2018-05-29 Apple Inc. Social reminders
US9668121B2 (en) 2014-09-30 2017-05-30 Apple Inc. Social reminders
US10552013B2 (en) 2014-12-02 2020-02-04 Apple Inc. Data detection
US11556230B2 (en) 2014-12-02 2023-01-17 Apple Inc. Data detection
US9711141B2 (en) 2014-12-09 2017-07-18 Apple Inc. Disambiguating heteronyms in speech synthesis
US9865280B2 (en) 2015-03-06 2018-01-09 Apple Inc. Structured dictation using intelligent automated assistants
US10567477B2 (en) 2015-03-08 2020-02-18 Apple Inc. Virtual assistant continuity
US9886953B2 (en) 2015-03-08 2018-02-06 Apple Inc. Virtual assistant activation
US11087759B2 (en) 2015-03-08 2021-08-10 Apple Inc. Virtual assistant activation
US9721566B2 (en) 2015-03-08 2017-08-01 Apple Inc. Competing devices responding to voice triggers
US10311871B2 (en) 2015-03-08 2019-06-04 Apple Inc. Competing devices responding to voice triggers
US9899019B2 (en) 2015-03-18 2018-02-20 Apple Inc. Systems and methods for structured stem and suffix language models
US10726197B2 (en) * 2015-03-26 2020-07-28 Lenovo (Singapore) Pte. Ltd. Text correction using a second input
US20160283453A1 (en) * 2015-03-26 2016-09-29 Lenovo (Singapore) Pte. Ltd. Text correction using a second input
US9842105B2 (en) 2015-04-16 2017-12-12 Apple Inc. Parsimonious continuous-space phrase representations for natural language processing
US10083688B2 (en) 2015-05-27 2018-09-25 Apple Inc. Device voice control for selecting a displayed affordance
US10127220B2 (en) 2015-06-04 2018-11-13 Apple Inc. Language identification from short strings
US10101822B2 (en) 2015-06-05 2018-10-16 Apple Inc. Language input correction
US10356243B2 (en) 2015-06-05 2019-07-16 Apple Inc. Virtual assistant aided communication with 3rd party service in a communication session
US11025565B2 (en) 2015-06-07 2021-06-01 Apple Inc. Personalized prediction of responses for instant messaging
US10186254B2 (en) 2015-06-07 2019-01-22 Apple Inc. Context-based endpoint detection
US10255907B2 (en) 2015-06-07 2019-04-09 Apple Inc. Automatic accent detection using acoustic models
US11500672B2 (en) 2015-09-08 2022-11-15 Apple Inc. Distributed personal assistant
US10747498B2 (en) 2015-09-08 2020-08-18 Apple Inc. Zero latency digital assistant
US10671428B2 (en) 2015-09-08 2020-06-02 Apple Inc. Distributed personal assistant
US9697820B2 (en) 2015-09-24 2017-07-04 Apple Inc. Unit-selection text-to-speech synthesis using concatenation-sensitive neural networks
US10366158B2 (en) 2015-09-29 2019-07-30 Apple Inc. Efficient word encoding for recurrent neural network language models
US11010550B2 (en) 2015-09-29 2021-05-18 Apple Inc. Unified language modeling framework for word prediction, auto-completion and auto-correction
US11587559B2 (en) 2015-09-30 2023-02-21 Apple Inc. Intelligent device identification
US10691473B2 (en) 2015-11-06 2020-06-23 Apple Inc. Intelligent automated assistant in a messaging environment
US11526368B2 (en) 2015-11-06 2022-12-13 Apple Inc. Intelligent automated assistant in a messaging environment
US10049668B2 (en) 2015-12-02 2018-08-14 Apple Inc. Applying neural network language models to weighted finite state transducers for automatic speech recognition
US10223066B2 (en) 2015-12-23 2019-03-05 Apple Inc. Proactive assistance based on dialog communication between devices
US10446143B2 (en) 2016-03-14 2019-10-15 Apple Inc. Identification of voice inputs providing credentials
US9934775B2 (en) 2016-05-26 2018-04-03 Apple Inc. Unit-selection text-to-speech synthesis based on predicted concatenation parameters
US9972304B2 (en) 2016-06-03 2018-05-15 Apple Inc. Privacy preserving distributed evaluation framework for embedded personalized systems
US10249300B2 (en) 2016-06-06 2019-04-02 Apple Inc. Intelligent list reading
US11069347B2 (en) 2016-06-08 2021-07-20 Apple Inc. Intelligent automated assistant for media exploration
US10049663B2 (en) 2016-06-08 2018-08-14 Apple, Inc. Intelligent automated assistant for media exploration
US10354011B2 (en) 2016-06-09 2019-07-16 Apple Inc. Intelligent automated assistant in a home environment
US10733993B2 (en) 2016-06-10 2020-08-04 Apple Inc. Intelligent digital assistant in a multi-tasking environment
US10509862B2 (en) 2016-06-10 2019-12-17 Apple Inc. Dynamic phrase expansion of language input
US10490187B2 (en) 2016-06-10 2019-11-26 Apple Inc. Digital assistant providing automated status report
US10192552B2 (en) 2016-06-10 2019-01-29 Apple Inc. Digital assistant providing whispered speech
US10067938B2 (en) 2016-06-10 2018-09-04 Apple Inc. Multilingual word prediction
US11037565B2 (en) 2016-06-10 2021-06-15 Apple Inc. Intelligent digital assistant in a multi-tasking environment
US10521466B2 (en) 2016-06-11 2019-12-31 Apple Inc. Data driven natural language event detection and classification
US10089072B2 (en) 2016-06-11 2018-10-02 Apple Inc. Intelligent device arbitration and control
US11152002B2 (en) 2016-06-11 2021-10-19 Apple Inc. Application integration with a digital assistant
US10297253B2 (en) 2016-06-11 2019-05-21 Apple Inc. Application integration with a digital assistant
US10269345B2 (en) 2016-06-11 2019-04-23 Apple Inc. Intelligent task discovery
EP3493048A4 (en) * 2016-07-28 2019-07-03 Panasonic Intellectual Property Management Co., Ltd. Translation device and translation system
US20190026266A1 (en) * 2016-07-28 2019-01-24 Panasonic Intellectual Property Management Co., Ltd. Translation device and translation system
US10553215B2 (en) 2016-09-23 2020-02-04 Apple Inc. Intelligent automated assistant
US10043516B2 (en) 2016-09-23 2018-08-07 Apple Inc. Intelligent automated assistant
US10593346B2 (en) 2016-12-22 2020-03-17 Apple Inc. Rank-reduced token representation for automatic speech recognition
US10755703B2 (en) 2017-05-11 2020-08-25 Apple Inc. Offline personal assistant
US11405466B2 (en) 2017-05-12 2022-08-02 Apple Inc. Synchronization and task delegation of a digital assistant
US10791176B2 (en) 2017-05-12 2020-09-29 Apple Inc. Synchronization and task delegation of a digital assistant
US10410637B2 (en) 2017-05-12 2019-09-10 Apple Inc. User-specific acoustic models
US10810274B2 (en) 2017-05-15 2020-10-20 Apple Inc. Optimizing dialogue policy decisions for digital assistants using implicit feedback
US10482874B2 (en) 2017-05-15 2019-11-19 Apple Inc. Hierarchical belief states for digital assistants
US11217255B2 (en) 2017-05-16 2022-01-04 Apple Inc. Far-field extension for digital assistant services
US11189270B2 (en) 2018-06-26 2021-11-30 Hitachi, Ltd. Method of controlling dialogue system, dialogue system, and data storage medium
US11209970B2 (en) 2018-10-30 2021-12-28 Banma Zhixing Network (Hongkong) Co., Limited Method, device, and system for providing an interface based on an interaction with a terminal

Also Published As

Publication number Publication date
JP4416643B2 (en) 2010-02-17
JP2006048628A (en) 2006-02-16
US7630901B2 (en) 2009-12-08

Similar Documents

Publication Publication Date Title
US7630901B2 (en) Multimodal input method
US11379663B2 (en) Multi-gesture text input prediction
US20210073467A1 (en) Method, System and Apparatus for Entering Text on a Computing Device
JP5819924B2 (en) Recognition architecture for generating Asian characters
US9489432B2 (en) System and method for using speech for data searching during presentations
TWI266280B (en) Multimodal disambiguation of speech recognition
US7844458B2 (en) Speech recognition for detecting setting instructions
US10445060B2 (en) System and method for controlling presentations using a multimodal interface
US8150699B2 (en) Systems and methods of a structured grammar for a speech recognition command system
US20050131686A1 (en) Information processing apparatus and data input method
US20090326938A1 (en) Multiword text correction
WO2017160341A1 (en) Dictation that allows editing
US20110301943A1 (en) System and method of dictation for a speech recognition command system
EP3491641B1 (en) Acoustic model training using corrected terms
JP3476007B2 (en) Recognition word registration method, speech recognition method, speech recognition device, storage medium storing software product for registration of recognition word, storage medium storing software product for speech recognition
JP6983118B2 (en) Dialogue system control methods, dialogue systems and programs
Zhao et al. Voice and touch based error-tolerant multimodal text editing and correction for smartphones
JP2011186994A (en) Character input device and character input method
Elliott et al. Computer based mathematics using continuous speech recognition
JP2002156996A (en) Voice recognition device, recognition result correcting method, and recording medium
CN107450742A (en) A kind of information processing method, device and terminal
JPH1091309A (en) Device and method for inputting and outputting information
Ricco et al. Visual Error Resolution Strategy for highly-structured text entry using Speech Recognition in FP6-ALLADIN project
AU2015221542A1 (en) Method system and apparatus for entering text on a computing device
JPH0319070A (en) Kana/kanji conversion system

Legal Events

Date Code Title Description
AS Assignment

Owner name: CANON KABUSHIKI KAISHA, JAPAN

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:OMI, HIROMI;REEL/FRAME:016701/0769

Effective date: 20050606

FEPP Fee payment procedure

Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

FPAY Fee payment

Year of fee payment: 4

REMI Maintenance fee reminder mailed
LAPS Lapse for failure to pay maintenance fees

Free format text: PATENT EXPIRED FOR FAILURE TO PAY MAINTENANCE FEES (ORIGINAL EVENT CODE: EXP.)

STCH Information on status: patent discontinuation

Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362

FP Lapsed due to failure to pay maintenance fee

Effective date: 20171208