USRE44418E1 - Techniques for disambiguating speech input using multimodal interfaces - Google Patents
Techniques for disambiguating speech input using multimodal interfaces Download PDFInfo
- Publication number
- USRE44418E1 USRE44418E1 US13/429,187 US201213429187A USRE44418E US RE44418 E1 USRE44418 E1 US RE44418E1 US 201213429187 A US201213429187 A US 201213429187A US RE44418 E USRE44418 E US RE44418E
- Authority
- US
- United States
- Prior art keywords
- user
- speech
- input
- parameters
- application
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active, expires
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
Definitions
- the present invention relates generally to the field of multimodal computing (and other electronic) devices, and, more particularly, to techniques for disambiguating speech input using multimodal interfaces.
- ASR Automatic Speech Recognition
- ASR technology is powerful, but not fool-proof. That is, ASR systems do not always correctly recognize the user's utterance. This can occur due to a variety of factors including noisy environments, the speaker's accent and mispronunciation, microphone quality, etc.
- ASR systems function by matching the user's utterance against a grammar that defines the allowable words and phrases.
- the result of the ASR processing is a one or more matching tokens, each with a corresponding measure of confidence that the user's utterance matches the text token.
- the presence of more than one matching token indicates that there is no clear best match to the user's speech.
- TTS Text-to-Speech
- Multimodal interfaces represent a new technology that facilitates the use of multiple modalities (or modes) to interact with an application. Multi-modal interfaces are potentially useful in improving the disambiguating of speech and substantially improving end user experience.
- the present invention provides a Multimodal Disambiguation Mechanism (MDM), and particular multimodal techniques to improve the speech recognition process.
- MDM Multimodal Disambiguation Mechanism
- This mechanism can be applied to many types of applications, software and hardware architectures, device types, and network technologies.
- a system preferably includes one or more of the following components: user input and/or output devices with various modes; a speech recognition engine; and an application that uses the results of the speech recognition engine; and a multi-modal disambiguation engine.
- the different modes of input/output devices include visual and voice modes.
- Visual mode may use devices such as a visual display, stylus, pen, buttons, keyboard, touch pad, touch screen, mouse, etc.
- Voice mode may use devices such as a microphone (with an optional push-to-talk button), speakers, headphones, speakerphone, etc.
- the speech recognition engine may use a grammar or rules to interpret speech input, and may generate tokens based on the speech input (although speech recognition systems based on other mechanisms may be used—the use of any speech recognition mechanism is within the spirit and scope of the invention).
- the multi-modal disambiguation engine receives the results from the speech recognition engine and performs disambiguation tasks. A token representing the disambiguated speech input is then provided to the application.
- FIG. 1 is a functional block diagram of an example multi-modal disambiguation mechanism in accordance with aspects of the invention, and further shows a method of disambiguating speech;
- FIG. 2 is a flow diagram of an example process for disambiguating speech.
- FIG. 1 shows an overview of an example multi-modal disambiguation mechanism (MDM) 102 in accordance with the invention, and demonstrates a context in which an MDM 102 may be used.
- the MDM 102 shown in FIG. 1 is used to disambiguate an end user's spoken utterances 104 so that the data represented by those utterances 104 may be used as input to application programs 106 .
- the end user 108 uses a speech interface to issue commands to the application 106 .
- the user's utterances 104 e.g., spoken words
- SRE speech recognition engine 110
- the SRE 110 does not recognize the user's utterance 104 with high enough confidence
- the multimodal disambiguation mechanism 102 is triggered to disambiguate the user's utterances 104 and pass the result on to the application 106 .
- MDM 102 may disambiguate speech based on a set of parameters 114 that have been configured by a user or administrator In case no user and application options and parameters 114 are set, the MDM may use a set of default parameters.
- the selection algorithm 116 receives as input the set of alternatives that SRE 110 believes are the best match to the user's utterance.
- the SA 116 filters this set according to the options and parameters 114 (or a set of default parameters) and passes the result on to output generator 118 .
- Output generator 118 preferably presents to the user a multimodal disambiguation panel, which renders a set of recognition alternatives 120 , and the user 108 may use the panel to select the correct alternative.
- the user's selection 122 is received by input handler 124 , which then passes the selected alternative to the output interface 126 .
- the user's selection constitutes disambiguated input 128 , which is then sent to application 106 .
- the above-described process generally takes place transparently, in the sense that application 106 is generally unaware that the disambiguation process has taken place.
- End user 108 accesses MDM 102 and application 106 via an end user device which has multimodal input and output capabilities.
- MDM 102 and application 106 may reside on the end user device and/or may be available as a distributed service on other computer servers or workstations.
- MDM software on the end user device has the capability to enter, edit, and store the end user parameters 114 , which govern the operations of MDM 102 .
- the end user device has various multimodal input and output capabilities that may vary by the type of device. These capabilities are used by the MDM 102 software to present to the end user the recognition alternatives 120 , and to accept and interpret the user selection input.
- Various types of input can be accepted including speech input, keypad input, stylus input, touch input, based on the end user device capabilities.
- the application can be any speech assisted application, or an application that accepts traditional text or event input.
- the application (or subcomponents of it) can be resident on the end user device and/or distributed across the end user device and other remote servers.
- the disambiguation mechanism can be entirely transparent to the user or portions of the MDM 102 can be implemented within the application 106 .
- Applications 106 can be written in various languages to use the MDM 102 .
- MDM Multimodal Disambiguation Mechanism
- a function of MDM 102 is to disambiguate the user's utterances 104 in the event that these utterances 104 are not recognized with sufficient confidence.
- the SRE 110 can be configured to return a set of alternatives 120 that the user's utterance 104 matches.
- the MDM 102 uses these alternatives 120 and the corresponding confidence levels to disambiguate the user's utterance 104 .
- the output of the disambiguation process i.e., the disambiguated user input
- the MDM 102 can be guided and controlled by user and application parameters 114 .
- the MDM comprises multiple components (e.g., components 110 , 114 , 116 , 118 , 124 , and 126 ) that can be resident on the end user device or can be distributed on other computers on a network. Portions of the MDM 102 can be resident in the application 104 . The components of the MDM 102 are described below.
- the end user 108 and the application 106 can both set parameters 114 to control the various sub-components of the MDM.
- the MDM combines the end user and application parameters to drive the MDM process.
- the SRE 110 takes as input the user utterance 104 , a grammar to be matched against the utterance 104 , and a set of parameters 114 , such as the confidence thresholds governing unambiguous recognition and inclusion of close matches. If the utterance matches a token in the grammar with a confidence higher than the threshold for unambiguous recognition, the recognized utterance 104 is passed to the application. Otherwise, a set of alternatives with their confidence values is passed to the selection algorithm 116 to begin the disambiguation process. Preferably, any SRE 110 supporting automatic speech recognition that returns a list of alternatives with confidence values can be used.
- SA Selection algorithm 116 .
- the selection algorithm 106 is invoked when the user's utterance is recognized with accuracy below the confidence threshold for unambiguous recognition.
- the SA 116 calculates the alternatives to be passed to the user based on the individual confidence values, application and user parameters, though other factors may also be involved in determining the alternatives.
- Output generator (OG) 118 takes as input the alternatives calculated by the SA 116 , and presents these to the end user who will select one alternative to be returned to the application.
- User and application parameters control the presentation to the user and the user disambiguation method (UDM) to be used.
- UDM's are of three overall classes: visual only, voice only, and multimodal. Within these classes, there are multiple types of UDM's that can be used.
- IH Input Handler 124 .
- the input action can be multimodal, i.e. the user can take voice or visual action, or perhaps a combination of the two.
- the IH 124 will handle this multimodal user selection and determine which alternative has been selected by the user. Allowable user actions are based on the types of UDM's used. A combination of multimodal UDM's can be utilized. It should be noted that it may be particularly useful to allow the user to interact with the alternatives in plural modes (e.g., visual and voice modes).
- FIG. 2 shows an example process of disambiguating speech in the form of a flow diagram.
- speech input is received ( 202 )—e.g., by a user speaking into a microphone.
- a speech recognition engine attempts to recognize the speech. If the speech is recognized unambiguously ( 204 ), then the unambiguous speech is provided as input to an application ( 206 ). If, however, the speech is not recognized unambiguously, then a list of possible alternatives is determined ( 208 ).
- the list of alternatives may, for example, be the set of possible tokens identified by the speech recognition engine whose confidence value exceeds some defined threshold.
- the list of alternatives may also be filtered according to a set of parameters.
- the list of alternatives is presented to a user in a multi-modal interaction ( 210 ). The user then selects one of the alternatives, and the selected alternative is provided to the application as input ( 212 ).
Abstract
Description
Claims (25)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/429,187 USRE44418E1 (en) | 2002-12-10 | 2012-03-23 | Techniques for disambiguating speech input using multimodal interfaces |
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US43222702P | 2002-12-10 | 2002-12-10 | |
US10/733,793 US7684985B2 (en) | 2002-12-10 | 2003-12-10 | Techniques for disambiguating speech input using multimodal interfaces |
US13/429,187 USRE44418E1 (en) | 2002-12-10 | 2012-03-23 | Techniques for disambiguating speech input using multimodal interfaces |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US10/733,793 Reissue US7684985B2 (en) | 2002-12-10 | 2003-12-10 | Techniques for disambiguating speech input using multimodal interfaces |
Publications (1)
Publication Number | Publication Date |
---|---|
USRE44418E1 true USRE44418E1 (en) | 2013-08-06 |
Family
ID=32507875
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US10/733,793 Ceased US7684985B2 (en) | 2002-12-10 | 2003-12-10 | Techniques for disambiguating speech input using multimodal interfaces |
US13/429,187 Active 2026-06-07 USRE44418E1 (en) | 2002-12-10 | 2012-03-23 | Techniques for disambiguating speech input using multimodal interfaces |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US10/733,793 Ceased US7684985B2 (en) | 2002-12-10 | 2003-12-10 | Techniques for disambiguating speech input using multimodal interfaces |
Country Status (4)
Country | Link |
---|---|
US (2) | US7684985B2 (en) |
EP (2) | EP1614102A4 (en) |
AU (1) | AU2003296981A1 (en) |
WO (1) | WO2004053836A1 (en) |
Families Citing this family (310)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2001013255A2 (en) * | 1999-08-13 | 2001-02-22 | Pixo, Inc. | Displaying and traversing links in character array |
US8645137B2 (en) | 2000-03-16 | 2014-02-04 | Apple Inc. | Fast, language-independent method for user authentication by voice |
ITFI20010199A1 (en) * | 2001-10-22 | 2003-04-22 | Riccardo Vieri | SYSTEM AND METHOD TO TRANSFORM TEXTUAL COMMUNICATIONS INTO VOICE AND SEND THEM WITH AN INTERNET CONNECTION TO ANY TELEPHONE SYSTEM |
US7398209B2 (en) | 2002-06-03 | 2008-07-08 | Voicebox Technologies, Inc. | Systems and methods for responding to natural language speech utterance |
US7693720B2 (en) | 2002-07-15 | 2010-04-06 | Voicebox Technologies, Inc. | Mobile systems and methods for responding to natural language speech utterance |
US7884804B2 (en) | 2003-04-30 | 2011-02-08 | Microsoft Corporation | Keyboard with input-sensitive display device |
US7669134B1 (en) | 2003-05-02 | 2010-02-23 | Apple Inc. | Method and apparatus for displaying information during an instant messaging session |
JP4012143B2 (en) * | 2003-12-16 | 2007-11-21 | キヤノン株式会社 | Information processing apparatus and data input method |
TWI253298B (en) * | 2004-02-09 | 2006-04-11 | Delta Electronics Inc | Video device with voice-assisted system |
US8942985B2 (en) | 2004-11-16 | 2015-01-27 | Microsoft Corporation | Centralized method and system for clarifying voice commands |
US8775459B2 (en) * | 2005-01-07 | 2014-07-08 | International Business Machines Corporation | Method and apparatus for robust input interpretation by conversation systems |
GB0504568D0 (en) * | 2005-03-04 | 2005-04-13 | Vida Software S L | User interfaces for electronic devices |
WO2006128248A1 (en) * | 2005-06-02 | 2006-12-07 | National Ict Australia Limited | Multimodal computer navigation |
US20060293889A1 (en) * | 2005-06-27 | 2006-12-28 | Nokia Corporation | Error correction for speech recognition systems |
US20070016420A1 (en) * | 2005-07-07 | 2007-01-18 | International Business Machines Corporation | Dictionary lookup for mobile devices using spelling recognition |
US20070016421A1 (en) * | 2005-07-12 | 2007-01-18 | Nokia Corporation | Correcting a pronunciation of a synthetically generated speech object |
US7640160B2 (en) | 2005-08-05 | 2009-12-29 | Voicebox Technologies, Inc. | Systems and methods for responding to natural language speech utterance |
US7949529B2 (en) * | 2005-08-29 | 2011-05-24 | Voicebox Technologies, Inc. | Mobile systems and methods of supporting natural language human-machine interactions |
US8677377B2 (en) | 2005-09-08 | 2014-03-18 | Apple Inc. | Method and apparatus for building an intelligent automated assistant |
US7633076B2 (en) | 2005-09-30 | 2009-12-15 | Apple Inc. | Automated response to and sensing of user activity in portable devices |
JP4878471B2 (en) * | 2005-11-02 | 2012-02-15 | キヤノン株式会社 | Information processing apparatus and control method thereof |
US7792971B2 (en) * | 2005-12-08 | 2010-09-07 | International Business Machines Corporation | Visual channel refresh rate control for composite services delivery |
US20070136421A1 (en) * | 2005-12-08 | 2007-06-14 | International Business Machines Corporation | Synchronized view state for composite services delivery |
US11093898B2 (en) | 2005-12-08 | 2021-08-17 | International Business Machines Corporation | Solution for adding context to a text exchange modality during interactions with a composite services application |
US20070133769A1 (en) * | 2005-12-08 | 2007-06-14 | International Business Machines Corporation | Voice navigation of a visual view for a session in a composite services enablement environment |
US20070133773A1 (en) * | 2005-12-08 | 2007-06-14 | International Business Machines Corporation | Composite services delivery |
US7827288B2 (en) * | 2005-12-08 | 2010-11-02 | International Business Machines Corporation | Model autocompletion for composite services synchronization |
US20070133512A1 (en) * | 2005-12-08 | 2007-06-14 | International Business Machines Corporation | Composite services enablement of visual navigation into a call center |
US8189563B2 (en) | 2005-12-08 | 2012-05-29 | International Business Machines Corporation | View coordination for callers in a composite services enablement environment |
US20070133509A1 (en) * | 2005-12-08 | 2007-06-14 | International Business Machines Corporation | Initiating voice access to a session from a visual access channel to the session in a composite services delivery system |
US20070132834A1 (en) * | 2005-12-08 | 2007-06-14 | International Business Machines Corporation | Speech disambiguation in a composite services enablement environment |
US8259923B2 (en) | 2007-02-28 | 2012-09-04 | International Business Machines Corporation | Implementing a contact center using open standards and non-proprietary components |
US20070147355A1 (en) * | 2005-12-08 | 2007-06-28 | International Business Machines Corporation | Composite services generation tool |
US7890635B2 (en) * | 2005-12-08 | 2011-02-15 | International Business Machines Corporation | Selective view synchronization for composite services delivery |
US7818432B2 (en) * | 2005-12-08 | 2010-10-19 | International Business Machines Corporation | Seamless reflection of model updates in a visual page for a visual channel in a composite services delivery system |
US7877486B2 (en) * | 2005-12-08 | 2011-01-25 | International Business Machines Corporation | Auto-establishment of a voice channel of access to a session for a composite service from a visual channel of access to the session for the composite service |
US10332071B2 (en) * | 2005-12-08 | 2019-06-25 | International Business Machines Corporation | Solution for adding context to a text exchange modality during interactions with a composite services application |
US8005934B2 (en) * | 2005-12-08 | 2011-08-23 | International Business Machines Corporation | Channel presence in a composite services enablement environment |
US7809838B2 (en) * | 2005-12-08 | 2010-10-05 | International Business Machines Corporation | Managing concurrent data updates in a composite services delivery system |
US20070136449A1 (en) * | 2005-12-08 | 2007-06-14 | International Business Machines Corporation | Update notification for peer views in a composite services delivery environment |
US7925975B2 (en) | 2006-03-10 | 2011-04-12 | Microsoft Corporation | Searching for commands to execute in applications |
US9318108B2 (en) | 2010-01-18 | 2016-04-19 | Apple Inc. | Intelligent automated assistant |
US8073681B2 (en) | 2006-10-16 | 2011-12-06 | Voicebox Technologies, Inc. | System and method for a cooperative conversational voice user interface |
WO2008064137A2 (en) * | 2006-11-17 | 2008-05-29 | Rao Ashwin P | Predictive speech-to-text input |
US7904298B2 (en) * | 2006-11-17 | 2011-03-08 | Rao Ashwin P | Predictive speech-to-text input |
US9830912B2 (en) | 2006-11-30 | 2017-11-28 | Ashwin P Rao | Speak and touch auto correction interface |
US8594305B2 (en) * | 2006-12-22 | 2013-11-26 | International Business Machines Corporation | Enhancing contact centers with dialog contracts |
US8195448B2 (en) * | 2006-12-28 | 2012-06-05 | John Paisley Dargan | Method and apparatus for predicting text |
US7818176B2 (en) | 2007-02-06 | 2010-10-19 | Voicebox Technologies, Inc. | System and method for selecting and presenting advertisements based on natural language processing of voice-based input |
US7912828B2 (en) * | 2007-02-23 | 2011-03-22 | Apple Inc. | Pattern searching methods and apparatuses |
US9247056B2 (en) * | 2007-02-28 | 2016-01-26 | International Business Machines Corporation | Identifying contact center agents based upon biometric characteristics of an agent's speech |
US9055150B2 (en) | 2007-02-28 | 2015-06-09 | International Business Machines Corporation | Skills based routing in a standards based contact center using a presence server and expertise specific watchers |
US8219406B2 (en) * | 2007-03-15 | 2012-07-10 | Microsoft Corporation | Speech-centric multimodal user interface design in mobile technology |
US8977255B2 (en) | 2007-04-03 | 2015-03-10 | Apple Inc. | Method and system for operating a multi-function portable electronic device using voice-activation |
US9794348B2 (en) | 2007-06-04 | 2017-10-17 | Todd R. Smith | Using voice commands from a mobile device to remotely access and control a computer |
ITFI20070177A1 (en) | 2007-07-26 | 2009-01-27 | Riccardo Vieri | SYSTEM FOR THE CREATION AND SETTING OF AN ADVERTISING CAMPAIGN DERIVING FROM THE INSERTION OF ADVERTISING MESSAGES WITHIN AN EXCHANGE OF MESSAGES AND METHOD FOR ITS FUNCTIONING. |
US9053089B2 (en) | 2007-10-02 | 2015-06-09 | Apple Inc. | Part-of-speech tagging using latent analogy |
US8165886B1 (en) | 2007-10-04 | 2012-04-24 | Great Northern Research LLC | Speech interface system and method for control and interaction with applications on a computing system |
US8595642B1 (en) | 2007-10-04 | 2013-11-26 | Great Northern Research, LLC | Multiple shell multi faceted graphical user interface |
US8364694B2 (en) | 2007-10-26 | 2013-01-29 | Apple Inc. | Search assistant for digital media assets |
US8620662B2 (en) | 2007-11-20 | 2013-12-31 | Apple Inc. | Context-aware unit selection |
US8140335B2 (en) | 2007-12-11 | 2012-03-20 | Voicebox Technologies, Inc. | System and method for providing a natural language voice user interface in an integrated voice navigation services environment |
US10002189B2 (en) | 2007-12-20 | 2018-06-19 | Apple Inc. | Method and apparatus for searching using an active ontology |
US10133372B2 (en) * | 2007-12-20 | 2018-11-20 | Nokia Technologies Oy | User device having sequential multimodal output user interface |
US9330720B2 (en) | 2008-01-03 | 2016-05-03 | Apple Inc. | Methods and apparatus for altering audio output signals |
US8327272B2 (en) | 2008-01-06 | 2012-12-04 | Apple Inc. | Portable multifunction device, method, and graphical user interface for viewing and managing electronic calendars |
US8065143B2 (en) | 2008-02-22 | 2011-11-22 | Apple Inc. | Providing text input using speech data and non-speech data |
US8289283B2 (en) * | 2008-03-04 | 2012-10-16 | Apple Inc. | Language input interface on a device |
US8224656B2 (en) * | 2008-03-14 | 2012-07-17 | Microsoft Corporation | Speech recognition disambiguation on mobile devices |
US8996376B2 (en) | 2008-04-05 | 2015-03-31 | Apple Inc. | Intelligent text-to-speech conversion |
US10496753B2 (en) | 2010-01-18 | 2019-12-03 | Apple Inc. | Automatically adapting user interfaces for hands-free interaction |
US9305548B2 (en) | 2008-05-27 | 2016-04-05 | Voicebox Technologies Corporation | System and method for an integrated, multi-modal, multi-device natural language voice services environment |
US8464150B2 (en) | 2008-06-07 | 2013-06-11 | Apple Inc. | Automatic language identification for dynamic text processing |
US20100030549A1 (en) | 2008-07-31 | 2010-02-04 | Lee Michael M | Mobile device having human language translation capability with positional feedback |
US8768702B2 (en) | 2008-09-05 | 2014-07-01 | Apple Inc. | Multi-tiered voice feedback in an electronic device |
US8898568B2 (en) | 2008-09-09 | 2014-11-25 | Apple Inc. | Audio user interface |
US8352260B2 (en) * | 2008-09-10 | 2013-01-08 | Jun Hyung Sung | Multimodal unification of articulation for device interfacing |
US8583418B2 (en) | 2008-09-29 | 2013-11-12 | Apple Inc. | Systems and methods of detecting language and natural language strings for text to speech synthesis |
US8352272B2 (en) * | 2008-09-29 | 2013-01-08 | Apple Inc. | Systems and methods for text to speech synthesis |
US8396714B2 (en) * | 2008-09-29 | 2013-03-12 | Apple Inc. | Systems and methods for concatenation of words in text to speech synthesis |
US8352268B2 (en) * | 2008-09-29 | 2013-01-08 | Apple Inc. | Systems and methods for selective rate of speech and speech preferences for text to speech synthesis |
US8355919B2 (en) * | 2008-09-29 | 2013-01-15 | Apple Inc. | Systems and methods for text normalization for text to speech synthesis |
US20100082328A1 (en) * | 2008-09-29 | 2010-04-01 | Apple Inc. | Systems and methods for speech preprocessing in text to speech synthesis |
US8712776B2 (en) | 2008-09-29 | 2014-04-29 | Apple Inc. | Systems and methods for selective text to speech synthesis |
US8676904B2 (en) | 2008-10-02 | 2014-03-18 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
US9922640B2 (en) | 2008-10-17 | 2018-03-20 | Ashwin P Rao | System and method for multimodal utterance detection |
US8145484B2 (en) * | 2008-11-11 | 2012-03-27 | Microsoft Corporation | Speech processing with predictive language modeling |
WO2010067118A1 (en) | 2008-12-11 | 2010-06-17 | Novauris Technologies Limited | Speech recognition involving a mobile device |
US8862252B2 (en) | 2009-01-30 | 2014-10-14 | Apple Inc. | Audio user interface for displayless electronic device |
US8326637B2 (en) | 2009-02-20 | 2012-12-04 | Voicebox Technologies, Inc. | System and method for processing multi-modal device interactions in a natural language voice services environment |
US8380507B2 (en) * | 2009-03-09 | 2013-02-19 | Apple Inc. | Systems and methods for determining the language to use for speech generated by a text to speech engine |
US20120311585A1 (en) | 2011-06-03 | 2012-12-06 | Apple Inc. | Organizing task items that represent tasks to perform |
US10241752B2 (en) | 2011-09-30 | 2019-03-26 | Apple Inc. | Interface for a virtual digital assistant |
US10241644B2 (en) | 2011-06-03 | 2019-03-26 | Apple Inc. | Actionable reminder entries |
US10540976B2 (en) | 2009-06-05 | 2020-01-21 | Apple Inc. | Contextual voice commands |
US9858925B2 (en) | 2009-06-05 | 2018-01-02 | Apple Inc. | Using context information to facilitate processing of commands in a virtual assistant |
US9431006B2 (en) | 2009-07-02 | 2016-08-30 | Apple Inc. | Methods and apparatuses for automatic speech recognition |
US20110010179A1 (en) * | 2009-07-13 | 2011-01-13 | Naik Devang K | Voice synthesis and processing |
US9171541B2 (en) | 2009-11-10 | 2015-10-27 | Voicebox Technologies Corporation | System and method for hybrid processing in a natural language voice services environment |
US8682649B2 (en) | 2009-11-12 | 2014-03-25 | Apple Inc. | Sentiment prediction from textual data |
DE102009059792A1 (en) * | 2009-12-21 | 2011-06-22 | Continental Automotive GmbH, 30165 | Method and device for operating technical equipment, in particular a motor vehicle |
US11416214B2 (en) | 2009-12-23 | 2022-08-16 | Google Llc | Multi-modal input on an electronic device |
EP3091535B1 (en) | 2009-12-23 | 2023-10-11 | Google LLC | Multi-modal input on an electronic device |
US8600743B2 (en) | 2010-01-06 | 2013-12-03 | Apple Inc. | Noise profile determination for voice-related feature |
US8311838B2 (en) | 2010-01-13 | 2012-11-13 | Apple Inc. | Devices and methods for identifying a prompt corresponding to a voice input in a sequence of prompts |
US8381107B2 (en) | 2010-01-13 | 2013-02-19 | Apple Inc. | Adaptive audio feedback system and method |
US10553209B2 (en) | 2010-01-18 | 2020-02-04 | Apple Inc. | Systems and methods for hands-free notification summaries |
US10276170B2 (en) | 2010-01-18 | 2019-04-30 | Apple Inc. | Intelligent automated assistant |
US10705794B2 (en) | 2010-01-18 | 2020-07-07 | Apple Inc. | Automatically adapting user interfaces for hands-free interaction |
US10679605B2 (en) | 2010-01-18 | 2020-06-09 | Apple Inc. | Hands-free list-reading by intelligent automated assistant |
WO2011089450A2 (en) | 2010-01-25 | 2011-07-28 | Andrew Peter Nelson Jerram | Apparatuses, methods and systems for a digital conversation management platform |
US8682667B2 (en) | 2010-02-25 | 2014-03-25 | Apple Inc. | User profiling for selecting user specific voice input processing information |
US8639516B2 (en) | 2010-06-04 | 2014-01-28 | Apple Inc. | User-specific noise suppression for voice quality improvements |
US8713021B2 (en) | 2010-07-07 | 2014-04-29 | Apple Inc. | Unsupervised document clustering using latent semantic density analysis |
US9104670B2 (en) | 2010-07-21 | 2015-08-11 | Apple Inc. | Customized search or acquisition of digital media assets |
US8473289B2 (en) | 2010-08-06 | 2013-06-25 | Google Inc. | Disambiguating input based on context |
US8719006B2 (en) | 2010-08-27 | 2014-05-06 | Apple Inc. | Combined statistical and rule-based part-of-speech tagging for text-to-speech synthesis |
US8719014B2 (en) | 2010-09-27 | 2014-05-06 | Apple Inc. | Electronic device with text error correction based on voice recognition data |
US10762293B2 (en) | 2010-12-22 | 2020-09-01 | Apple Inc. | Using parts-of-speech tagging and named entity recognition for spelling correction |
US10515147B2 (en) | 2010-12-22 | 2019-12-24 | Apple Inc. | Using statistical language models for contextual lookup |
US8352245B1 (en) | 2010-12-30 | 2013-01-08 | Google Inc. | Adjusting language models |
US8296142B2 (en) | 2011-01-21 | 2012-10-23 | Google Inc. | Speech recognition using dock context |
US8781836B2 (en) | 2011-02-22 | 2014-07-15 | Apple Inc. | Hearing assistance system for providing consistent human speech |
US9262612B2 (en) | 2011-03-21 | 2016-02-16 | Apple Inc. | Device access using voice authentication |
US20120310642A1 (en) | 2011-06-03 | 2012-12-06 | Apple Inc. | Automatically creating a mapping between text data and audio data |
US10057736B2 (en) | 2011-06-03 | 2018-08-21 | Apple Inc. | Active transport based notifications |
US8812294B2 (en) | 2011-06-21 | 2014-08-19 | Apple Inc. | Translating phrases from one language into another using an order-based set of declarative rules |
US8706472B2 (en) | 2011-08-11 | 2014-04-22 | Apple Inc. | Method for disambiguating multiple readings in language conversion |
US8994660B2 (en) | 2011-08-29 | 2015-03-31 | Apple Inc. | Text correction processing |
US8762156B2 (en) | 2011-09-28 | 2014-06-24 | Apple Inc. | Speech recognition repair using contextual information |
US10134385B2 (en) | 2012-03-02 | 2018-11-20 | Apple Inc. | Systems and methods for name pronunciation |
US9483461B2 (en) | 2012-03-06 | 2016-11-01 | Apple Inc. | Handling speech synthesis of content for multiple languages |
US9280610B2 (en) | 2012-05-14 | 2016-03-08 | Apple Inc. | Crowd sourcing information to fulfill user requests |
US10417037B2 (en) | 2012-05-15 | 2019-09-17 | Apple Inc. | Systems and methods for integrating third party services with a digital assistant |
US8775442B2 (en) | 2012-05-15 | 2014-07-08 | Apple Inc. | Semantic search using a single-source semantic model |
WO2013170383A1 (en) | 2012-05-16 | 2013-11-21 | Xtreme Interactions Inc. | System, device and method for processing interlaced multimodal user input |
WO2013185109A2 (en) | 2012-06-08 | 2013-12-12 | Apple Inc. | Systems and methods for recognizing textual identifiers within a plurality of words |
US9721563B2 (en) | 2012-06-08 | 2017-08-01 | Apple Inc. | Name recognition system |
US9495129B2 (en) | 2012-06-29 | 2016-11-15 | Apple Inc. | Device, method, and user interface for voice-activated navigation and browsing of a document |
US9576574B2 (en) | 2012-09-10 | 2017-02-21 | Apple Inc. | Context-sensitive handling of interruptions by intelligent digital assistant |
US9547647B2 (en) | 2012-09-19 | 2017-01-17 | Apple Inc. | Voice-based media searching |
US8935167B2 (en) | 2012-09-25 | 2015-01-13 | Apple Inc. | Exemplar-based latent perceptual modeling for automatic speech recognition |
KR102516577B1 (en) | 2013-02-07 | 2023-04-03 | 애플 인크. | Voice trigger for a digital assistant |
US10572476B2 (en) | 2013-03-14 | 2020-02-25 | Apple Inc. | Refining a search based on schedule items |
US9733821B2 (en) | 2013-03-14 | 2017-08-15 | Apple Inc. | Voice control to diagnose inadvertent activation of accessibility features |
US10642574B2 (en) | 2013-03-14 | 2020-05-05 | Apple Inc. | Device, method, and graphical user interface for outputting captions |
US9368114B2 (en) | 2013-03-14 | 2016-06-14 | Apple Inc. | Context-sensitive handling of interruptions |
US10652394B2 (en) | 2013-03-14 | 2020-05-12 | Apple Inc. | System and method for processing voicemail |
US9977779B2 (en) | 2013-03-14 | 2018-05-22 | Apple Inc. | Automatic supplementation of word correction dictionaries |
WO2014144949A2 (en) | 2013-03-15 | 2014-09-18 | Apple Inc. | Training an at least partial voice command system |
CN112230878A (en) | 2013-03-15 | 2021-01-15 | 苹果公司 | Context-sensitive handling of interrupts |
WO2014144579A1 (en) | 2013-03-15 | 2014-09-18 | Apple Inc. | System and method for updating an adaptive speech recognition model |
US10748529B1 (en) | 2013-03-15 | 2020-08-18 | Apple Inc. | Voice activated device for use with a voice-based digital assistant |
US11151899B2 (en) | 2013-03-15 | 2021-10-19 | Apple Inc. | User training by intelligent digital assistant |
WO2014197336A1 (en) | 2013-06-07 | 2014-12-11 | Apple Inc. | System and method for detecting errors in interactions with a voice-based digital assistant |
WO2014197334A2 (en) | 2013-06-07 | 2014-12-11 | Apple Inc. | System and method for user-specified pronunciation of words for speech synthesis and recognition |
US9582608B2 (en) | 2013-06-07 | 2017-02-28 | Apple Inc. | Unified ranking with entropy-weighted information for phrase-based semantic auto-completion |
WO2014197335A1 (en) | 2013-06-08 | 2014-12-11 | Apple Inc. | Interpreting and acting upon commands that involve sharing information with remote devices |
EP3008641A1 (en) | 2013-06-09 | 2016-04-20 | Apple Inc. | Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant |
US10176167B2 (en) | 2013-06-09 | 2019-01-08 | Apple Inc. | System and method for inferring user intent from speech inputs |
CN105265005B (en) | 2013-06-13 | 2019-09-17 | 苹果公司 | System and method for the urgent call initiated by voice command |
WO2015020942A1 (en) | 2013-08-06 | 2015-02-12 | Apple Inc. | Auto-activating smart responses based on activities from remote devices |
GB2518002B (en) * | 2013-09-10 | 2017-03-29 | Jaguar Land Rover Ltd | Vehicle interface system |
US10296160B2 (en) | 2013-12-06 | 2019-05-21 | Apple Inc. | Method for extracting salient dialog usage from live data |
US9842592B2 (en) | 2014-02-12 | 2017-12-12 | Google Inc. | Language models using non-linguistic context |
US9412365B2 (en) | 2014-03-24 | 2016-08-09 | Google Inc. | Enhanced maximum entropy models |
US9620105B2 (en) | 2014-05-15 | 2017-04-11 | Apple Inc. | Analyzing audio input for efficient speech and music recognition |
US10592095B2 (en) | 2014-05-23 | 2020-03-17 | Apple Inc. | Instantaneous speaking of content on touch devices |
US9502031B2 (en) | 2014-05-27 | 2016-11-22 | Apple Inc. | Method for supporting dynamic grammars in WFST-based ASR |
US10289433B2 (en) | 2014-05-30 | 2019-05-14 | Apple Inc. | Domain specific language for encoding assistant dialog |
US9760559B2 (en) | 2014-05-30 | 2017-09-12 | Apple Inc. | Predictive text input |
US9430463B2 (en) | 2014-05-30 | 2016-08-30 | Apple Inc. | Exemplar-based natural language processing |
US9734193B2 (en) | 2014-05-30 | 2017-08-15 | Apple Inc. | Determining domain salience ranking from ambiguous words in natural speech |
US9785630B2 (en) | 2014-05-30 | 2017-10-10 | Apple Inc. | Text prediction using combined word N-gram and unigram language models |
US9715875B2 (en) | 2014-05-30 | 2017-07-25 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
EP3149728B1 (en) | 2014-05-30 | 2019-01-16 | Apple Inc. | Multi-command single utterance input method |
US10170123B2 (en) | 2014-05-30 | 2019-01-01 | Apple Inc. | Intelligent assistant for home automation |
US9633004B2 (en) | 2014-05-30 | 2017-04-25 | Apple Inc. | Better resolution when referencing to concepts |
US9842101B2 (en) | 2014-05-30 | 2017-12-12 | Apple Inc. | Predictive conversion of language input |
US10078631B2 (en) | 2014-05-30 | 2018-09-18 | Apple Inc. | Entropy-guided text prediction using combined word and character n-gram language models |
US10659851B2 (en) | 2014-06-30 | 2020-05-19 | Apple Inc. | Real-time digital assistant knowledge updates |
US9338493B2 (en) | 2014-06-30 | 2016-05-10 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US10446141B2 (en) | 2014-08-28 | 2019-10-15 | Apple Inc. | Automatic speech recognition based on user feedback |
US9818400B2 (en) | 2014-09-11 | 2017-11-14 | Apple Inc. | Method and apparatus for discovering trending terms in speech requests |
US10789041B2 (en) | 2014-09-12 | 2020-09-29 | Apple Inc. | Dynamic thresholds for always listening speech trigger |
EP3195145A4 (en) | 2014-09-16 | 2018-01-24 | VoiceBox Technologies Corporation | Voice commerce |
US9898459B2 (en) | 2014-09-16 | 2018-02-20 | Voicebox Technologies Corporation | Integration of domain information into state transitions of a finite state transducer for natural language processing |
US9668121B2 (en) | 2014-09-30 | 2017-05-30 | Apple Inc. | Social reminders |
US9886432B2 (en) | 2014-09-30 | 2018-02-06 | Apple Inc. | Parsimonious handling of word inflection via categorical stem + suffix N-gram language models |
US10127911B2 (en) | 2014-09-30 | 2018-11-13 | Apple Inc. | Speaker identification and unsupervised speaker adaptation techniques |
US9646609B2 (en) | 2014-09-30 | 2017-05-09 | Apple Inc. | Caching apparatus for serving phonetic pronunciations |
US10074360B2 (en) | 2014-09-30 | 2018-09-11 | Apple Inc. | Providing an indication of the suitability of speech recognition |
CN107003999B (en) | 2014-10-15 | 2020-08-21 | 声钰科技 | System and method for subsequent response to a user's prior natural language input |
US10431214B2 (en) | 2014-11-26 | 2019-10-01 | Voicebox Technologies Corporation | System and method of determining a domain and/or an action related to a natural language input |
US10614799B2 (en) | 2014-11-26 | 2020-04-07 | Voicebox Technologies Corporation | System and method of providing intent predictions for an utterance prior to a system detection of an end of the utterance |
US10552013B2 (en) | 2014-12-02 | 2020-02-04 | Apple Inc. | Data detection |
US9711141B2 (en) | 2014-12-09 | 2017-07-18 | Apple Inc. | Disambiguating heteronyms in speech synthesis |
US9633661B1 (en) | 2015-02-02 | 2017-04-25 | Amazon Technologies, Inc. | Speech-responsive portable speaker |
US9865280B2 (en) | 2015-03-06 | 2018-01-09 | Apple Inc. | Structured dictation using intelligent automated assistants |
US10152299B2 (en) | 2015-03-06 | 2018-12-11 | Apple Inc. | Reducing response latency of intelligent automated assistants |
US10567477B2 (en) | 2015-03-08 | 2020-02-18 | Apple Inc. | Virtual assistant continuity |
US9886953B2 (en) | 2015-03-08 | 2018-02-06 | Apple Inc. | Virtual assistant activation |
US9721566B2 (en) | 2015-03-08 | 2017-08-01 | Apple Inc. | Competing devices responding to voice triggers |
US9899019B2 (en) | 2015-03-18 | 2018-02-20 | Apple Inc. | Systems and methods for structured stem and suffix language models |
US10134394B2 (en) | 2015-03-20 | 2018-11-20 | Google Llc | Speech recognition using log-linear model |
US9842105B2 (en) | 2015-04-16 | 2017-12-12 | Apple Inc. | Parsimonious continuous-space phrase representations for natural language processing |
US10460227B2 (en) | 2015-05-15 | 2019-10-29 | Apple Inc. | Virtual assistant in a communication session |
US10083688B2 (en) | 2015-05-27 | 2018-09-25 | Apple Inc. | Device voice control for selecting a displayed affordance |
US10127220B2 (en) | 2015-06-04 | 2018-11-13 | Apple Inc. | Language identification from short strings |
US9578173B2 (en) | 2015-06-05 | 2017-02-21 | Apple Inc. | Virtual assistant aided communication with 3rd party service in a communication session |
US10101822B2 (en) | 2015-06-05 | 2018-10-16 | Apple Inc. | Language input correction |
US10255907B2 (en) | 2015-06-07 | 2019-04-09 | Apple Inc. | Automatic accent detection using acoustic models |
US10186254B2 (en) | 2015-06-07 | 2019-01-22 | Apple Inc. | Context-based endpoint detection |
US11025565B2 (en) | 2015-06-07 | 2021-06-01 | Apple Inc. | Personalized prediction of responses for instant messaging |
US20160378747A1 (en) | 2015-06-29 | 2016-12-29 | Apple Inc. | Virtual assistant for media playback |
US10747498B2 (en) | 2015-09-08 | 2020-08-18 | Apple Inc. | Zero latency digital assistant |
US10671428B2 (en) | 2015-09-08 | 2020-06-02 | Apple Inc. | Distributed personal assistant |
US9697820B2 (en) | 2015-09-24 | 2017-07-04 | Apple Inc. | Unit-selection text-to-speech synthesis using concatenation-sensitive neural networks |
US10366158B2 (en) | 2015-09-29 | 2019-07-30 | Apple Inc. | Efficient word encoding for recurrent neural network language models |
US11010550B2 (en) | 2015-09-29 | 2021-05-18 | Apple Inc. | Unified language modeling framework for word prediction, auto-completion and auto-correction |
US11587559B2 (en) | 2015-09-30 | 2023-02-21 | Apple Inc. | Intelligent device identification |
US10691473B2 (en) | 2015-11-06 | 2020-06-23 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US10049668B2 (en) | 2015-12-02 | 2018-08-14 | Apple Inc. | Applying neural network language models to weighted finite state transducers for automatic speech recognition |
US10223066B2 (en) | 2015-12-23 | 2019-03-05 | Apple Inc. | Proactive assistance based on dialog communication between devices |
US10446143B2 (en) | 2016-03-14 | 2019-10-15 | Apple Inc. | Identification of voice inputs providing credentials |
US9978367B2 (en) | 2016-03-16 | 2018-05-22 | Google Llc | Determining dialog states for language models |
US9934775B2 (en) | 2016-05-26 | 2018-04-03 | Apple Inc. | Unit-selection text-to-speech synthesis based on predicted concatenation parameters |
US9972304B2 (en) | 2016-06-03 | 2018-05-15 | Apple Inc. | Privacy preserving distributed evaluation framework for embedded personalized systems |
US10249300B2 (en) | 2016-06-06 | 2019-04-02 | Apple Inc. | Intelligent list reading |
US11227589B2 (en) | 2016-06-06 | 2022-01-18 | Apple Inc. | Intelligent list reading |
US10049663B2 (en) | 2016-06-08 | 2018-08-14 | Apple, Inc. | Intelligent automated assistant for media exploration |
DK179309B1 (en) | 2016-06-09 | 2018-04-23 | Apple Inc | Intelligent automated assistant in a home environment |
US10586535B2 (en) | 2016-06-10 | 2020-03-10 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
US10192552B2 (en) | 2016-06-10 | 2019-01-29 | Apple Inc. | Digital assistant providing whispered speech |
US10067938B2 (en) | 2016-06-10 | 2018-09-04 | Apple Inc. | Multilingual word prediction |
US10509862B2 (en) | 2016-06-10 | 2019-12-17 | Apple Inc. | Dynamic phrase expansion of language input |
US10490187B2 (en) | 2016-06-10 | 2019-11-26 | Apple Inc. | Digital assistant providing automated status report |
DK179415B1 (en) | 2016-06-11 | 2018-06-14 | Apple Inc | Intelligent device arbitration and control |
DK179343B1 (en) | 2016-06-11 | 2018-05-14 | Apple Inc | Intelligent task discovery |
DK201670540A1 (en) | 2016-06-11 | 2018-01-08 | Apple Inc | Application integration with a digital assistant |
DK179049B1 (en) | 2016-06-11 | 2017-09-18 | Apple Inc | Data driven natural language event detection and classification |
US10331784B2 (en) | 2016-07-29 | 2019-06-25 | Voicebox Technologies Corporation | System and method of disambiguating natural language processing requests |
US10832664B2 (en) | 2016-08-19 | 2020-11-10 | Google Llc | Automated speech recognition using language models that selectively use domain-specific model components |
US10474753B2 (en) | 2016-09-07 | 2019-11-12 | Apple Inc. | Language identification using recurrent neural networks |
US10043516B2 (en) | 2016-09-23 | 2018-08-07 | Apple Inc. | Intelligent automated assistant |
US11281993B2 (en) | 2016-12-05 | 2022-03-22 | Apple Inc. | Model and ensemble compression for metric learning |
US10593346B2 (en) | 2016-12-22 | 2020-03-17 | Apple Inc. | Rank-reduced token representation for automatic speech recognition |
US11204787B2 (en) | 2017-01-09 | 2021-12-21 | Apple Inc. | Application integration with a digital assistant |
US10311860B2 (en) | 2017-02-14 | 2019-06-04 | Google Llc | Language model biasing system |
US10417266B2 (en) | 2017-05-09 | 2019-09-17 | Apple Inc. | Context-aware ranking of intelligent response suggestions |
DK201770383A1 (en) | 2017-05-09 | 2018-12-14 | Apple Inc. | User interface for correcting recognition errors |
DK201770439A1 (en) | 2017-05-11 | 2018-12-13 | Apple Inc. | Offline personal assistant |
US10726832B2 (en) | 2017-05-11 | 2020-07-28 | Apple Inc. | Maintaining privacy of personal information |
US10395654B2 (en) | 2017-05-11 | 2019-08-27 | Apple Inc. | Text normalization based on a data-driven learning network |
DK201770429A1 (en) | 2017-05-12 | 2018-12-14 | Apple Inc. | Low-latency intelligent automated assistant |
DK179745B1 (en) | 2017-05-12 | 2019-05-01 | Apple Inc. | SYNCHRONIZATION AND TASK DELEGATION OF A DIGITAL ASSISTANT |
US11301477B2 (en) | 2017-05-12 | 2022-04-12 | Apple Inc. | Feedback analysis of a digital assistant |
DK179496B1 (en) | 2017-05-12 | 2019-01-15 | Apple Inc. | USER-SPECIFIC Acoustic Models |
DK201770431A1 (en) | 2017-05-15 | 2018-12-20 | Apple Inc. | Optimizing dialogue policy decisions for digital assistants using implicit feedback |
DK201770432A1 (en) | 2017-05-15 | 2018-12-21 | Apple Inc. | Hierarchical belief states for digital assistants |
US20180336892A1 (en) | 2017-05-16 | 2018-11-22 | Apple Inc. | Detecting a trigger of a digital assistant |
US10311144B2 (en) | 2017-05-16 | 2019-06-04 | Apple Inc. | Emoji word sense disambiguation |
DK179560B1 (en) | 2017-05-16 | 2019-02-18 | Apple Inc. | Far-field extension for digital assistant services |
US20180336275A1 (en) | 2017-05-16 | 2018-11-22 | Apple Inc. | Intelligent automated assistant for media exploration |
US10403278B2 (en) | 2017-05-16 | 2019-09-03 | Apple Inc. | Methods and systems for phonetic matching in digital assistant services |
US10657328B2 (en) | 2017-06-02 | 2020-05-19 | Apple Inc. | Multi-task recurrent neural network architecture for efficient morphology handling in neural language modeling |
US10305765B2 (en) | 2017-07-21 | 2019-05-28 | International Business Machines Corporation | Adaptive selection of message data properties for improving communication throughput and reliability |
US10902847B2 (en) * | 2017-09-12 | 2021-01-26 | Spotify Ab | System and method for assessing and correcting potential underserved content in natural language understanding applications |
US10445429B2 (en) | 2017-09-21 | 2019-10-15 | Apple Inc. | Natural language understanding using vocabularies with compressed serialized tries |
US10755051B2 (en) | 2017-09-29 | 2020-08-25 | Apple Inc. | Rule-based natural language processing |
US10636424B2 (en) | 2017-11-30 | 2020-04-28 | Apple Inc. | Multi-turn canned dialog |
US10733982B2 (en) | 2018-01-08 | 2020-08-04 | Apple Inc. | Multi-directional dialog |
US10733375B2 (en) | 2018-01-31 | 2020-08-04 | Apple Inc. | Knowledge-based framework for improving natural language understanding |
US10789959B2 (en) | 2018-03-02 | 2020-09-29 | Apple Inc. | Training speaker recognition models for digital assistants |
US10592604B2 (en) | 2018-03-12 | 2020-03-17 | Apple Inc. | Inverse text normalization for automatic speech recognition |
US10818288B2 (en) | 2018-03-26 | 2020-10-27 | Apple Inc. | Natural assistant interaction |
US10909331B2 (en) | 2018-03-30 | 2021-02-02 | Apple Inc. | Implicit identification of translation payload with neural machine translation |
US10928918B2 (en) | 2018-05-07 | 2021-02-23 | Apple Inc. | Raise to speak |
US11200893B2 (en) | 2018-05-07 | 2021-12-14 | Google Llc | Multi-modal interaction between users, automated assistants, and other computing services |
US11145294B2 (en) | 2018-05-07 | 2021-10-12 | Apple Inc. | Intelligent automated assistant for delivering content from user experiences |
JP7203865B2 (en) | 2018-05-07 | 2023-01-13 | グーグル エルエルシー | Multimodal interaction between users, automated assistants, and other computing services |
US10984780B2 (en) | 2018-05-21 | 2021-04-20 | Apple Inc. | Global semantic word embeddings using bi-directional recurrent neural networks |
DK179822B1 (en) | 2018-06-01 | 2019-07-12 | Apple Inc. | Voice interaction at a primary device to access call functionality of a companion device |
DK201870355A1 (en) | 2018-06-01 | 2019-12-16 | Apple Inc. | Virtual assistant operation in multi-device environments |
US10892996B2 (en) | 2018-06-01 | 2021-01-12 | Apple Inc. | Variable latency device coordination |
US11386266B2 (en) | 2018-06-01 | 2022-07-12 | Apple Inc. | Text correction |
DK180639B1 (en) | 2018-06-01 | 2021-11-04 | Apple Inc | DISABILITY OF ATTENTION-ATTENTIVE VIRTUAL ASSISTANT |
US10496705B1 (en) | 2018-06-03 | 2019-12-03 | Apple Inc. | Accelerated task performance |
JP6966979B2 (en) * | 2018-06-26 | 2021-11-17 | 株式会社日立製作所 | Dialogue system control methods, dialogue systems and programs |
US11010561B2 (en) | 2018-09-27 | 2021-05-18 | Apple Inc. | Sentiment prediction from textual data |
US11170166B2 (en) | 2018-09-28 | 2021-11-09 | Apple Inc. | Neural typographical error modeling via generative adversarial networks |
US11462215B2 (en) | 2018-09-28 | 2022-10-04 | Apple Inc. | Multi-modal inputs for voice commands |
US10839159B2 (en) | 2018-09-28 | 2020-11-17 | Apple Inc. | Named entity normalization in a spoken dialog system |
US11475898B2 (en) | 2018-10-26 | 2022-10-18 | Apple Inc. | Low-latency multi-speaker speech recognition |
US11638059B2 (en) | 2019-01-04 | 2023-04-25 | Apple Inc. | Content playback on multiple devices |
US11348573B2 (en) | 2019-03-18 | 2022-05-31 | Apple Inc. | Multimodality in digital assistant systems |
US11423908B2 (en) | 2019-05-06 | 2022-08-23 | Apple Inc. | Interpreting spoken requests |
US11307752B2 (en) | 2019-05-06 | 2022-04-19 | Apple Inc. | User configurable task triggers |
US11475884B2 (en) | 2019-05-06 | 2022-10-18 | Apple Inc. | Reducing digital assistant latency when a language is incorrectly determined |
DK201970509A1 (en) | 2019-05-06 | 2021-01-15 | Apple Inc | Spoken notifications |
US11140099B2 (en) | 2019-05-21 | 2021-10-05 | Apple Inc. | Providing message response suggestions |
US11496600B2 (en) | 2019-05-31 | 2022-11-08 | Apple Inc. | Remote execution of machine-learned models |
US11289073B2 (en) | 2019-05-31 | 2022-03-29 | Apple Inc. | Device text to speech |
DK180129B1 (en) | 2019-05-31 | 2020-06-02 | Apple Inc. | User activity shortcut suggestions |
DK201970510A1 (en) | 2019-05-31 | 2021-02-11 | Apple Inc | Voice identification in digital assistant systems |
US11360641B2 (en) | 2019-06-01 | 2022-06-14 | Apple Inc. | Increasing the relevance of new available information |
US11488406B2 (en) | 2019-09-25 | 2022-11-01 | Apple Inc. | Text detection using global geometry estimators |
US11823659B2 (en) | 2019-12-11 | 2023-11-21 | Amazon Technologies, Inc. | Speech recognition through disambiguation feedback |
US11694682B1 (en) * | 2019-12-11 | 2023-07-04 | Amazon Technologies, Inc. | Triggering voice control disambiguation |
US11810578B2 (en) | 2020-05-11 | 2023-11-07 | Apple Inc. | Device arbitration for digital assistant-based intercom systems |
Citations (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4783803A (en) | 1985-11-12 | 1988-11-08 | Dragon Systems, Inc. | Speech recognition apparatus and method |
US5265014A (en) | 1990-04-10 | 1993-11-23 | Hewlett-Packard Company | Multi-modal user interface |
US5321608A (en) | 1990-11-30 | 1994-06-14 | Hitachi, Ltd. | Method and system for processing natural language |
US5424947A (en) | 1990-06-15 | 1995-06-13 | International Business Machines Corporation | Natural language analyzing apparatus and method, and construction of a knowledge base for natural language analysis |
US5477451A (en) | 1991-07-25 | 1995-12-19 | International Business Machines Corp. | Method and system for natural language translation |
US5712957A (en) | 1995-09-08 | 1998-01-27 | Carnegie Mellon University | Locating and correcting erroneously recognized portions of utterances by rescoring based on two n-best lists |
US5864808A (en) | 1994-04-25 | 1999-01-26 | Hitachi, Ltd. | Erroneous input processing method and apparatus in information processing system using composite input |
US5917890A (en) | 1995-12-29 | 1999-06-29 | At&T Corp | Disambiguation of alphabetic characters in an automated call processing environment |
US5960447A (en) | 1995-11-13 | 1999-09-28 | Holt; Douglas | Word tagging and editing system for speech recognition |
US5974413A (en) | 1997-07-03 | 1999-10-26 | Activeword Systems, Inc. | Semantic user interface |
US6006183A (en) | 1997-12-16 | 1999-12-21 | International Business Machines Corp. | Speech recognition confidence level display |
US6223150B1 (en) | 1999-01-29 | 2001-04-24 | Sony Corporation | Method and apparatus for parsing in a spoken language translation system |
US6260015B1 (en) | 1998-09-03 | 2001-07-10 | International Business Machines Corp. | Method and interface for correcting speech recognition errors for character languages |
US20020173955A1 (en) | 2001-05-16 | 2002-11-21 | International Business Machines Corporation | Method of speech recognition by presenting N-best word candidates |
US6539348B1 (en) | 1998-08-24 | 2003-03-25 | Virtual Research Associates, Inc. | Systems and methods for parsing a natural language sentence |
US6633846B1 (en) | 1999-11-12 | 2003-10-14 | Phoenix Solutions, Inc. | Distributed realtime speech recognition system |
-
2003
- 2003-12-10 US US10/733,793 patent/US7684985B2/en not_active Ceased
- 2003-12-10 WO PCT/US2003/039602 patent/WO2004053836A1/en not_active Application Discontinuation
- 2003-12-10 EP EP03812983A patent/EP1614102A4/en not_active Ceased
- 2003-12-10 AU AU2003296981A patent/AU2003296981A1/en not_active Abandoned
- 2003-12-10 EP EP08168464A patent/EP2017828A1/en not_active Withdrawn
-
2012
- 2012-03-23 US US13/429,187 patent/USRE44418E1/en active Active
Patent Citations (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4783803A (en) | 1985-11-12 | 1988-11-08 | Dragon Systems, Inc. | Speech recognition apparatus and method |
US5265014A (en) | 1990-04-10 | 1993-11-23 | Hewlett-Packard Company | Multi-modal user interface |
US5424947A (en) | 1990-06-15 | 1995-06-13 | International Business Machines Corporation | Natural language analyzing apparatus and method, and construction of a knowledge base for natural language analysis |
US5321608A (en) | 1990-11-30 | 1994-06-14 | Hitachi, Ltd. | Method and system for processing natural language |
US5477451A (en) | 1991-07-25 | 1995-12-19 | International Business Machines Corp. | Method and system for natural language translation |
US5864808A (en) | 1994-04-25 | 1999-01-26 | Hitachi, Ltd. | Erroneous input processing method and apparatus in information processing system using composite input |
US5712957A (en) | 1995-09-08 | 1998-01-27 | Carnegie Mellon University | Locating and correcting erroneously recognized portions of utterances by rescoring based on two n-best lists |
US5960447A (en) | 1995-11-13 | 1999-09-28 | Holt; Douglas | Word tagging and editing system for speech recognition |
US5917890A (en) | 1995-12-29 | 1999-06-29 | At&T Corp | Disambiguation of alphabetic characters in an automated call processing environment |
US5974413A (en) | 1997-07-03 | 1999-10-26 | Activeword Systems, Inc. | Semantic user interface |
US6006183A (en) | 1997-12-16 | 1999-12-21 | International Business Machines Corp. | Speech recognition confidence level display |
US6539348B1 (en) | 1998-08-24 | 2003-03-25 | Virtual Research Associates, Inc. | Systems and methods for parsing a natural language sentence |
US6260015B1 (en) | 1998-09-03 | 2001-07-10 | International Business Machines Corp. | Method and interface for correcting speech recognition errors for character languages |
US6223150B1 (en) | 1999-01-29 | 2001-04-24 | Sony Corporation | Method and apparatus for parsing in a spoken language translation system |
US6633846B1 (en) | 1999-11-12 | 2003-10-14 | Phoenix Solutions, Inc. | Distributed realtime speech recognition system |
US20020173955A1 (en) | 2001-05-16 | 2002-11-21 | International Business Machines Corporation | Method of speech recognition by presenting N-best word candidates |
Non-Patent Citations (5)
Title |
---|
Coutaz et al., "Four easy pieces for assessing the usability of multimodal interaction: the care properties," Proceedings of the International Conference on Human-Computer Interaction, Jan. 1995, 115-120. |
European Patent Application No. EP 03 81 2983: Supplementary European Search Report dated Nov. 21, 2006, 3 pages. |
European Patent Application No. EP 08 16 8464: Extended European Search Report dated Dec. 16, 2008, 9 pages. |
Sturm et al., "Adding Extra Input/Output Modalities to a Spoken Dialogue System," Proceedings of the 2nd SIGdial Workshop on Discourse and Dialogue, 2001, vol. 16, 1-4. |
Suhm et al., "Multimodal error correction for speech user interfaces," ACM Transactions on Computer-Human Interaction (TOCHI), Mar. 2001, 8(1), 60-98. |
Also Published As
Publication number | Publication date |
---|---|
US7684985B2 (en) | 2010-03-23 |
EP1614102A1 (en) | 2006-01-11 |
AU2003296981A1 (en) | 2004-06-30 |
WO2004053836A1 (en) | 2004-06-24 |
EP1614102A4 (en) | 2006-12-20 |
US20040172258A1 (en) | 2004-09-02 |
EP2017828A1 (en) | 2009-01-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
USRE44418E1 (en) | Techniques for disambiguating speech input using multimodal interfaces | |
US10403290B2 (en) | System and method for machine-mediated human-human conversation | |
KR102151681B1 (en) | Determining conversation states for the language model | |
US7624018B2 (en) | Speech recognition using categories and speech prefixing | |
US10446141B2 (en) | Automatic speech recognition based on user feedback | |
US8352260B2 (en) | Multimodal unification of articulation for device interfacing | |
RU2352979C2 (en) | Synchronous comprehension of semantic objects for highly active interface | |
US10811005B2 (en) | Adapting voice input processing based on voice input characteristics | |
US9293134B1 (en) | Source-specific speech interactions | |
MXPA04005122A (en) | Semantic object synchronous understanding implemented with speech application language tags. | |
JP2002116796A (en) | Voice processor and method for voice processing and storage medium | |
JP4667085B2 (en) | Spoken dialogue system, computer program, dialogue control apparatus, and spoken dialogue method | |
EP3779971A1 (en) | Method for recording and outputting conversation between multiple parties using voice recognition technology, and device therefor | |
Mirzaei et al. | Combining augmented reality and speech technologies to help deaf and hard of hearing people | |
US7461000B2 (en) | System and methods for conducting an interactive dialog via a speech-based user interface | |
JP3837061B2 (en) | Sound signal recognition system, sound signal recognition method, dialogue control system and dialogue control method using the sound signal recognition system | |
JP2002116797A (en) | Voice processor and method for voice recognition and storage medium | |
Rahul et al. | Development of voice activated ground control station | |
US20080256071A1 (en) | Method And System For Selection Of Text For Editing | |
AU2019100034A4 (en) | Improving automatic speech recognition based on user feedback |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: WALOOMBA TECH LTD., L.L.C., DELAWARE Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:KIRUSA INC.;REEL/FRAME:028646/0276 Effective date: 20110201 |
|
FPAY | Fee payment |
Year of fee payment: 4 |
|
CC | Certificate of correction | ||
AS | Assignment |
Owner name: GULA CONSULTING LIMITED LIABILITY COMPANY, DELAWARE Free format text: MERGER;ASSIGNOR:WALOOMBA TECH LTD., L.L.C.;REEL/FRAME:037028/0475 Effective date: 20150828 Owner name: GULA CONSULTING LIMITED LIABILITY COMPANY, DELAWAR Free format text: MERGER;ASSIGNOR:WALOOMBA TECH LTD., L.L.C.;REEL/FRAME:037028/0475 Effective date: 20150828 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552) Year of fee payment: 8 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 12TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1553); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 12 |