WO2014031505A1 - Word detection and domain dictionary recommendation - Google Patents

Word detection and domain dictionary recommendation Download PDF

Info

Publication number
WO2014031505A1
WO2014031505A1 PCT/US2013/055500 US2013055500W WO2014031505A1 WO 2014031505 A1 WO2014031505 A1 WO 2014031505A1 US 2013055500 W US2013055500 W US 2013055500W WO 2014031505 A1 WO2014031505 A1 WO 2014031505A1
Authority
WO
WIPO (PCT)
Prior art keywords
words
word
domain
text
text selection
Prior art date
Application number
PCT/US2013/055500
Other languages
French (fr)
Inventor
Hao Sun
Chi-Ho Li
Jing Li
Original Assignee
Microsoft Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Microsoft Corporation filed Critical Microsoft Corporation
Priority to CN201380044316.4A priority Critical patent/CN104584003B/en
Publication of WO2014031505A1 publication Critical patent/WO2014031505A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates

Definitions

  • new words When new words are received from one or more sources, for example, an Internet web page, electronic mail message, text message, electronic document, or the like, such words may not be recognized as belonging to a given domain dictionary, for example, a domain dictionary associated with a word processing application, and thus, such functionalities as text input methods, spellchecking, grammar checking, auto entry completion, and the like, may not be available for those new words. This may be particularly problematic with complex languages such as the Chinese language that are comprised of strings of characters not broken into words by spaces or other demarcation or separation indicia.
  • a user may be inputting information (e.g., text) via a given software functionality, for example, a word processing application, that is associated with a given domain dictionary, for example, a standard English language, Chinese language, or other standard language domain dictionary, but the user may be inputting text associated with a more particular domain, for example, a medical terminology domain.
  • a given software functionality for example, a word processing application
  • a domain dictionary for example, a standard English language, Chinese language, or other standard language domain dictionary
  • a more particular domain for example, a medical terminology domain.
  • Embodiments of the present invention solve the above and other problems by providing new word detection and domain dictionary recommendation.
  • words are extracted from the content by analyzing the content according to a variety of rules, including a stop word rule, a lexicon sub-string and number sequence rule, a prefix/suffix rule and a language pattern rule.
  • rules including a stop word rule, a lexicon sub-string and number sequence rule, a prefix/suffix rule and a language pattern rule.
  • remaining words are ranked for inclusion into one or more word lexicons and/or particular domain dictionaries for future use for such functionalities as text input methods, spellchecking, grammar checking, auto entry completion, definition, and the like.
  • Fig. 1 illustrates textual content according to a particular language, for example, Chinese language, displayed on a display screen of a tablet-type computing device from which one or more new words may be detected for inclusion in a given domain dictionary.
  • a particular language for example, Chinese language
  • Fig. 2 illustrates a system architecture for receiving textual content from one or more sources and for detecting one or more new words from the textual content via a new word detection engine.
  • Fig. 3 is a flow chart of a method for detecting new words contained in a received or input text content selection.
  • Fig. 4 illustrates a system architecture for domain dictionary recommendation for received or input textual content.
  • Fig. 5 is a flow chart of a method for recommending one or more domain dictionaries in association with received or input textual content.
  • Fig. 6 illustrates an example pop-up dialog for recommending a domain dictionary to a user in association with received or entered textual content.
  • FIG. 7 is a simplified block diagram illustrating example physical components of a computing device with which embodiments of the invention may be practiced.
  • FIGs. 8A and 8B are simplified block diagrams of a mobile computing device with which embodiments of the present invention may be practiced.
  • FIG. 9 is a simplified block diagram of a distributed computing system in which embodiments of the present invention may be practiced.
  • embodiments of the present invention are directed to providing new word detection and domain dictionary recommendation.
  • words are extracted from the content by analyzing the content according to a variety of rules. After words of low value for addition to a given domain dictionary as new words are eliminated, remaining words are ranked for inclusion into one or more word lexicons and/or particular domain dictionaries for future use for such functionalities as text input methods, spellchecking, grammar checking, auto entry completion, definition, and the like.
  • a determination may be made as to whether more helpful domain dictionaries may be available. If a determination is made that words entered by the user have a high degree of association with a domain dictionary not in use by the user, that domain dictionary may be recommended to the user to increase the accuracy of the user's input of additional text and editing of existing text.
  • a textual content selection 115 is illustrated on a display screen of a computing device 110 that may be read, edited, or otherwise utilized by a user according to a variety of software functionalities, for example, word processing applications, Internet-based applications, slide presentation applications, spreadsheet applications, desktop publishing applications, and the like.
  • the computing device 110 illustrated in Fig. 1 is a tablet-type computing device, but as should be appreciated, the computing device 110 may take any suitable form, for example, a laptop computer, a desktop computer, a handheld computing device, for example, a smart telephone, and the like that is capable of allowing the textual content 115 to be displayed and utilized according to one or more software functionalities. As illustrated in Fig.
  • the textual content 115 is Chinese language content, but as should be understood, the textual content 115 may be provided according to any other language type desired by the user of the device 110.
  • a source 120 is illustrated from which the textual content 115 may be obtained, for example, an Internet-based web page, a remotely stored document, an electronic mail message, a text message, a locally stored document, and the like.
  • the textual content when textual content, such as the textual content illustrated in Fig. 1, is received from a source 120, the textual content may have one or more new words that may or may not be understood by the user or other receiving party and/or may not be included in a domain dictionary available to the user, for example, one or more domain dictionaries associated with the user's word processing application, or other software application with which the received content is to be utilized.
  • a domain dictionary available to the user, for example, one or more domain dictionaries associated with the user's word processing application, or other software application with which the received content is to be utilized.
  • various functionalities for example, text input methods, spellchecking, grammar checking, auto entry completion, dictionary services, and the like may not be available for such new words.
  • the received textual content 115 may include a new word that is new to a given industry, for example, the software industry, Internet industry, or the like that may be understood by the receiving user, but that may not be included in a given domain dictionary for assisting the user in utilizing the new word according to available software functionalities.
  • a given industry for example, the software industry, Internet industry, or the like that may be understood by the receiving user, but that may not be included in a given domain dictionary for assisting the user in utilizing the new word according to available software functionalities.
  • textual content received from a variety of sources may be passed to a new word detection engine 230 for isolation of new words contained in the received textual content item and for inclusion in one or more word lexicons (lists of words) and/or given domain dictionaries (lists of words associated with a particular domain, e.g., medical terminology domain) for subsequent use in association with one or more software functionalities.
  • word lexicons lists of words
  • domain dictionaries lists of words associated with a particular domain, e.g., medical terminology domain
  • a new word of "texting" is received in a textual content item from one or more sources, the new word may be understood by a receiving user, but the new word may not be included in any domain dictionaries associated with software functionalities in use by the user, for example, text input method applications, word processing applications, electronic mail applications, and the like.
  • software functionalities associated with text content entry and editing may be utilized in association with the domain dictionary and the newly isolated and stored word.
  • a domain dictionary associated with the user's text input method application or word processing application to which the new word has been added may be utilized by the user's word processing application for assisting the user in properly entering the word, spelling the word, grammar checking use of the word in association with other words, providing dictionary services associated with the word, and the like.
  • the new word detection engine 230 utilizes a variety of word detection rules/methods 235 for determining whether portions of the textual content include new words and for ranking determined new words for possible output to one or more domain dictionaries 265 for subsequent use. As described below, some of the rules/methods 235 may be used for eliminating candidate new words that are not considered meaningful for adding to a given domain dictionary as a new word.
  • the stop word rule 240 may be used for eliminating text strings having a head or tail associated with one or more prescribed stop words where such stop words may be considered noise level textual content items and not meaningful for inclusion in a given domain dictionary.
  • portions of the text content may be extracted and compared against lists of known stop words. For example, such stop words as commonly used transitional phrase words, articles, verbs, and the like, for example, "a,” “and,” “the,” and the like, may be eliminated so that they are not needlessly analyzed further and added to a domain dictionary as a new word.
  • stop words are merely three example English language stop words and are not exhaustive of the vast number of stop words that are utilized according to a variety of languages, for example, Chinese language, English language, French language, Arabic language, and the like and that are well known and well understood by those skilled in the art of linguistics.
  • a lexicon sub-string and number sequences rule 245 may be utilized for eliminating strings that are sub-strings of other words or number sequences contained in one or more domain dictionaries where the inclusion of such sub-strings do not provide for meaningful inclusion in one or more domain dictionaries. That is, character strings contained in a given text content that are merely sub-strings of words contained in a lexicon of words or sub-strings of a number sequence contained in a lexicon of words may be eliminated because they are of little value in adding to a domain dictionary or lexicon of words or terms as a new word or term.
  • the sub-string of "diction” may be eliminated as a candidate new word.
  • this rule may be advantageous because when a word is in a lexicon, and if one of its sub-strings is not in the lexicon, then the sub-string is not a meaningful word.
  • number sequences for example, a number sequence of "2012" used for indicating a year may not be meaningful for adding to a lexicon or domain dictionary as a new word and thus may be eliminated.
  • portions of received or input text may be compared by the new word detection engine 230 against lists of known strings, and number sequences may be detected by determining one or more characters are sequences of numbers that are not part of a given word or term.
  • statistical methods 250 may be utilized for scoring remaining candidates for possible inclusion in a lexicon of words or domain dictionary as described herein.
  • a variety of statistical methods may be employed for scoring a given word so that highly scored or ranked words may be included in the lexicon or domain dictionary and so that lower scored or ranked words may be discarded.
  • a term frequency for a given word may be determined, and words appearing very frequently in a given text selection may be scored highly.
  • Such determinations may be refined by combining such determinations with other statistical information. For example, if a word has a high term frequency, but only appears in association with another word that is not considered meaningful, then the high term frequency for the word may be less important.
  • the contextual independency of a word may be considered where a higher or lower score for the word may be determined based on the dependency or association of the word on or with other words in an analyzed text selection.
  • the statistical methods 250 allow for calculation of six (6) statistical scores for any candidate word w, composed of characters ci ... Cn.
  • the statistical methods 250 may use lexicon sub-string and number sequences rule described above and the prefix/suffix rule described below for refining the statistical information determined for a given word.
  • a first statistical score for a given word may include a term frequency (TF) which may be determined for each word extracted from a received or input text content 1 15 as set out below.
  • TF is the term frequency of the word and length is the textual length of the word.
  • the FSCP for the word may be determined as follows.
  • a third statistical score may include an adapted mutual information (AMI) score.
  • the AMI score allows a determination as to whether a character pattern for a given word c l ...c i is more complete in semantics than any substrings that compose the word, especially on longest composed substrings.
  • the AMI score may be determined as follows.
  • a fourth statistical score may include a context entropy score. For a context entropy score neighboring words (x) of an analyzed word (w) are collected and the frequencies of the neighboring words (x) are determined. The context entropy of the analyzed word (w) may be determined as follows.
  • a fifth statistical score may include a prefix/suffix ratio of a given word relative to other words to with the given word is associated as a prefix/suffix.
  • a prefix/suffix ratio for a given word may be determined as follows.
  • a sixth statistical score for an analyzed word may include a biased mutual dependency (BMD) score for determining dependencies between analyzed words and a plurality of other words in a text selection.
  • BMD score for a given word may be determined as follows.
  • a language pattern rule may be used for adjusting the scores.
  • a Chinese pattern rule may be used to adjust the scores using a linear model to adjust FSCP and AMI may as follows.
  • Score, (w) FSCP(w) + delta fscD * Pattern(w)
  • Score mi (w) AMI(w) + delta mi * Pattern(w)
  • a Chinese pattern analysis may not be used for term frequency (TF) score adjustment because TF(w) is typically a very large number, and the Pattern(w) is between 0 ⁇ 1.
  • the delta fscp may be set to 0.01,
  • Pattern(w) is typically very large (e.g., 0.6-1), so the delta fscp may not be set large to let the Pattern(w) become dominant.
  • Such example parameters may be obtained by experimentation.
  • the delta mi may be set to 0.1, 0.5, 1 for testing because the AMI(w) is typically as large (e.g., 0.6-1) as (w). According to an embodiment, these parameters may be obtained by experimentation and testing.
  • the multiple scores may be combined for obtaining a single score that may be used for determining whether the word should be added to a lexicon or domain dictionary.
  • the values of the 6 's may be obtained by numerical optimization over a number of training instances.
  • Positive training instances may be provided by automated and human selection.
  • the negative training instances which may not be reliably provided by human selection, may be selected from lists of candidate words ranked by each of the six statistical scores/measures described above. If a candidate word is ranked low by at least three statistical measures, then it may be selected as a negative training instance.
  • the prefix/suffix rule 255 provides for eliminating words or phrases that are prefixes or suffixes or other words or phrases. After a score is calculated for a given word as described above, some candidates may be eliminated via the prefix/suffix rule 255 where a score for a prefix or suffix word is no greater than the words containing them. That is, the sub-string comprising the prefix or suffix is less meaningful (based on scoring) than the words to which the sub-strings belong. Thus, a string (word) comprising including such a prefix or suffix should not be split to the sub-string (prefix or suffix), and therefore, the sub-string may be removed as a candidate word for inclusion in a lexicon or domain dictionary.
  • the language pattern rule 260 allows for analyzing the patterns of characters for adjusting scores determined for candidate words. For example, if a word contains characters "abc," the language pattern rule may be used for determining a probability that a character may be in the first position or in the middle or in the tail of a candidate word for adjusting the score for the candidate word. For example, according to an example embodiment using a Chinese pattern rule, a text character's position may be used for determining the probability the character is a Chinese character. According to this example Chinese language embodiment, a unigram statistic is first calculated from original lexicon and trigram statistic to get the list of ⁇ word, tf> pairs.
  • a character statistic is calculated from a unigram statistic to get a list of ⁇ char, ⁇ head tf , mid tf , tail tf » pairs. That is, for a character, its frequency is calculated in the head, middle and tail position in the unigram statistic, respectively.
  • These steps comprise preprocessing for the Chinese pattern rule.
  • the probability of each position in which the character may occur may be calculated as follows.
  • Fig. 3 is a flow chart of a method for detecting new words contained in a received or input text content selection.
  • the method 300 begins at start operation 305 and proceeds to operation 310 where textual content in the form of a number of words or character strings is received from one or more sources 205, 210, 215, 220, 225, as illustrated and described above with reference to Fig. 2.
  • Word segmentation may next be performed for separating input or received textual content into individual words for subsequent analysis of segmented words as described below.
  • textual content may be broken into words according to a variety of methods. Textual content may be broken into words via one or more word breaker methods, for example, by breaking words at spaces between groupings of characters or before or after known head and tail characters. However, for some languages such as Chinese, traditional word breaking methods are less effective because spaces and other demarcation indicators are not provided between words. In such cases, other methods may be utilized for quickly grouping characters into words.
  • a positive maximum match method may be employed for segmenting such language types (e.g., Chinese) into words.
  • the positive maximum match method is not sensitive to the size of a given lexicon of words.
  • characters are grouped together one by one up to a maximum number (e.g., 9 characters), and each grouping may be treated as a word for comparing against a lexicon for isolating the grouping as a word.
  • a maximum number e.g. 9 characters
  • the stop word rule 240 may be run against the received textual content for eliminating one or more stop words contained in the received textual content.
  • stop words isolated and determined for the received textual content are eliminated as being of low value or meaningless for new word detection and determination.
  • the lexicon sub-string and number sequence rule may be run against the remaining textual content, and at operation 330, unnecessary sub-strings may be eliminated from the remaining textual content as lacking importance or meaning in the determination of new words contained in the received textual content.
  • the statistical methods 250 are run against remaining textual content for scoring words contained in the remaining textual content for determination as new words for including in one or more lexicons and/or domain dictionaries.
  • the prefix/suffix rule 255 may be run against scored words extracted from the received textual content.
  • unnecessary prefixes and suffixes may be eliminated for further reducing the number of textual content items that may be determined as new words contained in the received textual content.
  • language pattern analysis for example, Chinese language pattern analysis
  • the remaining words are ranked for inclusion in one or more word lexicons and/or domain dictionaries as new words, and at operation 360, highly ranked words may be selected and stored as new words for inclusion in one or more word lexicons and/or domain dictionaries.
  • the scores and associated ranking that are required for including a word in a given lexicon or domain dictionary may be different for different languages and domain types. That is, scores and associated ranking may be determined acceptable for word detection and selection at varying levels for making the word detection methods described above more or less selective as desired for different text content.
  • the word lexicon or domain dictionary may be recommended to a user for association with a given software functionality, for example text input methods or word processing.
  • the method 300 ends at operation 375.
  • a given software application in use by a user for example, a word processing application, slide presentation application, Internet web page functionality application, and the like may be associated with a given domain dictionary, for example, a standard grammar lexicon associated with a given language, for example, Chinese, English, French, Arabic, or the like.
  • the textual content being entered and/or edited by the user is more closely associated with a particular domain dictionary, for example, a medical terminology domain dictionary, an engineering terminology domain dictionary, a biological sciences domain dictionary, or the like, the user may be losing valuable resources of one of these particular or specialized domain dictionaries that may be available to the user for use in association with the entered and/or edited textual content.
  • a particular domain dictionary for example, a medical terminology domain dictionary, an engineering terminology domain dictionary, a biological sciences domain dictionary, or the like.
  • textual content entered and/or edited by a user may be analyzed for association with one or more domain dictionaries not in use by the user in association with the textual content, and one or more domain dictionaries that may be helpful in association with the entered and/or edited textual content may be recommended to the user.
  • example textual content 415 is illustrated on the display screen of a computing device 410 being entered and/or edited, and/or received by a user for use in association with one or more software functionalities.
  • a number of domain dictionaries 420, 425, 430, 435 are illustrated that may be associated with the textual content 415 for assisting a user in association with an input method 440 with which the user may input additional textual content, edit input or edit received textual content.
  • an input method editor IME
  • an input device e.g., a keyboard
  • an English language keyboard may be associated with a Chinese language IME.
  • a domain dictionary associated with the IME may assist in input and editing of text entered via the Chinese language IME in association with the English language keyboard.
  • the textual content 415 is provided according to the Chinese language.
  • the Chinese language is but one example of a variety of different textual content languages that may be utilized in accordance with embodiments of the present invention for recommending one or more available domain dictionaries for use in association with a given textual content.
  • the domain dictionary 420 may be a domain dictionary containing standard language lexicon, grammar and dictionary services associated with a given language, for example, the Chinese language, the English language, the French language, and the like.
  • the domain dictionaries 425, 430, 435 may be associated with particular domain types, for example, medical terminology domain, engineering terminology domain, biological sciences terminology domain, and the like. As should be appreciated, a great number of domain dictionaries may be provided for use in association with textual content that are associated with a variety of different topics and/or ideas.
  • the domain dictionary recommendation engine 445 is illustrative of a software module containing sufficient computer executable instructions for analyzing a textual content and for comparing the textual content to one or more domain dictionaries for recommending one or more domain dictionaries for use in association with the textual content.
  • IME input method editor
  • text being input or edited by a user may be analyzed for recommending one or more additional domain dictionaries that may be associated with the IME in use for allowing the user greater input and/or editing accuracy via the one or more additional domain dictionaries.
  • Fig. 5 is a flow chart of a method for recommending one or more domain dictionaries in association with received or input textual content.
  • the method 500 begins at start operation 505 and proceeds to operation 510 where textual content input and/or received by a user is received by the domain recommendation engine 445.
  • domain words are extracted from user input history (including text presently being entered, previously entered text, or text received from one or more sources) for comparison with words contained in one or more domain dictionaries that may be recommended to the user for use with the user's input method.
  • word segmentation is performed for separating input or received textual content into individual words for subsequent comparison of segmented words against words contained in one or more domain dictionaries 420, 425, 430, 435.
  • user input history may be broken into words for comparison against words contained in various domain dictionaries according to a variety of methods. For example, words may be isolated from user input according to the methods described above with reference to Figs. 1 - 3.
  • user input may be broken into words via one or more word breaker methods, for example, by breaking words at spaces between groupings of characters or before or after known head and tail characters.
  • a positive maximum match method may be employed for segmenting such language types (e.g., Chinese) into words.
  • the positive maximum match method is not sensitive to the size of a given lexicon.
  • characters are grouped together one by one up to a maximum number (e.g., 9 characters), and each grouping may be treated as a word for comparing against a lexico for isolating the grouping as a word.
  • the segme ted words may be compared against words contained in any number of domain dictionaries, as described below, for determining whether a given domain dictionary should be recommended to the user for associating with the user's current input method.
  • words having low value and/or low meaning with respect to a comparison against words contained in the one or more domain dictionaries may be eliminated.
  • elimination of low value or meaningless words at operation 520 may be performed according to a variety of methods, including the word detection rules and methods 235 described above with reference to Fig. 2.
  • the domain dictionaries and associated lexicons 420, 425, 430, 435 available for association with the input and/or received textual content 415 are obtained by the domain recommendation engine 445.
  • the domain recommendation engine 445 an almost limitless number of domain dictionaries may be obtained having associated lexicons related to many different topics and ideas.
  • the words segmented from the input and/or received textual content 415 are analyzed for term frequency by determining the frequency with which particular words are used in the input and/or received textual content 415. For example, if the word "texting" is included only once in the textual content 415, then that word will have a term frequency of one. On the other hand, if the word "texting" is used ten times in the textual content 415, then a term frequency of ten will be applied to that word. According to embodiments, if a given word has a low term frequency, that word may be discarded from further analysis for association with a particular domain dictionary.
  • the term frequency utilized for determining the value of a given word for comparison against words contained in one or more domain dictionaries may be varied based on a variety of factors. For example, in some instances a particular word may have a low term frequency, but nonetheless may be kept for further analysis. For example, a word such as "penicillin” may have a low term frequency in a given textual content, but the word may be kept due to its uniqueness, for comparison against words in a medical terminology domain dictionary.
  • Word pairs are created by pairing words extracted from the input and/or received textual content with matching words contained in the one or more domain dictionaries considered by the domain recommendation engine 445. For example, if the word "penicillin" is extracted from the textual content 415, and is found to match the same word contained in a medical terminology domain dictionary 430, a word pair associating the textual content 415 entered and/or received by the user with the example medical terminology domain dictionary 430 is created.
  • all the compared domains are sorted and ranked according to the number of matched word pairs in the analyzed text content, and a top number of domain dictionaries is determined for words extracted from the input and/or received textual content 415.
  • the top number e.g., two
  • the threshold count of matched word pairs may be determined via experimentation and testing.
  • An example and suitable algorithm for determining a top number of domain dictionaries is as follows.
  • all domain dictionaries containing a prescribed number of word pairs associated with the input and/or received textual content may be determined for recommendation to the user. For example, if the textual content input and/or received by the user contains a number of medical and scientific terms, then a number of word pairs may be determined for words extracted from the textual content 415 in comparison to both a medical terminology domain dictionary and a scientific terminology domain dictionary. Thus, both the example medical terminology domain dictionary and the scientific terminology domain dictionary may be selected as top domain dictionaries for recommendation to the user.
  • the example engineering domain dictionary may not be ranked highly for presentation to the user as a recommended domain dictionary.
  • the ranking of domain dictionaries for a possible recommendation to a user may be performed according to a variety of prescribed ranking levels. For example, it may be determined that any domain dictionary having five or more word pairs associated with an analyzed textual content 415 may be recommended to a user. On the other hand, it may be determined that there must be more than 25 word pairings between a given domain dictionary and an analyzed textual content for recommendation of the associated domain dictionary.
  • one or more domain dictionaries may be recommended to the user for association with the user's software functionalities, for example, an input method in use by the user, or the one or more domain dictionaries may be recommended for association with one or more software applications, such as word processing applications, slide presentation applications, Internet browsing applications, and the like. That is, the one or more domain dictionaries may be recommended to the user to allow the user to perform his/her text input and/or editing more efficiently through the use of the recommended domain dictionaries that may help him with the words he enters or edits.
  • An example recommendation user interface component is described below with reference to Fig. 6.
  • the method 500 ends at operation 595.
  • Fig. 6 illustrates an example pop-up dialog for recommending a domain dictionary to a user in association with received or entered textual content.
  • the one or more domain dictionaries may be recommended to the user by the domain dictionary recommendation engine 445.
  • a pop-up dialog 610 is illustrated for including a recommendation 615 of a given domain dictionary to the user. For example, the recommendation of "It appears you are working in a medical domain. To improve accuracy, we suggest you turn on the medical domain.
  • buttons are provided for allowing a user to selectively turn on or reject the turning on of the recommended domain dictionary.
  • the pop-up dialog 610 and the associated recommendation language are for purposes of example only and are not limiting of the vast number of user interface components that may be utilized for recommending a given domain dictionary in association with a given software functionality or textual content.
  • the recommendation engine 445 determines that a given domain dictionary may be recommended for use in association with a given software functionality and/or textual content
  • the recommended domain dictionary may be automatically associated with the given software functionality and/or textual content without user input. That is, some software functionalities, for example, input method applications and word processing applications, may be set up for automatically associating recommended domain dictionaries with textual content items for assisting users with those textual content items.
  • a given domain dictionary is associated with a given software functionality and/or textual content item, then the resources of that domain dictionary may be made available for use in association with textual content, including text input, spellchecking, grammar checking, auto entry completion, dictionary services, and the like.
  • the embodiments and functionalities described herein may operate via a multitude of computing systems including, without limitation, desktop computer systems, wired and wireless computing systems, mobile computing systems (e.g., mobile telephones, netbooks, tablet or slate type computers, notebook computers, and laptop computers), hand-held devices, multiprocessor systems, microprocessor-based or programmable consumer electronics, minicomputers, and mainframe computers.
  • the embodiments and functionalities described herein may operate over distributed systems (e.g., cloud-based computing systems), where application functionality, memory, data storage and retrieval and various processing functions may be operated remotely from each other over a distributed computing network, such as the Internet or an intranet.
  • User interfaces and information of various types may be displayed via on-board computing device displays or via remote display units associated with one or more computing devices. For example user interfaces and information of various types may be displayed and interacted with on a wall surface onto which user interfaces and information of various types are projected. Interaction with the multitude of computing systems with which embodiments of the invention may be practiced include, keystroke entry, touch screen entry, voice or other audio entry, gesture entry where an associated computing device is equipped with detection (e.g., camera) functionality for capturing and interpreting user gestures for controlling the functionality of the computing device, and the like.
  • Figs. 7 through 9 and the associated descriptions provide a discussion of a variety of operating environments in which embodiments of the invention may be practiced. However, the devices and systems illustrated and discussed with respect to Figs. 7 through 9 are for purposes of example and illustration and are not limiting of a vast number of computing device configurations that may be utilized for practicing embodiments of the invention, described herein.
  • Fig. 7 is a block diagram illustrating example physical components (i.e., hardware) of a computing device 700 with which embodiments of the invention may be practiced.
  • the computing device components described below may be suitable for the computing devices described above.
  • the computing device 700 may include at least one processing unit 702 and a system memory 704.
  • the system memory 704 may comprise, but is not limited to, volatile storage (e.g., random access memory), non-volatile storage (e.g., read-only memory), flash memory, or any combination of such memories.
  • the system memory 704 may include an operating system 705 and one or more program modules 706 suitable for running software applications 720 such as the new word detection engine 230 and the domain recommendation engine 445.
  • the operating system 705, for example, may be suitable for controlling the operation of the computing device 700.
  • embodiments of the invention may be practiced in conjunction with a graphics library, other operating systems, or any other application program and is not limited to any particular application or system.
  • This basic configuration is illustrated in Fig. 7 by those components within a dashed line 708.
  • the computing device 700 may have additional features or functionality.
  • the computing device 700 may also include additional data storage devices (removable and/or non-removable) such as, for example, magnetic disks, optical disks, or tape. Such additional storage is illustrated in Fig. 7 by a removable storage device 709 and a non-removable storage device 710.
  • program modules 706 such as the new word detection engine 230 and domain recommendation engine 445 may perform processes including, for example, one or more of the stages of the methods 300 and 500, respectively.
  • the aforementioned process is an example, and the processing unit 702 may perform other processes.
  • Other program modules that may be used in accordance with embodiments of the present invention may include electronic mail and contacts applications, word processing applications, spreadsheet applications, database applications, slide presentation applications, drawing or computer-aided application programs, etc.
  • embodiments of the invention may be practiced in an electrical circuit comprising discrete electronic elements, packaged or integrated electronic chips containing logic gates, a circuit utilizing a microprocessor, or on a single chip containing electronic elements or microprocessors.
  • embodiments of the invention may be practiced via a system-on-a-chip (SOC) where each or many of the components illustrated in Fig. 7 may be integrated onto a single integrated circuit.
  • SOC system-on-a-chip
  • Such an SOC device may include one or more processing units, graphics units, communications units, system virtualization units and various application functionality all of which are integrated (or "burned") onto the chip substrate as a single integrated circuit.
  • the functionality, described herein, with respect to the new word detection engine 230 and the domain recommendation engine 445 may be operated via application-specific logic integrated with other components of the computing device 700 on the single integrated circuit (chip).
  • Embodiments of the invention may also be practiced using other technologies capable of performing logical operations such as, for example, AND, OR, and NOT, including but not limited to mechanical, optical, fluidic, and quantum technologies.
  • embodiments of the invention may be practiced within a general purpose computer or in any other circuits or systems.
  • the computing device 700 may also have one or more input device(s) 712 such as a keyboard, a mouse, a pen, a sound input device, a touch input device, etc.
  • the output device(s) 714 such as a display, speakers, a printer, etc. may also be included.
  • the aforementioned devices are examples and others may be used.
  • the computing device 700 may include one or more communication connections 716 allowing communications with other computing devices 718. Examples of suitable communication connections 716 include, but are not limited to, RF transmitter, receiver, and/or transceiver circuitry; universal serial bus (USB), parallel, or serial ports, and other connections appropriate for use with the applicable computer readable media.
  • USB universal serial bus
  • Embodiments of the invention may be implemented as a computer process (method), a computing system, or as an article of manufacture, such as a computer program product or computer readable media.
  • the computer program product may be a computer storage media readable by a computer system and encoding a computer program of instructions for executing a computer process.
  • Computer readable media may include computer storage media and communication media.
  • Computer storage media may include volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information, such as computer readable instructions, data structures, program modules, or other data.
  • the system memory 704, the removable storage device 709, and the non-removable storage device 710 are all computer storage media examples (i.e., memory storage.)
  • Computer storage media may include, but is not limited to, RAM, ROM, electrically erasable read-only memory (EEPROM), flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store information and which can be accessed by the computing device 700. Any such computer storage media may be part of the computing device 700.
  • Communication media may be embodied by computer readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave or other transport mechanism, and includes any information delivery media.
  • modulated data signal may describe a signal that has one or more characteristics set or changed in such a manner as to encode information in the signal.
  • communication media may include wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, radio frequency (RF), infrared, and other wireless media.
  • RF radio frequency
  • Figs. 8A and 8B illustrate a mobile computing device 800, for example, a mobile telephone, a smart phone, a tablet personal computer, a laptop computer, and the like, with which embodiments of the invention may be practiced.
  • a mobile computing device 800 for example, a mobile telephone, a smart phone, a tablet personal computer, a laptop computer, and the like, with which embodiments of the invention may be practiced.
  • Fig. 8A an exemplary mobile computing device 800 for implementing the embodiments is illustrated.
  • the mobile computing device 800 is a handheld computer having both input elements and output elements.
  • the mobile computing device 800 typically includes a display 805 and one or more input buttons 810 that allow the user to enter information into the mobile computing device 800.
  • the display 805 of the mobile computing device 800 may also function as an input device (e.g., a touch screen display). If included, an optional side input element 815 allows further user input.
  • the side input element 815 may be a rotary switch, a button, or any other type of manual input element.
  • mobile computing device 800 may incorporate more or less input elements.
  • the display 805 may not be a touch screen in some embodiments.
  • the mobile computing device 800 is a portable phone system, such as a cellular phone.
  • the mobile computing device 800 may also include an optional keypad 835.
  • Optional keypad 835 may be a physical keypad or a "soft" keypad generated on the touch screen display.
  • the output elements include the display 805 for showing a graphical user interface (GUI), a visual indicator 820 (e.g., a light emitting diode), and/or an audio transducer 825 (e.g., a speaker).
  • GUI graphical user interface
  • the mobile computing device 800 incorporates a vibration transducer for providing the user with tactile feedback.
  • the mobile computing device 800 incorporates input and/or output ports, such as an audio input (e.g., a microphone jack), an audio output (e.g., a headphone jack), and a video output (e.g., a HDMI port) for sending signals to or receiving signals from an external device.
  • Fig. 8B is a block diagram illustrating the architecture of one embodiment of a mobile computing device. That is, the mobile computing device 800 can incorporate a system (i.e., an architecture) 802 to implement some embodiments.
  • the system 802 is implemented as a "smart phone" capable of running one or more applications (e.g., browser, e-mail, calendaring, contact managers, messaging clients, games, and media clients/players).
  • the system 802 is integrated as a computing device, such as an integrated personal digital assistant (PDA) and wireless phone.
  • PDA personal digital assistant
  • One or more application programs 866 may be loaded into the memory 862 and run on or in association with the operating system 864. Examples of the application programs include phone dialer programs, e-mail programs, personal information management (PIM) programs, word processing programs, spreadsheet programs, Internet browser programs, messaging programs, and so forth.
  • the system 802 also includes a nonvolatile storage area 868 within the memory 862. The non-volatile storage area 868 may be used to store persistent information that should not be lost if the system 802 is powered down.
  • the application programs 866 may use and store information in the non-volatile storage area 868, such as electronic mail or other messages used by an electronic mail application, and the like.
  • a synchronization application (not shown) also resides on the system 802 and is programmed to interact with a corresponding synchronization application resident on a host computer to keep the information stored in the non-volatile storage area 868 synchronized with corresponding information stored at the host computer.
  • other applications may be loaded into the memory 862 and run on the mobile computing device 800, including the new word detection engine 230 and domain recommendation engine 445, described herein.
  • the system 802 has a power supply 870, which may be implemented as one or more batteries.
  • the power supply 870 might further include an external power source, such as an AC adapter or a powered docking cradle that supplements or recharges the batteries.
  • the system 802 may also include a radio 872 that performs the function of transmitting and receiving radio frequency communications.
  • the radio 872 facilitates wireless connectivity between the system 802 and the "outside world", via a communications carrier or service provider. Transmissions to and from the radio 872 are conducted under control of the operating system 864. In other words, communications received by the radio 872 may be disseminated to the application programs 866 via the operating system 864, and vice versa.
  • the radio 872 allows the system 802 to communicate with other computing devices, such as over a network.
  • the radio 872 is one example of communication media.
  • Communication media may typically be embodied by computer readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave or other transport mechanism, and includes any information delivery media.
  • modulated data signal means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal.
  • communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media.
  • the term computer readable media as used herein includes both storage media and communication media.
  • This embodiment of the system 802 provides notifications using the visual indicator 820 that can be used to provide visual notifications and/or an audio interface 874 producing audible notifications via the audio transducer 825.
  • the visual indicator 820 is a light emitting diode (LED) and the audio transducer 825 is a speaker.
  • LED light emitting diode
  • the LED may be programmed to remain on indefinitely until the user takes action to indicate the powered-on status of the device.
  • the audio interface 874 is used to provide audible signals to and receive audible signals from the user.
  • the audio interface 874 may also be coupled to a microphone to receive audible input, such as to facilitate a telephone conversation.
  • the microphone may also serve as an audio sensor to facilitate control of notifications, as will be described below.
  • the system 802 may further include a video interface 876 that enables an operation of an on-board camera 830 to record still images, video stream, and the like.
  • a mobile computing device 800 implementing the system 802 may have additional features or functionality.
  • the mobile computing device 800 may also include additional data storage devices (removable and/or non-removable) such as, magnetic disks, optical disks, or tape.
  • additional storage is illustrated in Figure 8B by the non- volatile storage area 868.
  • Computer storage media may include volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information, such as computer readable instructions, data structures, program modules, or other data.
  • Data/information generated or captured by the mobile computing device 800 and stored via the system 802 may be stored locally on the mobile computing device 800, as described above, or the data may be stored on any number of storage media that may be accessed by the device via the radio 872 or via a wired connection between the mobile computing device 800 and a separate computing device associated with the mobile computing device 800, for example, a server computer in a distributed computing network, such as the Internet.
  • a server computer in a distributed computing network such as the Internet.
  • data/information may be accessed via the mobile computing device 800 via the radio 872 or via a distributed computing network.
  • data/information may be readily transferred between computing devices for storage and use according to well-known data/information transfer and storage means, including electronic mail and collaborative data/information sharing systems.
  • Fig. 9 illustrates one embodiment of the architecture of a system for providing the functionality of the new word detection engine 230 and domain recommendation engine 445 to one or more client devices, as described above.
  • Content developed, interacted with or edited in association with the new word detection engine 230 and domain recommendation engine 445 may be stored in different communication channels or other storage types.
  • various content and documents may be stored using a directory service 922, a web portal 924, a mailbox service 926, an instant messaging store 928, or a social networking site 930.
  • the new word detection engine 230 and domain recommendation engine 445 may use any of these types of systems or the like for enabling co-authoring conflict resolution via comments, as described herein.
  • the server 920 may be a web server providing the functionality of the new word detection engine 230 and domain recommendation engine 445 over the web.
  • the server 920 may provide the functionality of the new word detection engine 230 and domain recommendation engine 445 over the web to clients through a network 915.
  • the client computing device 918 may be implemented as the computing device 900 and embodied in a personal computer 918a, a tablet computing device 918b and/or a mobile computing device 918c (e.g., a smart phone). Any of these embodiments of the client computing device 918 may obtain content from the store 916.
  • the types of networks used for communication between the computing devices that make up the present invention include, but are not limited to, an internet, an intranet, wide area networks (WAN), local area networks (LAN), and virtual private networks (VPN).
  • the networks include the enterprise network and the network through which the client computing device accesses the enterprise network (i.e., the client network).
  • the client network is part of the enterprise network.
  • the client network is a separate network accessing the enterprise network through externally available entry points, such as a gateway, a remote access protocol, or a public or private internet address.

Abstract

New word detection and domain dictionary recommendation are provided. When text content is received according to a given language, for example, Chinese language, words are extracted from the content by analyzing the content according to a variety of rules. The words then are ranked for inclusion into one or more lexicons or domain dictionaries for future use for such functionalities as text input methods, spellchecking, grammar checking, auto entry completion, definition, and the like. In addition, when a user is entering or editing text according to one or more prescribed domain dictionaries, a determination may be made as to whether more helpful domain dictionaries may be available. When entered words have a high degree of association with a given domain dictionary, that domain dictionary may be recommended to the user to increase the accuracy of the user's input of additional text and editing of existing text.

Description

WORD DETECTION AND DOMAIN DICTIONARY RECOMMENDATION
BACKGROUND
[0001] With the great increase in Internet functionality, information transfer and electronic document production and use, more and more new words are being created and spread among users, and more and more words are being used in electronic document creation and use that are associated with a variety of different domain dictionaries.
[0002] When new words are received from one or more sources, for example, an Internet web page, electronic mail message, text message, electronic document, or the like, such words may not be recognized as belonging to a given domain dictionary, for example, a domain dictionary associated with a word processing application, and thus, such functionalities as text input methods, spellchecking, grammar checking, auto entry completion, and the like, may not be available for those new words. This may be particularly problematic with complex languages such as the Chinese language that are comprised of strings of characters not broken into words by spaces or other demarcation or separation indicia.
[0003] In addition, oftentimes a user may be inputting information (e.g., text) via a given software functionality, for example, a word processing application, that is associated with a given domain dictionary, for example, a standard English language, Chinese language, or other standard language domain dictionary, but the user may be inputting text associated with a more particular domain, for example, a medical terminology domain. If the user is not aware of the availability of the domain dictionary (e.g., a medical terminology domain dictionary) associated with his/her text input, the user may be losing the valuable resources of the available domain dictionary.
[0004] It is with respect to these and other considerations that the present invention has been made.
SUMMARY
[0005] Embodiments of the present invention solve the above and other problems by providing new word detection and domain dictionary recommendation. According to one embodiment, when text content is received according to a given language, for example, Chinese language, words are extracted from the content by analyzing the content according to a variety of rules, including a stop word rule, a lexicon sub-string and number sequence rule, a prefix/suffix rule and a language pattern rule. After words of low value for addition to a word lexicon as new words are eliminated, remaining words are ranked for inclusion into one or more word lexicons and/or particular domain dictionaries for future use for such functionalities as text input methods, spellchecking, grammar checking, auto entry completion, definition, and the like.
[0006] According to another embodiment, when a user is entering or editing text according to one or more prescribed domain dictionaries, a determination may be made as to whether more helpful domain dictionaries may be available. Words entered by the user are extracted and are compared with words contained in a variety of available domain dictionaries. If a determination is made that words entered by the user have a high degree of association with a domain dictionary not in use by the user, that domain dictionary may be recommended to the user to increase the accuracy of the user's input of additional text and editing of existing text.
[0007] The details of one or more embodiments are set forth in the accompanying drawings and description below. Other features and advantages will be apparent from a reading of the following detailed description and a review of the associated drawings. It is to be understood that the following detailed description is explanatory only and is not restrictive of the invention as claimed.
[0008] This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended as an aid in determining the scope of the claimed subject matter.
BRIEF DESCRIPTION OF THE DRAWINGS
[0009] The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate various embodiments of the present invention.
[0010] Fig. 1 illustrates textual content according to a particular language, for example, Chinese language, displayed on a display screen of a tablet-type computing device from which one or more new words may be detected for inclusion in a given domain dictionary.
[0011] Fig. 2 illustrates a system architecture for receiving textual content from one or more sources and for detecting one or more new words from the textual content via a new word detection engine.
[0012] Fig. 3 is a flow chart of a method for detecting new words contained in a received or input text content selection.
[0013] Fig. 4 illustrates a system architecture for domain dictionary recommendation for received or input textual content. [0014] Fig. 5 is a flow chart of a method for recommending one or more domain dictionaries in association with received or input textual content.
[0015] Fig. 6 illustrates an example pop-up dialog for recommending a domain dictionary to a user in association with received or entered textual content.
[0016] Fig. 7 is a simplified block diagram illustrating example physical components of a computing device with which embodiments of the invention may be practiced.
[0017] Figs. 8A and 8B are simplified block diagrams of a mobile computing device with which embodiments of the present invention may be practiced.
[0018] Fig. 9 is a simplified block diagram of a distributed computing system in which embodiments of the present invention may be practiced.
DETAILED DESCRIPTION
[0019] As briefly described above, embodiments of the present invention are directed to providing new word detection and domain dictionary recommendation. When text content is received according to a given language, for example, Chinese language, words are extracted from the content by analyzing the content according to a variety of rules. After words of low value for addition to a given domain dictionary as new words are eliminated, remaining words are ranked for inclusion into one or more word lexicons and/or particular domain dictionaries for future use for such functionalities as text input methods, spellchecking, grammar checking, auto entry completion, definition, and the like. In addition, when a user is entering or editing text according to one or more prescribed domain dictionaries, a determination may be made as to whether more helpful domain dictionaries may be available. If a determination is made that words entered by the user have a high degree of association with a domain dictionary not in use by the user, that domain dictionary may be recommended to the user to increase the accuracy of the user's input of additional text and editing of existing text.
[0020] The following detailed description refers to the accompanying drawings. Wherever possible, the same reference numbers are used in the drawing and the following description to refer to the same or similar elements. While embodiments of the invention may be described, modifications, adaptations, and other implementations are possible. For example, substitutions, additions, or modifications may be made to the elements illustrated in the drawings, and the methods described herein may be modified by substituting, reordering, or adding stages to the disclosed methods. Accordingly, the following detailed description does not limit the invention, but instead, the proper scope of the invention is defined by the appended claims. [0021] Referring now to Fig. 1, a textual content selection 115 is illustrated on a display screen of a computing device 110 that may be read, edited, or otherwise utilized by a user according to a variety of software functionalities, for example, word processing applications, Internet-based applications, slide presentation applications, spreadsheet applications, desktop publishing applications, and the like. The computing device 110 illustrated in Fig. 1 is a tablet-type computing device, but as should be appreciated, the computing device 110 may take any suitable form, for example, a laptop computer, a desktop computer, a handheld computing device, for example, a smart telephone, and the like that is capable of allowing the textual content 115 to be displayed and utilized according to one or more software functionalities. As illustrated in Fig. 1, the textual content 115 is Chinese language content, but as should be understood, the textual content 115 may be provided according to any other language type desired by the user of the device 110. A source 120 is illustrated from which the textual content 115 may be obtained, for example, an Internet-based web page, a remotely stored document, an electronic mail message, a text message, a locally stored document, and the like.
[0022] As briefly described above, when textual content, such as the textual content illustrated in Fig. 1, is received from a source 120, the textual content may have one or more new words that may or may not be understood by the user or other receiving party and/or may not be included in a domain dictionary available to the user, for example, one or more domain dictionaries associated with the user's word processing application, or other software application with which the received content is to be utilized. Thus, various functionalities, for example, text input methods, spellchecking, grammar checking, auto entry completion, dictionary services, and the like may not be available for such new words. For example, the received textual content 115 may include a new word that is new to a given industry, for example, the software industry, Internet industry, or the like that may be understood by the receiving user, but that may not be included in a given domain dictionary for assisting the user in utilizing the new word according to available software functionalities.
[0023] Referring now to Fig. 2, according to embodiments of the present invention, textual content received from a variety of sources, for example, the web page 205, the electronic document 210, the electronic mail message 215, the text message 220, or other content sources 225 may be passed to a new word detection engine 230 for isolation of new words contained in the received textual content item and for inclusion in one or more word lexicons (lists of words) and/or given domain dictionaries (lists of words associated with a particular domain, e.g., medical terminology domain) for subsequent use in association with one or more software functionalities. For example, if a new word of "texting" is received in a textual content item from one or more sources, the new word may be understood by a receiving user, but the new word may not be included in any domain dictionaries associated with software functionalities in use by the user, for example, text input method applications, word processing applications, electronic mail applications, and the like. By isolation of and inclusion of the new word in a given domain dictionary, software functionalities associated with text content entry and editing may be utilized in association with the domain dictionary and the newly isolated and stored word. For example, if a user is subsequently enters or edits the example word "texting," a domain dictionary associated with the user's text input method application or word processing application to which the new word has been added may be utilized by the user's word processing application for assisting the user in properly entering the word, spelling the word, grammar checking use of the word in association with other words, providing dictionary services associated with the word, and the like.
[0024] According to embodiments, when textual content 115 is received or entered, as described herein, the new word detection engine 230 utilizes a variety of word detection rules/methods 235 for determining whether portions of the textual content include new words and for ranking determined new words for possible output to one or more domain dictionaries 265 for subsequent use. As described below, some of the rules/methods 235 may be used for eliminating candidate new words that are not considered meaningful for adding to a given domain dictionary as a new word.
[0025] Referring still to Fig. 2, the stop word rule 240 may be used for eliminating text strings having a head or tail associated with one or more prescribed stop words where such stop words may be considered noise level textual content items and not meaningful for inclusion in a given domain dictionary. For determining whether a portion of text may be a stop word, portions of the text content may be extracted and compared against lists of known stop words. For example, such stop words as commonly used transitional phrase words, articles, verbs, and the like, for example, "a," "and," "the," and the like, may be eliminated so that they are not needlessly analyzed further and added to a domain dictionary as a new word. As should be appreciated, these example stop words are merely three example English language stop words and are not exhaustive of the vast number of stop words that are utilized according to a variety of languages, for example, Chinese language, English language, French language, Arabic language, and the like and that are well known and well understood by those skilled in the art of linguistics.
[0026] A lexicon sub-string and number sequences rule 245 may be utilized for eliminating strings that are sub-strings of other words or number sequences contained in one or more domain dictionaries where the inclusion of such sub-strings do not provide for meaningful inclusion in one or more domain dictionaries. That is, character strings contained in a given text content that are merely sub-strings of words contained in a lexicon of words or sub-strings of a number sequence contained in a lexicon of words may be eliminated because they are of little value in adding to a domain dictionary or lexicon of words or terms as a new word or term. For example, of the word "diction" is found to be a sub-string of the word "dictionary" already included in one or more lexicons or domain dictionaries, then the sub-string of "diction" may be eliminated as a candidate new word. According to an embodiment, this rule may be advantageous because when a word is in a lexicon, and if one of its sub-strings is not in the lexicon, then the sub-string is not a meaningful word. Likewise, number sequences, for example, a number sequence of "2012" used for indicating a year may not be meaningful for adding to a lexicon or domain dictionary as a new word and thus may be eliminated. For determining whether a portion of text contains a sub-string or number sequence, portions of received or input text may be compared by the new word detection engine 230 against lists of known strings, and number sequences may be detected by determining one or more characters are sequences of numbers that are not part of a given word or term.
[0027] After some words, phrases or number sequences are eliminated as described above, statistical methods 250 may be utilized for scoring remaining candidates for possible inclusion in a lexicon of words or domain dictionary as described herein. As should be appreciated a variety of statistical methods may be employed for scoring a given word so that highly scored or ranked words may be included in the lexicon or domain dictionary and so that lower scored or ranked words may be discarded. For example, a term frequency for a given word may be determined, and words appearing very frequently in a given text selection may be scored highly. Such determinations may be refined by combining such determinations with other statistical information. For example, if a word has a high term frequency, but only appears in association with another word that is not considered meaningful, then the high term frequency for the word may be less important. For another example, the contextual independency of a word may be considered where a higher or lower score for the word may be determined based on the dependency or association of the word on or with other words in an analyzed text selection.
[0028] According to one embodiment, the statistical methods 250 allow for calculation of six (6) statistical scores for any candidate word w, composed of characters ci ... Cn. The statistical methods 250 may use lexicon sub-string and number sequences rule described above and the prefix/suffix rule described below for refining the statistical information determined for a given word.
[0029] A first statistical score for a given word may include a term frequency (TF) which may be determined for each word extracted from a received or input text content 1 15 as set out below. TF is the term frequency of the word and length is the textual length of the word.
[0030] TF(w) = tfw * length ',w [0031] A second statistical score may include a fair symmetric conditional probability (FSCP) may be determined for the word, and the FSCP may be used to measure the contextual independency of the word and the cohesiveness of generic n-gram (n >= 2) for the word relative to other words. The FSCP for the word may be determined as follows.
Figure imgf000008_0001
[0033] Avp = P(Cl ...C; ) * ?(Ci+1...CJ
n - l ,=1
[0034] A third statistical score may include an adapted mutual information (AMI) score. The AMI score allows a determination as to whether a character pattern for a given word cl ...ci is more complete in semantics than any substrings that compose the word, especially on longest composed substrings. The AMI score may be determined as follows.
Figure imgf000008_0002
[0036] A fourth statistical score may include a context entropy score. For a context entropy score neighboring words (x) of an analyzed word (w) are collected and the frequencies of the neighboring words (x) are determined. The context entropy of the analyzed word (w) may be determined as follows.
[0037] Hc (w) = -∑p(x)logp(x)
[0038] A fifth statistical score may include a prefix/suffix ratio of a given word relative to other words to with the given word is associated as a prefix/suffix. As set out above, an analyzed word may be discarded if it is determined merely to be a prefix or suffix of one or more other words in a given text selection. A prefix/suffix ratio for a given word may be determined as follows.
[0039] PSR(c ..cn ) = ma
Figure imgf000009_0001
[0040] A sixth statistical score for an analyzed word may include a biased mutual dependency (BMD) score for determining dependencies between analyzed words and a plurality of other words in a text selection. A BMD score for a given word may be determined as follows.
[0041] BMD{Cl...cn) =
Figure imgf000009_0002
[0042] According to this embodiment, after the six (6) statistical scores are determined for a given word, a language pattern rule may be used for adjusting the scores. For example, according to a Chinese language word analysis, a Chinese pattern rule may be used to adjust the scores using a linear model to adjust FSCP and AMI may as follows.
[0043] Score, (w) = FSCP(w) + delta fscD * Pattern(w) [0044] Scoremi (w) = AMI(w) + deltami * Pattern(w)
[0045] According to a Chinese pattern analysis example, a Chinese pattern analysis may not be used for term frequency (TF) score adjustment because TF(w) is typically a very large number, and the Pattern(w) is between 0~1. The delta fscp may be set to 0.01,
0.05, 0.1 for testing because the FSCP{cl...cn) i'& may not be too large (e.g., 0-0.4), and
Pattern(w) is typically very large (e.g., 0.6-1), so the delta fscp may not be set large to let the Pattern(w) become dominant. Such example parameters may be obtained by experimentation. Continuing with this example, the deltami may be set to 0.1, 0.5, 1 for testing because the AMI(w) is typically as large (e.g., 0.6-1) as (w). According to an embodiment, these parameters may be obtained by experimentation and testing.
[0046] As should be appreciated, when multiple statistical scores are determined for a given analyzed word, the multiple scores may be combined for obtaining a single score that may be used for determining whether the word should be added to a lexicon or domain dictionary. For example, continuing with the above example embodiment, a total score combining all the six (6) scores described above may be combined into a single score by a log-linear formula as follows. [0047] TOTAL(w) = AlTF(w) + A2FSCP(w) + A3AMI(w) + A4HC (w) + A5PSR(w) + A6BMD(w)
[0048] According to this example embodiment, the values of the 6 's may be obtained by numerical optimization over a number of training instances. There are positive training instances (sequences in which are words determined as words for adding to a lexicon) and negative instances (sequences in which words are discarded). Positive training instances may be provided by automated and human selection. The negative training instances, which may not be reliably provided by human selection, may be selected from lists of candidate words ranked by each of the six statistical scores/measures described above. If a candidate word is ranked low by at least three statistical measures, then it may be selected as a negative training instance.
[0049] Referring still to Fig. 2, the prefix/suffix rule 255 provides for eliminating words or phrases that are prefixes or suffixes or other words or phrases. After a score is calculated for a given word as described above, some candidates may be eliminated via the prefix/suffix rule 255 where a score for a prefix or suffix word is no greater than the words containing them. That is, the sub-string comprising the prefix or suffix is less meaningful (based on scoring) than the words to which the sub-strings belong. Thus, a string (word) comprising including such a prefix or suffix should not be split to the sub-string (prefix or suffix), and therefore, the sub-string may be removed as a candidate word for inclusion in a lexicon or domain dictionary.
[0050] As described above, the language pattern rule 260 allows for analyzing the patterns of characters for adjusting scores determined for candidate words. For example, if a word contains characters "abc," the language pattern rule may be used for determining a probability that a character may be in the first position or in the middle or in the tail of a candidate word for adjusting the score for the candidate word. For example, according to an example embodiment using a Chinese pattern rule, a text character's position may be used for determining the probability the character is a Chinese character. According to this example Chinese language embodiment, a unigram statistic is first calculated from original lexicon and trigram statistic to get the list of <word, tf> pairs. Next, a character statistic is calculated from a unigram statistic to get a list of < char,< headtf , mid tf , tail tf » pairs. That is, for a character, its frequency is calculated in the head, middle and tail position in the unigram statistic, respectively. These steps comprise preprocessing for the Chinese pattern rule. Then, for each character, the probability of each position in which the character may occur may be calculated as follows.
[0051] P(pos) =
head,f + mid,f + tail. [0052] The list of < char,< head prob, mid prob , tail prob » pairs is thus obtained. Two conditions may then be considered, for example, a word w = clc2c3...cn . One condition may include only head and tail probabilities as follows.
[0053] Pattern(w) = ( ( , headprob) * P(cn , tailprob) [0054] Another condition may include all positions as follows. [0055] Pattern(w) = (P(Cl, headpr * P(c2 dpr * .. * P(cn_^
[0056] Fig. 3 is a flow chart of a method for detecting new words contained in a received or input text content selection. The method 300 begins at start operation 305 and proceeds to operation 310 where textual content in the form of a number of words or character strings is received from one or more sources 205, 210, 215, 220, 225, as illustrated and described above with reference to Fig. 2. Word segmentation may next be performed for separating input or received textual content into individual words for subsequent analysis of segmented words as described below. As should be appreciated, textual content may be broken into words according to a variety of methods. Textual content may be broken into words via one or more word breaker methods, for example, by breaking words at spaces between groupings of characters or before or after known head and tail characters. However, for some languages such as Chinese, traditional word breaking methods are less effective because spaces and other demarcation indicators are not provided between words. In such cases, other methods may be utilized for quickly grouping characters into words.
[0057] According to one method, a positive maximum match method may be employed for segmenting such language types (e.g., Chinese) into words. The positive maximum match method is not sensitive to the size of a given lexicon of words. According to this method, characters are grouped together one by one up to a maximum number (e.g., 9 characters), and each grouping may be treated as a word for comparing against a lexicon for isolating the grouping as a word. Regardless of the method of segmenting textual content into words, once textual content is segmented into words, the segmented words analyzed for determination as a new word for inclusion into a word lexicon or domain dictionary as described below.
[0058] At operation 315, the stop word rule 240 may be run against the received textual content for eliminating one or more stop words contained in the received textual content. At operation 320, stop words isolated and determined for the received textual content are eliminated as being of low value or meaningless for new word detection and determination.
[0059] At operation 325, the lexicon sub-string and number sequence rule may be run against the remaining textual content, and at operation 330, unnecessary sub-strings may be eliminated from the remaining textual content as lacking importance or meaning in the determination of new words contained in the received textual content. [0060] At operation 335, the statistical methods 250, described above, are run against remaining textual content for scoring words contained in the remaining textual content for determination as new words for including in one or more lexicons and/or domain dictionaries.
[0061] At operation 340, the prefix/suffix rule 255 may be run against scored words extracted from the received textual content. At operation 345, unnecessary prefixes and suffixes may be eliminated for further reducing the number of textual content items that may be determined as new words contained in the received textual content.
[0062] At operation 350, language pattern analysis, for example, Chinese language pattern analysis, may be run on remaining words for adjusting scores applied to the remaining words extracted from the received textual content. At operation 355, the remaining words are ranked for inclusion in one or more word lexicons and/or domain dictionaries as new words, and at operation 360, highly ranked words may be selected and stored as new words for inclusion in one or more word lexicons and/or domain dictionaries. As should be appreciated, the scores and associated ranking that are required for including a word in a given lexicon or domain dictionary may be different for different languages and domain types. That is, scores and associated ranking may be determined acceptable for word detection and selection at varying levels for making the word detection methods described above more or less selective as desired for different text content. According to one embodiment, after one or more words are added to a given word lexicon or domain dictionary, the word lexicon or domain dictionary may be recommended to a user for association with a given software functionality, for example text input methods or word processing. The method 300 ends at operation 375.
[0063] As briefly described above, according to embodiments, users enter and edit textual content selections entered via various input methods and received from various sources. A given software application in use by a user, for example, a word processing application, slide presentation application, Internet web page functionality application, and the like may be associated with a given domain dictionary, for example, a standard grammar lexicon associated with a given language, for example, Chinese, English, French, Arabic, or the like. However, if the textual content being entered and/or edited by the user is more closely associated with a particular domain dictionary, for example, a medical terminology domain dictionary, an engineering terminology domain dictionary, a biological sciences domain dictionary, or the like, the user may be losing valuable resources of one of these particular or specialized domain dictionaries that may be available to the user for use in association with the entered and/or edited textual content.
[0064] For example, if the user is entering and/or editing textual content that contains a number of medical terms, if the user has not associated the software application in use, for example, a word processing application, with an available medical terminology domain dictionary, then valuable resources, for example, input method assistance, spellchecking, grammar checking, auto entry completion, dictionary services, and the like may not be available to the user in association with the entered and/or received textual content. According to embodiments, textual content entered and/or edited by a user may be analyzed for association with one or more domain dictionaries not in use by the user in association with the textual content, and one or more domain dictionaries that may be helpful in association with the entered and/or edited textual content may be recommended to the user.
[0065] Referring now to Fig. 4, example textual content 415 is illustrated on the display screen of a computing device 410 being entered and/or edited, and/or received by a user for use in association with one or more software functionalities. A number of domain dictionaries 420, 425, 430, 435 are illustrated that may be associated with the textual content 415 for assisting a user in association with an input method 440 with which the user may input additional textual content, edit input or edit received textual content. For example, an input method editor (IME) may be associated with an input device (e.g., a keyboard) for assisting a user to input text for a language not otherwise enabled by the input device. For example, an English language keyboard may be associated with a Chinese language IME. A domain dictionary associated with the IME may assist in input and editing of text entered via the Chinese language IME in association with the English language keyboard. As illustrated in Fig. 4, the textual content 415 is provided according to the Chinese language. As should be appreciated, the Chinese language is but one example of a variety of different textual content languages that may be utilized in accordance with embodiments of the present invention for recommending one or more available domain dictionaries for use in association with a given textual content.
[0066] Referring still to Fig. 4, the domain dictionary 420 may be a domain dictionary containing standard language lexicon, grammar and dictionary services associated with a given language, for example, the Chinese language, the English language, the French language, and the like. On the other hand, the domain dictionaries 425, 430, 435 may be associated with particular domain types, for example, medical terminology domain, engineering terminology domain, biological sciences terminology domain, and the like. As should be appreciated, a great number of domain dictionaries may be provided for use in association with textual content that are associated with a variety of different topics and/or ideas.
[0067] Referring still to Fig. 4, the domain dictionary recommendation engine 445 is illustrative of a software module containing sufficient computer executable instructions for analyzing a textual content and for comparing the textual content to one or more domain dictionaries for recommending one or more domain dictionaries for use in association with the textual content. According to one embodiment, when a user is using a given input method editor (IME), for example, a Chinese IME with an English language keyboard, text being input or edited by a user may be analyzed for recommending one or more additional domain dictionaries that may be associated with the IME in use for allowing the user greater input and/or editing accuracy via the one or more additional domain dictionaries.
[0068] Fig. 5 is a flow chart of a method for recommending one or more domain dictionaries in association with received or input textual content. The method 500 begins at start operation 505 and proceeds to operation 510 where textual content input and/or received by a user is received by the domain recommendation engine 445. According to an embodiment, domain words are extracted from user input history (including text presently being entered, previously entered text, or text received from one or more sources) for comparison with words contained in one or more domain dictionaries that may be recommended to the user for use with the user's input method.
[0069] At operation 515, word segmentation is performed for separating input or received textual content into individual words for subsequent comparison of segmented words against words contained in one or more domain dictionaries 420, 425, 430, 435. As should be appreciated, user input history may be broken into words for comparison against words contained in various domain dictionaries according to a variety of methods. For example, words may be isolated from user input according to the methods described above with reference to Figs. 1 - 3. Alternatively, user input may be broken into words via one or more word breaker methods, for example, by breaking words at spaces between groupings of characters or before or after known head and tail characters.
[0070] According to some languages, for example, Chinese, traditional word breaking methods are less effective because spaces and other demarcation indicators are not provided between words. In such cases, other methods may be utilized for quickly grouping characters into words. According to one method, a positive maximum match method may be employed for segmenting such language types (e.g., Chinese) into words. The positive maximum match method is not sensitive to the size of a given lexicon. According to this method, characters are grouped together one by one up to a maximum number (e.g., 9 characters), and each grouping may be treated as a word for comparing against a lexico for isolating the grouping as a word. Regardless of the method of segmenting textual content into words, once textual content is segmented into words, the segme ted words may be compared against words contained in any number of domain dictionaries, as described below, for determining whether a given domain dictionary should be recommended to the user for associating with the user's current input method.
[0071] At operation 520, words having low value and/or low meaning with respect to a comparison against words contained in the one or more domain dictionaries may be eliminated. As should be appreciated, elimination of low value or meaningless words at operation 520 may be performed according to a variety of methods, including the word detection rules and methods 235 described above with reference to Fig. 2.
[0072] At operation 525, the domain dictionaries and associated lexicons 420, 425, 430, 435 available for association with the input and/or received textual content 415 are obtained by the domain recommendation engine 445. As should be appreciated, an almost limitless number of domain dictionaries may be obtained having associated lexicons related to many different topics and ideas.
[0073] At operation 530, the words segmented from the input and/or received textual content 415 are analyzed for term frequency by determining the frequency with which particular words are used in the input and/or received textual content 415. For example, if the word "texting" is included only once in the textual content 415, then that word will have a term frequency of one. On the other hand, if the word "texting" is used ten times in the textual content 415, then a term frequency of ten will be applied to that word. According to embodiments, if a given word has a low term frequency, that word may be discarded from further analysis for association with a particular domain dictionary. As should be appreciated, the term frequency utilized for determining the value of a given word for comparison against words contained in one or more domain dictionaries may be varied based on a variety of factors. For example, in some instances a particular word may have a low term frequency, but nonetheless may be kept for further analysis. For example, a word such as "penicillin" may have a low term frequency in a given textual content, but the word may be kept due to its uniqueness, for comparison against words in a medical terminology domain dictionary.
[0074] At operation 535, words extracted from the input and/or received textual content having a sufficiently high term frequency are compared against words contained in one or more different domain dictionaries. Word pairs are created by pairing words extracted from the input and/or received textual content with matching words contained in the one or more domain dictionaries considered by the domain recommendation engine 445. For example, if the word "penicillin" is extracted from the textual content 415, and is found to match the same word contained in a medical terminology domain dictionary 430, a word pair associating the textual content 415 entered and/or received by the user with the example medical terminology domain dictionary 430 is created.
[0075] At operation 540, all the compared domains are sorted and ranked according to the number of matched word pairs in the analyzed text content, and a top number of domain dictionaries is determined for words extracted from the input and/or received textual content 415. According to one embodiment, the top number (e.g., two) domains are selected as domain candidates to recommend based on a threshold count of matched word pairs between the received or input text content and the analyzed domain dictionaries. As should be appreciated, the threshold count of matched word pairs may be determined via experimentation and testing. An example and suitable algorithm for determining a top number of domain dictionaries is as follows.
[0076] Score(text,
Figure imgf000017_0001
Domains (text) = { {di , dj}, i Score(text, dt ) > Score(text, dj ) >
{Score(text, dk) \ k & {1 ~ 47}, k≠ i,k≠ j} and{Score(text, d;) +
[0077]
Score(text, d ,) > threshold
φ, otherwise [0078] For example, all domain dictionaries containing a prescribed number of word pairs associated with the input and/or received textual content may be determined for recommendation to the user. For example, if the textual content input and/or received by the user contains a number of medical and scientific terms, then a number of word pairs may be determined for words extracted from the textual content 415 in comparison to both a medical terminology domain dictionary and a scientific terminology domain dictionary. Thus, both the example medical terminology domain dictionary and the scientific terminology domain dictionary may be selected as top domain dictionaries for recommendation to the user. On the other hand, if the analyzed textual content 415 has very few engineering terms, resulting in very few word pairs from the analyzed textual content 415 an example engineering terminology domain dictionary, then the example engineering domain dictionary may not be ranked highly for presentation to the user as a recommended domain dictionary.
[0079] As should be appreciated, the ranking of domain dictionaries for a possible recommendation to a user may be performed according to a variety of prescribed ranking levels. For example, it may be determined that any domain dictionary having five or more word pairs associated with an analyzed textual content 415 may be recommended to a user. On the other hand, it may be determined that there must be more than 25 word pairings between a given domain dictionary and an analyzed textual content for recommendation of the associated domain dictionary.
[0080] At operation 545, one or more domain dictionaries may be recommended to the user for association with the user's software functionalities, for example, an input method in use by the user, or the one or more domain dictionaries may be recommended for association with one or more software applications, such as word processing applications, slide presentation applications, Internet browsing applications, and the like. That is, the one or more domain dictionaries may be recommended to the user to allow the user to perform his/her text input and/or editing more efficiently through the use of the recommended domain dictionaries that may help him with the words he enters or edits. An example recommendation user interface component is described below with reference to Fig. 6. The method 500 ends at operation 595.
[0081] Fig. 6 illustrates an example pop-up dialog for recommending a domain dictionary to a user in association with received or entered textual content. As illustrated in Fig. 6, once one or more domain dictionaries are determined for recommendation to a user in association with a given software functionality and/or textual content, the one or more domain dictionaries may be recommended to the user by the domain dictionary recommendation engine 445. As illustrated in Fig. 6, a pop-up dialog 610 is illustrated for including a recommendation 615 of a given domain dictionary to the user. For example, the recommendation of "It appears you are working in a medical domain. To improve accuracy, we suggest you turn on the medical domain. Do you want to turn on the medical domain?" As illustrated, "Yes" and "No" buttons are provided for allowing a user to selectively turn on or reject the turning on of the recommended domain dictionary. As should be appreciated, the pop-up dialog 610 and the associated recommendation language are for purposes of example only and are not limiting of the vast number of user interface components that may be utilized for recommending a given domain dictionary in association with a given software functionality or textual content.
[0082] According to an alternate embodiment, once the recommendation engine 445 determines that a given domain dictionary may be recommended for use in association with a given software functionality and/or textual content, the recommended domain dictionary may be automatically associated with the given software functionality and/or textual content without user input. That is, some software functionalities, for example, input method applications and word processing applications, may be set up for automatically associating recommended domain dictionaries with textual content items for assisting users with those textual content items.
[0083] Once a given domain dictionary is associated with a given software functionality and/or textual content item, then the resources of that domain dictionary may be made available for use in association with textual content, including text input, spellchecking, grammar checking, auto entry completion, dictionary services, and the like.
[0084] The embodiments and functionalities described herein may operate via a multitude of computing systems including, without limitation, desktop computer systems, wired and wireless computing systems, mobile computing systems (e.g., mobile telephones, netbooks, tablet or slate type computers, notebook computers, and laptop computers), hand-held devices, multiprocessor systems, microprocessor-based or programmable consumer electronics, minicomputers, and mainframe computers. In addition, the embodiments and functionalities described herein may operate over distributed systems (e.g., cloud-based computing systems), where application functionality, memory, data storage and retrieval and various processing functions may be operated remotely from each other over a distributed computing network, such as the Internet or an intranet. User interfaces and information of various types may be displayed via on-board computing device displays or via remote display units associated with one or more computing devices. For example user interfaces and information of various types may be displayed and interacted with on a wall surface onto which user interfaces and information of various types are projected. Interaction with the multitude of computing systems with which embodiments of the invention may be practiced include, keystroke entry, touch screen entry, voice or other audio entry, gesture entry where an associated computing device is equipped with detection (e.g., camera) functionality for capturing and interpreting user gestures for controlling the functionality of the computing device, and the like. Figs. 7 through 9 and the associated descriptions provide a discussion of a variety of operating environments in which embodiments of the invention may be practiced. However, the devices and systems illustrated and discussed with respect to Figs. 7 through 9 are for purposes of example and illustration and are not limiting of a vast number of computing device configurations that may be utilized for practicing embodiments of the invention, described herein.
[0085] Fig. 7 is a block diagram illustrating example physical components (i.e., hardware) of a computing device 700 with which embodiments of the invention may be practiced. The computing device components described below may be suitable for the computing devices described above. In a basic configuration, the computing device 700 may include at least one processing unit 702 and a system memory 704. Depending on the configuration and type of computing device, the system memory 704 may comprise, but is not limited to, volatile storage (e.g., random access memory), non-volatile storage (e.g., read-only memory), flash memory, or any combination of such memories. The system memory 704 may include an operating system 705 and one or more program modules 706 suitable for running software applications 720 such as the new word detection engine 230 and the domain recommendation engine 445. The operating system 705, for example, may be suitable for controlling the operation of the computing device 700. Furthermore, embodiments of the invention may be practiced in conjunction with a graphics library, other operating systems, or any other application program and is not limited to any particular application or system. This basic configuration is illustrated in Fig. 7 by those components within a dashed line 708. The computing device 700 may have additional features or functionality. For example, the computing device 700 may also include additional data storage devices (removable and/or non-removable) such as, for example, magnetic disks, optical disks, or tape. Such additional storage is illustrated in Fig. 7 by a removable storage device 709 and a non-removable storage device 710.
[0086] As stated above, a number of program modules and data files may be stored in the system memory 704. While executing on the processing unit 702, the program modules 706, such as the new word detection engine 230 and domain recommendation engine 445 may perform processes including, for example, one or more of the stages of the methods 300 and 500, respectively. The aforementioned process is an example, and the processing unit 702 may perform other processes. Other program modules that may be used in accordance with embodiments of the present invention may include electronic mail and contacts applications, word processing applications, spreadsheet applications, database applications, slide presentation applications, drawing or computer-aided application programs, etc.
[0087] Furthermore, embodiments of the invention may be practiced in an electrical circuit comprising discrete electronic elements, packaged or integrated electronic chips containing logic gates, a circuit utilizing a microprocessor, or on a single chip containing electronic elements or microprocessors. For example, embodiments of the invention may be practiced via a system-on-a-chip (SOC) where each or many of the components illustrated in Fig. 7 may be integrated onto a single integrated circuit. Such an SOC device may include one or more processing units, graphics units, communications units, system virtualization units and various application functionality all of which are integrated (or "burned") onto the chip substrate as a single integrated circuit. When operating via an SOC, the functionality, described herein, with respect to the new word detection engine 230 and the domain recommendation engine 445 may be operated via application-specific logic integrated with other components of the computing device 700 on the single integrated circuit (chip). Embodiments of the invention may also be practiced using other technologies capable of performing logical operations such as, for example, AND, OR, and NOT, including but not limited to mechanical, optical, fluidic, and quantum technologies. In addition, embodiments of the invention may be practiced within a general purpose computer or in any other circuits or systems.
[0088] The computing device 700 may also have one or more input device(s) 712 such as a keyboard, a mouse, a pen, a sound input device, a touch input device, etc. The output device(s) 714 such as a display, speakers, a printer, etc. may also be included. The aforementioned devices are examples and others may be used. The computing device 700 may include one or more communication connections 716 allowing communications with other computing devices 718. Examples of suitable communication connections 716 include, but are not limited to, RF transmitter, receiver, and/or transceiver circuitry; universal serial bus (USB), parallel, or serial ports, and other connections appropriate for use with the applicable computer readable media.
[0089] Embodiments of the invention, for example, may be implemented as a computer process (method), a computing system, or as an article of manufacture, such as a computer program product or computer readable media. The computer program product may be a computer storage media readable by a computer system and encoding a computer program of instructions for executing a computer process.
[0090] The term computer readable media as used herein may include computer storage media and communication media. Computer storage media may include volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information, such as computer readable instructions, data structures, program modules, or other data. The system memory 704, the removable storage device 709, and the non-removable storage device 710 are all computer storage media examples (i.e., memory storage.) Computer storage media may include, but is not limited to, RAM, ROM, electrically erasable read-only memory (EEPROM), flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store information and which can be accessed by the computing device 700. Any such computer storage media may be part of the computing device 700.
[0091] Communication media may be embodied by computer readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave or other transport mechanism, and includes any information delivery media. The term "modulated data signal" may describe a signal that has one or more characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media may include wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, radio frequency (RF), infrared, and other wireless media.
[0092] Figs. 8A and 8B illustrate a mobile computing device 800, for example, a mobile telephone, a smart phone, a tablet personal computer, a laptop computer, and the like, with which embodiments of the invention may be practiced. With reference to Fig. 8A, an exemplary mobile computing device 800 for implementing the embodiments is illustrated. In a basic configuration, the mobile computing device 800 is a handheld computer having both input elements and output elements. The mobile computing device 800 typically includes a display 805 and one or more input buttons 810 that allow the user to enter information into the mobile computing device 800. The display 805 of the mobile computing device 800 may also function as an input device (e.g., a touch screen display). If included, an optional side input element 815 allows further user input. The side input element 815 may be a rotary switch, a button, or any other type of manual input element. In alternative embodiments, mobile computing device 800 may incorporate more or less input elements. For example, the display 805 may not be a touch screen in some embodiments. In yet another alternative embodiment, the mobile computing device 800 is a portable phone system, such as a cellular phone. The mobile computing device 800 may also include an optional keypad 835. Optional keypad 835 may be a physical keypad or a "soft" keypad generated on the touch screen display. In various embodiments, the output elements include the display 805 for showing a graphical user interface (GUI), a visual indicator 820 (e.g., a light emitting diode), and/or an audio transducer 825 (e.g., a speaker). In some embodiments, the mobile computing device 800 incorporates a vibration transducer for providing the user with tactile feedback. In yet another embodiment, the mobile computing device 800 incorporates input and/or output ports, such as an audio input (e.g., a microphone jack), an audio output (e.g., a headphone jack), and a video output (e.g., a HDMI port) for sending signals to or receiving signals from an external device.
[0093] Fig. 8B is a block diagram illustrating the architecture of one embodiment of a mobile computing device. That is, the mobile computing device 800 can incorporate a system (i.e., an architecture) 802 to implement some embodiments. In one embodiment, the system 802 is implemented as a "smart phone" capable of running one or more applications (e.g., browser, e-mail, calendaring, contact managers, messaging clients, games, and media clients/players). In some embodiments, the system 802 is integrated as a computing device, such as an integrated personal digital assistant (PDA) and wireless phone.
[0094] One or more application programs 866 may be loaded into the memory 862 and run on or in association with the operating system 864. Examples of the application programs include phone dialer programs, e-mail programs, personal information management (PIM) programs, word processing programs, spreadsheet programs, Internet browser programs, messaging programs, and so forth. The system 802 also includes a nonvolatile storage area 868 within the memory 862. The non-volatile storage area 868 may be used to store persistent information that should not be lost if the system 802 is powered down. The application programs 866 may use and store information in the non-volatile storage area 868, such as electronic mail or other messages used by an electronic mail application, and the like. A synchronization application (not shown) also resides on the system 802 and is programmed to interact with a corresponding synchronization application resident on a host computer to keep the information stored in the non-volatile storage area 868 synchronized with corresponding information stored at the host computer. As should be appreciated, other applications may be loaded into the memory 862 and run on the mobile computing device 800, including the new word detection engine 230 and domain recommendation engine 445, described herein.
[0095] The system 802 has a power supply 870, which may be implemented as one or more batteries. The power supply 870 might further include an external power source, such as an AC adapter or a powered docking cradle that supplements or recharges the batteries. The system 802 may also include a radio 872 that performs the function of transmitting and receiving radio frequency communications. The radio 872 facilitates wireless connectivity between the system 802 and the "outside world", via a communications carrier or service provider. Transmissions to and from the radio 872 are conducted under control of the operating system 864. In other words, communications received by the radio 872 may be disseminated to the application programs 866 via the operating system 864, and vice versa.
[0096] The radio 872 allows the system 802 to communicate with other computing devices, such as over a network. The radio 872 is one example of communication media. Communication media may typically be embodied by computer readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave or other transport mechanism, and includes any information delivery media. The term "modulated data signal" means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media. The term computer readable media as used herein includes both storage media and communication media.
[0097] This embodiment of the system 802 provides notifications using the visual indicator 820 that can be used to provide visual notifications and/or an audio interface 874 producing audible notifications via the audio transducer 825. In the illustrated embodiment, the visual indicator 820 is a light emitting diode (LED) and the audio transducer 825 is a speaker. These devices may be directly coupled to the power supply 870 so that when activated, they remain on for a duration dictated by the notification mechanism even though the processor 860 and other components might shut down for conserving battery power. The LED may be programmed to remain on indefinitely until the user takes action to indicate the powered-on status of the device. The audio interface 874 is used to provide audible signals to and receive audible signals from the user. For example, in addition to being coupled to the audio transducer 825, the audio interface 874 may also be coupled to a microphone to receive audible input, such as to facilitate a telephone conversation. In accordance with embodiments of the present invention, the microphone may also serve as an audio sensor to facilitate control of notifications, as will be described below. The system 802 may further include a video interface 876 that enables an operation of an on-board camera 830 to record still images, video stream, and the like.
[0098] A mobile computing device 800 implementing the system 802 may have additional features or functionality. For example, the mobile computing device 800 may also include additional data storage devices (removable and/or non-removable) such as, magnetic disks, optical disks, or tape. Such additional storage is illustrated in Figure 8B by the non- volatile storage area 868. Computer storage media may include volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information, such as computer readable instructions, data structures, program modules, or other data.
[0099] Data/information generated or captured by the mobile computing device 800 and stored via the system 802 may be stored locally on the mobile computing device 800, as described above, or the data may be stored on any number of storage media that may be accessed by the device via the radio 872 or via a wired connection between the mobile computing device 800 and a separate computing device associated with the mobile computing device 800, for example, a server computer in a distributed computing network, such as the Internet. As should be appreciated such data/information may be accessed via the mobile computing device 800 via the radio 872 or via a distributed computing network. Similarly, such data/information may be readily transferred between computing devices for storage and use according to well-known data/information transfer and storage means, including electronic mail and collaborative data/information sharing systems.
[00100] Fig. 9 illustrates one embodiment of the architecture of a system for providing the functionality of the new word detection engine 230 and domain recommendation engine 445 to one or more client devices, as described above. Content developed, interacted with or edited in association with the new word detection engine 230 and domain recommendation engine 445 may be stored in different communication channels or other storage types. For example, various content and documents may be stored using a directory service 922, a web portal 924, a mailbox service 926, an instant messaging store 928, or a social networking site 930. The new word detection engine 230 and domain recommendation engine 445 may use any of these types of systems or the like for enabling co-authoring conflict resolution via comments, as described herein. As one example, the server 920 may be a web server providing the functionality of the new word detection engine 230 and domain recommendation engine 445 over the web. The server 920 may provide the functionality of the new word detection engine 230 and domain recommendation engine 445 over the web to clients through a network 915. By way of example, the client computing device 918 may be implemented as the computing device 900 and embodied in a personal computer 918a, a tablet computing device 918b and/or a mobile computing device 918c (e.g., a smart phone). Any of these embodiments of the client computing device 918 may obtain content from the store 916. In various embodiments, the types of networks used for communication between the computing devices that make up the present invention include, but are not limited to, an internet, an intranet, wide area networks (WAN), local area networks (LAN), and virtual private networks (VPN). In the present application, the networks include the enterprise network and the network through which the client computing device accesses the enterprise network (i.e., the client network). In one embodiment, the client network is part of the enterprise network. In another embodiment, the client network is a separate network accessing the enterprise network through externally available entry points, such as a gateway, a remote access protocol, or a public or private internet address.
[00101] The description and illustration of one or more embodiments provided in this application are not intended to limit or restrict the scope of the invention as claimed in any way. The embodiments, examples, and details provided in this application are considered sufficient to convey possession and enable others to make and use the best mode of claimed invention. The claimed invention should not be construed as being limited to any embodiment, example, or detail provided in this application. Regardless of whether shown and described in combination or separately, the various features (both structural and methodological) are intended to be selectively included or omitted to produce an embodiment with a particular set of features. Having been provided with the description and illustration of the present application, one skilled in the art may envision variations, modifications, and alternate embodiments falling within the spirit of the broader aspects of the claimed invention and the general inventive concept embodied in this application that do not depart from the broader scope.

Claims

CLAIM
1. A method of detecting a word for inclusion in a given word lexicon; comprising: receiving a text selection;
extracting one or more words from the text selection;
eliminating one or more of the extracted words where the eliminated one or more words are not considered valuable for inclusion in the given word lexicon;
ranking a remaining one or more of the extracted words for inclusion in the word lexicon; and
selecting one or more of the remaining one or more of the extracted words for inclusion in the word lexicon based on a ranking applied to the selected one or more of the remaining one or more of the extracted words.
2. The method of claim 1, prior to extracting one or more words from the text selection, segmenting the received text selection into one or more words.
3. The method of claim 2, wherein segmenting the text selection into one or more words includes creating a plurality of character groupings from the text selection and comparing the plurality of character groupings with one or more word lexicons for determining whether any of the plurality of character groupings is a known word.
4. The method of claim 1, wherein ranking a remaining one or more of the extracted words for inclusion in the word lexicon includes scoring each of the remaining one or more of the extracted words according to one or more scoring attributes.
5. The method of claim 4, wherein scoring each of the remaining one or more of the extracted words according to one or more scoring attributes includes determining a term frequency for each of the remaining one or more of the extracted words in the received text content selection.
6. A method of recommending a domain dictionary; comprising;
receiving a text selection;
segmenting the text selection into one or more words;
receiving one or more domain lexicons;
comparing each of the one or more words segmented from the text selection with each of one or more words contained in the one or more domain lexicons; and
ranking each of the one or more domain lexicons based on an association between each of the one or more words segmented from the text selection and one or more words contained in each of the one or more domain lexicons; selecting one or more of the one or more domain lexicons based on a ranking applied to the selected one or more of the one or more domain lexicons; and
recommending the selected one or more of the one or more domain lexicons for use with inputting or editing the text selection.
7. The method of claim 6, wherein comparing each of the one or more words segmented from the text selection with each of one or more words contained in the one or more domain lexicons includes determining a term frequency for each of the one or more words segmented from the text selection.
8. The method of claim 7, wherein comparing each of the one or more words segmented from the text selection with each of one or more words contained in the one or more domain lexicons further includes developing a word pair for each of the one or more words segmented from the text selection with a matching word found in any of the one or more domain lexicons.
9. The method of claim 8, wherein ranking each of the one or more domain lexicons based on an association between each of the one or more words segmented from the text selection and one or more words contained in each of the one or more domain lexicons includes ranking each of the one or more domain lexicons based on a number of word pairs developed for each of the one or more domain lexicons from words segmented from the text selection and words contained in each of the one or more domain lexicons.
10. A computer readable medium containing computer executable instructions which when executed by a computer perform a method for detecting a word for inclusion in a given word lexicon; comprising:
receiving a text content selection;
segmenting and extracting one or more words from the text content selection; eliminating one or more of the extracted words where the eliminated one or more words are not considered valuable for inclusion in the given word lexicon;
ranking a remaining one or more of the extracted words for inclusion in the word lexicon including scoring each of the remaining one or more of the extracted words according to one or more scoring attributes;
selecting one or more of the remaining one or more of the extracted words for inclusion in the word lexicon based on a ranking applied to the selected one or more of the remaining one or more of the extracted words; and
recommending the word lexicon for association with one or more software functionalities for improving software functionality performance.
PCT/US2013/055500 2012-08-24 2013-08-19 Word detection and domain dictionary recommendation WO2014031505A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201380044316.4A CN104584003B (en) 2012-08-24 2013-08-19 Word is detected and domain dictionary is recommended

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US13/594,473 2012-08-24
US13/594,473 US9229924B2 (en) 2012-08-24 2012-08-24 Word detection and domain dictionary recommendation

Publications (1)

Publication Number Publication Date
WO2014031505A1 true WO2014031505A1 (en) 2014-02-27

Family

ID=49083782

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2013/055500 WO2014031505A1 (en) 2012-08-24 2013-08-19 Word detection and domain dictionary recommendation

Country Status (3)

Country Link
US (2) US9229924B2 (en)
CN (1) CN104584003B (en)
WO (1) WO2014031505A1 (en)

Cited By (160)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103825952A (en) * 2014-03-04 2014-05-28 百度在线网络技术(北京)有限公司 Cell lexicon pushing method and server
WO2014159473A3 (en) * 2013-03-14 2014-11-20 Apple Inc. Automatic supplementation of word correction dictionaries
US9412392B2 (en) 2008-10-02 2016-08-09 Apple Inc. Electronic devices with voice command and contextual data processing capabilities
US9548050B2 (en) 2010-01-18 2017-01-17 Apple Inc. Intelligent automated assistant
US9620104B2 (en) 2013-06-07 2017-04-11 Apple Inc. System and method for user-specified pronunciation of words for speech synthesis and recognition
US9626955B2 (en) 2008-04-05 2017-04-18 Apple Inc. Intelligent text-to-speech conversion
US9633674B2 (en) 2013-06-07 2017-04-25 Apple Inc. System and method for detecting errors in interactions with a voice-based digital assistant
US9633660B2 (en) 2010-02-25 2017-04-25 Apple Inc. User profiling for voice input processing
US9646609B2 (en) 2014-09-30 2017-05-09 Apple Inc. Caching apparatus for serving phonetic pronunciations
US9646614B2 (en) 2000-03-16 2017-05-09 Apple Inc. Fast, language-independent method for user authentication by voice
US9668024B2 (en) 2014-06-30 2017-05-30 Apple Inc. Intelligent automated assistant for TV user interactions
US9668121B2 (en) 2014-09-30 2017-05-30 Apple Inc. Social reminders
US9697820B2 (en) 2015-09-24 2017-07-04 Apple Inc. Unit-selection text-to-speech synthesis using concatenation-sensitive neural networks
US9697195B2 (en) 2014-10-15 2017-07-04 Microsoft Technology Licensing, Llc Construction of a lexicon for a selected context
US9721566B2 (en) 2015-03-08 2017-08-01 Apple Inc. Competing devices responding to voice triggers
US9798393B2 (en) 2011-08-29 2017-10-24 Apple Inc. Text correction processing
US9818400B2 (en) 2014-09-11 2017-11-14 Apple Inc. Method and apparatus for discovering trending terms in speech requests
US9842101B2 (en) 2014-05-30 2017-12-12 Apple Inc. Predictive conversion of language input
US9842105B2 (en) 2015-04-16 2017-12-12 Apple Inc. Parsimonious continuous-space phrase representations for natural language processing
US9865280B2 (en) 2015-03-06 2018-01-09 Apple Inc. Structured dictation using intelligent automated assistants
US9886432B2 (en) 2014-09-30 2018-02-06 Apple Inc. Parsimonious handling of word inflection via categorical stem + suffix N-gram language models
US9886953B2 (en) 2015-03-08 2018-02-06 Apple Inc. Virtual assistant activation
US9899019B2 (en) 2015-03-18 2018-02-20 Apple Inc. Systems and methods for structured stem and suffix language models
US9934775B2 (en) 2016-05-26 2018-04-03 Apple Inc. Unit-selection text-to-speech synthesis based on predicted concatenation parameters
US9953088B2 (en) 2012-05-14 2018-04-24 Apple Inc. Crowd sourcing information to fulfill user requests
US9966068B2 (en) 2013-06-08 2018-05-08 Apple Inc. Interpreting and acting upon commands that involve sharing information with remote devices
US9971774B2 (en) 2012-09-19 2018-05-15 Apple Inc. Voice-based media searching
US9972304B2 (en) 2016-06-03 2018-05-15 Apple Inc. Privacy preserving distributed evaluation framework for embedded personalized systems
US10043516B2 (en) 2016-09-23 2018-08-07 Apple Inc. Intelligent automated assistant
US10049663B2 (en) 2016-06-08 2018-08-14 Apple, Inc. Intelligent automated assistant for media exploration
US10049668B2 (en) 2015-12-02 2018-08-14 Apple Inc. Applying neural network language models to weighted finite state transducers for automatic speech recognition
US10067938B2 (en) 2016-06-10 2018-09-04 Apple Inc. Multilingual word prediction
US10074360B2 (en) 2014-09-30 2018-09-11 Apple Inc. Providing an indication of the suitability of speech recognition
US10079014B2 (en) 2012-06-08 2018-09-18 Apple Inc. Name recognition system
US10083688B2 (en) 2015-05-27 2018-09-25 Apple Inc. Device voice control for selecting a displayed affordance
US10083690B2 (en) 2014-05-30 2018-09-25 Apple Inc. Better resolution when referencing to concepts
US10089072B2 (en) 2016-06-11 2018-10-02 Apple Inc. Intelligent device arbitration and control
US10102359B2 (en) 2011-03-21 2018-10-16 Apple Inc. Device access using voice authentication
US10101822B2 (en) 2015-06-05 2018-10-16 Apple Inc. Language input correction
US10108612B2 (en) 2008-07-31 2018-10-23 Apple Inc. Mobile device having human language translation capability with positional feedback
US10127220B2 (en) 2015-06-04 2018-11-13 Apple Inc. Language identification from short strings
US10127911B2 (en) 2014-09-30 2018-11-13 Apple Inc. Speaker identification and unsupervised speaker adaptation techniques
US10169329B2 (en) 2014-05-30 2019-01-01 Apple Inc. Exemplar-based natural language processing
US10176167B2 (en) 2013-06-09 2019-01-08 Apple Inc. System and method for inferring user intent from speech inputs
US10185542B2 (en) 2013-06-09 2019-01-22 Apple Inc. Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant
US10186254B2 (en) 2015-06-07 2019-01-22 Apple Inc. Context-based endpoint detection
US10192552B2 (en) 2016-06-10 2019-01-29 Apple Inc. Digital assistant providing whispered speech
US10223066B2 (en) 2015-12-23 2019-03-05 Apple Inc. Proactive assistance based on dialog communication between devices
US10249300B2 (en) 2016-06-06 2019-04-02 Apple Inc. Intelligent list reading
US10255907B2 (en) 2015-06-07 2019-04-09 Apple Inc. Automatic accent detection using acoustic models
US10269345B2 (en) 2016-06-11 2019-04-23 Apple Inc. Intelligent task discovery
US10283110B2 (en) 2009-07-02 2019-05-07 Apple Inc. Methods and apparatuses for automatic speech recognition
US10297253B2 (en) 2016-06-11 2019-05-21 Apple Inc. Application integration with a digital assistant
US10303715B2 (en) 2017-05-16 2019-05-28 Apple Inc. Intelligent automated assistant for media exploration
US10311144B2 (en) 2017-05-16 2019-06-04 Apple Inc. Emoji word sense disambiguation
US10318871B2 (en) 2005-09-08 2019-06-11 Apple Inc. Method and apparatus for building an intelligent automated assistant
US10332518B2 (en) 2017-05-09 2019-06-25 Apple Inc. User interface for correcting recognition errors
US10356243B2 (en) 2015-06-05 2019-07-16 Apple Inc. Virtual assistant aided communication with 3rd party service in a communication session
US10354011B2 (en) 2016-06-09 2019-07-16 Apple Inc. Intelligent automated assistant in a home environment
US10366158B2 (en) 2015-09-29 2019-07-30 Apple Inc. Efficient word encoding for recurrent neural network language models
US10381016B2 (en) 2008-01-03 2019-08-13 Apple Inc. Methods and apparatus for altering audio output signals
US10395654B2 (en) 2017-05-11 2019-08-27 Apple Inc. Text normalization based on a data-driven learning network
US10403283B1 (en) 2018-06-01 2019-09-03 Apple Inc. Voice interaction at a primary device to access call functionality of a companion device
US10403278B2 (en) 2017-05-16 2019-09-03 Apple Inc. Methods and systems for phonetic matching in digital assistant services
US10410637B2 (en) 2017-05-12 2019-09-10 Apple Inc. User-specific acoustic models
US10417266B2 (en) 2017-05-09 2019-09-17 Apple Inc. Context-aware ranking of intelligent response suggestions
US10446143B2 (en) 2016-03-14 2019-10-15 Apple Inc. Identification of voice inputs providing credentials
US10445429B2 (en) 2017-09-21 2019-10-15 Apple Inc. Natural language understanding using vocabularies with compressed serialized tries
CN110362803A (en) * 2019-07-19 2019-10-22 北京邮电大学 A kind of text template generation method based on the combination of domain features morphology
US10474753B2 (en) 2016-09-07 2019-11-12 Apple Inc. Language identification using recurrent neural networks
US10482874B2 (en) 2017-05-15 2019-11-19 Apple Inc. Hierarchical belief states for digital assistants
US10490187B2 (en) 2016-06-10 2019-11-26 Apple Inc. Digital assistant providing automated status report
US10497365B2 (en) 2014-05-30 2019-12-03 Apple Inc. Multi-command single utterance input method
US10496705B1 (en) 2018-06-03 2019-12-03 Apple Inc. Accelerated task performance
US10509862B2 (en) 2016-06-10 2019-12-17 Apple Inc. Dynamic phrase expansion of language input
US10521466B2 (en) 2016-06-11 2019-12-31 Apple Inc. Data driven natural language event detection and classification
US10567477B2 (en) 2015-03-08 2020-02-18 Apple Inc. Virtual assistant continuity
US10592604B2 (en) 2018-03-12 2020-03-17 Apple Inc. Inverse text normalization for automatic speech recognition
US10593346B2 (en) 2016-12-22 2020-03-17 Apple Inc. Rank-reduced token representation for automatic speech recognition
US10636424B2 (en) 2017-11-30 2020-04-28 Apple Inc. Multi-turn canned dialog
US10657328B2 (en) 2017-06-02 2020-05-19 Apple Inc. Multi-task recurrent neural network architecture for efficient morphology handling in neural language modeling
US10671428B2 (en) 2015-09-08 2020-06-02 Apple Inc. Distributed personal assistant
US10684703B2 (en) 2018-06-01 2020-06-16 Apple Inc. Attention aware virtual assistant dismissal
US10691473B2 (en) 2015-11-06 2020-06-23 Apple Inc. Intelligent automated assistant in a messaging environment
US10699717B2 (en) 2014-05-30 2020-06-30 Apple Inc. Intelligent assistant for home automation
US10714117B2 (en) 2013-02-07 2020-07-14 Apple Inc. Voice trigger for a digital assistant
US10726832B2 (en) 2017-05-11 2020-07-28 Apple Inc. Maintaining privacy of personal information
US10733375B2 (en) 2018-01-31 2020-08-04 Apple Inc. Knowledge-based framework for improving natural language understanding
US10733993B2 (en) 2016-06-10 2020-08-04 Apple Inc. Intelligent digital assistant in a multi-tasking environment
US10733982B2 (en) 2018-01-08 2020-08-04 Apple Inc. Multi-directional dialog
US10741185B2 (en) 2010-01-18 2020-08-11 Apple Inc. Intelligent automated assistant
US10748546B2 (en) 2017-05-16 2020-08-18 Apple Inc. Digital assistant services based on device capabilities
US10747498B2 (en) 2015-09-08 2020-08-18 Apple Inc. Zero latency digital assistant
US10755703B2 (en) 2017-05-11 2020-08-25 Apple Inc. Offline personal assistant
US10755051B2 (en) 2017-09-29 2020-08-25 Apple Inc. Rule-based natural language processing
US10789959B2 (en) 2018-03-02 2020-09-29 Apple Inc. Training speaker recognition models for digital assistants
US10789945B2 (en) 2017-05-12 2020-09-29 Apple Inc. Low-latency intelligent automated assistant
US10789041B2 (en) 2014-09-12 2020-09-29 Apple Inc. Dynamic thresholds for always listening speech trigger
US10791176B2 (en) 2017-05-12 2020-09-29 Apple Inc. Synchronization and task delegation of a digital assistant
US10795541B2 (en) 2009-06-05 2020-10-06 Apple Inc. Intelligent organization of tasks items
US10810274B2 (en) 2017-05-15 2020-10-20 Apple Inc. Optimizing dialogue policy decisions for digital assistants using implicit feedback
US10818288B2 (en) 2018-03-26 2020-10-27 Apple Inc. Natural assistant interaction
US10839159B2 (en) 2018-09-28 2020-11-17 Apple Inc. Named entity normalization in a spoken dialog system
US10892996B2 (en) 2018-06-01 2021-01-12 Apple Inc. Variable latency device coordination
US10909331B2 (en) 2018-03-30 2021-02-02 Apple Inc. Implicit identification of translation payload with neural machine translation
US10928918B2 (en) 2018-05-07 2021-02-23 Apple Inc. Raise to speak
US10984780B2 (en) 2018-05-21 2021-04-20 Apple Inc. Global semantic word embeddings using bi-directional recurrent neural networks
US11010550B2 (en) 2015-09-29 2021-05-18 Apple Inc. Unified language modeling framework for word prediction, auto-completion and auto-correction
US11010127B2 (en) 2015-06-29 2021-05-18 Apple Inc. Virtual assistant for media playback
US11010561B2 (en) 2018-09-27 2021-05-18 Apple Inc. Sentiment prediction from textual data
US11023513B2 (en) 2007-12-20 2021-06-01 Apple Inc. Method and apparatus for searching using an active ontology
US11025565B2 (en) 2015-06-07 2021-06-01 Apple Inc. Personalized prediction of responses for instant messaging
US11069336B2 (en) 2012-03-02 2021-07-20 Apple Inc. Systems and methods for name pronunciation
US11070949B2 (en) 2015-05-27 2021-07-20 Apple Inc. Systems and methods for proactively identifying and surfacing relevant content on an electronic device with a touch-sensitive display
US11080012B2 (en) 2009-06-05 2021-08-03 Apple Inc. Interface for a virtual digital assistant
US11120372B2 (en) 2011-06-03 2021-09-14 Apple Inc. Performing actions associated with task items that represent tasks to perform
US11133008B2 (en) 2014-05-30 2021-09-28 Apple Inc. Reducing the need for manual start/end-pointing and trigger phrases
US11140099B2 (en) 2019-05-21 2021-10-05 Apple Inc. Providing message response suggestions
US11145294B2 (en) 2018-05-07 2021-10-12 Apple Inc. Intelligent automated assistant for delivering content from user experiences
US11170166B2 (en) 2018-09-28 2021-11-09 Apple Inc. Neural typographical error modeling via generative adversarial networks
US11204787B2 (en) 2017-01-09 2021-12-21 Apple Inc. Application integration with a digital assistant
US11217251B2 (en) 2019-05-06 2022-01-04 Apple Inc. Spoken notifications
US11227589B2 (en) 2016-06-06 2022-01-18 Apple Inc. Intelligent list reading
US11231904B2 (en) 2015-03-06 2022-01-25 Apple Inc. Reducing response latency of intelligent automated assistants
US11237797B2 (en) 2019-05-31 2022-02-01 Apple Inc. User activity shortcut suggestions
US11269678B2 (en) 2012-05-15 2022-03-08 Apple Inc. Systems and methods for integrating third party services with a digital assistant
US11281993B2 (en) 2016-12-05 2022-03-22 Apple Inc. Model and ensemble compression for metric learning
US11289073B2 (en) 2019-05-31 2022-03-29 Apple Inc. Device text to speech
US11301477B2 (en) 2017-05-12 2022-04-12 Apple Inc. Feedback analysis of a digital assistant
US11307752B2 (en) 2019-05-06 2022-04-19 Apple Inc. User configurable task triggers
US11314370B2 (en) 2013-12-06 2022-04-26 Apple Inc. Method for extracting salient dialog usage from live data
US11350253B2 (en) 2011-06-03 2022-05-31 Apple Inc. Active transport based notifications
US11348573B2 (en) 2019-03-18 2022-05-31 Apple Inc. Multimodality in digital assistant systems
US11360641B2 (en) 2019-06-01 2022-06-14 Apple Inc. Increasing the relevance of new available information
US11388291B2 (en) 2013-03-14 2022-07-12 Apple Inc. System and method for processing voicemail
US11386266B2 (en) 2018-06-01 2022-07-12 Apple Inc. Text correction
US11423908B2 (en) 2019-05-06 2022-08-23 Apple Inc. Interpreting spoken requests
US11462215B2 (en) 2018-09-28 2022-10-04 Apple Inc. Multi-modal inputs for voice commands
US11467802B2 (en) 2017-05-11 2022-10-11 Apple Inc. Maintaining privacy of personal information
US11468282B2 (en) 2015-05-15 2022-10-11 Apple Inc. Virtual assistant in a communication session
US11475898B2 (en) 2018-10-26 2022-10-18 Apple Inc. Low-latency multi-speaker speech recognition
US11475884B2 (en) 2019-05-06 2022-10-18 Apple Inc. Reducing digital assistant latency when a language is incorrectly determined
US11488406B2 (en) 2019-09-25 2022-11-01 Apple Inc. Text detection using global geometry estimators
US11495218B2 (en) 2018-06-01 2022-11-08 Apple Inc. Virtual assistant operation in multi-device environments
US11496600B2 (en) 2019-05-31 2022-11-08 Apple Inc. Remote execution of machine-learned models
US11532306B2 (en) 2017-05-16 2022-12-20 Apple Inc. Detecting a trigger of a digital assistant
US11587559B2 (en) 2015-09-30 2023-02-21 Apple Inc. Intelligent device identification
US11638059B2 (en) 2019-01-04 2023-04-25 Apple Inc. Content playback on multiple devices
US11657813B2 (en) 2019-05-31 2023-05-23 Apple Inc. Voice identification in digital assistant systems
US11671920B2 (en) 2007-04-03 2023-06-06 Apple Inc. Method and system for operating a multifunction portable electronic device using voice-activation
US11696060B2 (en) 2020-07-21 2023-07-04 Apple Inc. User identification using headphones
US11755276B2 (en) 2020-05-12 2023-09-12 Apple Inc. Reducing description length based on confidence
US11765209B2 (en) 2020-05-11 2023-09-19 Apple Inc. Digital assistant hardware abstraction
US11790914B2 (en) 2019-06-01 2023-10-17 Apple Inc. Methods and user interfaces for voice-based control of electronic devices
US11798547B2 (en) 2013-03-15 2023-10-24 Apple Inc. Voice activated device for use with a voice-based digital assistant
US11809483B2 (en) 2015-09-08 2023-11-07 Apple Inc. Intelligent automated assistant for media search and playback
US11838734B2 (en) 2020-07-20 2023-12-05 Apple Inc. Multi-device audio adjustment coordination
US11853536B2 (en) 2015-09-08 2023-12-26 Apple Inc. Intelligent automated assistant in a media environment
US11886805B2 (en) 2015-11-09 2024-01-30 Apple Inc. Unconventional virtual assistant interactions
US11914848B2 (en) 2020-05-11 2024-02-27 Apple Inc. Providing relevant data items based on context

Families Citing this family (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8190419B1 (en) 2006-09-11 2012-05-29 WordRake Holdings, LLC Computer processes for analyzing and improving document readability
US8285719B1 (en) 2008-08-08 2012-10-09 The Research Foundation Of State University Of New York System and method for probabilistic relational clustering
US9600566B2 (en) 2010-05-14 2017-03-21 Microsoft Technology Licensing, Llc Identifying entity synonyms
US10032131B2 (en) 2012-06-20 2018-07-24 Microsoft Technology Licensing, Llc Data services for enterprises leveraging search system data assets
US9594831B2 (en) 2012-06-22 2017-03-14 Microsoft Technology Licensing, Llc Targeted disambiguation of named entities
US8713433B1 (en) * 2012-10-16 2014-04-29 Google Inc. Feature-based autocorrection
US20150067491A1 (en) * 2013-09-03 2015-03-05 International Business Machines Corporation Intelligent auto complete
US20150088493A1 (en) * 2013-09-20 2015-03-26 Amazon Technologies, Inc. Providing descriptive information associated with objects
KR101536520B1 (en) * 2014-04-28 2015-07-14 숭실대학교산학협력단 Method and server for extracting topic and evaluating compatibility of the extracted topic
US20150317303A1 (en) * 2014-04-30 2015-11-05 Linkedin Corporation Topic mining using natural language processing techniques
US9733825B2 (en) * 2014-11-05 2017-08-15 Lenovo (Singapore) Pte. Ltd. East Asian character assist
US9898773B2 (en) 2014-11-18 2018-02-20 Microsoft Technology Licensing, Llc Multilingual content based recommendation system
US10019515B2 (en) 2015-04-24 2018-07-10 Microsoft Technology Licensing, Llc Attribute-based contexts for sentiment-topic pairs
US10410136B2 (en) 2015-09-16 2019-09-10 Microsoft Technology Licensing, Llc Model-based classification of content items
CN107092588B (en) * 2016-02-18 2022-09-09 腾讯科技(深圳)有限公司 Text information processing method, device and system
US10733221B2 (en) * 2016-03-30 2020-08-04 Microsoft Technology Licensing, Llc Scalable mining of trending insights from text
CN107515853B (en) * 2016-06-17 2021-11-05 北京搜狗科技发展有限公司 Cell word bank pushing method and device
CN106126606B (en) * 2016-06-21 2019-08-20 国家计算机网络与信息安全管理中心 A kind of short text new word discovery method
US10387568B1 (en) 2016-09-19 2019-08-20 Amazon Technologies, Inc. Extracting keywords from a document
KR102617717B1 (en) * 2016-10-18 2023-12-27 삼성전자주식회사 Electronic Apparatus and Controlling Method thereof
US11023679B2 (en) * 2017-02-27 2021-06-01 Medidata Solutions, Inc. Apparatus and method for automatically mapping verbatim narratives to terms in a terminology dictionary
CN107315734B (en) * 2017-05-04 2019-11-26 中国科学院信息工程研究所 A kind of method and system to be standardized based on time window and semantic variant word
US10740365B2 (en) * 2017-06-14 2020-08-11 International Business Machines Corporation Gap identification in corpora
US10997225B2 (en) 2018-03-20 2021-05-04 The Boeing Company Predictive query processing for complex system lifecycle management
CN108959259B (en) * 2018-07-05 2019-11-08 第四范式(北京)技术有限公司 New word discovery method and system
US11023681B2 (en) * 2018-09-19 2021-06-01 International Business Machines Corporation Co-reference resolution and entity linking
EP3640834A1 (en) * 2018-10-17 2020-04-22 Verint Americas Inc. Automatic discovery of business-specific terminology
JP7172571B2 (en) * 2018-12-21 2022-11-16 富士フイルムビジネスイノベーション株式会社 Search device and search program
US11308274B2 (en) 2019-05-17 2022-04-19 International Business Machines Corporation Word grouping using a plurality of models
US11966686B2 (en) * 2019-06-17 2024-04-23 The Boeing Company Synthetic intelligent extraction of relevant solutions for lifecycle management of complex systems
US20220334808A1 (en) * 2019-12-18 2022-10-20 The University Of North Carolina At Chapel Hill Methods, systems, and computer readable media for creating and using minimum dictionary language (mdl) to access data in closed-domain data sets
US11501067B1 (en) * 2020-04-23 2022-11-15 Wells Fargo Bank, N.A. Systems and methods for screening data instances based on a target text of a target corpus
CN112100987A (en) * 2020-09-27 2020-12-18 中国建设银行股份有限公司 Transcoding method and device for multi-source data dictionary

Family Cites Families (137)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2943447B2 (en) 1991-01-30 1999-08-30 三菱電機株式会社 Text information extraction device, text similarity matching device, text search system, text information extraction method, text similarity matching method, and question analysis device
US5265065A (en) 1991-10-08 1993-11-23 West Publishing Company Method and apparatus for information retrieval from a database by replacing domain specific stemmed phases in a natural language to create a search query
WO1997040452A1 (en) * 1996-04-23 1997-10-30 Language Engineering Corporation Automated natural language translation
US6278967B1 (en) * 1992-08-31 2001-08-21 Logovista Corporation Automated system for generating natural language translations that are domain-specific, grammar rule-based, and/or based on part-of-speech analysis
JP3025724B2 (en) 1992-11-24 2000-03-27 富士通株式会社 Synonym generation processing method
US5717913A (en) 1995-01-03 1998-02-10 University Of Central Florida Method for detecting and extracting text data using database schemas
US5805911A (en) 1995-02-01 1998-09-08 Microsoft Corporation Word prediction system
US6098034A (en) 1996-03-18 2000-08-01 Expert Ease Development, Ltd. Method for standardizing phrasing in a document
US6012055A (en) 1996-04-09 2000-01-04 Silicon Graphics, Inc. Mechanism for integrated information search and retrieval from diverse sources using multiple navigation methods
US6137911A (en) 1997-06-16 2000-10-24 The Dialog Corporation Plc Test classification system and method
US6128613A (en) * 1997-06-26 2000-10-03 The Chinese University Of Hong Kong Method and apparatus for establishing topic word classes based on an entropy cost function to retrieve documents represented by the topic words
US5926808A (en) 1997-07-25 1999-07-20 Claritech Corporation Displaying portions of text from multiple documents over multiple databases related to a search query in a computer network
US6269368B1 (en) 1997-10-17 2001-07-31 Textwise Llc Information retrieval using dynamic evidence combination
US6006225A (en) 1998-06-15 1999-12-21 Amazon.Com Refining search queries by the suggestion of correlated terms from prior searches
NO983175L (en) 1998-07-10 2000-01-11 Fast Search & Transfer Asa Search system for data retrieval
US6363377B1 (en) 1998-07-30 2002-03-26 Sarnoff Corporation Search data processor
IL126373A (en) 1998-09-27 2003-06-24 Haim Zvi Melman Apparatus and method for search and retrieval of documents
US6370527B1 (en) 1998-12-29 2002-04-09 At&T Corp. Method and apparatus for searching distributed networks using a plurality of search devices
US6510406B1 (en) 1999-03-23 2003-01-21 Mathsoft, Inc. Inverse inference engine for high performance web search
US6963867B2 (en) 1999-12-08 2005-11-08 A9.Com, Inc. Search query processing to provide category-ranked presentation of search results
US7447988B2 (en) 2000-05-10 2008-11-04 Ross Gary E Augmentation system for documentation
DE10031351A1 (en) 2000-06-28 2002-01-17 Guru Netservices Gmbh Automatic research procedure
US7490092B2 (en) 2000-07-06 2009-02-10 Streamsage, Inc. Method and system for indexing and searching timed media information based upon relevance intervals
US6675159B1 (en) 2000-07-27 2004-01-06 Science Applic Int Corp Concept-based search and retrieval system
US20020103793A1 (en) 2000-08-02 2002-08-01 Daphne Koller Method and apparatus for learning probabilistic relational models having attribute and link uncertainty and for performing selectivity estimation using probabilistic relational models
EP1393200A2 (en) 2000-09-29 2004-03-03 Gavagai Technology Incorporated A method and system for describing and identifying concepts in natural language text for information retrieval and processing
US7778817B1 (en) 2000-09-30 2010-08-17 Intel Corporation Method and apparatus for determining text passage similarity
GB2368929B (en) 2000-10-06 2004-12-01 Andrew Mather An improved system for storing and retrieving data
US6711577B1 (en) * 2000-10-09 2004-03-23 Battelle Memorial Institute Data mining and visualization techniques
US7440904B2 (en) 2000-10-11 2008-10-21 Malik M. Hanson Method and system for generating personal/individual health records
US6804677B2 (en) 2001-02-26 2004-10-12 Ori Software Development Ltd. Encoding semi-structured data for efficient search and browsing
WO2002086864A1 (en) * 2001-04-18 2002-10-31 Rutgers, The State University Of New Jersey System and method for adaptive language understanding by computers
US20020169755A1 (en) 2001-05-09 2002-11-14 Framroze Bomi Patel System and method for the storage, searching, and retrieval of chemical names in a relational database
US6697818B2 (en) 2001-06-14 2004-02-24 International Business Machines Corporation Methods and apparatus for constructing and implementing a universal extension module for processing objects in a database
US7295965B2 (en) 2001-06-29 2007-11-13 Honeywell International Inc. Method and apparatus for determining a measure of similarity between natural language sentences
KR20040018404A (en) 2001-07-26 2004-03-03 인터내셔널 비지네스 머신즈 코포레이션 Data processing method, data processing system, and program
US6820075B2 (en) 2001-08-13 2004-11-16 Xerox Corporation Document-centric system with auto-completion
US7526425B2 (en) 2001-08-14 2009-04-28 Evri Inc. Method and system for extending keyword searching to syntactically and semantically annotated data
US6826568B2 (en) 2001-12-20 2004-11-30 Microsoft Corporation Methods and system for model matching
US7024624B2 (en) * 2002-01-07 2006-04-04 Kenneth James Hintz Lexicon-based new idea detector
ATE466345T1 (en) 2002-01-16 2010-05-15 Elucidon Group Ltd RETRIEVAL OF INFORMATION DATA WHERE DATA IS ORGANIZED INTO TERMS, DOCUMENTS AND DOCUMENT CORPORAS
US7340466B2 (en) * 2002-02-26 2008-03-04 Kang Jo Mgmt. Limited Liability Company Topic identification and use thereof in information retrieval systems
US7293003B2 (en) 2002-03-21 2007-11-06 Sun Microsystems, Inc. System and method for ranking objects by likelihood of possessing a property
US8214391B2 (en) 2002-05-08 2012-07-03 International Business Machines Corporation Knowledge-based data mining system
US7149746B2 (en) 2002-05-10 2006-12-12 International Business Machines Corporation Method for schema mapping and data transformation
US7440941B1 (en) 2002-09-17 2008-10-21 Yahoo! Inc. Suggesting an alternative to the spelling of a search query
EP1586058A1 (en) 2003-01-24 2005-10-19 BRITISH TELECOMMUNICATIONS public limited company Searching apparatus and methods
US8055669B1 (en) 2003-03-03 2011-11-08 Google Inc. Search queries improved based on query semantic information
FI120755B (en) 2003-06-06 2010-02-15 Tieto Oyj Processing of data record to find correspondingly a reference data set
US7617202B2 (en) 2003-06-16 2009-11-10 Microsoft Corporation Systems and methods that employ a distributional analysis on a query log to improve search results
US7296011B2 (en) 2003-06-20 2007-11-13 Microsoft Corporation Efficient fuzzy match for evaluating data records
US7577654B2 (en) * 2003-07-25 2009-08-18 Palo Alto Research Center Incorporated Systems and methods for new event detection
US7509313B2 (en) 2003-08-21 2009-03-24 Idilia Inc. System and method for processing a query
US20050060643A1 (en) 2003-08-25 2005-03-17 Miavia, Inc. Document similarity detection and classification system
US7533115B2 (en) 2003-09-16 2009-05-12 International Business Machines Corporation Method for managing persistent federated folders within a federated content management system
US7577655B2 (en) 2003-09-16 2009-08-18 Google Inc. Systems and methods for improving the ranking of news articles
US7610190B2 (en) 2003-10-15 2009-10-27 Fuji Xerox Co., Ltd. Systems and methods for hybrid text summarization
US7890526B1 (en) 2003-12-30 2011-02-15 Microsoft Corporation Incremental query refinement
US7254774B2 (en) 2004-03-16 2007-08-07 Microsoft Corporation Systems and methods for improved spell checking
US20050216444A1 (en) 2004-03-25 2005-09-29 Ritter Gerd M Relationship-based searching
US7523102B2 (en) 2004-06-12 2009-04-21 Getty Images, Inc. Content search in complex language, such as Japanese
US7302426B2 (en) 2004-06-29 2007-11-27 Xerox Corporation Expanding a partially-correct list of category elements using an indexed document collection
WO2006039566A2 (en) 2004-09-30 2006-04-13 Intelliseek, Inc. Topical sentiments in electronically stored communications
US20080077570A1 (en) 2004-10-25 2008-03-27 Infovell, Inc. Full Text Query and Search Systems and Method of Use
US7647294B2 (en) 2004-10-27 2010-01-12 Fairbanks Jr William E Indexing and querying engines and methods of indexing and querying
US7461056B2 (en) 2005-02-09 2008-12-02 Microsoft Corporation Text mining apparatus and associated methods
DE102005008803A1 (en) 2005-02-25 2006-09-07 Siemens Ag Method and computer unit for determining computer service names
US7505985B2 (en) 2005-02-25 2009-03-17 International Business Machines Corporation System and method of generating string-based search expressions using templates
US7617193B2 (en) 2005-03-28 2009-11-10 Elan Bitan Interactive user-controlled relevance ranking retrieved information in an information search system
US7636714B1 (en) 2005-03-31 2009-12-22 Google Inc. Determining query term synonyms within query context
EP1875336A2 (en) 2005-04-11 2008-01-09 Textdigger, Inc. System and method for searching for a query
US20080195601A1 (en) 2005-04-14 2008-08-14 The Regents Of The University Of California Method For Information Retrieval
US8438142B2 (en) 2005-05-04 2013-05-07 Google Inc. Suggesting and refining user input based on original user input
JP5027803B2 (en) 2005-05-20 2012-09-19 エヌエイチエヌ ビジネス プラットフォーム コーポレーション Query matching system and method, and computer-readable recording medium on which a program for executing the method is recorded
US7739104B2 (en) * 2005-05-27 2010-06-15 Hakia, Inc. System and method for natural language processing and using ontological searches
US20070011183A1 (en) 2005-07-05 2007-01-11 Justin Langseth Analysis and transformation tools for structured and unstructured data
US7634462B2 (en) 2005-08-10 2009-12-15 Yahoo! Inc. System and method for determining alternate search queries
US7546290B2 (en) 2005-08-11 2009-06-09 Marc Colando Systems and methods for extracting and adapting data
US20070073745A1 (en) 2005-09-23 2007-03-29 Applied Linguistics, Llc Similarity metric for semantic profiling
US7873624B2 (en) 2005-10-21 2011-01-18 Microsoft Corporation Question answering over structured content on the web
US20070100823A1 (en) 2005-10-21 2007-05-03 Inmon Data Systems, Inc. Techniques for manipulating unstructured data using synonyms and alternate spellings prior to recasting as structured data
US7627548B2 (en) 2005-11-22 2009-12-01 Google Inc. Inferring search category synonyms from user logs
US7797303B2 (en) 2006-02-15 2010-09-14 Xerox Corporation Natural language processing for developing queries
US8195683B2 (en) 2006-02-28 2012-06-05 Ebay Inc. Expansion of database search queries
US9135238B2 (en) 2006-03-31 2015-09-15 Google Inc. Disambiguation of named entities
US20070239742A1 (en) 2006-04-06 2007-10-11 Oracle International Corporation Determining data elements in heterogeneous schema definitions for possible mapping
US8255383B2 (en) 2006-07-14 2012-08-28 Chacha Search, Inc Method and system for qualifying keywords in query strings
US9361364B2 (en) 2006-07-20 2016-06-07 Accenture Global Services Limited Universal data relationship inference engine
US7552112B2 (en) 2006-09-18 2009-06-23 Yahoo! Inc. Discovering associative intent queries from search web logs
US20080087725A1 (en) 2006-10-11 2008-04-17 Qing Liu Fixture based Item Locator System
KR100835172B1 (en) 2006-10-16 2008-06-05 한국전자통신연구원 System and method for searching information using synonyms
US8332333B2 (en) 2006-10-19 2012-12-11 Massachusetts Institute Of Technology Learning algorithm for ranking on graph data
US20080109416A1 (en) 2006-11-06 2008-05-08 Williams Frank J Method of searching and retrieving synonyms, similarities and other relevant information
US8423565B2 (en) 2006-12-21 2013-04-16 Digital Doors, Inc. Information life cycle search engine and method
US7890521B1 (en) 2007-02-07 2011-02-15 Google Inc. Document-based synonym generation
US7860853B2 (en) 2007-02-14 2010-12-28 Provilla, Inc. Document matching engine using asymmetric signature generation
US7877343B2 (en) 2007-04-02 2011-01-25 University Of Washington Through Its Center For Commercialization Open information extraction from the Web
US7958489B2 (en) 2007-04-12 2011-06-07 Microsoft Corporation Out of band data augmentation
EP2140376A1 (en) 2007-05-01 2010-01-06 International Business Machines Corporation Method and system for approximate string matching
US7899666B2 (en) * 2007-05-04 2011-03-01 Expert System S.P.A. Method and system for automatically extracting relations between concepts included in text
US8239751B1 (en) 2007-05-16 2012-08-07 Google Inc. Data from web documents in a spreadsheet
CN101785000B (en) * 2007-06-25 2013-04-24 谷歌股份有限公司 Word probability determination method and system
US8601361B2 (en) 2007-08-06 2013-12-03 Apple Inc. Automatically populating and/or generating tables using data extracted from files
JP5283208B2 (en) 2007-08-21 2013-09-04 国立大学法人 東京大学 Information search system and method, program, and information search service providing method
US8594996B2 (en) 2007-10-17 2013-11-26 Evri Inc. NLP-based entity recognition and disambiguation
US8417713B1 (en) 2007-12-05 2013-04-09 Google Inc. Sentiment detection as a ranking signal for reviewable entities
US8108380B2 (en) 2008-03-03 2012-01-31 Oracle International Corporation Inclusion of metadata in indexed composite document
US7970808B2 (en) 2008-05-05 2011-06-28 Microsoft Corporation Leveraging cross-document context to label entity
US8156053B2 (en) * 2008-05-09 2012-04-10 Yahoo! Inc. Automated tagging of documents
US8782061B2 (en) 2008-06-24 2014-07-15 Microsoft Corporation Scalable lookup-driven entity extraction from indexed document collections
US20090327223A1 (en) 2008-06-26 2009-12-31 Microsoft Corporation Query-driven web portals
US8275608B2 (en) 2008-07-03 2012-09-25 Xerox Corporation Clique based clustering for named entity recognition system
US9092517B2 (en) 2008-09-23 2015-07-28 Microsoft Technology Licensing, Llc Generating synonyms based on query log data
US20100121702A1 (en) 2008-11-06 2010-05-13 Ryan Steelberg Search and storage engine having variable indexing for information associations and predictive modeling
US8229883B2 (en) 2009-03-30 2012-07-24 Sap Ag Graph based re-composition of document fragments for name entity recognition under exploitation of enterprise databases
US9836448B2 (en) * 2009-04-30 2017-12-05 Conversant Wireless Licensing S.A R.L. Text editing
US20100293179A1 (en) 2009-05-14 2010-11-18 Microsoft Corporation Identifying synonyms of entities using web search
US8533203B2 (en) 2009-06-04 2013-09-10 Microsoft Corporation Identifying synonyms of entities using a document collection
GB2472250A (en) 2009-07-31 2011-02-02 Stephen Timothy Morris Method for determining document relevance
US8332334B2 (en) 2009-09-24 2012-12-11 Yahoo! Inc. System and method for cross domain learning for data augmentation
US20110106807A1 (en) 2009-10-30 2011-05-05 Janya, Inc Systems and methods for information integration through context-based entity disambiguation
US8156140B2 (en) 2009-11-24 2012-04-10 International Business Machines Corporation Service oriented architecture enterprise service bus with advanced virtualization
US8751218B2 (en) * 2010-02-09 2014-06-10 Siemens Aktiengesellschaft Indexing content at semantic level
US9600566B2 (en) 2010-05-14 2017-03-21 Microsoft Technology Licensing, Llc Identifying entity synonyms
US9251248B2 (en) 2010-06-07 2016-02-02 Microsoft Licensing Technology, LLC Using context to extract entities from a document collection
US8463786B2 (en) 2010-06-10 2013-06-11 Microsoft Corporation Extracting topically related keywords from related documents
US20120011115A1 (en) 2010-07-09 2012-01-12 Jayant Madhavan Table search using recovered semantic information
US8429099B1 (en) 2010-10-14 2013-04-23 Aro, Inc. Dynamic gazetteers for entity recognition and fact association
US9460207B2 (en) 2010-12-08 2016-10-04 Microsoft Technology Licensing, Llc Automated database generation for answering fact lookup queries
US9355145B2 (en) 2011-01-25 2016-05-31 Hewlett Packard Enterprise Development Lp User defined function classification in analytical data processing systems
CN102306144B (en) * 2011-07-18 2013-05-08 南京邮电大学 Terms disambiguation method based on semantic dictionary
US9092478B2 (en) 2011-12-27 2015-07-28 Sap Se Managing business objects data sources
CN102609407B (en) * 2012-02-16 2014-10-29 复旦大学 Fine-grained semantic detection method of harmful text contents in network
US8745019B2 (en) 2012-03-05 2014-06-03 Microsoft Corporation Robust discovery of entity synonyms using query logs
US9171081B2 (en) 2012-03-06 2015-10-27 Microsoft Technology Licensing, Llc Entity augmentation service from latent relational data
US10032131B2 (en) 2012-06-20 2018-07-24 Microsoft Technology Licensing, Llc Data services for enterprises leveraging search system data assets
US9594831B2 (en) 2012-06-22 2017-03-14 Microsoft Technology Licensing, Llc Targeted disambiguation of named entities

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
JIAN-YUN NIE ET AL: "Unknown word detection and segmentation of Chinese using statistical and heuristic knowledge", COMMUNICATIONS OF COLIPS, SINGAPORE, vol. 5, no. 1-2, 1 December 1995 (1995-12-01), pages 47 - 57, XP002116762, ISSN: 0218-7019 *

Cited By (261)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9646614B2 (en) 2000-03-16 2017-05-09 Apple Inc. Fast, language-independent method for user authentication by voice
US10318871B2 (en) 2005-09-08 2019-06-11 Apple Inc. Method and apparatus for building an intelligent automated assistant
US11928604B2 (en) 2005-09-08 2024-03-12 Apple Inc. Method and apparatus for building an intelligent automated assistant
US11671920B2 (en) 2007-04-03 2023-06-06 Apple Inc. Method and system for operating a multifunction portable electronic device using voice-activation
US11023513B2 (en) 2007-12-20 2021-06-01 Apple Inc. Method and apparatus for searching using an active ontology
US10381016B2 (en) 2008-01-03 2019-08-13 Apple Inc. Methods and apparatus for altering audio output signals
US9626955B2 (en) 2008-04-05 2017-04-18 Apple Inc. Intelligent text-to-speech conversion
US9865248B2 (en) 2008-04-05 2018-01-09 Apple Inc. Intelligent text-to-speech conversion
US10108612B2 (en) 2008-07-31 2018-10-23 Apple Inc. Mobile device having human language translation capability with positional feedback
US11900936B2 (en) 2008-10-02 2024-02-13 Apple Inc. Electronic devices with voice command and contextual data processing capabilities
US9412392B2 (en) 2008-10-02 2016-08-09 Apple Inc. Electronic devices with voice command and contextual data processing capabilities
US10643611B2 (en) 2008-10-02 2020-05-05 Apple Inc. Electronic devices with voice command and contextual data processing capabilities
US11348582B2 (en) 2008-10-02 2022-05-31 Apple Inc. Electronic devices with voice command and contextual data processing capabilities
US11080012B2 (en) 2009-06-05 2021-08-03 Apple Inc. Interface for a virtual digital assistant
US10795541B2 (en) 2009-06-05 2020-10-06 Apple Inc. Intelligent organization of tasks items
US10283110B2 (en) 2009-07-02 2019-05-07 Apple Inc. Methods and apparatuses for automatic speech recognition
US11423886B2 (en) 2010-01-18 2022-08-23 Apple Inc. Task flow identification based on user intent
US10706841B2 (en) 2010-01-18 2020-07-07 Apple Inc. Task flow identification based on user intent
US10741185B2 (en) 2010-01-18 2020-08-11 Apple Inc. Intelligent automated assistant
US9548050B2 (en) 2010-01-18 2017-01-17 Apple Inc. Intelligent automated assistant
US10692504B2 (en) 2010-02-25 2020-06-23 Apple Inc. User profiling for voice input processing
US9633660B2 (en) 2010-02-25 2017-04-25 Apple Inc. User profiling for voice input processing
US10049675B2 (en) 2010-02-25 2018-08-14 Apple Inc. User profiling for voice input processing
US10417405B2 (en) 2011-03-21 2019-09-17 Apple Inc. Device access using voice authentication
US10102359B2 (en) 2011-03-21 2018-10-16 Apple Inc. Device access using voice authentication
US11120372B2 (en) 2011-06-03 2021-09-14 Apple Inc. Performing actions associated with task items that represent tasks to perform
US11350253B2 (en) 2011-06-03 2022-05-31 Apple Inc. Active transport based notifications
US9798393B2 (en) 2011-08-29 2017-10-24 Apple Inc. Text correction processing
US11069336B2 (en) 2012-03-02 2021-07-20 Apple Inc. Systems and methods for name pronunciation
US9953088B2 (en) 2012-05-14 2018-04-24 Apple Inc. Crowd sourcing information to fulfill user requests
US11321116B2 (en) 2012-05-15 2022-05-03 Apple Inc. Systems and methods for integrating third party services with a digital assistant
US11269678B2 (en) 2012-05-15 2022-03-08 Apple Inc. Systems and methods for integrating third party services with a digital assistant
US10079014B2 (en) 2012-06-08 2018-09-18 Apple Inc. Name recognition system
US9971774B2 (en) 2012-09-19 2018-05-15 Apple Inc. Voice-based media searching
US11557310B2 (en) 2013-02-07 2023-01-17 Apple Inc. Voice trigger for a digital assistant
US10978090B2 (en) 2013-02-07 2021-04-13 Apple Inc. Voice trigger for a digital assistant
US11862186B2 (en) 2013-02-07 2024-01-02 Apple Inc. Voice trigger for a digital assistant
US10714117B2 (en) 2013-02-07 2020-07-14 Apple Inc. Voice trigger for a digital assistant
US11636869B2 (en) 2013-02-07 2023-04-25 Apple Inc. Voice trigger for a digital assistant
US9977779B2 (en) 2013-03-14 2018-05-22 Apple Inc. Automatic supplementation of word correction dictionaries
US11388291B2 (en) 2013-03-14 2022-07-12 Apple Inc. System and method for processing voicemail
WO2014159473A3 (en) * 2013-03-14 2014-11-20 Apple Inc. Automatic supplementation of word correction dictionaries
US11798547B2 (en) 2013-03-15 2023-10-24 Apple Inc. Voice activated device for use with a voice-based digital assistant
US9966060B2 (en) 2013-06-07 2018-05-08 Apple Inc. System and method for user-specified pronunciation of words for speech synthesis and recognition
US9633674B2 (en) 2013-06-07 2017-04-25 Apple Inc. System and method for detecting errors in interactions with a voice-based digital assistant
US9620104B2 (en) 2013-06-07 2017-04-11 Apple Inc. System and method for user-specified pronunciation of words for speech synthesis and recognition
US9966068B2 (en) 2013-06-08 2018-05-08 Apple Inc. Interpreting and acting upon commands that involve sharing information with remote devices
US10657961B2 (en) 2013-06-08 2020-05-19 Apple Inc. Interpreting and acting upon commands that involve sharing information with remote devices
US11727219B2 (en) 2013-06-09 2023-08-15 Apple Inc. System and method for inferring user intent from speech inputs
US11048473B2 (en) 2013-06-09 2021-06-29 Apple Inc. Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant
US10176167B2 (en) 2013-06-09 2019-01-08 Apple Inc. System and method for inferring user intent from speech inputs
US10185542B2 (en) 2013-06-09 2019-01-22 Apple Inc. Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant
US10769385B2 (en) 2013-06-09 2020-09-08 Apple Inc. System and method for inferring user intent from speech inputs
US11314370B2 (en) 2013-12-06 2022-04-26 Apple Inc. Method for extracting salient dialog usage from live data
CN103825952B (en) * 2014-03-04 2017-07-04 百度在线网络技术(北京)有限公司 Cell dictionary method for pushing and server
US9916288B2 (en) 2014-03-04 2018-03-13 Baidu Online Network Technology (Beijing) Co., Ltd. Method and server for pushing cellular lexicon
EP2919126A3 (en) * 2014-03-04 2015-10-07 Baidu Online Network Technology (Beijing) Co., Ltd Method and server for pushing personalized cellular lexicon
JP2015170357A (en) * 2014-03-04 2015-09-28 バイドゥ オンライン ネットワーク テクノロジー(ペキン) カンパニー リミテッド Method and server for pushing cellular lexicon
CN103825952A (en) * 2014-03-04 2014-05-28 百度在线网络技术(北京)有限公司 Cell lexicon pushing method and server
US10083690B2 (en) 2014-05-30 2018-09-25 Apple Inc. Better resolution when referencing to concepts
US10657966B2 (en) 2014-05-30 2020-05-19 Apple Inc. Better resolution when referencing to concepts
US11699448B2 (en) 2014-05-30 2023-07-11 Apple Inc. Intelligent assistant for home automation
US10497365B2 (en) 2014-05-30 2019-12-03 Apple Inc. Multi-command single utterance input method
US10714095B2 (en) 2014-05-30 2020-07-14 Apple Inc. Intelligent assistant for home automation
US11257504B2 (en) 2014-05-30 2022-02-22 Apple Inc. Intelligent assistant for home automation
US10699717B2 (en) 2014-05-30 2020-06-30 Apple Inc. Intelligent assistant for home automation
US10169329B2 (en) 2014-05-30 2019-01-01 Apple Inc. Exemplar-based natural language processing
US11670289B2 (en) 2014-05-30 2023-06-06 Apple Inc. Multi-command single utterance input method
US11810562B2 (en) 2014-05-30 2023-11-07 Apple Inc. Reducing the need for manual start/end-pointing and trigger phrases
US9842101B2 (en) 2014-05-30 2017-12-12 Apple Inc. Predictive conversion of language input
US11133008B2 (en) 2014-05-30 2021-09-28 Apple Inc. Reducing the need for manual start/end-pointing and trigger phrases
US10417344B2 (en) 2014-05-30 2019-09-17 Apple Inc. Exemplar-based natural language processing
US10878809B2 (en) 2014-05-30 2020-12-29 Apple Inc. Multi-command single utterance input method
US11838579B2 (en) 2014-06-30 2023-12-05 Apple Inc. Intelligent automated assistant for TV user interactions
US11516537B2 (en) 2014-06-30 2022-11-29 Apple Inc. Intelligent automated assistant for TV user interactions
US10904611B2 (en) 2014-06-30 2021-01-26 Apple Inc. Intelligent automated assistant for TV user interactions
US9668024B2 (en) 2014-06-30 2017-05-30 Apple Inc. Intelligent automated assistant for TV user interactions
US10431204B2 (en) 2014-09-11 2019-10-01 Apple Inc. Method and apparatus for discovering trending terms in speech requests
US9818400B2 (en) 2014-09-11 2017-11-14 Apple Inc. Method and apparatus for discovering trending terms in speech requests
US10789041B2 (en) 2014-09-12 2020-09-29 Apple Inc. Dynamic thresholds for always listening speech trigger
US10438595B2 (en) 2014-09-30 2019-10-08 Apple Inc. Speaker identification and unsupervised speaker adaptation techniques
US9986419B2 (en) 2014-09-30 2018-05-29 Apple Inc. Social reminders
US9668121B2 (en) 2014-09-30 2017-05-30 Apple Inc. Social reminders
US10127911B2 (en) 2014-09-30 2018-11-13 Apple Inc. Speaker identification and unsupervised speaker adaptation techniques
US10453443B2 (en) 2014-09-30 2019-10-22 Apple Inc. Providing an indication of the suitability of speech recognition
US10074360B2 (en) 2014-09-30 2018-09-11 Apple Inc. Providing an indication of the suitability of speech recognition
US10390213B2 (en) 2014-09-30 2019-08-20 Apple Inc. Social reminders
US9886432B2 (en) 2014-09-30 2018-02-06 Apple Inc. Parsimonious handling of word inflection via categorical stem + suffix N-gram language models
US9646609B2 (en) 2014-09-30 2017-05-09 Apple Inc. Caching apparatus for serving phonetic pronunciations
US9697195B2 (en) 2014-10-15 2017-07-04 Microsoft Technology Licensing, Llc Construction of a lexicon for a selected context
US9865280B2 (en) 2015-03-06 2018-01-09 Apple Inc. Structured dictation using intelligent automated assistants
US11231904B2 (en) 2015-03-06 2022-01-25 Apple Inc. Reducing response latency of intelligent automated assistants
US10930282B2 (en) 2015-03-08 2021-02-23 Apple Inc. Competing devices responding to voice triggers
US9886953B2 (en) 2015-03-08 2018-02-06 Apple Inc. Virtual assistant activation
US10529332B2 (en) 2015-03-08 2020-01-07 Apple Inc. Virtual assistant activation
US9721566B2 (en) 2015-03-08 2017-08-01 Apple Inc. Competing devices responding to voice triggers
US10567477B2 (en) 2015-03-08 2020-02-18 Apple Inc. Virtual assistant continuity
US11087759B2 (en) 2015-03-08 2021-08-10 Apple Inc. Virtual assistant activation
US11842734B2 (en) 2015-03-08 2023-12-12 Apple Inc. Virtual assistant activation
US10311871B2 (en) 2015-03-08 2019-06-04 Apple Inc. Competing devices responding to voice triggers
US9899019B2 (en) 2015-03-18 2018-02-20 Apple Inc. Systems and methods for structured stem and suffix language models
US9842105B2 (en) 2015-04-16 2017-12-12 Apple Inc. Parsimonious continuous-space phrase representations for natural language processing
US11468282B2 (en) 2015-05-15 2022-10-11 Apple Inc. Virtual assistant in a communication session
US11070949B2 (en) 2015-05-27 2021-07-20 Apple Inc. Systems and methods for proactively identifying and surfacing relevant content on an electronic device with a touch-sensitive display
US10083688B2 (en) 2015-05-27 2018-09-25 Apple Inc. Device voice control for selecting a displayed affordance
US11127397B2 (en) 2015-05-27 2021-09-21 Apple Inc. Device voice control
US10127220B2 (en) 2015-06-04 2018-11-13 Apple Inc. Language identification from short strings
US10101822B2 (en) 2015-06-05 2018-10-16 Apple Inc. Language input correction
US10681212B2 (en) 2015-06-05 2020-06-09 Apple Inc. Virtual assistant aided communication with 3rd party service in a communication session
US10356243B2 (en) 2015-06-05 2019-07-16 Apple Inc. Virtual assistant aided communication with 3rd party service in a communication session
US10255907B2 (en) 2015-06-07 2019-04-09 Apple Inc. Automatic accent detection using acoustic models
US10186254B2 (en) 2015-06-07 2019-01-22 Apple Inc. Context-based endpoint detection
US11025565B2 (en) 2015-06-07 2021-06-01 Apple Inc. Personalized prediction of responses for instant messaging
US11947873B2 (en) 2015-06-29 2024-04-02 Apple Inc. Virtual assistant for media playback
US11010127B2 (en) 2015-06-29 2021-05-18 Apple Inc. Virtual assistant for media playback
US11500672B2 (en) 2015-09-08 2022-11-15 Apple Inc. Distributed personal assistant
US11809483B2 (en) 2015-09-08 2023-11-07 Apple Inc. Intelligent automated assistant for media search and playback
US11550542B2 (en) 2015-09-08 2023-01-10 Apple Inc. Zero latency digital assistant
US11954405B2 (en) 2015-09-08 2024-04-09 Apple Inc. Zero latency digital assistant
US11853536B2 (en) 2015-09-08 2023-12-26 Apple Inc. Intelligent automated assistant in a media environment
US10671428B2 (en) 2015-09-08 2020-06-02 Apple Inc. Distributed personal assistant
US11126400B2 (en) 2015-09-08 2021-09-21 Apple Inc. Zero latency digital assistant
US10747498B2 (en) 2015-09-08 2020-08-18 Apple Inc. Zero latency digital assistant
US9697820B2 (en) 2015-09-24 2017-07-04 Apple Inc. Unit-selection text-to-speech synthesis using concatenation-sensitive neural networks
US10366158B2 (en) 2015-09-29 2019-07-30 Apple Inc. Efficient word encoding for recurrent neural network language models
US11010550B2 (en) 2015-09-29 2021-05-18 Apple Inc. Unified language modeling framework for word prediction, auto-completion and auto-correction
US11587559B2 (en) 2015-09-30 2023-02-21 Apple Inc. Intelligent device identification
US10691473B2 (en) 2015-11-06 2020-06-23 Apple Inc. Intelligent automated assistant in a messaging environment
US11526368B2 (en) 2015-11-06 2022-12-13 Apple Inc. Intelligent automated assistant in a messaging environment
US11809886B2 (en) 2015-11-06 2023-11-07 Apple Inc. Intelligent automated assistant in a messaging environment
US11886805B2 (en) 2015-11-09 2024-01-30 Apple Inc. Unconventional virtual assistant interactions
US10049668B2 (en) 2015-12-02 2018-08-14 Apple Inc. Applying neural network language models to weighted finite state transducers for automatic speech recognition
US10354652B2 (en) 2015-12-02 2019-07-16 Apple Inc. Applying neural network language models to weighted finite state transducers for automatic speech recognition
US10223066B2 (en) 2015-12-23 2019-03-05 Apple Inc. Proactive assistance based on dialog communication between devices
US10942703B2 (en) 2015-12-23 2021-03-09 Apple Inc. Proactive assistance based on dialog communication between devices
US10446143B2 (en) 2016-03-14 2019-10-15 Apple Inc. Identification of voice inputs providing credentials
US9934775B2 (en) 2016-05-26 2018-04-03 Apple Inc. Unit-selection text-to-speech synthesis based on predicted concatenation parameters
US9972304B2 (en) 2016-06-03 2018-05-15 Apple Inc. Privacy preserving distributed evaluation framework for embedded personalized systems
US11227589B2 (en) 2016-06-06 2022-01-18 Apple Inc. Intelligent list reading
US10249300B2 (en) 2016-06-06 2019-04-02 Apple Inc. Intelligent list reading
US11069347B2 (en) 2016-06-08 2021-07-20 Apple Inc. Intelligent automated assistant for media exploration
US10049663B2 (en) 2016-06-08 2018-08-14 Apple, Inc. Intelligent automated assistant for media exploration
US10354011B2 (en) 2016-06-09 2019-07-16 Apple Inc. Intelligent automated assistant in a home environment
US10509862B2 (en) 2016-06-10 2019-12-17 Apple Inc. Dynamic phrase expansion of language input
US10490187B2 (en) 2016-06-10 2019-11-26 Apple Inc. Digital assistant providing automated status report
US10733993B2 (en) 2016-06-10 2020-08-04 Apple Inc. Intelligent digital assistant in a multi-tasking environment
US10067938B2 (en) 2016-06-10 2018-09-04 Apple Inc. Multilingual word prediction
US11657820B2 (en) 2016-06-10 2023-05-23 Apple Inc. Intelligent digital assistant in a multi-tasking environment
US11037565B2 (en) 2016-06-10 2021-06-15 Apple Inc. Intelligent digital assistant in a multi-tasking environment
US10192552B2 (en) 2016-06-10 2019-01-29 Apple Inc. Digital assistant providing whispered speech
US11749275B2 (en) 2016-06-11 2023-09-05 Apple Inc. Application integration with a digital assistant
US10269345B2 (en) 2016-06-11 2019-04-23 Apple Inc. Intelligent task discovery
US10089072B2 (en) 2016-06-11 2018-10-02 Apple Inc. Intelligent device arbitration and control
US10521466B2 (en) 2016-06-11 2019-12-31 Apple Inc. Data driven natural language event detection and classification
US11809783B2 (en) 2016-06-11 2023-11-07 Apple Inc. Intelligent device arbitration and control
US11152002B2 (en) 2016-06-11 2021-10-19 Apple Inc. Application integration with a digital assistant
US10580409B2 (en) 2016-06-11 2020-03-03 Apple Inc. Application integration with a digital assistant
US10297253B2 (en) 2016-06-11 2019-05-21 Apple Inc. Application integration with a digital assistant
US10942702B2 (en) 2016-06-11 2021-03-09 Apple Inc. Intelligent device arbitration and control
US10474753B2 (en) 2016-09-07 2019-11-12 Apple Inc. Language identification using recurrent neural networks
US10043516B2 (en) 2016-09-23 2018-08-07 Apple Inc. Intelligent automated assistant
US10553215B2 (en) 2016-09-23 2020-02-04 Apple Inc. Intelligent automated assistant
US11281993B2 (en) 2016-12-05 2022-03-22 Apple Inc. Model and ensemble compression for metric learning
US10593346B2 (en) 2016-12-22 2020-03-17 Apple Inc. Rank-reduced token representation for automatic speech recognition
US11656884B2 (en) 2017-01-09 2023-05-23 Apple Inc. Application integration with a digital assistant
US11204787B2 (en) 2017-01-09 2021-12-21 Apple Inc. Application integration with a digital assistant
US10741181B2 (en) 2017-05-09 2020-08-11 Apple Inc. User interface for correcting recognition errors
US10332518B2 (en) 2017-05-09 2019-06-25 Apple Inc. User interface for correcting recognition errors
US10417266B2 (en) 2017-05-09 2019-09-17 Apple Inc. Context-aware ranking of intelligent response suggestions
US10847142B2 (en) 2017-05-11 2020-11-24 Apple Inc. Maintaining privacy of personal information
US10755703B2 (en) 2017-05-11 2020-08-25 Apple Inc. Offline personal assistant
US11599331B2 (en) 2017-05-11 2023-03-07 Apple Inc. Maintaining privacy of personal information
US10726832B2 (en) 2017-05-11 2020-07-28 Apple Inc. Maintaining privacy of personal information
US10395654B2 (en) 2017-05-11 2019-08-27 Apple Inc. Text normalization based on a data-driven learning network
US11467802B2 (en) 2017-05-11 2022-10-11 Apple Inc. Maintaining privacy of personal information
US10791176B2 (en) 2017-05-12 2020-09-29 Apple Inc. Synchronization and task delegation of a digital assistant
US11380310B2 (en) 2017-05-12 2022-07-05 Apple Inc. Low-latency intelligent automated assistant
US11538469B2 (en) 2017-05-12 2022-12-27 Apple Inc. Low-latency intelligent automated assistant
US11580990B2 (en) 2017-05-12 2023-02-14 Apple Inc. User-specific acoustic models
US11301477B2 (en) 2017-05-12 2022-04-12 Apple Inc. Feedback analysis of a digital assistant
US11405466B2 (en) 2017-05-12 2022-08-02 Apple Inc. Synchronization and task delegation of a digital assistant
US10410637B2 (en) 2017-05-12 2019-09-10 Apple Inc. User-specific acoustic models
US10789945B2 (en) 2017-05-12 2020-09-29 Apple Inc. Low-latency intelligent automated assistant
US11862151B2 (en) 2017-05-12 2024-01-02 Apple Inc. Low-latency intelligent automated assistant
US11837237B2 (en) 2017-05-12 2023-12-05 Apple Inc. User-specific acoustic models
US10810274B2 (en) 2017-05-15 2020-10-20 Apple Inc. Optimizing dialogue policy decisions for digital assistants using implicit feedback
US10482874B2 (en) 2017-05-15 2019-11-19 Apple Inc. Hierarchical belief states for digital assistants
US10403278B2 (en) 2017-05-16 2019-09-03 Apple Inc. Methods and systems for phonetic matching in digital assistant services
US11675829B2 (en) 2017-05-16 2023-06-13 Apple Inc. Intelligent automated assistant for media exploration
US10748546B2 (en) 2017-05-16 2020-08-18 Apple Inc. Digital assistant services based on device capabilities
US10909171B2 (en) 2017-05-16 2021-02-02 Apple Inc. Intelligent automated assistant for media exploration
US10311144B2 (en) 2017-05-16 2019-06-04 Apple Inc. Emoji word sense disambiguation
US10303715B2 (en) 2017-05-16 2019-05-28 Apple Inc. Intelligent automated assistant for media exploration
US11217255B2 (en) 2017-05-16 2022-01-04 Apple Inc. Far-field extension for digital assistant services
US11532306B2 (en) 2017-05-16 2022-12-20 Apple Inc. Detecting a trigger of a digital assistant
US10657328B2 (en) 2017-06-02 2020-05-19 Apple Inc. Multi-task recurrent neural network architecture for efficient morphology handling in neural language modeling
US10445429B2 (en) 2017-09-21 2019-10-15 Apple Inc. Natural language understanding using vocabularies with compressed serialized tries
US10755051B2 (en) 2017-09-29 2020-08-25 Apple Inc. Rule-based natural language processing
US10636424B2 (en) 2017-11-30 2020-04-28 Apple Inc. Multi-turn canned dialog
US10733982B2 (en) 2018-01-08 2020-08-04 Apple Inc. Multi-directional dialog
US10733375B2 (en) 2018-01-31 2020-08-04 Apple Inc. Knowledge-based framework for improving natural language understanding
US10789959B2 (en) 2018-03-02 2020-09-29 Apple Inc. Training speaker recognition models for digital assistants
US10592604B2 (en) 2018-03-12 2020-03-17 Apple Inc. Inverse text normalization for automatic speech recognition
US10818288B2 (en) 2018-03-26 2020-10-27 Apple Inc. Natural assistant interaction
US11710482B2 (en) 2018-03-26 2023-07-25 Apple Inc. Natural assistant interaction
US10909331B2 (en) 2018-03-30 2021-02-02 Apple Inc. Implicit identification of translation payload with neural machine translation
US11169616B2 (en) 2018-05-07 2021-11-09 Apple Inc. Raise to speak
US10928918B2 (en) 2018-05-07 2021-02-23 Apple Inc. Raise to speak
US11854539B2 (en) 2018-05-07 2023-12-26 Apple Inc. Intelligent automated assistant for delivering content from user experiences
US11145294B2 (en) 2018-05-07 2021-10-12 Apple Inc. Intelligent automated assistant for delivering content from user experiences
US11487364B2 (en) 2018-05-07 2022-11-01 Apple Inc. Raise to speak
US11900923B2 (en) 2018-05-07 2024-02-13 Apple Inc. Intelligent automated assistant for delivering content from user experiences
US11907436B2 (en) 2018-05-07 2024-02-20 Apple Inc. Raise to speak
US10984780B2 (en) 2018-05-21 2021-04-20 Apple Inc. Global semantic word embeddings using bi-directional recurrent neural networks
US11009970B2 (en) 2018-06-01 2021-05-18 Apple Inc. Attention aware virtual assistant dismissal
US11495218B2 (en) 2018-06-01 2022-11-08 Apple Inc. Virtual assistant operation in multi-device environments
US11630525B2 (en) 2018-06-01 2023-04-18 Apple Inc. Attention aware virtual assistant dismissal
US10403283B1 (en) 2018-06-01 2019-09-03 Apple Inc. Voice interaction at a primary device to access call functionality of a companion device
US10684703B2 (en) 2018-06-01 2020-06-16 Apple Inc. Attention aware virtual assistant dismissal
US11386266B2 (en) 2018-06-01 2022-07-12 Apple Inc. Text correction
US10720160B2 (en) 2018-06-01 2020-07-21 Apple Inc. Voice interaction at a primary device to access call functionality of a companion device
US10892996B2 (en) 2018-06-01 2021-01-12 Apple Inc. Variable latency device coordination
US10984798B2 (en) 2018-06-01 2021-04-20 Apple Inc. Voice interaction at a primary device to access call functionality of a companion device
US11360577B2 (en) 2018-06-01 2022-06-14 Apple Inc. Attention aware virtual assistant dismissal
US11431642B2 (en) 2018-06-01 2022-08-30 Apple Inc. Variable latency device coordination
US10944859B2 (en) 2018-06-03 2021-03-09 Apple Inc. Accelerated task performance
US10496705B1 (en) 2018-06-03 2019-12-03 Apple Inc. Accelerated task performance
US10504518B1 (en) 2018-06-03 2019-12-10 Apple Inc. Accelerated task performance
US11010561B2 (en) 2018-09-27 2021-05-18 Apple Inc. Sentiment prediction from textual data
US11462215B2 (en) 2018-09-28 2022-10-04 Apple Inc. Multi-modal inputs for voice commands
US11893992B2 (en) 2018-09-28 2024-02-06 Apple Inc. Multi-modal inputs for voice commands
US11170166B2 (en) 2018-09-28 2021-11-09 Apple Inc. Neural typographical error modeling via generative adversarial networks
US10839159B2 (en) 2018-09-28 2020-11-17 Apple Inc. Named entity normalization in a spoken dialog system
US11475898B2 (en) 2018-10-26 2022-10-18 Apple Inc. Low-latency multi-speaker speech recognition
US11638059B2 (en) 2019-01-04 2023-04-25 Apple Inc. Content playback on multiple devices
US11348573B2 (en) 2019-03-18 2022-05-31 Apple Inc. Multimodality in digital assistant systems
US11783815B2 (en) 2019-03-18 2023-10-10 Apple Inc. Multimodality in digital assistant systems
US11423908B2 (en) 2019-05-06 2022-08-23 Apple Inc. Interpreting spoken requests
US11307752B2 (en) 2019-05-06 2022-04-19 Apple Inc. User configurable task triggers
US11705130B2 (en) 2019-05-06 2023-07-18 Apple Inc. Spoken notifications
US11475884B2 (en) 2019-05-06 2022-10-18 Apple Inc. Reducing digital assistant latency when a language is incorrectly determined
US11675491B2 (en) 2019-05-06 2023-06-13 Apple Inc. User configurable task triggers
US11217251B2 (en) 2019-05-06 2022-01-04 Apple Inc. Spoken notifications
US11888791B2 (en) 2019-05-21 2024-01-30 Apple Inc. Providing message response suggestions
US11140099B2 (en) 2019-05-21 2021-10-05 Apple Inc. Providing message response suggestions
US11237797B2 (en) 2019-05-31 2022-02-01 Apple Inc. User activity shortcut suggestions
US11496600B2 (en) 2019-05-31 2022-11-08 Apple Inc. Remote execution of machine-learned models
US11360739B2 (en) 2019-05-31 2022-06-14 Apple Inc. User activity shortcut suggestions
US11289073B2 (en) 2019-05-31 2022-03-29 Apple Inc. Device text to speech
US11657813B2 (en) 2019-05-31 2023-05-23 Apple Inc. Voice identification in digital assistant systems
US11790914B2 (en) 2019-06-01 2023-10-17 Apple Inc. Methods and user interfaces for voice-based control of electronic devices
US11360641B2 (en) 2019-06-01 2022-06-14 Apple Inc. Increasing the relevance of new available information
CN110362803A (en) * 2019-07-19 2019-10-22 北京邮电大学 A kind of text template generation method based on the combination of domain features morphology
US11488406B2 (en) 2019-09-25 2022-11-01 Apple Inc. Text detection using global geometry estimators
US11765209B2 (en) 2020-05-11 2023-09-19 Apple Inc. Digital assistant hardware abstraction
US11914848B2 (en) 2020-05-11 2024-02-27 Apple Inc. Providing relevant data items based on context
US11924254B2 (en) 2020-05-11 2024-03-05 Apple Inc. Digital assistant hardware abstraction
US11755276B2 (en) 2020-05-12 2023-09-12 Apple Inc. Reducing description length based on confidence
US11838734B2 (en) 2020-07-20 2023-12-05 Apple Inc. Multi-device audio adjustment coordination
US11750962B2 (en) 2020-07-21 2023-09-05 Apple Inc. User identification using headphones
US11696060B2 (en) 2020-07-21 2023-07-04 Apple Inc. User identification using headphones

Also Published As

Publication number Publication date
US20160012036A1 (en) 2016-01-14
CN104584003B (en) 2017-08-11
CN104584003A (en) 2015-04-29
US9229924B2 (en) 2016-01-05
US20140058722A1 (en) 2014-02-27

Similar Documents

Publication Publication Date Title
US9229924B2 (en) Word detection and domain dictionary recommendation
US11157490B2 (en) Conversational virtual assistant
US10878009B2 (en) Translating natural language utterances to keyword search queries
US11200269B2 (en) Method and system for highlighting answer phrases
EP3183728B1 (en) Orphaned utterance detection system and method
US10629193B2 (en) Advancing word-based speech recognition processing
US11093711B2 (en) Entity-specific conversational artificial intelligence
JP6033326B2 (en) Automatic content-based input protocol selection
US10073840B2 (en) Unsupervised relation detection model training
US10558701B2 (en) Method and system to recommend images in a social application
US10049098B2 (en) Extracting actionable information from emails
WO2016008128A1 (en) Speech recognition using foreign word grammar
US9916301B2 (en) Named entity variations for multimodal understanding systems
US11947909B2 (en) Training a language detection model for language autodetection from non-character sub-token signals
WO2014190220A2 (en) Language model trained using predicted queries from statistical machine translation
US10963641B2 (en) Multi-lingual tokenization of documents and associated queries
CN111046168A (en) Method, apparatus, electronic device, and medium for generating patent summary information
CN112445907A (en) Text emotion classification method, device and equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13753968

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 13753968

Country of ref document: EP

Kind code of ref document: A1