CN101878476B - Machine translation for query expansion - Google Patents

Machine translation for query expansion Download PDF

Info

Publication number
CN101878476B
CN101878476B CN200880102717XA CN200880102717A CN101878476B CN 101878476 B CN101878476 B CN 101878476B CN 200880102717X A CN200880102717X A CN 200880102717XA CN 200880102717 A CN200880102717 A CN 200880102717A CN 101878476 B CN101878476 B CN 101878476B
Authority
CN
China
Prior art keywords
search
search inquiry
translation
document
inquiry
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN200880102717XA
Other languages
Chinese (zh)
Other versions
CN101878476A (en
Inventor
斯特凡·里茨勒
亚历山大·L·瓦谢尔曼
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Google LLC
Original Assignee
Google LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Google LLC filed Critical Google LLC
Publication of CN101878476A publication Critical patent/CN101878476A/en
Application granted granted Critical
Publication of CN101878476B publication Critical patent/CN101878476B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3338Query expansion

Abstract

Methods, systems and apparatus, including computer program products, for expanding search queries. One method includes receiving a search query, selecting a synonym of a term in the search query based on a context of occurrence of the term in the received search query, the synonym having been derived from statistical machine translation of the term, and expanding the received search query with the synonym and using the expanded search query to search a collection of documents. Alternatively, another method includes receiving a request to search a corpus of documents, the request specifying a search query, using statistical machine translation to translate the specified search query into an expanded search query, the specified search query and the expanded search query being in the same natural language, and in response to the request, using the expanded search query to search a collection of documents.

Description

The mechanical translation that is used for query expansion
Technical field
This instructions relates to search query expansion.
Background technology
Query expansion refers to revise the search inquiry that receives from the user before carrying out search.Ideally, compare with original query, amended search inquiry will produce improved Search Results.The typical method that is used for query expansion comprises the stem extraction of word, the correction of misspelling and the amplification of search inquiry, for example uses the synonym of the word that occurs in original query.
The method that has the synon query expansion of many uses.For example, can be from the dictionary of expert's appointment or the synonym of vocabulary body identified word.In some systems, from other similar to original query syntactically search inquiry identification synonym.May have a plurality of potential synonyms at word, and each is when having the implication of wide in range variation, synonym selects especially to be rich in challenge.For example, in inquiry " How to ship a box (how transporting chest) ", word " ship " can have synonym for example " boat (ship) " and " send (transmission) ".May cause the identification of incoherent Search Results with the inconsistent synonym expanding query of implication with user's expection.For example, the Search Results relevant with trawlboat may be with to transport chest uncorrelated.
Summary of the invention
This instructions is described the technology relevant with search query expansion.An aspect of the theme of generally speaking, describing in this manual can be implemented in comprising the method for following behavior: receive search inquiry; Select the synonym of the word in the search inquiry based on the appearance linguistic context of the word in the search inquiry that receives, described synonym is derived from the statistical machine translation of described word; Expand the search inquiry that receives with described synonym; And use the search inquiry of expanding to search for the set of document.Other embodiment of this aspect comprises corresponding system, equipment and computer program.
The embodiment of these and other can comprise the one or more of following characteristics alternatively.Identify a plurality of search inquiries that recorded, wherein said word occurs in each of the described search inquiry that has recorded.With statistical machine translation described a plurality of search inquiries that recorded are translated into the search inquiry of corresponding translation.Can identify a plurality of potential synonyms from the search inquiry of described translation.Potential synonym can be the one or more distinctive translation of the described word in the search inquiry of described translation, and wherein each potential synonym has the linguistic context of the appearance that is associated.By the appearance linguistic context of the word in the inquiry that receives is mated with each the potential synon linguistic context that occurs in the search inquiry of translation, can select synonym from a plurality of potential synonyms.Statistical machine translation can use two-way phrase to aim at.
Can be from a plurality of document recognition problem phrases and corresponding answer phrase.The problem phrase is used as source language, and the answer phrase of correspondence is used as target language, can set up the translation model for statistical machine translation.Can identify the first phrase of the first natural language form.By the first phrase being translated into the second phrase that the second nature language can generate the second nature linguistic form.Can identify the lexical or textual analysis of the first phrase (paraphrase) by the second phrase being translated back the first natural language.The first phrase is used as source language, and lexical or textual analysis is used as corresponding target language, can set up the translation model for statistical machine translation.
Can identify the Search Results access log.Search inquiry and the corresponding extracts that has recorded accordingly can be identified in each record in the Search Results access log.The extracts of the corresponding search inquiry that has recorded can be the part of the content of the document of accessing from the user, wherein in response to receiving the search inquiry that has recorded accordingly, described document is presented to the user as Search Results.To be used as source language from the search inquiry of Search Results access log, and the extracts of correspondence will be used as target language, set up the translation model that is used for statistical machine translation.Can filter record from inquiry log based on the corresponding information that is associated with each record.Corresponding information can be following one or more: document is with respect to the position of presenting to other document of user as Search Results, the time quantum of passing between Search Results and the user's access document is being provided to the user, and the time quantum of between user's access document and user's execution operation subsequently, passing.Can be following one or more from the part of the content of document: the title of document, the anchor word that is associated with document and the summary of document, wherein said summary can comprise the word from the search inquiry that has recorded accordingly.
Generally speaking, described theme aspect can realize in the method that comprises following behavior in this manual: receive the request that the corpus of document is searched for, described request is specified search inquiry; Use statistical machine translation the search inquiry of appointment to be translated into the search inquiry of expansion, the search inquiry of described appointment is identical natural language form with the search inquiry of described expansion; And in response to described request, search for the set of document with the search inquiry of described expansion.Other embodiment of this aspect comprises corresponding system, equipment and computer program.
Can implement specific embodiment and realize the one or more of following advantage.Can come expanded search inquiry, described word with word is synonym for search inquiry identification, thereby increases the possibility that relevant result may be provided in response to search inquiry.In some embodiments, only be that the synonym of being correlated with just is used in expansion in the situation of the linguistic context of given search inquiry, thereby avoid with inappropriate word expanding query.Can select to be used for the synonym of query expansion from the corpus of document, described document uses the selected Search Results of similar search inquiry based on other users.Such expansion generates the inquiry of expansion, and the inquiry of described expansion can be used to identify more relevant (for example, satisfying inquiry according to some rules), accurate Search Results.
The one or more embodiments of the detail of the theme described in this instructions will be set forth in the drawings and specific embodiments below.The advantage of other feature, aspect and theme will be apparent from embodiment, accompanying drawing and claims.
Description of drawings
Fig. 1 is the figure of example statictic machine translation system.
Fig. 2 shows from document and derives the right example of problem-answer.
Fig. 3 shows from inquiry log derived query-right example of extracts.
Fig. 4 shows from the phrase set and derives the right example of phrase-lexical or textual analysis.
Fig. 5 shows the example of using the statistical machine translation model to derive context map.
Fig. 6 is for the instantiation procedure that comes the expanded search inquiry with statistical machine translation.
Fig. 7 is the block diagram of example system.
Identical Reference numeral represents identical element among each figure.
Embodiment
Fig. 1 is the figure of example statictic machine translation system 100.Statictic machine translation system 100 is used to the sequence input word translation of source language form is become the word of the sequence translation of target language form.Statistical machine translation depends on statistical model, and described statistical model is based on the prior probability between the appearance of the word in training corpus and statistic correlation.It is different natural languages (for example, French, English, German or Arabic) with target language that the routine of statistical machine translation is used the hypothesis source language.Yet in principle, the natural language that uses as input needs not to be different with the natural language that provides as output.
Statictic machine translation system 100 comprises two different models: language model 117 and translation model 113.Language model 117 is used in mechanical translation determining that the text fragment possibility is the form (for example, using the probability that is associated with target language) of target language.In the situation that input text is with the source language form, translation model 113 is used to derive the potential translation (for example, using given source language text corresponding to the probability of target language text) of target language form.When receiving text fragment, two models all are used to implement the statistical machine translation of paragraph.Based on the possibility that any potential translation will occur in target language, language model 117 is used to determine that by in the potential translation of translation model suggestion which be the most rational.Thereby the translation of text fragment is: may be the form of target language by translation model 113 prediction with according to language model 117.Can say that two models form statistical machine translation model 110 together.
Before statistical machine translation model 110 can be used to the cypher text paragraph, from sample data (for example sample text) train language model 117 and translation model 113 both.Sample text is used as sample data, and language model 117 and translation model 113 are all derived from this sample text.For example, can come train language model 113 with the language corpus 130 of the sample text of target language form.Similarly, can train translation model 117 with the corpus 120 of parallel text, the corpus 120 of described parallel text comprises the sample text of source language and target language.In the corpus 120 of parallel text, for the given paragraph of the text of source language form, provide the text fragment of the correspondence of target language form, it is assumed that the translation of the given paragraph of source language form.
Statistic correlation between the appearance of the word of the word of source language form and target language form is represented as the aligning between certain words or phrase.When target language and source language were identical natural language, it was identical aiming at right main meaning.Aim at word or expression and have similar implication to being assumed to be, namely it is assumed to be it is synonym.For example, word " ship " in some cases (for example, specific linguistic context in) can aim at word " transport (transporting) ".Thereby for these situations, " ship " and " transport " are synonyms.
The search inquiry 140 that statistical machine translation model 110 is used to receive is translated into the search inquiry of translation.Each search inquiry that receives 140 comprises that the descriptor searchers wishes from the text of the content of search corpus 180 retrievals.Ideally, the translation identification synonym of the search inquiry 140 that receives, described synonym does not provide in the search inquiry that receives but it improves the Search Results in response to inquiry, for example, when the search inquiry 150 of the search inquiry 140 that receives and the expansion of using synonym to derive has identical substantially implication.
In some embodiments, the inquiry of translation is used as the search inquiry 150 of expansion.In other embodiment, the search inquiry 140 that the search inquiry of translation is used to receive is extended to the search inquiry 150 of expansion.The search inquiry 140 that expansion receives can comprise the word that occurs in the search inquiry that is added on translation and do not occur in the inquiry 140 that receives.The search inquiry 150 of expansion is used to search search corpus 180.By search engine 160 assist search the search corpus 180.Search search corpus 180 produces the Search Results 170 that can offer the information seeker in response to the search inquiry 140 that receives.
Search engine 160 can be the part as the search system of the computer program realization that for example moves by the interconnective one or more computing machines of network in one or more positions.Search engine 160 responds to inquiry by generating Search Results, and described Search Results is for example identified the result corresponding to the position in storage vault of inquiry.
When search engine 160 received inquiry, search engine 160 usefulness information retrieval techniques were identified relevant resource (for example, the document in the set of source).Search engine 160 generally includes rank engine (or other software) resource relevant with inquiry is carried out rank.Can be that the index resource determines that the routine techniques of information retrieval score carries out the rank of resource with being used for for given inquiry.Specific resources can be determined by any appropriate technology about the ad hoc inquiry word or about the correlativity of other information that provides.
For the purpose of following discussion, any easily statistical machine translation embodiment can be used to cypher text.In some embodiments, one or more usefulness that are configured to improve the synonym selection of the following feature of statistical machine translation embodiment.For example, can remove irrelevant information by pre-service training corpus, for example punctuation mark or format indicate (for example, coming HTML (Hypertext Markup Language) (HTML) sign since the corpus of webpage derivation).In some embodiments, aim at and Phrase extraction according to conventional practice configuration sentence and block alignment, word.
In some embodiments, the strongly connected identification of the statistical machine translation model preference of derived query expansion between the synonym word rather than the generation of smooth translation phrase.Usually, be better than many alignings between the small possible synonym at the less aligning between the highly possible synonym.Thereby the statistical machine translation embodiment can be configured to only be identified in the possible aligning of height between object phrase and the source phrase.For example, the statistical machine translation embodiment can be configured to only have that just to aim at phrase-such aligning when alignment applications in two sides translation direction may be accurate.Thereby, translate into the second phrase if aim at indication the first phrase, and the 3rd phrase (rather than translating back the first phrase) translated in the second phrase, then to will definitely in translation model, being omitted.
A parameter that is used for the configuration of statistical machine translation embodiment is the null word probability.For given word pair, the null word probability is used for determining allowing the word of how many ratios in the source text not to be mapped to word in the target text by the statistical machine translation model.For the corpus of parallel text, the source language phrase can have than the target language phrase of correspondence the word that significantly lacks.As described below, this for problem-answer to especially correct.Under these circumstances, the null word probability of embodiment can be arranged relatively high.For example, when from answering the common problem more much longer than its corresponding problem-when the right Parallel Corpus of answer is set up translation model, can null word probability value of being set to 90%.
Use iterative process until local optimum is determined, expectation-maximization (" EM ") technology can be used to the estimated parameter value and aim at probability.The maximal possibility estimation of the variable in the EM technique computes probability model.The EM technology is two step process.Desired step is by including the variate-value former state that observes to calculate the expectation to possibility.Maximization steps is calculated maximal possibility estimation by the possibility maximization of the expectation that will calculate in desired step.Process is iteration between expectation and maximization steps, and the value of the variable that wherein will calculate in maximization steps is used for next desired step.Term " EM technology " refers to the technology that a class is relevant: expectation and maximization steps provide road sign to develop specific EM technology.In some embodiments, other technology is used to look for maximal possibility estimation rather than EM technology, for example Gradient Descent or conjugate gradient technology.
Use is such as the technology of EM technology, and translation model 113 is trained to determine most probable parameter value and aligning.
Three kinds of diverse ways that are used for training statistical machine translation model have been described in following discussion.In front two kinds of methods, derive the search inquiry of translation from text, the result that described text representative will be provided in response to search inquiry.In other words, train translation model at the Parallel Corpus of the text that comprises inquiry (source language) and corresponding Search Results (target language).Ideally, the query sample representative may be by the search inquiry of 110 receptions of statistical machine translation model and translation.Similarly, the corresponding Search Results representative result that will respond to each sample queries.
In first method, take problem-answer to as the grounding translation model.Right for each problem-answer, the answer of problem representation inquiry and its correspondence represents the result that is correlated with.Can be for example from FAQs (FAQ) document the content recognition problem-answer is right.Such document typically comprises series of problems, and for each problem separately answer is arranged.Generally speaking, can analyze any document and determine whether the content of document comprises problem and the answer of the problem of can being merged in-answer Parallel Corpus.Further details about how from document recognition problem and answer will be described with reference to figure 2 below.
In the second approach, can be take inquiry-take passages to as the grounding translation model.Each inquiry-answer is to the expression search inquiry with in response to the content of the Search Results of the correspondence of search inquiry.For example, the search inquiry that receives from the user of the information retrieval system of for example search engine can be recorded and be stored.For in these search inquiries of having stored each, also can be stored to the Search Results that the user presents in response to receiving search inquiry.In some embodiments, system can further identify the user and access the Search Results which presents (for example, user search which document).These Search Results that the user accessed be possible especially relevant with corresponding search inquiry Search Results.In the future freely the part of the content of the document of each Search Results identification (for example taking passages) and search inquiry be made into to add to formation in inquiry-extracts Parallel Corpus inquiry-extracts is right.The further details that derive Parallel Corpus from the record of search inquiry about how will be described with reference to figure 3 below.
In the third method, can be take synonym phrase-lexical or textual analysis to as the grounding translation model.Each phrase-lexical or textual analysis is to comprising phrase and corresponding lexical or textual analysis, and described lexical or textual analysis has the roughly the same implication of the phrase corresponding with it.In some embodiments, can specify phrase-lexical or textual analysis to (for example passing through language specialist) in the artificially.In other embodiment, automatically identify phrase from the corpus of text at first.From corpus is that the phrase of the first natural language is selected and be translated into another phrases of the second different natural language forms.Use any easily full-automatic or semiautomatic machine translation technology can realize this translation.Then the phrase of the second nature linguistic form is translated back the first natural language.The phrase of supposing each dual translation that this process generates is the synonym lexical or textual analysis of original input phrase.The further details that derive Parallel Corpus from such translation about how will be described with reference to figure 4 below.
Being used for the target language corpus 130 of train language model 117 can change.In some embodiments, this corpus only is the sample from the content of search corpus 180.For example, for internet search engine, can use by the content in the corpus of webpage of search engine retrieving and cataloguing and come train language model.As an alternative, in other the embodiment, come train language model with the search inquiry that has recorded at some.
Fig. 2 shows from document 210 derivation problem-answer 235 example.The document that may comprise problem and answer can be based on the word identification that may occur in such document.For example, in the document that the Internet finds, keyword " FAQ " or " Frequently AskedQuestions (FAQs) " occur on the webpage of feature take series of problems and corresponding answer through being everlasting.Such keyword can at first be used to identify the set of the document that comprises potentially problem and answer.In some embodiments, sorter is trained to identify the page in the corpus of document.Such sorter can be the expert user training of (for example comprising the appearance of keyword or question mark and other punctuation mark of five " wh-word " (who, what, why, when and where) in document) of common feature by for example specifying problem-answer document.Which sorter can make to identify at the corpus of document may comprise problem and answer.
After the set of having identified potential problem-answer document, extract from document that independent problem-answer is right.Can be based on the answer of punctuation mark (for example question mark of the end of defining issue), formatting identifying symbol (for example in problem and the paragraph Separator between answering), tabulation mark the problem sequence identifier of (for example such as " Q: " or " 1: ") and vocabulary clue (the wh-word of beginning capitalization that for example is used for the beginning of defining issue) extraction problem and its correspondence.
For each problem of appointment in the content of document 210, extract the answer text 220 of the correspondence of question text 230 and problem from document.Question text 230 and answer the problem that text 220 expressions add Parallel Corpus to-answer to 235.Can process similarly in the set all documents with derive in the Parallel Corpus 240 problem-answer is right.
Fig. 3 shows from the right example of inquiry log 310 derived query-extracts.Inquiry log 310 comprises the search inquiry 350 that has recorded.For each search inquiry that has recorded 350, corresponding Search Results 353 also is identified in inquiry log 310.Each Search Results identification of Search Results 353 comprises the document of text.Each document can be corresponding with record, file, webpage in database or some other content containers.The Search Results 353 of record is those results relevant with search inquiry 350 most probables in daily record 310.
In some embodiments, in the Search Results of determining to present in response to search inquiry with extra information which is maximally related.For example, check that by the user Search Results of (for example based on clicking or the document access record) can be considered to be confirmed as the Search Results relevant with search inquiry 350.
Each Search Results 353 is associated with the content 357 of the document that comes free Search Results 353 identifications.In some embodiments, content 357 is the texts by the document of Search Results 353 identifications.In some embodiments, content 357 comprises location identifier (for example can find uniform resource locator (URL) or the file/pathname of content 357 from it).In other embodiment, content 357 comprises the text (for example making to point to the anchor text of the document of Search Results 353 identifications in webpage) that is made to point to document by other document.
In some embodiments, each Search Results 353 is associated with the attribute 355 of describing the feature of Search Results 353 when presenting each Search Results in response to corresponding search inquiry 350.For example, the position that presents with respect to other Search Results of order attribute description particular search result.For example, the order of Search Results can be five, and its indication Search Results ranked fifth in the Search Results that presents in response to search inquiry 350.In some embodiments, the access length attribute is described user's access by the time span of the given document of particular search result identification.Attribute also can comprise about providing the Search Results when information of access search results of (perhaps as an alternative, with when receive search inquiry compare) user of comparing with when to the user.For example, attribute can specify in that the user submits search inquiry 350 to or present in response to search inquiry 350 Search Results after 25 seconds the user access given Search Results.
The attribute 355 of Search Results can be used to filter inquiry 350 and corresponding Search Results 353 from daily record 310.Filtration can be used to remove any Search Results that correlativity does not surpass the threshold value of appointment.By can measure the correlativity of Search Results for any given attribute specified requirements.Can the Search Results that satisfy condition be omitted specified requirements.
In some embodiments, condition is the threshold value of appointment.For example, only being lower than the Search Results that the 5th Search Results occurs just is used.In some embodiments, this rule was effective when the possibility of result was accessed by the user at uppermost (for example the first five is individual).As an alternative, the user accesses the result and can be omitted less than all Search Results in ten seconds.Can specify other condition that is used for selecting or omitting Search Results based on available attribute.
Whenever a pair of in the search inquiry 350 that has recorded and the corresponding Search Results 353, derived query-extracts is to 320.Inquiry-extracts comprises search inquiry 350 and the corresponding extracts 340 that has recorded to 320.Take passages 340 from Search Results 353 derivation, especially the content 357 from Search Results identification derives.In some embodiments, taking passages 340 is the text strings that extract from content 357.
Text string can comprise the word relevant with given search inquiry 350.For example, occur in the includable appearance 357 of text string and be included in the sentence of any word that occurs in the search inquiry 350 or the part of sentence.Take passages 340 and also can comprise other content, the title that for example is associated with content, the location identifier of content or be used to refer to the anchor text of the content in other document.Thereby for any given search inquiry, each of the Search Results that can record from daily record 310 derives multiple queries-extracts to 320.Each inquiry-extracts of deriving from daily record 310 is added to the right Parallel Corpus of inquiry-extracts 380 for using when training translation model (for example translation model 113 of Fig. 1) 320.
Fig. 4 shows from phrase set 410 and derives phrase-lexical or textual analysis to 455 example.Phrase set 410 is set of the single phrase of the first natural language form, and its natural language with the search corpus is identical.In some embodiments, automatically identify these phrases from the content of the corpus of document.In other embodiment, can specify phrase or even phrase-lexical or textual analysis pair by language specialist.
The second different natural language translated in input phrase 415 in the set 410.Can be become by the text translation with the first natural language the first translation component 420 of the text (for example being used for English Translation is become Chinese) of the second nature language to carry out translation.The first translation component 420 can be any easily translating equipment that comprises by the translation of language expectation or use mechanical translation.The result of translation is the phrase 430 through translation of the second nature linguistic form.The phrase 430 through translation that is assumed to the translation of input phrase 415 has and input phrase 415 similar implications.
The second translation component 440 is translated the text (for example, being used for English is returned in translator of Chinese) that is used for the text translation of the second nature language is become the first natural language through the phrase of translation again.The phrase of the dual translation that therefore produces is with input phrase 415 phrase of identical natural language to be arranged, and is assumed to be the phrase 430 similar implications that have with through translation.By association, the phrase of dual translation is assumed that the synonym lexical or textual analysis 450 of input phrase 415.
In situation about a kind of phrase of linguistic form being derived from another character string of same-language by translating with given foreign language, the character string that derives can be considered to be in this foreign language upper rotary (pivot) and obtain.The lexical or textual analysis 450 of deriving and input phrase 415 are used as phrase-lexical or textual analysis to 455 and are added in the Parallel Corpus 480.
Suppose such as phrase-lexical or textual analysis particular phrase-lexical or textual analysis of 455 pair, the possibility that lexical or textual analysis 450 translated in input phrase 415 is defined as inputting phrase 415 and translates into through the phrase 430 of translation with through the phrase 430 of translation and translate into the associating possibility of lexical or textual analysis 450.In some embodiments, two events are assumed that independence is possible, thereby and can be represented as:
p ( para | in ) = max trans p ( trans | in ) p ( para | trans )
p ( in | para ) = max trans p ( trans | para ) p ( in | trans ) .
At this, will input phrase table and be shown in, the phrase table that lexical or textual analysis is expressed as para and translation is shown trans.Generally speaking, given phrase-lexical or textual analysis is to obtaining by transferring last time in a plurality of foreign language.Right translation possibility can be obtained from each right translation of turning round linguistic form.In some embodiments, the right translation possibility of phrase-lexical or textual analysis can be assigned the summation of these all right translation possibilities of all foreign language forms.Yet, too high probability assignment may be given so phrase-lexical or textual analysis that many linguistic forms occur pair.In other embodiment, right translation possibility is the maximum translation possibility of any foreign language.
Fig. 5 shows the example of using statistical machine translation model 520 to derive context map 580.In some embodiments, when receiving search inquiry, search inquiry is translated into the search inquiry of expansion with statistical machine translation model 520.Such embodiment can be described to synchronous translation on line, because model 520 is used to when each search inquiry is received this search inquiry be translated.
In other embodiment, use statistical machine translation model 520 that the search inquiry that is pre-existing in is translated into the corresponding translation that is pre-existing in.The translation that these can be pre-existing in is recorded in the context map 580.Afterwards, can expand new search inquiry based on the translation that is pre-existing in the context map 580.Such embodiment can be described to asynchronous offline translation because the search inquiry that is pre-existing in of translation at first, and only after come expanded search to inquire about with the result of translation process.When statistical machine translation needed relatively more resource, this off-line method can be than more effective at line method.Because the translation that is pre-existing in that expansion is determined based on statistical machine translation model 520, so the final synonym of still identifying based on statistical machine translation model 520 of the expansion of inquiry.
Identification comprises the inquiry log 510 of search inquiry 515.Search inquiry 515 representative may after the search inquiry that is received and is expanded.In some embodiments, inquiry log 310 is the records from the search inquiry of search engine (example is search engine 160 as shown in Figure 1) reception.Be used as the input on the statistical machine learning model 520 that to derive from above-mentioned training method from the search inquiry 515 of inquiry log 510.
Each inputted search by statistical translation model 520 translations is inquired about the translation of the correspondence that produces this inputted search inquiry.The search inquiry of each translation is by being expanded potentially by the 520 performed translations of statistical machine translation model.For example, the translation of search inquiry " how to become amason (how becoming the stonemason) " can produce the search inquiry " how to bea bricklayer (how being the bricklayer) " through translation.
Comparison module 540 compares to determine which synonym is used in translation with the inputted search inquiry with the search inquiry of translation, if synonym exists.In some embodiments, comparison module 540 compares to determine which word is replaced in translation one by one with the inquiry of input inquiry and translation word.Different any words is identified as the synonym 560 of the word of the correspondence in the inputted search inquiry in the search inquiry of translation.
Synonym (word or synonym phrase) can be replaced any word in the original words and phrases.For example, replaced by word " bricklayer (bricklayer) " translation from the word " mason (stonemason) " of relatively can determining that above-mentioned exemplary search inquiry is carried out, and word " become (becoming) " is replaced by phrase " be (being) ".
Any amount of word can be in original query the left side or the right of the word that is replaced occur.These words are considered to the linguistic context 550 that synonym is replaced.Thereby the word that is replaced for getting, is given the linguistic context of the word that is replaced in the inputted search inquiry with specific synonym.The word on these left sides and the right is stored in the context map 580 with synonym as the linguistic context on the left side and the right.For example, from above-mentioned example relatively, the linguistic context on word " mason ", its synonym " bricklayer " and the left side " how to become a " is added in the context map.The linguistic context " how to " on word " become ", its synonym " be ", the left side and the linguistic context " a mason " on the right also are added in the context map.
Behind the search inquiry in processing inquiry log 510, context map comprises a plurality of target word.In the target word each is at least one word replaced with synonym by Machine Translation Model in the search inquiry of record.Each target word is associated with at least one synonym and each synonym is associated with the linguistic context on the corresponding left side and the right.In some embodiments, any one synonym can be associated with the linguistic context on a plurality of left sides and the right, and the linguistic context on the described left side and the right is unique concerning the linguistic context on other synon left side of same target word and the right.
In some embodiments, for any given word in the context map 580, each the potential synonym with the linguistic context that is associated is associated with score value.The given synonym of value representation was the possibility of the appropriate expansion of the word in the given linguistic context in potential synon minute.The translation possibility that this score value is provided by Machine Translation Model when being translated from the search inquiry at record derives.The translation possibility is that exporting text much may be the measurement of the translation of input text.Usually, the translation possibility comprises the language probability that combines with translation probability, predicts as the statistical machine translation model.
When selecting that in a plurality of synonyms which be used for expanding query, can use synon score value.For example, specific context map can comprise the word " tie " that is associated with synonym " knot " and " windsor ", and wherein two synon linguistic context are identical (for example " how totie a ").When context of use shone upon escape character (ESC) string " how to tie a tie ", synonym " knot " was used rather than synonym " windsor ", because " knot " is associated with the score value higher than the score value of synonym " windsor ".
Fig. 6 is for the instantiation procedure 600 that comes the expanded search inquiry with statistical machine translation.For facility, come description process 600 with reference to the system of implementation 600.System receives search inquiry (step 610).Search inquiry can be provided from search engine (for example search engine Fig. 1 160) by the user of the information of searching.In other embodiment, another process that is derived from search inquiry therein or use and receive search inquiry procedurally.
600 pairs of search inquiries that receive of system are expanded (step 620).Especially, (for example context map 580 of Fig. 5) can be shone upon by context of use in system, comes the expanded search inquiry according to the synonym for the word that occurs at search inquiry of identifying.
In some embodiments, (step 630) selected to the word that occurs by system in the search inquiry that receives.Based on selected word, system is from the potential synonym (step 640) of context map identification.In context map, selected word is associated with several synonyms, and each has separately linguistic context described synonym.Each synonym of context map for example uses statistical machine translation to derive.One (step 650) in several synonyms selected based on the linguistic context of the selected word in the linguistic context that is associated with synonym and the search inquiry that receives by system.Synonym with linguistic context that the linguistic context with selected word is complementary is used to the expanded search inquiry.
Especially, whether system is complementary to identify specific synonym with the left side of selected word or the linguistic context on the right based on the linguistic context on the synon left side or the right.For example, for inquiry " how to tie a bow ", the left side of the word in this inquiry " tie " and the linguistic context on the right are respectively " how to " and " a bow ".In context map, word tie may be associated with two synonyms " equal " and " knot ".If " how to " or " a bow " is the left side that is associated with " knot " or the linguistic context on the right, then " knot " is selected as the synonym of " tie ".In some embodiments, if some part of the word in two linguistic context is identical, then these two linguistic context are considered to be complementary.For example, if latter two word of the linguistic context on two left sides is identical, then the linguistic context on the linguistic context on a left side and another left side is complementary.Similarly, if the first two word of the linguistic context on two the right is identical, then the linguistic context on the linguistic context on a right and another the right is complementary.In some embodiments, when the linguistic context of the word that is expanded in a plurality of synon linguistic context and the inquiry is complementary or part when mating, the synonym with the longest linguistic context is selected.
System adds the synonym that comes in the inquiry with this identification to by the synonym with identification and comes expanded search inquiry (step 660).Come the expanded search inquiry by the search inquiry that receives with the synonym amplification.In some embodiments, only synonym is appended to inquiry.In other embodiment, search inquiry by re so that the word that is expanded and synonym with the mode combination of logical disjunct (for example "or").For example, inquiry " how to be a mason " is extended to " how to (be or become) a (mason or bricklayer) ".Search inquiry search corpus (step 670) with expansion.Search Results to the identification specific resources (for example webpage, image, text document, process, content of multimedia) that responds of search inquiry of expansion can be returned by (for example to the user) subsequently.
As an alternative, as mentioned above, use the statistical machine translation at line method can be used (for example in step 620).In the method, search inquiry is directly translated into the search inquiry of corresponding translation.Search inquiry and the search inquiry of translation can be compared to be identified in employed synonym in the translation.System comes the expanded search inquiry with these synonyms.Search inquiry search corpus (step 670) with expansion.
Fig. 7 shows and is suitable for realization equipment or carries out the in this manual block diagram of the example system 700 of the method for the various aspects of the theme of description.System 700 can comprise processor 710, storer 720, memory storage 730 and input/output device 740.In the assembly 710,720,730 and 740 each uses system bus 750 to interconnect.Processor 710 can be processed for the instruction in system's 700 interior execution.In one embodiment, processor 710 is single-threaded processor.In another embodiment, processor 710 is multiline procedure processors.Processor 710 can be processed to be stored in the storer 720 or in the instruction on the memory storage 730 and come to be the user interface display graphics information on the input/output device 740.
Storer 720 is computer-readable mediums of the information in the storage system 700, such as volatibility or non-volatile.Memory storage 730 can provide lasting storage for system 700.Memory storage 730 can be diskette unit, hard disk unit, optical disc apparatus or belting, perhaps other suitable persistent storage.Input/output device 740 provides input/output operations for system 700.In one embodiment, input/output device 740 comprises keyboard and/or indicating device.In another embodiment, input/output device 740 comprises the display unit for the display graphics user interface.
The theme of describing in this manual and the embodiment of feature operation can be in comprising this instructions in the Fundamental Digital Circuit of disclosed structure and its structural equivalents or realize in computer software, firmware or the hardware or in above-mentioned one or more combination.The embodiment of the theme of describing in this manual can be used as one or more computer programs and realizes, namely is used for carrying out or controlling by data processing equipment the one or more modules that are coded in the computer program instructions on the tangible program carrier of the operation of data processing equipment.Tangible program carrier can be transmitting signal or computer-readable medium.Transmitting signal is the signal of the artificial generation of the signal of electricity, light or the electromagnetism that generates of machine for example, and described signal is generated encodes to be transferred to suitable receiver apparatus to carry out by computing machine to information.Computer-readable medium can be the constituent of material of machine-readable storage device, machine readable storage substrate, storage arrangement, realization machine readable transmitting signal or above-mentioned one or more combination.
Term " data processing equipment " comprises all devices, device and the machine for the treatment of data, comprises programmable processor, computing machine or a plurality of processor or computing machine as example.Except hardware, equipment can comprise the code of the execution environment that create to be used for the computer program just discussed, for example consist of processor firmware, protocol stack, data base management system (DBMS), operating system or above-mentioned in the code of one or more combinations.
Computer program (being also referred to as program, software, software application, script or code) can be write by programming language in any form, described programming language comprises compiling or interpretative code or statement or procedural language, and described computer program can be disposed in any form, and described form comprises as stand-alone program or as module, assembly, subroutine or other unit of being suitable for using in computing environment.Computer program does not need corresponding with the file in the file system.Program can be stored in the part of the file that keeps other program or data (for example being stored in the one or more scripts in the marking language document), in the Single document that is exclusively used in the program of just discussing or in a plurality of coordinative files (for example storing the file of the part of one or more modules, subroutine or code).Computer program can be deployed as on a computing machine or distribute and carry out by the interconnective a plurality of computing machines of communication network being positioned at a place or striding a plurality of places.
Process and the logic flow described in this manual can be carried out by one or more programmable processors, and described programmable processor is carried out one or more computer programs and carried out function by the input data are operated and generate output.Described process and logic flow also can be carried out by dedicated logic circuit, and equipment also can be used as dedicated logic circuit and realize, described dedicated logic circuit is FPGA (field programmable gate array) or ASIC (special IC) for example.
As example, the processor that is suitable for computer program comprises any one or a plurality of processor of the digital machine of general and special microprocessor and any type.Usually, processor will receive instruction and data from ROM (read-only memory) or random access memory or both.The essential elements of computing machine is for the processor of carrying out instruction with for one or more storage arrangements of storing instruction and data.Usually, computing machine also will comprise or connect with being operated from one or more mass storage device receive datas of being used for the storage data or with data be sent to described mass storage device or both all can, described mass storage device is magnetic, magneto-optic disk or CD for example.Yet computing machine needn't have such device.In addition, computing machine can be embedded in another device, described another installs for example mobile phone, personal digital assistant (PDA), Mobile audio frequency or video player, game console, GPS (GPS) receiver, only enumerated some.
The computer-readable medium that is suitable for storing computer program instructions and data comprises nonvolatile memory, medium and the storage arrangement of form of ownership, comprises for example semiconductor storage of EPROM, EEPROM and flash memory device as example; The disk of internal hard drive or removable dish for example; Magneto-optic disk; And CD-ROM and DVD-ROM dish.Processor and storer can be augmented or incorporated into dedicated logic circuit by dedicated logic circuit.
For mutual with the user is provided, the embodiment of the theme of describing in this manual can realize at the computing machine that has with lower device: be used for showing to the user display device of information, for example CRT (cathode-ray tube (CRT)) or LCD (liquid crystal display) monitor; And the indicating device of keyboard and for example mouse or tracking ball, can provide input to computing machine by described keyboard and indicating device user.Mutual with the user also can be provided with the device of other type; The feedback that for example provides to the user can be any type of sensory feedback, for example visual feedback, audio feedback or tactile feedback; And can receive in any form the input from the user, described form comprises sound, speech or sense of touch input.
Although this instructions comprises many concrete embodiment details, but these should not be interpreted as the restriction to the scope of any scope of invention or the patent claim that may advocate, on the contrary as can be specific to the description of the feature of the specific embodiment of specific invention.In this manual, some feature of describing in the background of independent embodiment also can the mode with combination realize in single embodiment.On the contrary, each feature of describing in the background of single embodiment also can realize in a plurality of embodiment or with any suitable sub-portfolio individually.In addition, although feature may be described as be in the above in some combination and work, and even advocated at first for like this, but can leave out from combination in some cases from the one or more features in the combination of advocating, and the combination of advocating can be used to the variant of sub-portfolio or sub-portfolio.
Similarly, although operation is described with specific order in the accompanying drawings, this is not appreciated that and requires to carry out such operation with the certain order that illustrates or with sequential order, and perhaps all illustrated operations are performed the result that obtains to expect.In some cases, multitask and parallel processing may be useful.In addition, the separation of each system component among the embodiment that describes in the above is not appreciated that and requires such separation in all embodiment, and is to be understood that described program assembly and system usually can be integrated in together in the single software product or be packaged in a plurality of software products.
The specific embodiment of the theme of describing has in this manual been described.Other embodiment is in the scope of claims.For example, the result of expectation can carry out and still obtain with different order to the behavior of narrating in claims.As an example, the result that the process that is described in the drawings needn't require the certain order that illustrates or sequential order to obtain to expect.In some embodiments, multitask and parallel processing may be useful.

Claims (16)

1. one kind is used for the computer implemented method that expanded search is inquired about, and comprising:
Reception comprises the search inquiry of word;
Identify a plurality of search inquiries that recorded, wherein said word occurs in each of the described search inquiry that has recorded;
With statistical machine translation described a plurality of search inquiries that recorded are translated into the search inquiry of corresponding translation;
Identify a plurality of potential synonyms from the search inquiry of described translation, potential synonym is the one or more distinctive translation of the described word in the search inquiry of described translation, and each potential synonym has the appearance linguistic context that is associated;
By the appearance linguistic context of the described word in the search inquiry that receives and the described synon linguistic context that occurs in the search inquiry of translation are mated, from described a plurality of potential synonyms, select the synonym of the described word in the described search inquiry;
Expand the search inquiry that receives with selected synonym; And
Search for the set of document with the search inquiry of expanding.
2. the method for claim 1, wherein said statistical machine translation use two-way phrase to aim at.
3. the method for claim 1 further comprises:
From a plurality of document recognition problem phrases and corresponding answer phrase; And
Described problem phrase is used as source language, and the answer phrase of described correspondence is used as target language, set up the translation model that is used for described statistical machine translation.
4. the method for claim 1 further comprises:
Identify the first phrase of the first natural language form;
By the second phrase that the second nature language generates described the second nature linguistic form translated in described the first phrase;
By the lexical or textual analysis that described the first natural language is identified described the first phrase translated back in described the second phrase; And
Described the first phrase is used as source language, and described lexical or textual analysis is used as corresponding target language, set up the translation model that is used for described statistical machine translation.
5. the method for claim 1 further comprises:
Identification Search Results access log, search inquiry and corresponding extracts that each record identification in described Search Results access log has been recorded accordingly, the described extracts of the corresponding search inquiry that has recorded is the part of the content of the document of accessing from the user, and the described document that described user accesses is presented to described user in response to receiving the corresponding search inquiry that has recorded as Search Results; And
To be used as source language from the described search inquiry of described Search Results access log, and the extracts of described correspondence will be used as target language, set up the translation model that is used for described statistical machine translation.
6. method as claimed in claim 5 further comprises:
Filter record from inquiry log based on the corresponding information that is associated with each record, corresponding information is following one or more:
The described document that described user accesses is with respect to the position of presenting to other document of described user as Search Results;
Providing described Search Results and described user to access the time quantum of passing between the described document to described user; And
Access the time quantum of passing between described document and the described user execution operation subsequently described user.
7. method as claimed in claim 5, the part of the content of wherein said document of accessing from the user are following one or more:
The title of the described document that described user accesses;
The anchor word that is associated with described document that described user accesses; And
The summary of the described document that described user accesses, described summary comprises the word from the corresponding search inquiry that has recorded.
8. one kind is used for the computer implemented method that expanded search is inquired about, and comprising:
The request that reception is searched for the corpus of document, described request is specified search inquiry;
Use statistical machine translation specified search inquiry to be translated into the search inquiry of translation, specified search inquiry is identical natural language form with the search inquiry of described translation;
From the potential synonym of search inquiry identification of described translation, described potential synonym is the one or more distinctive translation of described word, and each potential synonym has the appearance linguistic context that is associated;
By the appearance linguistic context of the described word in the search inquiry that receives is mated with the described potential synon linguistic context that occurs in the search inquiry of translation, from described potential synonym, select synonym;
Expand the search inquiry that receives with selected synonym; And
In response to described request, use the search inquiry of expanding to search for the set of document.
9. one kind is used for the system that expanded search is inquired about, and comprising:
Be used for receiving the device of the search inquiry that comprises word;
Be used for identifying the device of a plurality of search inquiries that recorded, wherein said word occurs in each of the described search inquiry that has recorded;
Be used for statistical machine translation described a plurality of search inquiries that recorded being translated into the device of the search inquiry of corresponding translation;
Be used for identifying a plurality of potential synon devices from the search inquiry of described translation, potential synonym is the one or more distinctive translation of the described word in the search inquiry of described translation, and each potential synonym has the appearance linguistic context that is associated;
Be used for by will mating with the described synon linguistic context that occurs in the search inquiry of translation in the appearance linguistic context of the described word of the search inquiry that receives, from described a plurality of potential synonyms, select the synon device of the described word in the described search inquiry;
Be used for expanding with selected synonym the device of the search inquiry that receives; And
Use the search inquiry of expanding to search for the device of the set of document.
10. system as claimed in claim 9, wherein said statistical machine translation uses two-way phrase to aim at.
11. system as claimed in claim 9 further comprises:
Be used for from the device of the answer phrase of a plurality of document recognition problem phrases and correspondence; And
Be used for described problem phrase is used as source language and the answer phrase of described correspondence is used as target language to set up the device of the translation model that is used for described statistical machine translation.
12. system as claimed in claim 9 further comprises:
The device that is used for the first phrase of identification the first natural language form;
Be used for by the device that the second nature language generates the second phrase of described the second nature linguistic form translated in described the first phrase;
Be used for by the device that described the first natural language is identified the lexical or textual analysis of described the first phrase translated back in described the second phrase; And
Be used for described the first phrase is used as source language and described lexical or textual analysis is used as corresponding target language to set up the device of the translation model that is used for described statistical machine translation.
13. system as claimed in claim 9 further comprises:
The device that is used for identification Search Results access log, search inquiry and corresponding extracts that each record identification in described Search Results access log has been recorded accordingly, the described extracts of the corresponding search inquiry that has recorded is the part of the content of the document of accessing from the user, and the described document that described user accesses is presented to described user in response to receiving the corresponding search inquiry that has recorded as Search Results; And
Be used for to be used as from the described search inquiry of described Search Results access log source language and the extracts of described correspondence is used as target language to set up the device of the translation model that is used for described statistical machine translation.
14. system as claimed in claim 13 further comprises:
Be used for filtering device from the record of inquiry log based on the corresponding information that is associated with each record, corresponding information is following one or more:
The described document that described user accesses is with respect to the position of presenting to other document of described user as Search Results;
Providing described Search Results and described user to access the time quantum of passing between the described document to described user; And
Access the time quantum of passing between described document and the described user execution operation subsequently described user.
15. system as claimed in claim 13, the part of wherein said content from document are following one or more:
The title of the described document that described user accesses;
The anchor word that is associated with described document that described user accesses; And
The summary of the described document that described user accesses, described summary comprises the word from the corresponding search inquiry that has recorded.
16. a system that is used for the expanded search inquiry comprises:
Be used for receiving the device of the request that the corpus of document is searched for, described request is specified search inquiry;
Be used for using statistical machine translation specified search inquiry to be translated into the device of the search inquiry of expansion, specified search inquiry is identical natural language form with the search inquiry of expanding;
Be used for the potential synon device of search inquiry identification from described translation, described potential synonym is the one or more distinctive translation of described word, and each potential synonym has the appearance linguistic context that is associated;
Be used for from described potential synonym, selecting synon device by mating with the described potential synon linguistic context that occurs in the search inquiry of translation in the appearance linguistic context of the described word of the search inquiry that receives;
Be used for expanding with selected synonym the device of the search inquiry that receives; And
Be used in response to described request, use the search inquiry of expanding to search for the device of the set of document.
CN200880102717XA 2007-06-22 2008-06-20 Machine translation for query expansion Expired - Fee Related CN101878476B (en)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US94590307P 2007-06-22 2007-06-22
US60/945,903 2007-06-22
US12/050,022 US9002869B2 (en) 2007-06-22 2008-03-17 Machine translation for query expansion
US12/050,022 2008-03-17
PCT/US2008/067721 WO2009002864A2 (en) 2007-06-22 2008-06-20 Machine translation for query expansion

Publications (2)

Publication Number Publication Date
CN101878476A CN101878476A (en) 2010-11-03
CN101878476B true CN101878476B (en) 2013-03-06

Family

ID=40137557

Family Applications (1)

Application Number Title Priority Date Filing Date
CN200880102717XA Expired - Fee Related CN101878476B (en) 2007-06-22 2008-06-20 Machine translation for query expansion

Country Status (4)

Country Link
US (2) US9002869B2 (en)
EP (1) EP2165272A4 (en)
CN (1) CN101878476B (en)
WO (1) WO2009002864A2 (en)

Families Citing this family (113)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7490092B2 (en) 2000-07-06 2009-02-10 Streamsage, Inc. Method and system for indexing and searching timed media information based upon relevance intervals
US20080167876A1 (en) * 2007-01-04 2008-07-10 International Business Machines Corporation Methods and computer program products for providing paraphrasing in a text-to-speech system
US9002869B2 (en) 2007-06-22 2015-04-07 Google Inc. Machine translation for query expansion
US8725756B1 (en) 2007-11-12 2014-05-13 Google Inc. Session-based query suggestions
US8615388B2 (en) * 2008-03-28 2013-12-24 Microsoft Corporation Intra-language statistical machine translation
US20140142920A1 (en) * 2008-08-13 2014-05-22 International Business Machines Corporation Method and apparatus for Utilizing Structural Information in Semi-Structured Documents to Generate Candidates for Question Answering Systems
US8713016B2 (en) 2008-12-24 2014-04-29 Comcast Interactive Media, Llc Method and apparatus for organizing segments of media assets and determining relevance of segments to a query
US9442933B2 (en) 2008-12-24 2016-09-13 Comcast Interactive Media, Llc Identification of segments within audio, video, and multimedia items
US11531668B2 (en) 2008-12-29 2022-12-20 Comcast Interactive Media, Llc Merging of multiple data sets
US8176043B2 (en) 2009-03-12 2012-05-08 Comcast Interactive Media, Llc Ranking search results
US8533223B2 (en) 2009-05-12 2013-09-10 Comcast Interactive Media, LLC. Disambiguation and tagging of entities
US20100299132A1 (en) * 2009-05-22 2010-11-25 Microsoft Corporation Mining phrase pairs from an unstructured resource
US20100318425A1 (en) * 2009-06-12 2010-12-16 Meherzad Ratan Karanjia System and method for providing a personalized shopping assistant for online computer users
US9892730B2 (en) * 2009-07-01 2018-02-13 Comcast Interactive Media, Llc Generating topic-specific language models
EP2341450A1 (en) * 2009-08-21 2011-07-06 Mikko Kalervo Väänänen Method and means for data searching and language translation
US8700652B2 (en) * 2009-12-15 2014-04-15 Ebay, Inc. Systems and methods to generate and utilize a synonym dictionary
US8442964B2 (en) * 2009-12-30 2013-05-14 Rami B. Safadi Information retrieval based on partial machine recognition of the same
US8543381B2 (en) * 2010-01-25 2013-09-24 Holovisions LLC Morphing text by splicing end-compatible segments
US8161073B2 (en) 2010-05-05 2012-04-17 Holovisions, LLC Context-driven search
US20110295897A1 (en) * 2010-06-01 2011-12-01 Microsoft Corporation Query correction probability based on query-correction pairs
US9147039B2 (en) * 2010-09-15 2015-09-29 Epic Systems Corporation Hybrid query system for electronic medical records
CN104484322A (en) * 2010-09-24 2015-04-01 新加坡国立大学 Methods and systems for automated text correction
US20120078941A1 (en) * 2010-09-27 2012-03-29 Teradata Us, Inc. Query enhancement apparatus, methods, and systems
US9251185B2 (en) 2010-12-15 2016-02-02 Girish Kumar Classifying results of search queries
CN102650986A (en) * 2011-02-27 2012-08-29 孙星明 Synonym expansion method and device both used for text duplication detection
CN102722498B (en) * 2011-03-31 2015-06-03 北京百度网讯科技有限公司 Search engine and implementation method thereof
US10642934B2 (en) 2011-03-31 2020-05-05 Microsoft Technology Licensing, Llc Augmented conversational understanding architecture
CN102722499B (en) * 2011-03-31 2015-07-01 北京百度网讯科技有限公司 Search engine and implementation method thereof
CN102737021B (en) * 2011-03-31 2014-10-22 北京百度网讯科技有限公司 Search engine and realization method thereof
US9244984B2 (en) 2011-03-31 2016-01-26 Microsoft Technology Licensing, Llc Location based conversational understanding
US9842168B2 (en) 2011-03-31 2017-12-12 Microsoft Technology Licensing, Llc Task driven user intents
US9298287B2 (en) 2011-03-31 2016-03-29 Microsoft Technology Licensing, Llc Combined activation for natural user interface systems
US9760566B2 (en) 2011-03-31 2017-09-12 Microsoft Technology Licensing, Llc Augmented conversational understanding agent to identify conversation context between two humans and taking an agent action thereof
CN102722501B (en) * 2011-03-31 2015-07-01 北京百度网讯科技有限公司 Search engine and realization method thereof
US9858343B2 (en) 2011-03-31 2018-01-02 Microsoft Technology Licensing Llc Personalization of queries, conversations, and searches
KR101781673B1 (en) * 2011-04-28 2017-09-25 마이크로소프트 테크놀로지 라이센싱, 엘엘씨 Alternative market search result toggle
US9454962B2 (en) * 2011-05-12 2016-09-27 Microsoft Technology Licensing, Llc Sentence simplification for spoken language understanding
US9064006B2 (en) 2012-08-23 2015-06-23 Microsoft Technology Licensing, Llc Translating natural language utterances to keyword search queries
CN102193916A (en) * 2011-06-17 2011-09-21 汉王科技股份有限公司 Method and device for realizing switching of translation word stocks and electronic equipment
US20130103668A1 (en) * 2011-10-21 2013-04-25 Telcordia Technologies, Inc. Question conversion for information searching
CN103106189B (en) * 2011-11-11 2016-04-27 北京百度网讯科技有限公司 A kind of method and apparatus excavating synonym attribute word
US20130218876A1 (en) * 2012-02-22 2013-08-22 Nokia Corporation Method and apparatus for enhancing context intelligence in random index based system
CN104820686B (en) * 2012-06-28 2019-06-21 北京奇虎科技有限公司 A kind of network search method and network searching system
US8892596B1 (en) * 2012-08-08 2014-11-18 Google Inc. Identifying related documents based on links in documents
US9965472B2 (en) * 2012-08-09 2018-05-08 International Business Machines Corporation Content revision using question and answer generation
US9336193B2 (en) 2012-08-30 2016-05-10 Arria Data2Text Limited Method and apparatus for updating a previously generated text
US8762133B2 (en) 2012-08-30 2014-06-24 Arria Data2Text Limited Method and apparatus for alert validation
US9135244B2 (en) 2012-08-30 2015-09-15 Arria Data2Text Limited Method and apparatus for configurable microplanning
US8762134B2 (en) 2012-08-30 2014-06-24 Arria Data2Text Limited Method and apparatus for situational analysis text generation
US9405448B2 (en) 2012-08-30 2016-08-02 Arria Data2Text Limited Method and apparatus for annotating a graphical output
US9600471B2 (en) 2012-11-02 2017-03-21 Arria Data2Text Limited Method and apparatus for aggregating with information generalization
WO2014076524A1 (en) 2012-11-16 2014-05-22 Data2Text Limited Method and apparatus for spatial descriptions in an output text
WO2014076525A1 (en) 2012-11-16 2014-05-22 Data2Text Limited Method and apparatus for expressing time in an output text
US9189531B2 (en) * 2012-11-30 2015-11-17 Orbis Technologies, Inc. Ontology harmonization and mediation systems and methods
KR101364774B1 (en) * 2012-12-07 2014-02-20 포항공과대학교 산학협력단 Method for correction error of speech recognition and apparatus
KR20140079598A (en) * 2012-12-17 2014-06-27 한국전자통신연구원 Apparatus and method for verifying context
WO2014102568A1 (en) 2012-12-27 2014-07-03 Arria Data2Text Limited Method and apparatus for motion detection
WO2014102569A1 (en) 2012-12-27 2014-07-03 Arria Data2Text Limited Method and apparatus for motion description
US10678870B2 (en) * 2013-01-15 2020-06-09 Open Text Sa Ulc System and method for search discovery
GB2524934A (en) 2013-01-15 2015-10-07 Arria Data2Text Ltd Method and apparatus for document planning
US10394863B2 (en) 2013-03-12 2019-08-27 International Business Machines Corporation Identifying a stale data source to improve NLP accuracy
US9245008B2 (en) * 2013-03-12 2016-01-26 International Business Machines Corporation Detecting and executing data re-ingestion to improve accuracy in a NLP system
US9405803B2 (en) * 2013-04-23 2016-08-02 Google Inc. Ranking signals in mixed corpora environments
CN104123322A (en) * 2013-04-28 2014-10-29 百度在线网络技术(北京)有限公司 Method and device for obtaining related question corresponding to input question based on synonymy processing
US20140350931A1 (en) * 2013-05-24 2014-11-27 Microsoft Corporation Language model trained using predicted queries from statistical machine translation
CN104239286A (en) * 2013-06-24 2014-12-24 阿里巴巴集团控股有限公司 Method and device for mining synonymous phrases and method and device for searching related contents
KR101508429B1 (en) 2013-08-22 2015-04-07 주식회사 엘지씨엔에스 System and method for providing agent service to user terminal
WO2015028844A1 (en) 2013-08-29 2015-03-05 Arria Data2Text Limited Text generation from correlated alerts
US9244894B1 (en) 2013-09-16 2016-01-26 Arria Data2Text Limited Method and apparatus for interactive reports
US9396181B1 (en) 2013-09-16 2016-07-19 Arria Data2Text Limited Method, apparatus, and computer program product for user-directed reporting
US9460085B2 (en) * 2013-12-09 2016-10-04 International Business Machines Corporation Testing and training a question-answering system
US8819006B1 (en) 2013-12-31 2014-08-26 Google Inc. Rich content for query answers
US8751466B1 (en) * 2014-01-12 2014-06-10 Machine Intelligence Services, Inc. Customizable answer engine implemented by user-defined plug-ins
WO2015159133A1 (en) 2014-04-18 2015-10-22 Arria Data2Text Limited Method and apparatus for document planning
US9378204B2 (en) 2014-05-22 2016-06-28 International Business Machines Corporation Context based synonym filtering for natural language processing systems
US20160078364A1 (en) * 2014-09-17 2016-03-17 Microsoft Corporation Computer-Implemented Identification of Related Items
US10545958B2 (en) 2015-05-18 2020-01-28 Microsoft Technology Licensing, Llc Language scaling platform for natural language processing systems
US9959354B2 (en) 2015-06-23 2018-05-01 Google Llc Utilizing user co-search behavior to identify search queries seeking inappropriate content
WO2017011465A1 (en) 2015-07-13 2017-01-19 Google Inc. Images for query answers
US20170102832A1 (en) * 2015-10-08 2017-04-13 Mastercard International Incorporated Systems and Methods for Displaying Content, Based on Selections of Unlinked Objects
US11227113B2 (en) * 2016-01-20 2022-01-18 International Business Machines Corporation Precision batch interaction with a question answering system
US9898460B2 (en) * 2016-01-26 2018-02-20 International Business Machines Corporation Generation of a natural language resource using a parallel corpus
JP6671027B2 (en) * 2016-02-01 2020-03-25 パナソニックIpマネジメント株式会社 Paraphrase generation method, apparatus and program
US11086866B2 (en) * 2016-04-15 2021-08-10 Verizon Media Inc. Method and system for rewriting a query
CN105975558B (en) * 2016-04-29 2018-08-10 百度在线网络技术(北京)有限公司 Establish method, the automatic edit methods of sentence and the corresponding intrument of statement editing model
US10445432B1 (en) 2016-08-31 2019-10-15 Arria Data2Text Limited Method and apparatus for lightweight multilingual natural language realizer
US10503767B2 (en) * 2016-09-13 2019-12-10 Microsoft Technology Licensing, Llc Computerized natural language query intent dispatching
US10467347B1 (en) 2016-10-31 2019-11-05 Arria Data2Text Limited Method and apparatus for natural language document orchestrator
CN108319615A (en) * 2017-01-18 2018-07-24 百度在线网络技术(北京)有限公司 Recommend word acquisition methods and device
US10102199B2 (en) * 2017-02-24 2018-10-16 Microsoft Technology Licensing, Llc Corpus specific natural language query completion assistant
CN107526727B (en) * 2017-07-31 2021-01-19 苏州大学 Language generation method based on statistical machine translation
CN107632982B (en) * 2017-09-12 2021-11-16 郑州科技学院 Method and device for voice-controlled foreign language translation equipment
US11036938B2 (en) * 2017-10-20 2021-06-15 ConceptDrop Inc. Machine learning system for optimizing projects
US10922696B2 (en) * 2017-11-14 2021-02-16 Sap Se Smart agent services using machine learning technology
CN108197121A (en) * 2017-12-29 2018-06-22 北京中关村科金技术有限公司 Acquisition methods, system, device and the readable storage medium storing program for executing of machine learning language material
JP7110644B2 (en) * 2018-03-22 2022-08-02 カシオ計算機株式会社 Information display device, information display method and information display program
JP7149560B2 (en) * 2018-04-13 2022-10-07 国立研究開発法人情報通信研究機構 Request translation system, training method for request translation model and request judgment model, and dialogue system
CN108932218B (en) * 2018-06-29 2022-09-30 北京百度网讯科技有限公司 Instance extension method, device, equipment and medium
CN110737750B (en) * 2018-07-03 2023-01-31 百度在线网络技术(北京)有限公司 Data processing method and device for analyzing text audience and electronic equipment
US20200042643A1 (en) * 2018-08-06 2020-02-06 International Business Machines Corporation Heuristic q&a system
WO2020067870A1 (en) * 2018-09-28 2020-04-02 Mimos Berhad Method and system for providing a content list based on a search query
US10936635B2 (en) * 2018-10-08 2021-03-02 International Business Machines Corporation Context-based generation of semantically-similar phrases
CN111597800B (en) * 2019-02-19 2023-12-12 百度在线网络技术(北京)有限公司 Method, device, equipment and storage medium for obtaining synonyms
US11392853B2 (en) * 2019-02-27 2022-07-19 Capital One Services, Llc Methods and arrangements to adjust communications
US10445745B1 (en) * 2019-03-26 2019-10-15 Fmr Llc Computer systems and methods for efficient query resolution by customer representatives
US11232154B2 (en) * 2019-03-28 2022-01-25 Microsoft Technology Licensing, Llc Neural related search query generation
CN110263149A (en) * 2019-05-29 2019-09-20 科大讯飞股份有限公司 A kind of textual presentation method and device
JP2021051567A (en) * 2019-09-25 2021-04-01 株式会社日立製作所 Information processing method and information processing device
CN111506705B (en) * 2020-04-13 2023-07-21 北京奇艺世纪科技有限公司 Information query method and device and electronic equipment
US11775764B2 (en) * 2020-04-20 2023-10-03 International Business Machines Corporation Estimating output confidence for black-box API
CN113505194B (en) * 2021-06-15 2022-09-13 北京三快在线科技有限公司 Training method and device for rewrite word generation model
CN115221872B (en) * 2021-07-30 2023-06-02 苏州七星天专利运营管理有限责任公司 Vocabulary expansion method and system based on near-sense expansion
CN114328798B (en) * 2021-11-09 2024-02-23 腾讯科技(深圳)有限公司 Processing method, device, equipment, storage medium and program product for searching text

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5926811A (en) * 1996-03-15 1999-07-20 Lexis-Nexis Statistical thesaurus, method of forming same, and use thereof in query expansion in automated text searching
CN1839386A (en) * 2003-08-21 2006-09-27 伊迪利亚公司 Internet searching using semantic disambiguation and expansion
CN1898670A (en) * 2003-12-30 2007-01-17 Google公司 Systems and methods for improving search quality

Family Cites Families (39)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030074353A1 (en) * 1999-12-20 2003-04-17 Berkan Riza C. Answer retrieval technique
US7711547B2 (en) * 2001-03-16 2010-05-04 Meaningful Machines, L.L.C. Word association method and apparatus
US7860706B2 (en) * 2001-03-16 2010-12-28 Eli Abir Knowledge system method and appparatus
US6999916B2 (en) 2001-04-20 2006-02-14 Wordsniffer, Inc. Method and apparatus for integrated, user-directed web site text translation
WO2003079225A1 (en) 2002-03-11 2003-09-25 University Of Southern California Named entity translation
US20050222901A1 (en) 2004-03-31 2005-10-06 Sumit Agarwal Determining ad targeting information and/or ad creative information using past search queries
US7293015B2 (en) * 2002-09-19 2007-11-06 Microsoft Corporation Method and system for detecting user intentions in retrieval of hint sentences
CN1290036C (en) * 2002-12-30 2006-12-13 国际商业机器公司 Computer system and method for establishing concept knowledge according to machine readable dictionary
US7346615B2 (en) 2003-10-09 2008-03-18 Google, Inc. Using match confidence to adjust a performance threshold
US7689412B2 (en) * 2003-12-05 2010-03-30 Microsoft Corporation Synonymous collocation extraction using translation information
US7996419B2 (en) 2004-03-31 2011-08-09 Google Inc. Query rewriting with entity detection
US7546235B2 (en) * 2004-11-15 2009-06-09 Microsoft Corporation Unsupervised learning of paraphrase/translation alternations and selective application thereof
US20060149625A1 (en) 2004-12-30 2006-07-06 Ross Koningstein Suggesting and/or providing targeting information for advertisements
WO2006110684A2 (en) * 2005-04-11 2006-10-19 Textdigger, Inc. System and method for searching for a query
US7765098B2 (en) * 2005-04-26 2010-07-27 Content Analyst Company, Llc Machine translation using vector space representations
US20070022134A1 (en) 2005-07-22 2007-01-25 Microsoft Corporation Cross-language related keyword suggestion
US7930236B2 (en) 2005-10-28 2011-04-19 Adobe Systems Incorporated Direct tracking of keywords to ads/text
US7631008B2 (en) 2005-11-16 2009-12-08 Yahoo! Inc. System and method for generating functions to predict the clickability of advertisements
US20070124200A1 (en) 2005-11-26 2007-05-31 Chintano, Inc. Systems and methods for providing online contextual advertising in multilingual environments
US20070282594A1 (en) * 2006-06-02 2007-12-06 Microsoft Corporation Machine translation in natural language application development
US8321448B2 (en) 2007-02-22 2012-11-27 Microsoft Corporation Click-through log mining
US8332207B2 (en) 2007-03-26 2012-12-11 Google Inc. Large language models in machine translation
US20080256035A1 (en) 2007-04-10 2008-10-16 Wei Vivian Zhang Query substitution using active learning
US9002869B2 (en) 2007-06-22 2015-04-07 Google Inc. Machine translation for query expansion
US8051061B2 (en) 2007-07-20 2011-11-01 Microsoft Corporation Cross-lingual query suggestion
US7912843B2 (en) 2007-10-29 2011-03-22 Yahoo! Inc. Method for selecting electronic advertisements using machine translation techniques
US8209164B2 (en) 2007-11-21 2012-06-26 University Of Washington Use of lexical translations for facilitating searches
US8229728B2 (en) 2008-01-04 2012-07-24 Fluential, Llc Methods for using manual phrase alignment data to generate translation models for statistical machine translation
US20090192782A1 (en) 2008-01-28 2009-07-30 William Drewes Method for increasing the accuracy of statistical machine translation (SMT)
US20090216710A1 (en) 2008-02-27 2009-08-27 Yahoo! Inc. Optimizing query rewrites for keyword-based advertising
US7877404B2 (en) 2008-03-05 2011-01-25 Microsoft Corporation Query classification based on query click logs
US20090248655A1 (en) 2008-03-26 2009-10-01 Evgeniy Makeev Method and Apparatus for Providing Sponsored Search Ads for an Esoteric Web Search Query
US20090248627A1 (en) 2008-03-27 2009-10-01 Yahoo! Inc. System and method for query substitution for sponsored search
US8615388B2 (en) 2008-03-28 2013-12-24 Microsoft Corporation Intra-language statistical machine translation
US20090254512A1 (en) 2008-04-03 2009-10-08 Yahoo! Inc. Ad matching by augmenting a search query with knowledge obtained through search engine results
US20090265290A1 (en) 2008-04-18 2009-10-22 Yahoo! Inc. Optimizing ranking functions using click data
US8918328B2 (en) 2008-04-18 2014-12-23 Yahoo! Inc. Ranking using word overlap and correlation features
US20100010895A1 (en) 2008-07-08 2010-01-14 Yahoo! Inc. Prediction of a degree of relevance between query rewrites and a search query
US20100017262A1 (en) 2008-07-18 2010-01-21 Yahoo! Inc. Predicting selection rates of a document using click-based translation dictionaries

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5926811A (en) * 1996-03-15 1999-07-20 Lexis-Nexis Statistical thesaurus, method of forming same, and use thereof in query expansion in automated text searching
CN1839386A (en) * 2003-08-21 2006-09-27 伊迪利亚公司 Internet searching using semantic disambiguation and expansion
CN1898670A (en) * 2003-12-30 2007-01-17 Google公司 Systems and methods for improving search quality

Also Published As

Publication number Publication date
EP2165272A2 (en) 2010-03-24
EP2165272A4 (en) 2011-08-31
CN101878476A (en) 2010-11-03
US20130031122A1 (en) 2013-01-31
WO2009002864A2 (en) 2008-12-31
US20080319962A1 (en) 2008-12-25
WO2009002864A3 (en) 2009-02-19
US9569527B2 (en) 2017-02-14
US9002869B2 (en) 2015-04-07

Similar Documents

Publication Publication Date Title
CN101878476B (en) Machine translation for query expansion
KR101579551B1 (en) Automatic expanded language search
US8250053B2 (en) Intelligent enhancement of a search result snippet
US8204874B2 (en) Abbreviation handling in web search
CN103136352B (en) Text retrieval system based on double-deck semantic analysis
US20240028837A1 (en) Device and method for machine reading comprehension question and answer
US8463593B2 (en) Natural language hypernym weighting for word sense disambiguation
CA2536265C (en) System and method for processing a query
US7742922B2 (en) Speech interface for search engines
JP7232831B2 (en) Retrieval of corroborative evidence for complex answers
US8280721B2 (en) Efficiently representing word sense probabilities
US20060059132A1 (en) Searching hypertext based multilingual web information
US11875585B2 (en) Semantic cluster formation in deep learning intelligent assistants
KR20100075454A (en) Identification of semantic relationships within reported speech
KR100847376B1 (en) Method and apparatus for searching information using automatic query creation
US20180365318A1 (en) Semantic analysis of search results to generate snippets responsive to receipt of a query
Li et al. National University of Singapore at the TREC-13 question answering main task
US9305103B2 (en) Method or system for semantic categorization
Roche et al. AcroDef: A quality measure for discriminating expansions of ambiguous acronyms
KR102600703B1 (en) Apparatus and method for answering questions related to legal field
JP2012243130A (en) Information retrieval device, method and program
Reddy et al. Cross lingual information retrieval using search engine and data mining
WO2001046838A1 (en) Answer retrieval technique
KR20240018197A (en) SYSTEM AND Method for ANSWERING OF QUERY FOR DOCUMENT IN WORD PROCESSOR
Zhang Query enhancement with topic detection and disambiguation for robust retrieval

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20130306

Termination date: 20160620