US20080027888A1 - Optimization of fact extraction using a multi-stage approach - Google Patents

Optimization of fact extraction using a multi-stage approach Download PDF

Info

Publication number
US20080027888A1
US20080027888A1 US11/496,650 US49665006A US2008027888A1 US 20080027888 A1 US20080027888 A1 US 20080027888A1 US 49665006 A US49665006 A US 49665006A US 2008027888 A1 US2008027888 A1 US 2008027888A1
Authority
US
United States
Prior art keywords
factual
descriptions
fact
rules
factual descriptions
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US11/496,650
Other versions
US7668791B2 (en
Inventor
Saliha Azzam
Kevin William Humphreys
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Microsoft Technology Licensing LLC
Original Assignee
Microsoft Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Microsoft Corp filed Critical Microsoft Corp
Priority to US11/496,650 priority Critical patent/US7668791B2/en
Assigned to MICROSOFT CORPORATION reassignment MICROSOFT CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: AZZAM, SALIHA, HUMPHREYS, KEVIN WILLIAM
Priority to TW096126248A priority patent/TWI431493B/en
Priority to AU2007281638A priority patent/AU2007281638B2/en
Priority to EP07796948A priority patent/EP2050019A4/en
Priority to JP2009522777A priority patent/JP5202524B2/en
Priority to RU2009103145/08A priority patent/RU2451999C2/en
Priority to BRPI0714311-7A priority patent/BRPI0714311A2/en
Priority to MX2009000588A priority patent/MX2009000588A/en
Priority to PCT/US2007/016435 priority patent/WO2008016491A1/en
Publication of US20080027888A1 publication Critical patent/US20080027888A1/en
Priority to NO20085387A priority patent/NO20085387L/en
Publication of US7668791B2 publication Critical patent/US7668791B2/en
Application granted granted Critical
Assigned to MICROSOFT TECHNOLOGY LICENSING, LLC reassignment MICROSOFT TECHNOLOGY LICENSING, LLC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: MICROSOFT CORPORATION
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/34Browsing; Visualisation therefor
    • G06F16/345Summarisation for human users
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Definitions

  • Electronic documents may contain a mixture of facts and opinions. At times, a reader may only be interested in facts, or may wish to have the facts be identified. For example, a user performing an on-line search for information may wish to obtain facts about a particular subject as quickly and efficiently as possible. However, presenting a list of web pages or other electronic documents that are related to the search terms used require the user to individually examine each web page or other electronic document and distinguish the facts from the opinions or subjective information.
  • Embodiments provide optimization of fact extraction by using a multi-stage approach.
  • the electronic documents are scanned to find factual descriptions that are likely to contain facts by using a fact-word table to match terms within sentences of the electronic documents to obtain a set of factual descriptions. Further analysis may then be performed, including determining linguistic constituents, e.g., syntactic constituents and/or semantics, in the neighborhood of that set of factual descriptions rather than on the entire document. Accordingly, time is saved by avoiding a complex lexical and syntactic analysis of the entire document for every electronic document of interest.
  • FIG. 1 shows an example of a computer system for implementing embodiments.
  • FIG. 2 shows an example of an operational flow of a search involving the presentation of facts that have been extracted prior to the search.
  • FIG. 3 shows an example of an operational flow of a search involving the presentation of facts that have been extracted during the search.
  • FIG. 4 shows an example of an operational flow of the multiple steps of fact extraction.
  • FIG. 5 shows an example of a more detailed operational flow of the multiple steps of fact extraction.
  • FIG. 6 shows an example of a screen display providing search results that include the presentation of facts obtained from electronic documents discovered by the search.
  • Embodiments provide for fact extraction using multiple stages to avoid performing complex analyses of the entire documents of interest.
  • Factual descriptions of the documents are recognized in relation to a fact-word table in an initial stage. These factual descriptions may be tagged with their parts of speech, either noun or verb. Then more detailed analyses may be done in a subsequent stage over those factual descriptions to thereby avoid such detailed analyses over the entire documents of interest.
  • the linguistic constituents for each factual description may be determined and then exclusions and scores may be used to eliminate factual descriptions that are less likely to be facts. The factual descriptions remaining after the exclusions and scoring may then be presented as fact.
  • FIG. 1 shows an example of a computer system 100 that provides an operating environment for the embodiments.
  • the computer system 100 as shown may be a standard, general-purpose programmable computer system 100 including a processor 102 as well as various components including mass storage 112 , memory 104 , a display adapter 108 , and one or more input devices 110 such as a keyboard, keypad, mouse, and the like.
  • the processor 102 communicates with each of the components through a data signaling bus 106 .
  • the computer system 100 may also include a network interface 124 , such as a wired or wireless connection, that allows the computer system 100 to communicate with other computer systems via data networks.
  • the computer system 100 may alternatively be a hard-wired, application specific device that implements one or more of the embodiments.
  • the processor 102 implements instructions stored in the mass storage 112 in the form of an operating system 114 .
  • the operating system 114 of this example provides a foundation upon which various applications may be implemented to utilize the components of the computer system 100 .
  • the computer system 100 may implement a search engine 118 or similar application for finding electronic documents relevant to a particular situation.
  • the search engine 118 may receive search terms entered directly through input device 110 by a user of the computer system 100 or may receive search terms submitted by a user of a remote computer that are received via the network interface 122 .
  • the search and/or fact extraction may occur in relation to one or more sets of electronic documents that contain textual information such as web pages, standard word processing documents, spreadsheets, and so forth. These electronic documents may be stored locally as electronic document set 116 . These electronic documents may also be stored at a non-local location such as network-based storage 124 containing an electronic document set 126 .
  • Network-based storage 124 is representative of local network storage, on-line storage locations of the Internet, and so forth. The network-based storage 124 is accessible via the network interface 122 .
  • a fact extraction tool 120 may be present on the local storage device 112 , either as a component of the operating system 114 , a component of the search engine 118 or other application, or as a stand-alone application capable of producing its own independent results. The logical operations performed by embodiments of the fact extraction tool 120 are discussed below in relation to FIGS. 2-5 .
  • the computer system 100 of FIG. 1 may include a variety of computer readable media.
  • Such computer readable media contains the instructions for operation of the computer system and for implementation of the embodiments discussed herein.
  • Computer readable media can be any available media that can be accessed by computer 100 and includes both volatile and nonvolatile media, removable and non-removable media.
  • Computer readable media may comprise computer storage media and communication media.
  • Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.
  • Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by computer system 100 .
  • Communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media.
  • modulated data signal means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal.
  • communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media. Combinations of the any of the above should also be included within the scope of computer readable media.
  • FIG. 2 shows an example of logical operations performed by a search engine 118 in conjunction with the fact extraction tool 120 .
  • the fact extraction tool 120 is utilized prior to a search occurring in order to generate a library of facts present in the electronic documents to be searched. In this manner, there is no processing time required to extract the facts but instead those facts have already been extracted and are retrieved from a fact library on the basis of the search terms entered.
  • the logical operations begin at collection operation 202 where the collection of electronic documents is obtained or access is otherwise achieved.
  • the electronic documents to eventually be searched may be saved to local storage or may be acquired via on-line access.
  • the fact extraction tool 120 then operates upon each one of those electronic documents to attempt to extract all of the facts that are present in the electronic documents.
  • the fact extraction tool 120 may generate a library of facts that are stored in association with the corresponding electronic documents and are available for access during future searches. For example, Table 1 shows such a library of associations.
  • a user wishing to do a search to find relevant electronic documents, and particularly to find relevant facts from those electronic documents enters a search term into the search engine 118 at term operation 206 .
  • the search engine 118 searches through the electronic documents for the search terms and finds matching documents at document operation 208 .
  • the search engine also finds the previously extracted facts which match the search terms from those matching electronic documents and then displays the relevant documents or a link thereto along with the relevant facts at display operation 210 .
  • a search term may be found in www.sample1.com and the search term may also be found to match Fact A and Fact B such that a link to www.sample1.com is displayed along with Fact A and Fact B.
  • the user is quickly provided with facts related to the search terms that were entered.
  • An example of such a screen display is discussed below in relation to FIG. 6 .
  • the search may be for previously extracted facts only, rather than for the electronic documents themselves.
  • the previously extracted facts may match the search terms regardless of whether the electronic documents containing the facts match the search terms.
  • FIG. 3 shows another example of logical operations performed by a search engine 118 in conjunction with the fact extraction tool 120 .
  • the fact extraction tool 120 is utilized during a search in order to discover facts present in the electronic documents as they are being found by the search. In this manner, there is no need for pre-search fact extraction and no need for storage of a library of facts. In such a scenario, the fact extraction tool may only scan snippets or summaries of the document to provide very fast results, or the entire document may also be scanned to extract all potential facts.
  • the logical operations begin at term operation 302 where a user enters a search term into the search engine 118 .
  • the search engine 118 searches through the electronic documents for the search terms and finds matching documents at document operation 304 .
  • the extraction tool 120 is then employed at extraction operation 306 in order to analyze the electronic documents that have been found by the search in order to extract facts from those documents that are relevant to the search terms.
  • the result of extraction operation 306 may produce a temporary set of associations between electronic documents and facts as shown in Table 1, which may then be placed in longer term storage in anticipation searches for those search terms occurring in the future.
  • the search engine displays the relevant documents or a link thereto along with the relevant facts returned by the fact extraction tool 120 at extraction operation 306 at display operation 308 .
  • FIG. 4 shows the multi-stage approach utilized by embodiments of the fact extraction tool 120 .
  • the fact extraction tool 120 attempts to recognize a set of factual descriptions from the electronic documents of interest at recognition operation 402 .
  • the goal is to find those descriptions in the text that are likely to be facts based on finding matches to a fact-word table discussed in more detail below with reference to FIG. 5 .
  • By performing a quick matching process much of the electronic document that should be ignored when finding facts can be eliminated from further fact extraction processing thereby increasing the efficiency of the subsequent stage(s) that are employed to increase accuracy.
  • fact extraction is then performed on that set of factual descriptions at extraction operation 404 .
  • more detailed analyses are performed only on the set of factual descriptions, as opposed to the whole document, so that satisfactory efficiency is maintained while adequate accuracy is achieved.
  • the analyses of extraction operation involve decision making based on a determination of linguistic constituents of the factual descriptions.
  • Such linguistic constituents may include the syntactic constituents, the semantics, and so forth.
  • FIG. 5 shows an example of details of the recognition and extraction operations of FIG. 4 .
  • the logical operations begin at scanning operation 502 where the fact extraction tool 120 scans the electronic document to find words or phrases matching those of a fact-word table.
  • a fact-word table is a list of words or phrases that are known to likely be used when expressing a fact as opposed to an opinion for example. Table 2 shows a brief example. Note that to provide optimal processing performance, the words of the table may be associated with the most appropriate part of speech (POS) tag which is discussed below in relation to tag operation 504 .
  • POS part of speech
  • the fact-word list as shown in Table 2 may be constructed to include those verbs or other words that are suggestive of a fact expression as opposed to a non-fact.
  • the terms “invented” or “hired” are suggestive of a fact expression whereas the terms “can be” or “complains” are not.
  • a particular example of a fact-word list can be found in Appendix A located at the end of this specification. This particular example is a non-exhaustive list of verbs that are fact-words that may be used to discover factual descriptions in electronic documents.
  • the parts of speech (POS) of each of the words of each factual description are tagged at tag operation 504 .
  • This tagging operation 504 may involve making disambiguating choices for words which have more than one POS tag, such as by favoring a noun tag over a verb tag since it is understood that syntactic phrases like noun phrases are known to be the entities involved in a factual event. Any unknown and non-pre-tagged words may default to nouns for this reason as well.
  • adjectives may be favored over verbs (e.g., “planned” as an adjective over “planned” as a verb) as well such that words having both an adjective and verb tag will default to adjective because adjective are part of noun phrases which are known to be the entities involved in a factual event.
  • verbs e.g., “planned” as an adjective over “planned” as a verb
  • these disambiguating choices may already be applied so that, for instance, “planned” is associated with an adjective POS Tag in the table and not a verb POS Tag.
  • syntactic phrases like noun phrases and verb phrases are identified.
  • the syntactic phrases are identified by utilizing conventional grammar rules and light linguistic analysis. Those syntactic phrases that are in the neighborhood, i.e., very local to the set of factual descriptions in a document are identified and if a factual description has no syntactic phrases associated to it, then the corresponding sentence may be eliminated from further consideration.
  • the process avoids looking at all the linguistic constituents of a whole sentence.
  • the linguistic constituents of the factual descriptions having the neighboring syntactic phrases are further determined by assessing the role a syntactic phrase plays within the corresponding sentence based on the pattern identified in the factual description. Thus, it is determined from the word pattern of the factual description whether the syntactic phrase plays the role of subject or object within the sentence containing the current factual description being analyzed.
  • exclusion rules may then be applied to those noun phrases of the factual descriptions to further eliminate those that are less likely to be an expression of fact at exclusion operation 508 .
  • the exclusion rules may be applicable on the basis of a syntactic phrase as an object, a syntactic phrase as a subject, or a syntactic phrase without regard to its role.
  • an exclusion rule being applied to individual words, to the syntactic phrases, or to the whole sentence lead to the same result, which is to exclude the whole sentence from being a factual description.
  • An example of exclusion rules that may be applied is shown in Table 3.
  • scoring rules are applied at scoring operation 510 .
  • the scoring rules give a weight to both the subject and object noun phrases for each of various features, and a total score for the candidate factual description is the sum of the individual feature weights plus the certainty score of the matching fact-word.
  • the individual feature weights may be positive, when indicative of a fact, and may be negative, when indicative of a non-fact. Examples of features and associated scoring rules are provided below in Table 4.
  • the feature scores may be manually assigned using human judgment or may be automatically learned.
  • the total score for the factual description is then compared to a pre-defined threshold to determine whether the total score exceeds the threshold at query operation 512 . If the threshold is not exceeded, then the corresponding factual description may be discarded. If the threshold is exceeded, then the factual description, the complete sentence, and/or the complete paragraph or other document portion may be presented as a fact at presentation operation 514 . This presentation may include displaying the fact, saving the fact to a library, and so forth.
  • the weights assigned to the features and/or the threshold value may be manipulated without manipulating the whole approach to fact extraction. In this manner, the degree of accuracy of fact extraction and presentation can be controlled while the processing steps remain the same.
  • FIG. 6 shows an example screenshot 600 resulting from performing a search.
  • Search terms have been entered in search field 602 to conduct the search.
  • the search term has been matched to various web site links 604 available from the Internet.
  • the user may visit the electronic documents in the normal fashion.
  • facts 610 , 612 , and 614 about the search term are displayed in section 608 . Accordingly, a user can quickly spot facts about the subject of the search without having to visit any of the electronic documents that have been found and without having to manually read and discern fact from opinion.
  • the facts 610 , 612 , and 614 include hyperlinks that the user may select to give more information about the source of the fact and/or to show the context within which the facts were discovered (e.g., date of the fact associated, other facts, etc.).
  • screenshot 600 is merely one example of how the facts may be presented to the user. Rather than presenting them in a separate column as shown, they may be listed as sub-elements of the electronic document that they have been extracted from. Furthermore, as an alternative to or in addition to listing the facts on the search results page, the facts extracted from a particular electronic document may also be listed in a column or other location upon the user viewing the electronic document itself. Additionally, as an alternative to or in addition to separating the facts from the document for display, the facts may be highlighted within the electronic documents both in the list of documents 604 within the search results and within the complete electronic document when it is chosen for display. As yet another alternative, the facts may be displayed independently from search results, such as to display facts only with a selectable link to obtain the source documents, where only the extracted facts have been searched to thereby avoid the document search completely.
  • the presentation of the extracted facts may be provided as a display to a local computer implementing the search and fact extraction for a local user.
  • the presentation of the extracted facts may be provided as a display to a remote computer that has requested that the local computer perform the search and fact extraction on its behalf, such as in the case of an Internet based search engine.
  • facts may be efficiently and accurately extracted from documents for presentation to users.
  • the efficiency is increased by avoiding detailed analysis of the entire documents as well as avoiding detailed analysis of the entire sentence where a factual description has been found.
  • the accuracy is maintained by employing further analysis upon the factual descriptions that have been discovered in the document by the initial stage of processing.

Abstract

Facts are extracted from electronic documents by recognizing factual descriptions using a fact-word table to match to words of the electronic documents. The words of those factual descriptions may be tagged with the appropriate part of speech. More detailed analysis is then performed on those factual descriptions, rather than on the entire electronic document, and particularly to the text in the neighborhood of the fact-word matches. The analysis may involve identifying the linguistic constituents of each phrase and determining the role as either subject or object. Exclusion rules may be applied to eliminate those phrases unlikely to be part of facts, the exclusion rules being based in part on the linguistic constituents. Scoring rules may be applied to remaining phrases, and for those phrases having a score in excess of a threshold, the corresponding sentence part, whole sentence, paragraph, or other document portion may be presented as representing one or more facts.

Description

    BACKGROUND
  • Electronic documents may contain a mixture of facts and opinions. At times, a reader may only be interested in facts, or may wish to have the facts be identified. For example, a user performing an on-line search for information may wish to obtain facts about a particular subject as quickly and efficiently as possible. However, presenting a list of web pages or other electronic documents that are related to the search terms used require the user to individually examine each web page or other electronic document and distinguish the facts from the opinions or subjective information.
  • Attempts have been made to perform fact extraction. However, accurate fact extraction can be a slow and inefficient process even for high-speed server computers. Such fact extraction attempts generally apply a linguistic analysis to the entire contents of the electronic document to extract those facts that it may contain. When applying fact extraction to hundreds or thousands of electronic documents, the amount of time needed to achieve a result may be unacceptable.
  • SUMMARY
  • Embodiments provide optimization of fact extraction by using a multi-stage approach. The electronic documents are scanned to find factual descriptions that are likely to contain facts by using a fact-word table to match terms within sentences of the electronic documents to obtain a set of factual descriptions. Further analysis may then be performed, including determining linguistic constituents, e.g., syntactic constituents and/or semantics, in the neighborhood of that set of factual descriptions rather than on the entire document. Accordingly, time is saved by avoiding a complex lexical and syntactic analysis of the entire document for every electronic document of interest.
  • This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
  • DESCRIPTION OF THE DRAWINGS
  • FIG. 1 shows an example of a computer system for implementing embodiments.
  • FIG. 2 shows an example of an operational flow of a search involving the presentation of facts that have been extracted prior to the search.
  • FIG. 3 shows an example of an operational flow of a search involving the presentation of facts that have been extracted during the search.
  • FIG. 4 shows an example of an operational flow of the multiple steps of fact extraction.
  • FIG. 5 shows an example of a more detailed operational flow of the multiple steps of fact extraction.
  • FIG. 6 shows an example of a screen display providing search results that include the presentation of facts obtained from electronic documents discovered by the search.
  • DETAILED DESCRIPTION
  • Embodiments provide for fact extraction using multiple stages to avoid performing complex analyses of the entire documents of interest. Factual descriptions of the documents are recognized in relation to a fact-word table in an initial stage. These factual descriptions may be tagged with their parts of speech, either noun or verb. Then more detailed analyses may be done in a subsequent stage over those factual descriptions to thereby avoid such detailed analyses over the entire documents of interest. The linguistic constituents for each factual description may be determined and then exclusions and scores may be used to eliminate factual descriptions that are less likely to be facts. The factual descriptions remaining after the exclusions and scoring may then be presented as fact.
  • FIG. 1 shows an example of a computer system 100 that provides an operating environment for the embodiments. The computer system 100 as shown may be a standard, general-purpose programmable computer system 100 including a processor 102 as well as various components including mass storage 112, memory 104, a display adapter 108, and one or more input devices 110 such as a keyboard, keypad, mouse, and the like. The processor 102 communicates with each of the components through a data signaling bus 106. The computer system 100 may also include a network interface 124, such as a wired or wireless connection, that allows the computer system 100 to communicate with other computer systems via data networks. The computer system 100 may alternatively be a hard-wired, application specific device that implements one or more of the embodiments.
  • In the example, of FIG. 1, the processor 102 implements instructions stored in the mass storage 112 in the form of an operating system 114. The operating system 114 of this example provides a foundation upon which various applications may be implemented to utilize the components of the computer system 100. The computer system 100 may implement a search engine 118 or similar application for finding electronic documents relevant to a particular situation. For example, the search engine 118 may receive search terms entered directly through input device 110 by a user of the computer system 100 or may receive search terms submitted by a user of a remote computer that are received via the network interface 122.
  • The search and/or fact extraction may occur in relation to one or more sets of electronic documents that contain textual information such as web pages, standard word processing documents, spreadsheets, and so forth. These electronic documents may be stored locally as electronic document set 116. These electronic documents may also be stored at a non-local location such as network-based storage 124 containing an electronic document set 126. Network-based storage 124 is representative of local network storage, on-line storage locations of the Internet, and so forth. The network-based storage 124 is accessible via the network interface 122.
  • Additionally, these embodiments provide logic for implementation by the processor 102 in order to extract the facts from the electronic documents 116, 126. A fact extraction tool 120 may be present on the local storage device 112, either as a component of the operating system 114, a component of the search engine 118 or other application, or as a stand-alone application capable of producing its own independent results. The logical operations performed by embodiments of the fact extraction tool 120 are discussed below in relation to FIGS. 2-5.
  • The computer system 100 of FIG. 1 may include a variety of computer readable media. Such computer readable media contains the instructions for operation of the computer system and for implementation of the embodiments discussed herein. Computer readable media can be any available media that can be accessed by computer 100 and includes both volatile and nonvolatile media, removable and non-removable media. By way of example, and not limitation, computer readable media may comprise computer storage media and communication media.
  • Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by computer system 100.
  • Communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. The term “modulated data signal” means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media. Combinations of the any of the above should also be included within the scope of computer readable media.
  • FIG. 2 shows an example of logical operations performed by a search engine 118 in conjunction with the fact extraction tool 120. In this example, the fact extraction tool 120 is utilized prior to a search occurring in order to generate a library of facts present in the electronic documents to be searched. In this manner, there is no processing time required to extract the facts but instead those facts have already been extracted and are retrieved from a fact library on the basis of the search terms entered.
  • The logical operations begin at collection operation 202 where the collection of electronic documents is obtained or access is otherwise achieved. For example, the electronic documents to eventually be searched may be saved to local storage or may be acquired via on-line access. The fact extraction tool 120 then operates upon each one of those electronic documents to attempt to extract all of the facts that are present in the electronic documents. The fact extraction tool 120 may generate a library of facts that are stored in association with the corresponding electronic documents and are available for access during future searches. For example, Table 1 shows such a library of associations.
  • TABLE 1
    Electronic Document Facts
    www.sample1.com Fact A
    Fact B
    Fact C
    www.sample2.com Fact AA
    Fact BB
    Fact CC
    www.sample3.com Fact AAA
  • Continuing with the operational flow of FIG. 2, a user wishing to do a search to find relevant electronic documents, and particularly to find relevant facts from those electronic documents, enters a search term into the search engine 118 at term operation 206. In this example, the search engine 118 then searches through the electronic documents for the search terms and finds matching documents at document operation 208. The search engine also finds the previously extracted facts which match the search terms from those matching electronic documents and then displays the relevant documents or a link thereto along with the relevant facts at display operation 210. For example, a search term may be found in www.sample1.com and the search term may also be found to match Fact A and Fact B such that a link to www.sample1.com is displayed along with Fact A and Fact B. Thus, the user is quickly provided with facts related to the search terms that were entered. An example of such a screen display is discussed below in relation to FIG. 6.
  • Of course, as an alternative the search may be for previously extracted facts only, rather than for the electronic documents themselves. Furthermore, in certain circumstances the previously extracted facts may match the search terms regardless of whether the electronic documents containing the facts match the search terms.
  • FIG. 3 shows another example of logical operations performed by a search engine 118 in conjunction with the fact extraction tool 120. In this example, the fact extraction tool 120 is utilized during a search in order to discover facts present in the electronic documents as they are being found by the search. In this manner, there is no need for pre-search fact extraction and no need for storage of a library of facts. In such a scenario, the fact extraction tool may only scan snippets or summaries of the document to provide very fast results, or the entire document may also be scanned to extract all potential facts.
  • The logical operations begin at term operation 302 where a user enters a search term into the search engine 118. In this example, the search engine 118 then searches through the electronic documents for the search terms and finds matching documents at document operation 304. The extraction tool 120 is then employed at extraction operation 306 in order to analyze the electronic documents that have been found by the search in order to extract facts from those documents that are relevant to the search terms. The result of extraction operation 306 may produce a temporary set of associations between electronic documents and facts as shown in Table 1, which may then be placed in longer term storage in anticipation searches for those search terms occurring in the future. The search engine then displays the relevant documents or a link thereto along with the relevant facts returned by the fact extraction tool 120 at extraction operation 306 at display operation 308.
  • FIG. 4 shows the multi-stage approach utilized by embodiments of the fact extraction tool 120. Initially, the fact extraction tool 120 attempts to recognize a set of factual descriptions from the electronic documents of interest at recognition operation 402. Here, the goal is to find those descriptions in the text that are likely to be facts based on finding matches to a fact-word table discussed in more detail below with reference to FIG. 5. By performing a quick matching process, much of the electronic document that should be ignored when finding facts can be eliminated from further fact extraction processing thereby increasing the efficiency of the subsequent stage(s) that are employed to increase accuracy.
  • After having identified a set of factual descriptions for a document being analyzed, fact extraction is then performed on that set of factual descriptions at extraction operation 404. Here, more detailed analyses are performed only on the set of factual descriptions, as opposed to the whole document, so that satisfactory efficiency is maintained while adequate accuracy is achieved. The analyses of extraction operation involve decision making based on a determination of linguistic constituents of the factual descriptions. Such linguistic constituents may include the syntactic constituents, the semantics, and so forth.
  • FIG. 5 shows an example of details of the recognition and extraction operations of FIG. 4. The logical operations begin at scanning operation 502 where the fact extraction tool 120 scans the electronic document to find words or phrases matching those of a fact-word table. A fact-word table is a list of words or phrases that are known to likely be used when expressing a fact as opposed to an opinion for example. Table 2 shows a brief example. Note that to provide optimal processing performance, the words of the table may be associated with the most appropriate part of speech (POS) tag which is discussed below in relation to tag operation 504.
  • TABLE 2
    Fact-Word List POS Tags
    Word/Phrase1 POS Tag
    Word/Phrase 2 POS Tag
    Word/Phrase N POS Tag
  • Research has been done to determine words that are suggestive of facts rather than opinions. For example, the class of words that introduce facts can be derived using research and work on the classification of verbs and their lexical functions. Two relevant papers that may be used as a material to do so include:
      • (1) Mel'cuk (1996) Lexical Functions: A Tool for the Description of Lexical Relations in the Lexicon. In L. Wanner (ed.): Lexical Functions in Lexicography and Natural Language Processing, Amsterdam/Philadelphia: Benjamins, 37-102.
      • (2) Fontenelle, T. (1997): “Discovering Significant Lexical Functions in Dictionary Entries”, in Cowie, A P. (ed.) Phraseology: Theory, Analysis, and Applications, Oxford University Press, Oxford.
  • Thus, on the basis of such research, the fact-word list as shown in Table 2 may be constructed to include those verbs or other words that are suggestive of a fact expression as opposed to a non-fact. For example, the terms “invented” or “hired” are suggestive of a fact expression whereas the terms “can be” or “complains” are not. A particular example of a fact-word list can be found in Appendix A located at the end of this specification. This particular example is a non-exhaustive list of verbs that are fact-words that may be used to discover factual descriptions in electronic documents.
  • Either upon application of the fact-word table to an electronic document, or in parallel with the application of the fact-word table such as where the POS Tag is already associated with the words in the fact-word table, the parts of speech (POS) of each of the words of each factual description are tagged at tag operation 504. This tagging operation 504, which may occur in parallel with or subsequent to scan operation 502, may involve making disambiguating choices for words which have more than one POS tag, such as by favoring a noun tag over a verb tag since it is understood that syntactic phrases like noun phrases are known to be the entities involved in a factual event. Any unknown and non-pre-tagged words may default to nouns for this reason as well. As with nouns, adjectives may be favored over verbs (e.g., “planned” as an adjective over “planned” as a verb) as well such that words having both an adjective and verb tag will default to adjective because adjective are part of noun phrases which are known to be the entities involved in a factual event. When creating the associations of the POS Tags to the words of the fact-word table, such as when creating the table, these disambiguating choices may already be applied so that, for instance, “planned” is associated with an adjective POS Tag in the table and not a verb POS Tag.
  • Once the factual descriptions have been found and the words of the factual descriptions have been tagged with the POS, then the more complete analysis may be performed to improve the accuracy of the fact extraction without requiring that the entire document be subjected to this more complete processing. At identification operation 506, syntactic phrases like noun phrases and verb phrases are identified. The syntactic phrases are identified by utilizing conventional grammar rules and light linguistic analysis. Those syntactic phrases that are in the neighborhood, i.e., very local to the set of factual descriptions in a document are identified and if a factual description has no syntactic phrases associated to it, then the corresponding sentence may be eliminated from further consideration. Thus, by focusing on only those syntactic phrases that are in the neighborhood of the factual description, the process avoids looking at all the linguistic constituents of a whole sentence.
  • Furthermore, at identification operation 506, the linguistic constituents of the factual descriptions having the neighboring syntactic phrases are further determined by assessing the role a syntactic phrase plays within the corresponding sentence based on the pattern identified in the factual description. Thus, it is determined from the word pattern of the factual description whether the syntactic phrase plays the role of subject or object within the sentence containing the current factual description being analyzed.
  • Once the linguistic constituents of the factual descriptions are determined, i.e., the syntactic phrases and their roles have been identified, exclusion rules may then be applied to those noun phrases of the factual descriptions to further eliminate those that are less likely to be an expression of fact at exclusion operation 508. The exclusion rules may be applicable on the basis of a syntactic phrase as an object, a syntactic phrase as a subject, or a syntactic phrase without regard to its role. Furthermore, in this particular embodiment, an exclusion rule being applied to individual words, to the syntactic phrases, or to the whole sentence lead to the same result, which is to exclude the whole sentence from being a factual description. An example of exclusion rules that may be applied is shown in Table 3.
  • TABLE 3
    Exclusion Rules Conclusion
    “Object” has “opinion/biased” modifier Rule out the sentence
    candidate
    Sentence Filters: Rule out the sentence
    Initial word of sentence (e.g., pronouns) candidate
    Punctuation: e.g. ‘?’
    “Subject” is a definite - unless Proper name Rule out the sentence
    candidate
    Surrounding “Context” of the “Object” Rule out the sentence
    candidate if the
    surrounding context
    has a particular POS
    that is not indicative of
    a fact (e.g., some class
    of pronouns)
    Stop words occur in the sentence Rule out the sentence
    candidate
    “Subject” of “Object” contain pronouns Rule out Noun Phrase
  • Either upon application of the exclusion rules, or in parallel with the application of the exclusion rules, scoring rules are applied at scoring operation 510. The scoring rules give a weight to both the subject and object noun phrases for each of various features, and a total score for the candidate factual description is the sum of the individual feature weights plus the certainty score of the matching fact-word. The individual feature weights may be positive, when indicative of a fact, and may be negative, when indicative of a non-fact. Examples of features and associated scoring rules are provided below in Table 4. The feature scores may be manually assigned using human judgment or may be automatically learned.
  • TABLE 4
    Features Scoring rules
    Certainty score of the matching pattern
    (fact-word, e.g., main verb)
    Class of the Roles (i.e., subject or verb), Score per class
    e.g.: person, country, organization, etc.
    Main “subject” contains a Proper Name Normal weight
    “Object” length Length score
    “Subject” length Length score
    Sentence length Length score
    “Subject” appears at beginning of sentence - Position score
    i.e., subject offset
    “Object” has a modifier (adjective, adverbs) Negative - Basic
    weight
    “Object” is a definite (“the”) Negative - Basic
    Exclusive when ends
    copula sentence
  • The total score for the factual description is then compared to a pre-defined threshold to determine whether the total score exceeds the threshold at query operation 512. If the threshold is not exceeded, then the corresponding factual description may be discarded. If the threshold is exceeded, then the factual description, the complete sentence, and/or the complete paragraph or other document portion may be presented as a fact at presentation operation 514. This presentation may include displaying the fact, saving the fact to a library, and so forth.
  • In utilizing the scoring rules and threshold comparison, the weights assigned to the features and/or the threshold value may be manipulated without manipulating the whole approach to fact extraction. In this manner, the degree of accuracy of fact extraction and presentation can be controlled while the processing steps remain the same.
  • FIG. 6 shows an example screenshot 600 resulting from performing a search. Search terms have been entered in search field 602 to conduct the search. The search term has been matched to various web site links 604 available from the Internet. The user may visit the electronic documents in the normal fashion.
  • Additionally, facts 610, 612, and 614 about the search term are displayed in section 608. Accordingly, a user can quickly spot facts about the subject of the search without having to visit any of the electronic documents that have been found and without having to manually read and discern fact from opinion. In this particular example, the facts 610, 612, and 614 include hyperlinks that the user may select to give more information about the source of the fact and/or to show the context within which the facts were discovered (e.g., date of the fact associated, other facts, etc.).
  • It will be appreciated that screenshot 600 is merely one example of how the facts may be presented to the user. Rather than presenting them in a separate column as shown, they may be listed as sub-elements of the electronic document that they have been extracted from. Furthermore, as an alternative to or in addition to listing the facts on the search results page, the facts extracted from a particular electronic document may also be listed in a column or other location upon the user viewing the electronic document itself. Additionally, as an alternative to or in addition to separating the facts from the document for display, the facts may be highlighted within the electronic documents both in the list of documents 604 within the search results and within the complete electronic document when it is chosen for display. As yet another alternative, the facts may be displayed independently from search results, such as to display facts only with a selectable link to obtain the source documents, where only the extracted facts have been searched to thereby avoid the document search completely.
  • Additionally, it will be appreciated that the presentation of the extracted facts, such as that shown in screenshot 600, may be provided as a display to a local computer implementing the search and fact extraction for a local user. Alternatively, the presentation of the extracted facts, such as that shown in screenshot 600, may be provided as a display to a remote computer that has requested that the local computer perform the search and fact extraction on its behalf, such as in the case of an Internet based search engine.
  • Accordingly, facts may be efficiently and accurately extracted from documents for presentation to users. Through the multi-stage approach, the efficiency is increased by avoiding detailed analysis of the entire documents as well as avoiding detailed analysis of the entire sentence where a factual description has been found. The accuracy is maintained by employing further analysis upon the factual descriptions that have been discovered in the document by the initial stage of processing.
  • While the invention has been particularly shown and described with reference to various embodiments thereof, it will be understood by those skilled in the art that various other changes in the form and details may be made therein without departing from the spirit and scope of the invention. For example, certain exclusion rules that are not specific to the linguistic constituents of a factual description, such as those based on punctuation of a sentence, may be applied when parsing for the factual description rather than later during the application of other exclusion rules.
  • APPENDIX A
    Fact Words
    abase
    abate
    abort
    abrade
    abridge
    absorb
    abstract
    accelerate
    accent
    accept
    accredit
    achieve
    act
    add
    address
    adduce
    adjust
    administer
    admit
    advance
    advertise
    aerate
    afford
    aggravate
    agree
    aid
    aim
    air
    allay
    alleviate
    alter
    amend
    amplify
    amuse
    animate
    announce
    answer
    antedate
    appear
    appease
    apply
    argue
    arouse
    arrange
    arrest
    arrive
    ask
    assemble
    assert
    asseverate
    assign
    assuage
    assure
    attach
    attack
    attenuate
    avert
    avoid
    awake
    award
    back
    bail
    bank
    bar
    barbarize
    bare
    base
    batter
    beach
    beam
    bear
    become
    befog
    befuddle
    beget
    begin
    begrime
    belch
    belie
    bend
    benumb
    bequeath
    bestow
    betray
    better
    bind
    blackleg
    blanket
    bleach
    blemish
    blend
    blight
    blister
    block
    blockade
    blow
    blunder
    blunt
    blur
    blurt
    bob
    bog
    boil
    bolster
    boost
    bowdlerize
    bowl
    brace
    brand
    brave
    break
    brief
    brighten
    bring
    broadcast
    bruise
    buckle
    build
    bull
    bunch
    bundle
    bung
    burlesque
    burn
    burst
    bury
    buy
    bypass
    canvass
    cap
    capitalize
    carry
    cast
    castigate
    castrate
    catch
    chafe
    change
    channel
    charge
    check
    chill
    chime
    chip
    chock
    choke
    choose
    churn
    cipher
    circulate
    circumvent
    claim
    clash
    clean
    cleanse
    clear
    climb
    clinch
    clip
    clog
    close
    clot
    cloud
    cockle
    coin
    collapse
    collect
    colour
    comfort
    commission
    commit
    communicate
    compare
    complete
    compound
    compress
    compromise
    conceal
    concede
    conceive
    conciliate
    conclude
    conduct
    confess
    confide
    confirm
    confound
    confuse
    congeal
    connect
    conserve
    consolidate
    constitute
    constrain
    constrict
    continue
    contort
    contract
    control
    convert
    convey
    cook
    cool
    cordon
    correct
    corrode
    corrupt
    counter
    countersink
    cover
    crack
    crank
    crash
    craze
    create
    cripple
    crop
    cross
    crumble
    crush
    cry
    curb
    curdle
    curtail
    cushion
    cut
    damage
    damp
    dance
    dangle
    darken
    darn
    dash
    deaden
    deal
    debase
    debauch
    debunk
    decay
    decide
    declare
    deepen
    deface
    defeat
    defend
    deflate
    deflect
    deform
    defrost
    delay
    delegate
    deliver
    demise
    demonstrate
    dent
    deny
    deplete
    depreciate
    depress
    deprive
    depute
    derange
    describe
    desecrate
    design
    designate
    desolate
    despoil
    destroy
    detail
    detect
    deteriorate
    determine
    develop
    die
    differentiate
    diffuse
    dilute
    dim
    diminish
    direct
    dirty
    disable
    disappear
    discharge
    discipline
    disclose
    discolour
    disconnect
    discontinue
    discover
    discuss
    disfigure
    disguise
    dislocate
    dislodge
    dismantle
    dismount
    disorder
    dispatch
    dispense
    disperse
    display
    dispute
    disrupt
    distil
    distinguish
    distort
    disturb
    divert
    divide
    dock
    doctor
    dodge
    double
    douse
    draft
    dramatize
    draw
    dredge
    dress
    drive
    drop
    drown
    duff
    dull
    earth
    ease
    eat
    educate
    effect
    elevate
    elicit
    elude
    emancipate
    embellish
    embitter
    embody
    emit
    emphasize
    enable
    encourage
    end
    endorse
    endow
    enforce
    engage
    enhance
    enjoin
    enlarge
    enliven
    ennoble
    enrich
    enrol
    enshrine
    entail
    entangle
    enthrone
    entrust
    enunciate
    epitomize
    equalize
    erect
    escalate
    establish
    evade
    evaporate
    evince
    evoke
    exacerbate
    exact
    exaggerate
    examine
    exasperate
    exceed
    excite
    exhale
    exhibit
    exist
    expand
    expedite
    explain
    expose
    expound
    express
    extend
    extinguish
    extort
    extract
    fabricate
    face
    fade
    fail
    fake
    fall
    falsify
    familiarize
    fasten
    father
    fatten
    feature
    feed
    ferry
    fertilize
    festoon
    fiddle
    fight
    fill
    filter
    finalize
    find
    finish
    fire
    fit
    fix
    flag
    flash
    flaunt
    flay
    float
    flood
    floodlight
    flourish
    flush
    fly
    fog
    foil
    fold
    follow
    force
    forge
    forgive
    form
    foster
    foul
    found
    frame
    fray
    free
    freeze
    frustrate
    furl
    furnish
    furrow
    fuse
    gain
    gallop
    garble
    gash
    generate
    gerrymander
    get
    give
    gladden
    glorify
    gloss
    glut
    go
    govern
    grade
    graduate
    grant
    grate
    graze
    ground
    group
    grow
    guide
    halt
    halve
    hamper
    handle
    happen
    harass
    harbour
    harden
    harm
    harmonize
    harry
    hasten
    hatch
    head
    heal
    hear
    heat
    heighten
    help
    hide
    hit
    hoard
    hoist
    hold
    hope
    hound
    hurt
    identify
    illuminate
    imagine
    impair
    impart
    impeach
    impede
    imperil
    implant
    improve
    inaugurate
    increase
    indent
    indenture
    indicate
    induce
    induct
    infect
    infiltrate
    infix
    inflame
    inflate
    inflict
    influence
    inform
    infuse
    initial
    initiate
    injure
    insert
    inspire
    instigate
    instil
    institute
    integrate
    intend
    intensify
    interpolate
    interrupt
    intimate
    introduce
    invert
    invigorate
    invite
    invoke
    involve
    issue
    jab
    jam
    jettison
    jingle
    join
    jumble
    jump
    justify
    keep
    kick
    kill
    kindle
    knock
    lacerate
    ladder
    lance
    land
    laugh
    launch
    lay
    layer
    lead
    leave
    lend
    lengthen
    lessen
    let
    level
    liberate
    lie
    light
    lighten
    limit
    line
    link
    listen
    litter
    live
    liven
    load
    lock
    loose
    loosen
    lose
    lower
    lump
    magnify
    maintain
    make
    manage
    mangle
    manipulate
    manufacture
    mark
    marshal
    mask
    match
    matter
    maul
    measure
    meet
    mellow
    melt
    mend
    mention
    mildew
    mind
    misrepresent
    miss
    mist
    mitigate
    modify
    mollify
    moot
    mould
    move
    muddle
    muddy
    muffle
    muss
    muster
    mute
    mutilate
    narrow
    navigate
    neaten
    nick
    nip
    notch
    notice
    nourish
    nurse
    obfuscate
    obscure
    obstruct
    obtain
    occupy
    occur
    offend
    offer
    open
    operate
    oppose
    order
    originate
    outline
    overcharge
    overdo
    overflow
    overturn
    overwork
    pacify
    pack
    pad
    panic
    paralyze
    pare
    parlay
    parole
    parry
    part
    partition
    pass
    patch
    pay
    peal
    peddle
    peg
    penalize
    perform
    perish
    persecute
    pervert
    phrase
    pick
    pillow
    pique
    pit
    placard
    place
    plan
    plant
    play
    pluck
    plug
    plunge
    point
    poison
    pole
    polish
    poll
    pool
    pop
    pose
    position
    post
    pound
    preach
    precipitate
    predate
    prefer
    prejudice
    preoccupy
    prepare
    present
    preserve
    prettify
    prevent
    prick
    prime
    proclaim
    procure
    produce
    profess
    programme
    promote
    promulgate
    prop
    propagandize
    propel
    propound
    prosecute
    protect
    protest
    prove
    provide
    provoke
    prune
    publicize
    publish
    pull
    pulp
    punch
    puncture
    punish
    punt
    purge
    push
    put
    qualify
    quarter
    quench
    question
    quicken
    quieten
    quilt
    race
    raise
    ransack
    rap
    rationalize
    rattle
    re-engage
    re-establish
    re-form
    read
    rear
    reawaken
    recall
    receive
    reclaim
    recline
    recognize
    recommend
    reconcile
    reconsider
    record
    recruit
    reduce
    refer
    refine
    reflect
    refloat
    reform
    refuse
    regard
    register
    regulate
    rehabilitate
    rehearse
    reinforce
    reissue
    reject
    rekindle
    relate
    relax
    release
    relieve
    reline
    remould
    remove
    rend
    renew
    renovate
    reopen
    repair
    replace
    report
    republish
    require
    rerun
    reseat
    resist
    rest
    restart
    restore
    restrain
    result
    resurrect
    retail
    retain
    retire
    retract
    retrench
    retrieve
    return
    reveal
    reverse
    revive
    rewind
    right
    ring
    rise
    roast
    rock
    roll
    rotate
    rouse
    row
    ruffle
    ruin
    rumple
    run
    rush
    rustle
    sail
    salvage
    sap
    save
    scald
    scorch
    score
    scotch
    scratch
    scream
    scuff
    scupper
    scuttle
    seal
    sear
    seat
    secure
    see
    sell
    send
    serve
    set
    settle
    sever
    shake
    shame
    sharpen
    shatter
    sheathe
    shed
    shelter
    shield
    shift
    shine
    shingle
    shirk
    shoot
    shorten
    shout
    show
    shrink
    shut
    sift
    sign
    signal
    signalize
    signify
    simmer
    sing
    singe
    sink
    sit
    site
    situate
    skirt
    slacken
    slake
    slash
    sleep
    slice
    slip
    slow
    smear
    smile
    smudge
    snag
    snap
    snarl
    snuff
    sober
    soften
    soil
    solace
    solidify
    soothe
    sort
    sound
    sour
    sow
    spare
    spark
    speak
    speck
    speed
    spill
    spin
    splinter
    split
    splodge
    spoil
    sponsor
    sport
    spot
    spout
    sprain
    spray
    spread
    spring
    square
    squash
    squeeze
    stack
    staff
    stain
    stalemate
    stall
    stamp
    stand
    star
    starch
    start
    staunch
    stay
    steady
    steer
    stem
    step
    stick
    stiffen
    still
    stir
    stoke
    stop
    store
    straighten
    strain
    strand
    strengthen
    stress
    stretch
    strike
    strip
    strum
    study
    stuff
    stultify
    stunt
    subdue
    subscribe
    subvert
    succeed
    suffer
    suggest
    suit
    summarize
    supplement
    supply
    support
    suppose
    suppress
    surface
    surrender
    survive
    suspend
    sustain
    sweep
    sweeten
    swell
    swing
    swish
    taint
    tarnish
    task
    teach
    tear
    telephone
    temper
    tend
    thank
    thaw
    thin
    thrill
    throw
    thrust
    thump
    thwart
    tidy
    tighten
    toll
    tootle
    topple
    torment
    torture
    total
    touch
    toughen
    tousle
    tow
    train
    trample
    transfer
    transplant
    trap
    travel
    treat
    trigger
    trim
    truss
    try
    tumble
    turn
    twang
    twiddle
    twirl
    twist
    unblock
    unburden
    unclog
    undo
    unfasten
    unfix
    unfold
    unhinge
    unhitch
    unite
    unloose
    unravel
    unsaddle
    unseat
    unsex
    unstop
    untangle
    untwist
    uphold
    upset
    urge
    use
    validate
    vandalize
    veer
    veil
    ventilate
    vocalize
    voice
    vote
    vulgarize
    waft
    waggle
    wake
    walk
    wangle
    warm
    warn
    warp
    warrant
    wash
    watch
    weaken
    wean
    wear
    weave
    weep
    weld
    whet
    whirl
    whitewash
    widen
    wield
    wiggle
    wilt
    win
    wind
    wing
    wipe
    wire
    wish
    withdraw
    wither
    withhold
    work
    worry
    wreak
    wreck
    wrest
    wring
    wrinkle
    write
    yield

Claims (20)

1. A method of finding facts within electronic resources, comprising:
scanning an electronic resource to discover factual descriptions of sentences that comprise words matching words of a fact-word table;
examining the discovered factual descriptions to identify the linguistic constituents of the factual descriptions; and
determining whether to present a factual description as a fact based on the identified linguistic constituents.
2. The method of claim 1, wherein determining whether to present a factual description as fact based on the identified linguistic constituent comprises:
applying excluding rules in relation to the linguistic constituents of the factual descriptions to eliminate certain factual descriptions from consideration;
scoring the factual descriptions;
comparing the score of each factual description remaining for consideration to a threshold; and
for each factual description having a score that exceeds the threshold, presenting at least a portion of the sentence containing the factual description as a fact.
3. The method of claim 2, further comprising tagging words of the factual descriptions with their parts of speech.
4. The method of claim 3, wherein tagging words of the factual descriptions with their parts of speech comprises applying a noun tag when a word may be either a verb or a noun.
5. The method of claim 4, wherein applying the excluding rules comprises applying a first set of rules for syntactic phrases that have a role of subjects and applying a second set of rules for syntactic phrases that have a role of objects.
6. The method of claim 5, wherein applying the first set of rules comprises excluding noun phrases having an opinion or biased modifier of subjects or objects.
7. The method of claim 5, wherein applying the second set of rules comprises excluding subject noun phrases which non-proper name definite descriptions, excluding noun phrases which conatain pronouns, and excluding subject noun phrases which do not appear at the beginning of text.
8. The method of claim 5, further comprising applying a third set of rules without regard to the role of the noun phrase.
9. The method of claim 8, wherein applying the third set of rules comprises excluding factual descriptions where the punctuation of the sentence is a question mark, and excluding sentences with phrases that include a stop word.
10. The method of claim 2, wherein scoring the factual descriptions comprises scoring only those factual descriptions remaining for consideration either after or during application of the excluding rules.
11. A computer readable medium containing instructions that perform acts comprising:
receiving a search term;
parsing a plurality of electronic documents to discover factual descriptions of sentences that comprise words matching words of a fact-word table;
examining the discovered factual descriptions to identify the linguistic constituents of the factual descriptions; and
determining whether to present a factual description as a fact relevant to the search term based on the identified linguistic constituent.
12. The computer readable medium of claim 11, wherein the acts further comprise obtaining the plurality of documents by searching an collection of electronic documents to find those documents containing the search term, wherein the collection is searched to find those documents containing the search term prior to parsing the plurality of electronic documents.
13. The computer readable medium of claim 11, wherein the acts further comprise obtaining the electronic documents and presenting factual descriptions prior to receiving the search term and searching the electronic documents and factual descriptions to find those electronic documents and corresponding factual descriptions that are relevant to the search term.
14. The computer readable medium of claim 11, wherein determining whether to present a factual description as a fact relevant to the search term based on the identified linguistic constituent comprises:
applying excluding rules in relation to the linguistic constituents of the factual descriptions to eliminate a portion of the factual descriptions from consideration;
scoring the factual descriptions;
comparing the score of each factual description remaining for consideration to a threshold; and
for each factual description that is taken from an electronic document that contains the search term and that has a score that exceeds the threshold, presenting at least a portion of the sentence containing the factual description as a fact relevant to the search term.
15. The computer readable medium of claim 14, wherein scoring the factual descriptions comprises scoring only those factual descriptions remaining for consideration after applying the excluding rules.
16. A computer system, comprising:
storage containing a plurality of electronic resources that comprise textual information;
a processor that receives a request to present facts that are related to the search term from a set of electronic documents, wherein the processor parses the plurality of electronic documents to discover factual descriptions of sentences that comprise words matching words of a fact-word table, examines the discovered factual descriptions to identify the linguistic constituents of the factual descriptions, determines whether to present a factual description as a fact based on the identified linguistic constituent, and presents at least a portion of sentences that contain the factual descriptions that are determined to be presented as a fact and that are related to the search term.
17. The computer system of claim 16, further comprising a display device and wherein the processor presents at least the portion of the sentences by displaying at least the portions of the sentences on the display device.
18. The computer system of claim 16, further comprising a network interface and wherein the processor presents at least the portion of the sentences by outputting those portions to another computer via the network interface.
19. The computer system of claim 16, further comprising a network interface and wherein the storage is accessible by the processor via the network interface.
20. The computer system of claim 16, wherein the processor determines whether to present a factual description as fact by:
applying excluding rules in relation to the linguistic constituents of the factual descriptions to eliminate a portion of the factual descriptions from consideration;
scoring the factual descriptions;
comparing the score of each factual description remaining for consideration to a threshold; and
for each factual description that contains the search term and that has a score that exceeds the threshold, presenting at least the portion of the sentence containing the factual description as a fact relevant to the search term.
US11/496,650 2006-07-31 2006-07-31 Distinguishing facts from opinions using a multi-stage approach Active US7668791B2 (en)

Priority Applications (10)

Application Number Priority Date Filing Date Title
US11/496,650 US7668791B2 (en) 2006-07-31 2006-07-31 Distinguishing facts from opinions using a multi-stage approach
TW096126248A TWI431493B (en) 2006-07-31 2007-07-18 Method, computer readable storage medium, and computer system for optimization of fact extraction using a multi-stage approach
RU2009103145/08A RU2451999C2 (en) 2006-07-31 2007-07-20 Optimisation of fact extraction using multi-stage approach
PCT/US2007/016435 WO2008016491A1 (en) 2006-07-31 2007-07-20 Optimization of fact extraction using a multi-stage approach
EP07796948A EP2050019A4 (en) 2006-07-31 2007-07-20 Optimization of fact extraction using a multi-stage approach
JP2009522777A JP5202524B2 (en) 2006-07-31 2007-07-20 Optimizing fact extraction using a multi-stage approach
AU2007281638A AU2007281638B2 (en) 2006-07-31 2007-07-20 Optimization of fact extraction using a multi-stage approach
BRPI0714311-7A BRPI0714311A2 (en) 2006-07-31 2007-07-20 Actual extraction optimization using a multi-stage approach
MX2009000588A MX2009000588A (en) 2006-07-31 2007-07-20 Optimization of fact extraction using a multi-stage approach.
NO20085387A NO20085387L (en) 2006-07-31 2008-12-29 Optimization of fact retrieval in a multistage approach

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US11/496,650 US7668791B2 (en) 2006-07-31 2006-07-31 Distinguishing facts from opinions using a multi-stage approach

Publications (2)

Publication Number Publication Date
US20080027888A1 true US20080027888A1 (en) 2008-01-31
US7668791B2 US7668791B2 (en) 2010-02-23

Family

ID=38987573

Family Applications (1)

Application Number Title Priority Date Filing Date
US11/496,650 Active US7668791B2 (en) 2006-07-31 2006-07-31 Distinguishing facts from opinions using a multi-stage approach

Country Status (10)

Country Link
US (1) US7668791B2 (en)
EP (1) EP2050019A4 (en)
JP (1) JP5202524B2 (en)
AU (1) AU2007281638B2 (en)
BR (1) BRPI0714311A2 (en)
MX (1) MX2009000588A (en)
NO (1) NO20085387L (en)
RU (1) RU2451999C2 (en)
TW (1) TWI431493B (en)
WO (1) WO2008016491A1 (en)

Cited By (36)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080010767A1 (en) * 2003-11-19 2008-01-17 Grimes David B Cleaning apparatus
US20110231387A1 (en) * 2010-03-22 2011-09-22 Yahoo! Inc. Engaging content provision
US8190628B1 (en) * 2007-11-30 2012-05-29 Google Inc. Phrase generation
US20120233534A1 (en) * 2011-03-11 2012-09-13 Microsoft Corporation Validation, rejection, and modification of automatically generated document annotations
CN102929934A (en) * 2012-09-25 2013-02-13 东莞宇龙通信科技有限公司 Photograph information display method and mobile terminal
US20130080152A1 (en) * 2011-09-26 2013-03-28 Xerox Corporation Linguistically-adapted structural query annotation
US9164977B2 (en) 2013-06-24 2015-10-20 International Business Machines Corporation Error correction in tables using discovered functional dependencies
CN105260091A (en) * 2015-09-07 2016-01-20 努比亚技术有限公司 Photo processing method and device
US20170060945A1 (en) * 2015-08-25 2017-03-02 International Business Machines Corporation Selective Fact Generation from Table Data in a Cognitive System
US9600461B2 (en) 2013-07-01 2017-03-21 International Business Machines Corporation Discovering relationships in tabular data
CN106648390A (en) * 2016-12-05 2017-05-10 网易(杭州)网络有限公司 Control instruction generation method and device, and mobile terminal
CN106924963A (en) * 2017-04-26 2017-07-07 温州大学 A kind of eyesight hearing rehabilitation training amusement target-shooting machine
US9830314B2 (en) 2013-11-18 2017-11-28 International Business Machines Corporation Error correction in tables using a question and answer system
CN108038263A (en) * 2017-11-15 2018-05-15 南京邮电大学 Consider the uncertain chip multiple parameters yield prediction method of performance dependency structure
US20180181673A1 (en) * 2016-12-28 2018-06-28 Beijing Baidu Netcom Science And Technology Co., Ltd. Answer searching method and device based on deep question and answer
CN109344993A (en) * 2018-08-23 2019-02-15 江西省水利科学研究院 A kind of river flood-peak stage forecasting procedure based on conditional probability distribution
US10289653B2 (en) 2013-03-15 2019-05-14 International Business Machines Corporation Adapting tabular data for narration
US10331782B2 (en) 2014-11-19 2019-06-25 Lexisnexis, A Division Of Reed Elsevier Inc. Systems and methods for automatic identification of potential material facts in documents
CN110057634A (en) * 2019-04-11 2019-07-26 东北石油大学 A kind of device and method manufacturing rock core crack
CN110597108A (en) * 2019-08-23 2019-12-20 广州电力设计院有限公司 Cable tunnel area control system, control method and device and computer equipment
CN110737010A (en) * 2019-09-19 2020-01-31 西安空间无线电技术研究所 safe positioning time service signal generation system based on low-orbit communication satellite
CN111026597A (en) * 2019-01-31 2020-04-17 哈尔滨安天科技集团股份有限公司 Method and device for detecting chip hidden storage space and storage medium
CN111078849A (en) * 2019-12-02 2020-04-28 百度在线网络技术(北京)有限公司 Method and apparatus for outputting information
CN111090785A (en) * 2019-06-10 2020-05-01 工盒(嘉兴)网络技术有限公司 Fastening cloud system
CN111126057A (en) * 2019-12-09 2020-05-08 航天科工网络信息发展有限公司 Case plot accurate criminal measuring system of hierarchical neural network
CN111836065A (en) * 2020-07-14 2020-10-27 韶关市启之信息技术有限公司 Intelligent method for automatically hiding live broadcast trademark
CN111858225A (en) * 2019-04-28 2020-10-30 中国移动通信集团上海有限公司 Delay prediction method, device, equipment and computer storage medium
CN111882828A (en) * 2020-07-22 2020-11-03 郝磊 Landslide prevention early warning device and using method thereof
CN112182895A (en) * 2020-10-10 2021-01-05 中际联合(天津)科技有限公司 Automatic analysis method for arrangement scheme diagram of wind turbine tower ladder and anti-falling
US10922326B2 (en) * 2012-11-27 2021-02-16 Google Llc Triggering knowledge panels
CN112890771A (en) * 2021-01-14 2021-06-04 四川写正智能科技有限公司 Child watch capable of monitoring sleep state based on millimeter wave radar sensor
US20220129641A1 (en) * 2018-02-14 2022-04-28 Capital One Services, Llc Utilizing machine learning models to identify insights in a document
CN115135133A (en) * 2020-02-14 2022-09-30 格立莫农业机械制造有限两合公司 Method for operating a machine for harvesting and/or separating root crops, associated machine and associated computer program product
CN115191786A (en) * 2022-08-04 2022-10-18 慕思健康睡眠股份有限公司 Control method, device, equipment and storage medium
US20220366141A1 (en) * 2021-05-13 2022-11-17 Motorola Solutions, Inc. System and method for predicting a penal code and modifying an annotation based on the prediction
CN115432851A (en) * 2022-08-23 2022-12-06 长兴瑷晟环保装备有限公司 Efficient coagulation and hydrodynamic cavitation integrated machine

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9495358B2 (en) 2006-10-10 2016-11-15 Abbyy Infopoisk Llc Cross-language text clustering
US8671341B1 (en) * 2007-01-05 2014-03-11 Linguastat, Inc. Systems and methods for identifying claims associated with electronic text
US9152738B2 (en) 2008-06-13 2015-10-06 Neil Young Sortable and updateable compilation and archiving platform and uses thereof
USD805535S1 (en) 2013-06-04 2017-12-19 Abbyy Production Llc Display screen or portion thereof with a transitional graphical user interface
USD802609S1 (en) 2013-06-04 2017-11-14 Abbyy Production Llc Display screen with graphical user interface
RU2665239C2 (en) 2014-01-15 2018-08-28 Общество с ограниченной ответственностью "Аби Продакшн" Named entities from the text automatic extraction
RU2586577C2 (en) 2014-01-15 2016-06-10 Общество с ограниченной ответственностью "Аби ИнфоПоиск" Filtering arcs parser graph
US9626358B2 (en) 2014-11-26 2017-04-18 Abbyy Infopoisk Llc Creating ontologies by analyzing natural language texts
RU2592396C1 (en) 2015-02-03 2016-07-20 Общество с ограниченной ответственностью "Аби ИнфоПоиск" Method and system for machine extraction and interpretation of text information
RU2610241C2 (en) 2015-03-19 2017-02-08 Общество с ограниченной ответственностью "Аби ИнфоПоиск" Method and system for text synthesis based on information extracted as rdf-graph using templates
US10776587B2 (en) * 2016-07-11 2020-09-15 International Business Machines Corporation Claim generation
RU2637992C1 (en) * 2016-08-25 2017-12-08 Общество с ограниченной ответственностью "Аби Продакшн" Method of extracting facts from texts on natural language
CN108257380B (en) * 2017-12-05 2020-11-10 北京掌行通信息技术有限公司 Method and system for detecting congestion event based on road condition information
CN110007589B (en) * 2019-02-26 2021-05-18 湖南盛世威得科技有限公司 Intelligent watch with automatic fire distress function
CN111526397A (en) * 2020-03-30 2020-08-11 深圳市懿美莱科技有限公司 Intelligent home network player
US11687539B2 (en) 2021-03-17 2023-06-27 International Business Machines Corporation Automatic neutral point of view content generation

Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5331556A (en) * 1993-06-28 1994-07-19 General Electric Company Method for natural language data processing using morphological and part-of-speech information
US5519608A (en) * 1993-06-24 1996-05-21 Xerox Corporation Method for extracting from a text corpus answers to questions stated in natural language by using linguistic analysis and hypothesis generation
US5696962A (en) * 1993-06-24 1997-12-09 Xerox Corporation Method for computerized information retrieval using shallow linguistic analysis
US6167370A (en) * 1998-09-09 2000-12-26 Invention Machine Corporation Document semantic analysis/selection with knowledge creativity capability utilizing subject-action-object (SAO) structures
US6609091B1 (en) * 1994-09-30 2003-08-19 Robert L. Budzinski Memory system for storing and retrieving experience and knowledge with natural language utilizing state representation data, word sense numbers, function codes and/or directed graphs
US6665661B1 (en) * 2000-09-29 2003-12-16 Battelle Memorial Institute System and method for use in text analysis of documents and records
US6741986B2 (en) * 2000-12-08 2004-05-25 Ingenuity Systems, Inc. Method and system for performing information extraction and quality control for a knowledgebase
US20040158469A1 (en) * 2003-02-05 2004-08-12 Verint Systems, Inc. Augmentation and calibration of output from non-deterministic text generators by modeling its characteristics in specific environments
US20040172378A1 (en) * 2002-11-15 2004-09-02 Shanahan James G. Method and apparatus for document filtering using ensemble filters
US20050108630A1 (en) * 2003-11-19 2005-05-19 Wasson Mark D. Extraction of facts from text
US20050192992A1 (en) * 2004-03-01 2005-09-01 Microsoft Corporation Systems and methods that determine intent of data and respond to the data based on the intent
US20050267871A1 (en) * 2001-08-14 2005-12-01 Insightful Corporation Method and system for extending keyword searching to syntactically and semantically annotated data
US20060041424A1 (en) * 2001-07-31 2006-02-23 James Todhunter Semantic processor for recognition of cause-effect relations in natural language documents
US20060095250A1 (en) * 2004-11-03 2006-05-04 Microsoft Corporation Parser for natural language processing
US20070027860A1 (en) * 2005-07-28 2007-02-01 International Business Machines Corporation Method and apparatus for eliminating partitions of a database table from a join query using implicit limitations on a partition key value
US7254530B2 (en) * 2001-09-26 2007-08-07 The Trustees Of Columbia University In The City Of New York System and method of generating dictionary entries
US7376551B2 (en) * 2005-08-01 2008-05-20 Microsoft Corporation Definition extraction

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000029902A (en) * 1998-07-15 2000-01-28 Nec Corp Structure document classifying device and recording medium where program actualizing same structured document classifying device by computer is recorded, and structured document retrieval system and recording medium where program actualizing same structured document retrieval system by computer is recorded
JP4630480B2 (en) 2001-03-19 2011-02-09 株式会社東芝 Summary extraction program, document analysis support program, summary extraction method, document analysis support method, document analysis support system
JP2001357064A (en) * 2001-04-09 2001-12-26 Toshiba Corp Information sharing support system
RU2236699C1 (en) * 2003-02-25 2004-09-20 Открытое акционерное общество "Телепортал. Ру" Method for searching and selecting information with increased relevance
KR100515641B1 (en) * 2003-04-24 2005-09-22 우순조 Method for sentence structure analysis based on mobile configuration concept and method for natural language search using of it

Patent Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5519608A (en) * 1993-06-24 1996-05-21 Xerox Corporation Method for extracting from a text corpus answers to questions stated in natural language by using linguistic analysis and hypothesis generation
US5696962A (en) * 1993-06-24 1997-12-09 Xerox Corporation Method for computerized information retrieval using shallow linguistic analysis
US5331556A (en) * 1993-06-28 1994-07-19 General Electric Company Method for natural language data processing using morphological and part-of-speech information
US6609091B1 (en) * 1994-09-30 2003-08-19 Robert L. Budzinski Memory system for storing and retrieving experience and knowledge with natural language utilizing state representation data, word sense numbers, function codes and/or directed graphs
US6167370A (en) * 1998-09-09 2000-12-26 Invention Machine Corporation Document semantic analysis/selection with knowledge creativity capability utilizing subject-action-object (SAO) structures
US6665661B1 (en) * 2000-09-29 2003-12-16 Battelle Memorial Institute System and method for use in text analysis of documents and records
US6741986B2 (en) * 2000-12-08 2004-05-25 Ingenuity Systems, Inc. Method and system for performing information extraction and quality control for a knowledgebase
US20060041424A1 (en) * 2001-07-31 2006-02-23 James Todhunter Semantic processor for recognition of cause-effect relations in natural language documents
US20050267871A1 (en) * 2001-08-14 2005-12-01 Insightful Corporation Method and system for extending keyword searching to syntactically and semantically annotated data
US7254530B2 (en) * 2001-09-26 2007-08-07 The Trustees Of Columbia University In The City Of New York System and method of generating dictionary entries
US20040172378A1 (en) * 2002-11-15 2004-09-02 Shanahan James G. Method and apparatus for document filtering using ensemble filters
US20040163035A1 (en) * 2003-02-05 2004-08-19 Verint Systems, Inc. Method for automatic and semi-automatic classification and clustering of non-deterministic texts
US20040158469A1 (en) * 2003-02-05 2004-08-12 Verint Systems, Inc. Augmentation and calibration of output from non-deterministic text generators by modeling its characteristics in specific environments
US20050108630A1 (en) * 2003-11-19 2005-05-19 Wasson Mark D. Extraction of facts from text
US20050192992A1 (en) * 2004-03-01 2005-09-01 Microsoft Corporation Systems and methods that determine intent of data and respond to the data based on the intent
US20060095250A1 (en) * 2004-11-03 2006-05-04 Microsoft Corporation Parser for natural language processing
US20070027860A1 (en) * 2005-07-28 2007-02-01 International Business Machines Corporation Method and apparatus for eliminating partitions of a database table from a join query using implicit limitations on a partition key value
US7376551B2 (en) * 2005-08-01 2008-05-20 Microsoft Corporation Definition extraction

Cited By (48)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8240063B2 (en) * 2003-11-19 2012-08-14 David Brian Grimes Cleaning wringing and drying apparatus
US20080010767A1 (en) * 2003-11-19 2008-01-17 Grimes David B Cleaning apparatus
US8190628B1 (en) * 2007-11-30 2012-05-29 Google Inc. Phrase generation
US8538979B1 (en) * 2007-11-30 2013-09-17 Google Inc. Generating phrase candidates from text string entries
US20110231387A1 (en) * 2010-03-22 2011-09-22 Yahoo! Inc. Engaging content provision
US8719692B2 (en) * 2011-03-11 2014-05-06 Microsoft Corporation Validation, rejection, and modification of automatically generated document annotations
US20120233534A1 (en) * 2011-03-11 2012-09-13 Microsoft Corporation Validation, rejection, and modification of automatically generated document annotations
US9880988B2 (en) 2011-03-11 2018-01-30 Microsoft Technology Licensing, Llc Validation, rejection, and modification of automatically generated document annotations
US20130080152A1 (en) * 2011-09-26 2013-03-28 Xerox Corporation Linguistically-adapted structural query annotation
US8812301B2 (en) * 2011-09-26 2014-08-19 Xerox Corporation Linguistically-adapted structural query annotation
CN102929934A (en) * 2012-09-25 2013-02-13 东莞宇龙通信科技有限公司 Photograph information display method and mobile terminal
US10922326B2 (en) * 2012-11-27 2021-02-16 Google Llc Triggering knowledge panels
US10303741B2 (en) 2013-03-15 2019-05-28 International Business Machines Corporation Adapting tabular data for narration
US10289653B2 (en) 2013-03-15 2019-05-14 International Business Machines Corporation Adapting tabular data for narration
US9569417B2 (en) 2013-06-24 2017-02-14 International Business Machines Corporation Error correction in tables using discovered functional dependencies
US9164977B2 (en) 2013-06-24 2015-10-20 International Business Machines Corporation Error correction in tables using discovered functional dependencies
US9606978B2 (en) 2013-07-01 2017-03-28 International Business Machines Corporation Discovering relationships in tabular data
US9600461B2 (en) 2013-07-01 2017-03-21 International Business Machines Corporation Discovering relationships in tabular data
US9830314B2 (en) 2013-11-18 2017-11-28 International Business Machines Corporation Error correction in tables using a question and answer system
US10331782B2 (en) 2014-11-19 2019-06-25 Lexisnexis, A Division Of Reed Elsevier Inc. Systems and methods for automatic identification of potential material facts in documents
US20170060945A1 (en) * 2015-08-25 2017-03-02 International Business Machines Corporation Selective Fact Generation from Table Data in a Cognitive System
US10095740B2 (en) * 2015-08-25 2018-10-09 International Business Machines Corporation Selective fact generation from table data in a cognitive system
CN105260091A (en) * 2015-09-07 2016-01-20 努比亚技术有限公司 Photo processing method and device
CN106648390A (en) * 2016-12-05 2017-05-10 网易(杭州)网络有限公司 Control instruction generation method and device, and mobile terminal
US20180181673A1 (en) * 2016-12-28 2018-06-28 Beijing Baidu Netcom Science And Technology Co., Ltd. Answer searching method and device based on deep question and answer
US10606915B2 (en) * 2016-12-28 2020-03-31 Beijing Baidu Netcom Science And Technology Co., Ltd. Answer searching method and device based on deep question and answer
CN106924963A (en) * 2017-04-26 2017-07-07 温州大学 A kind of eyesight hearing rehabilitation training amusement target-shooting machine
CN108038263A (en) * 2017-11-15 2018-05-15 南京邮电大学 Consider the uncertain chip multiple parameters yield prediction method of performance dependency structure
US11861477B2 (en) * 2018-02-14 2024-01-02 Capital One Services, Llc Utilizing machine learning models to identify insights in a document
US20220129641A1 (en) * 2018-02-14 2022-04-28 Capital One Services, Llc Utilizing machine learning models to identify insights in a document
CN109344993A (en) * 2018-08-23 2019-02-15 江西省水利科学研究院 A kind of river flood-peak stage forecasting procedure based on conditional probability distribution
CN111026597A (en) * 2019-01-31 2020-04-17 哈尔滨安天科技集团股份有限公司 Method and device for detecting chip hidden storage space and storage medium
CN110057634A (en) * 2019-04-11 2019-07-26 东北石油大学 A kind of device and method manufacturing rock core crack
CN111858225A (en) * 2019-04-28 2020-10-30 中国移动通信集团上海有限公司 Delay prediction method, device, equipment and computer storage medium
CN111090785A (en) * 2019-06-10 2020-05-01 工盒(嘉兴)网络技术有限公司 Fastening cloud system
CN110597108A (en) * 2019-08-23 2019-12-20 广州电力设计院有限公司 Cable tunnel area control system, control method and device and computer equipment
CN110737010A (en) * 2019-09-19 2020-01-31 西安空间无线电技术研究所 safe positioning time service signal generation system based on low-orbit communication satellite
CN111078849A (en) * 2019-12-02 2020-04-28 百度在线网络技术(北京)有限公司 Method and apparatus for outputting information
CN111126057A (en) * 2019-12-09 2020-05-08 航天科工网络信息发展有限公司 Case plot accurate criminal measuring system of hierarchical neural network
CN115135133A (en) * 2020-02-14 2022-09-30 格立莫农业机械制造有限两合公司 Method for operating a machine for harvesting and/or separating root crops, associated machine and associated computer program product
CN111836065A (en) * 2020-07-14 2020-10-27 韶关市启之信息技术有限公司 Intelligent method for automatically hiding live broadcast trademark
CN111882828B (en) * 2020-07-22 2021-08-20 淮北智淮科技有限公司 Landslide prevention early warning device and using method thereof
CN111882828A (en) * 2020-07-22 2020-11-03 郝磊 Landslide prevention early warning device and using method thereof
CN112182895A (en) * 2020-10-10 2021-01-05 中际联合(天津)科技有限公司 Automatic analysis method for arrangement scheme diagram of wind turbine tower ladder and anti-falling
CN112890771A (en) * 2021-01-14 2021-06-04 四川写正智能科技有限公司 Child watch capable of monitoring sleep state based on millimeter wave radar sensor
US20220366141A1 (en) * 2021-05-13 2022-11-17 Motorola Solutions, Inc. System and method for predicting a penal code and modifying an annotation based on the prediction
CN115191786A (en) * 2022-08-04 2022-10-18 慕思健康睡眠股份有限公司 Control method, device, equipment and storage medium
CN115432851A (en) * 2022-08-23 2022-12-06 长兴瑷晟环保装备有限公司 Efficient coagulation and hydrodynamic cavitation integrated machine

Also Published As

Publication number Publication date
EP2050019A1 (en) 2009-04-22
TWI431493B (en) 2014-03-21
JP5202524B2 (en) 2013-06-05
MX2009000588A (en) 2009-01-27
NO20085387L (en) 2009-01-19
JP2009545808A (en) 2009-12-24
RU2009103145A (en) 2010-08-10
BRPI0714311A2 (en) 2013-04-24
RU2451999C2 (en) 2012-05-27
AU2007281638A1 (en) 2008-02-07
TW200817947A (en) 2008-04-16
AU2007281638B2 (en) 2011-10-06
WO2008016491A1 (en) 2008-02-07
US7668791B2 (en) 2010-02-23
EP2050019A4 (en) 2012-03-21

Similar Documents

Publication Publication Date Title
US7668791B2 (en) Distinguishing facts from opinions using a multi-stage approach
Watson et al. Creating false memories with hybrid lists of semantic and phonological associates: Over-additive false memories produced by converging associative networks
US9400838B2 (en) System and method for searching for a query
Sprenger Fixed expressions and the production of idioms
JP2009545808A5 (en)
Newman et al. Refining targeted syntactic evaluation of language models
Oostdijk et al. N-gram-based recognition of threatening tweets
US20180349360A1 (en) Systems and methods for automatically generating news article
US20180349352A1 (en) Systems and methods for identifying news trends
Clines The Ubiquitous Language of Violence in the Hebrew Bible
Najjari et al. Metaphorical conceptualization of SPORT through TERRITORY as a vehicle
Beliaeva et al. Blended names in the discussions of the Ukrainian crisis
Džanić et al. Conceptual integration theory in idiom modifications
Sotudeh et al. Comparing discrimination powers of text and citation-based context types
Considine Current projects in historical lexicography
Huang et al. The creative use of defeat is control in metaphors for sport: A study of NBA news headlines written in Chinese
Hudson The Empire in the epitome: Florus and the conquest of historiography
Russi Sicilian Elements in Andrea Camilleri's Narrative Language: A Linguistic Analysis
Janda Perturbations, practices, predictions, and postludes in a bioheuristic historical linguistics
Minugh The filling in the sandwich: internal modification of idioms
Smith et al. The semantics of winning and losing1
Vechtomova Related Entity Finding: University of Waterloo at TREC 2010 Entity Track.
Sundström How not to write a thesis or dissertation: a guide to success through failure
Kho " commodious vicus of recirculation": Finnegans Wake Under the Influence of Our Mutual Friend
Wehrle Frenzy: Babe Ruth's Much Ballyhooed Premier Season with the New York Yankees

Legal Events

Date Code Title Description
AS Assignment

Owner name: MICROSOFT CORPORATION, WASHINGTON

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:AZZAM, SALIHA;HUMPHREYS, KEVIN WILLIAM;REEL/FRAME:018162/0492

Effective date: 20060727

Owner name: MICROSOFT CORPORATION,WASHINGTON

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:AZZAM, SALIHA;HUMPHREYS, KEVIN WILLIAM;REEL/FRAME:018162/0492

Effective date: 20060727

STCF Information on status: patent grant

Free format text: PATENTED CASE

FPAY Fee payment

Year of fee payment: 4

AS Assignment

Owner name: MICROSOFT TECHNOLOGY LICENSING, LLC, WASHINGTON

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MICROSOFT CORPORATION;REEL/FRAME:034542/0001

Effective date: 20141014

FPAY Fee payment

Year of fee payment: 8

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 12TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1553); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 12