WO2002021324A1 - Method and apparatus for summarizing multiple documents using a subsumption model - Google Patents

Method and apparatus for summarizing multiple documents using a subsumption model Download PDF

Info

Publication number
WO2002021324A1
WO2002021324A1 PCT/CN2000/000265 CN0000265W WO0221324A1 WO 2002021324 A1 WO2002021324 A1 WO 2002021324A1 CN 0000265 W CN0000265 W CN 0000265W WO 0221324 A1 WO0221324 A1 WO 0221324A1
Authority
WO
WIPO (PCT)
Prior art keywords
documents
paragraphs
phrases
paragraph
entity names
Prior art date
Application number
PCT/CN2000/000265
Other languages
French (fr)
Inventor
Weiquan Liu
Joe F. Zhou
Original Assignee
Intel Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intel Corporation filed Critical Intel Corporation
Priority to AU2000269782A priority Critical patent/AU2000269782A1/en
Priority to PCT/CN2000/000265 priority patent/WO2002021324A1/en
Priority to US10/018,517 priority patent/US7398196B1/en
Publication of WO2002021324A1 publication Critical patent/WO2002021324A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/258Heading extraction; Automatic titling; Numbering

Definitions

  • the present invention relates to the field of natural language processing, information retrieval, information extraction, and automatic summary and abstraction generation.
  • Figure 1 is a flow diagram of one embodiment of a method for summarizing multiple documents using a subsumption model.
  • Figure 2 is a flow diagram of one embodiment of parsing a plurality of documents.
  • Figure 3 is a flow diagram of one embodiment of selecting paragraphs from the documents through subsuming relation calculation.
  • Figure 4 is a flow diagram of one embodiment of rewriting the selected paragraphs into a summary.
  • Figure 5 is an example of one embodiment of linking entity names in paragraphs of documents.
  • Figure 6 is an example of one embodiment of a computer system.
  • Figure 1 is a flow diagram of one embodiment of a method for summarizing multiple documents using a subsumption model.
  • the content of the documents are co-related to one central topic.
  • a plurality of documents are parsed, step 101.
  • paragraphs are selected from the documents through subsuming relation calculation, step 102.
  • the selected paragraphs are rewritten into a sum aiy, step 103.
  • Figure 2 is a flow diagram of one embodiment of parsing a plurality of documents, corresponding to step 101 of Figure 1.
  • parsing is accomplished by applying shallow natural language processing to text.
  • noun phrases and verb phrases are extracted from the documents, step 201.
  • the words in the documents are tagged according to their respective parts-of-speech.
  • a set of rules is applied to bracket out the noun phrases and verb phrases in the documents by matching the part-of-speech tags according to predefined patterns.
  • the noun phrases are further analyzed to identify entity names.
  • a word with the first letter in uppercase denotes that it is part of an entity name.
  • entity name, noun phrase, and verb phrase recognition captures the features of docin ⁇ fi ⁇ ts while limiting the overhead involved in parsing to a minimum.
  • the noun phrases that are entity names are categorized, step 202. Exemplary categories include people's names, company and organization names, addresses, currency amounts, dates, geographical locations, measurements, etc. In an embodiment where the documents all relate to one central topic, the detected noun phrases, verb phrases, and entity names have much in common.
  • the entity names are converted into canonical form, step 203. For example, "06/26/00" would be converted to "June 26, 2000".
  • the identified entity names are input into a subsuming relation calculation.
  • Figure 3 is a flow diagram of one embodiment of selecting, or in other words, extracting, paragraphs from the documents through subsuming relation calculation, corresponding to step 102 of Figure 1.
  • the subsuming relation calculation is designed to calculate the inherent subsumption between paragraphs from each document. This process determines the significance of each paragraph.
  • the noun phrases, verb phrases and/or entity names in the documents represent the content of those documents. Different paragraphs may share common noun/verb phrases and entity names. For example, if all the noun/verb phrases and entity names in a paragraph A are also in a paragraph B, then B subsumes A.
  • noun/verb phrases and entity names in each paragraph of every document are linked with identical noun/verb phrases and entity names in other paragraphs of each document, step 301.
  • Reference links are built between the common phrases and entity names shared by paragraphs.
  • Figure 5 discussed below illustrates an example of one embodiment of linking entity names in paragraphs of documents having a common topic independent of domain and being composed in a language other than English.
  • the links for each paragraph are counted, step 302.
  • the link count may be called a significance score. If a paragraph has more reference links, it is more significant than other paragraphs in representing the meaning of the documents. The more other paragraphs a given paragraph subsumes, the richer it is in content in comparison to the other paragraphs subsumed.
  • the paragraphs from the plurality of documents are ranked by their significant scores, step 303.
  • the paragraphs with the most subsumption are relatively more dominative and informative. Therefore, these paragraphs are extracted, or in other words selected, prior to other paragraphs.
  • the top N paragraphs are bulleted, where N can be a predefined length factor decided jointly by an empirical function and a user's preference, step 304.
  • the extracted paragraphs selected by the subsumption model are typically informative enough to represent the content of the central topic.
  • the subsuming relation calculation is domain independent. It can process documents of a variety of topics. It does not assume any domain knowledge adaptation. Thus, it is relatively easy to implement for different applications.
  • Figure 4 is a flow diagram of one embodiment of rewriting the selected paragraphs into a summary, co ⁇ esponding to step 103 of Figure 1.
  • the paragraphs are ranked, step 401, by their significance score.
  • the top N paragraphs are bulleted, where N can be a predefined length factor decided jointly by an empirical function and a user's preference. Cohesiveness is less likely if these bulleted paragraphs are output as a summary without further processing. So a co-reference resolution algorithm is applied to the paragraphs, step 402, to resolve anaphoric ambiguity.
  • a co-reference resolution algorithm is applied to the paragraphs, step 402, to resolve anaphoric ambiguity.
  • the subsuming relation calculation can be applied to languages other than English. To apply the calculation to another language, only the shallow natural language processing and co-reference resolution components need to be modified.
  • the core subsumption model is language independent.
  • Figure 5 is an example of one embodiment of linking identical entity names in paragraphs of documents having a common topic independent of domain and being composed in a language other than English.
  • One paragraph 501 contains entity names which are also contained in another paragraph 502.
  • the identical entity names in each paragraph are linked according to the flow diagram in Figure 3. Because all of the entity names in paragraph 501 are also contained in paragraph 502, paragraph 502 can be said to subsume paragraph 501.
  • Figure 6 is an example of one embodiment of a computer system.
  • the system shown has a processor 601 coupled to a bus 602.
  • a memory 603 which may contain instructions 604.
  • Additional components shown coupled to the bus are a storage device 605 (such as a hard drive, floppy drive, CD-ROM, DND-ROM, etc.), an input device 606 (such as a keyboard, mouse, light pen, bar code reader, scanner, microphone, joystick, etc.), and an output device (such as a printer, monitor, speakers, etc.).
  • a storage device 605 such as a hard drive, floppy drive, CD-ROM, DND-ROM, etc.
  • an input device 606 such as a keyboard, mouse, light pen, bar code reader, scanner, microphone, joystick, etc.
  • an output device such as a printer, monitor, speakers, etc.
  • an exemplary computer system could have more components than these or a subset of the components listed.
  • the method described above can be stored in the memory of a computer system (e.g., set top box, video recorders, etc.) as a set of instructions to be executed, as shown by way of example in Figure 6.
  • the instructions to perform the method described above could alternatively be stored on other forms of machine-readable media, including magnetic and optical disks.
  • the method of the present invention could be stored on machine-readable media, such as magnetic disks or optical disks, which are accessible via a disk drive (or computer-readable medium drive).
  • the instructions can be downloaded into a computing device over a data network in a form of compiled and linked version.
  • the logic to perform the methods as discussed above could be implemented in additional computer and/or machine readable media, such as discrete hardware components as large-scale integrated circuits (LSI's), application-specific integrated circuits (ASIC's), firmware such as electrically erasable programmable read ⁇ only memory (EEPROM's); and electrical, optical, acoustical and other forms of propagated signals (e.g., carrier waves, infrared signals, digital signals, etc.); etc:

Abstract

A method and apparatus for parsing a plurality of documents, selecting paragraphs from the documents through subsuming relation calculation, and rewriting the selected paragraphs into a summary is disclosed.

Description

METHOD AND APPARATUS FOR SUMMARIZING MULTIPLE DOCUMENTS
USING A SUBSUMPTION MODEL
FIELD OF INVENTION
The present invention relates to the field of natural language processing, information retrieval, information extraction, and automatic summary and abstraction generation.
BACKGROUND OF THE INVENΩON
The advent of the Information Age has brought with it an increase in the accessibility of data, accompanied by schemes for searching that data. One searching for specific data through the Internet or in other information systems using any of many search engines available is often presented with an lengthy list of documents which may or may not contain the data for which he was searching. Reading through such a lengthy list is undesirably time consuming.
To reduce the time needlessly wasted in such reading, a variety of technologies have been presented for summarizing multiple documents to express a theme central to these documents. However, all of these technologies are inherently limited in some aspect. Some are able to search only a specific domain of knowledge and are therefore difficult to implement for different applications. Some, without radical modification, can only search documents composed in certain languages. Some use deep language parsing, statistical, or term-vector based techniques, resulting in longer waits for search results and greater demands on computing resources. Almost all generate summaries by merely concatenating together text segments containing some keyword, often producing results which are incohesive due to anaphoric ambiguity. None use real natural language analyzing techniques. A method for summarizing multiple documents while avoiding these limitations is desirable.
BRIEF DESCRIPTION OF THE DRAWINGS
The present invention is illustrated by way of example and not limitation in the figures of the accompanying drawings, in which like references indicate similar elements, and in which:
Figure 1 is a flow diagram of one embodiment of a method for summarizing multiple documents using a subsumption model.
Figure 2 is a flow diagram of one embodiment of parsing a plurality of documents.
Figure 3 is a flow diagram of one embodiment of selecting paragraphs from the documents through subsuming relation calculation.
Figure 4 is a flow diagram of one embodiment of rewriting the selected paragraphs into a summary.
Figure 5 is an example of one embodiment of linking entity names in paragraphs of documents.
Figure 6 is an example of one embodiment of a computer system.
DETAILED DESCRIPTION
In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the present invention. It will be evident, however, to one skilled in the art that the present invention maybe practiced without these specific details.
Figure 1 is a flow diagram of one embodiment of a method for summarizing multiple documents using a subsumption model. In one embodiment, the content of the documents are co-related to one central topic. First, a plurality of documents are parsed, step 101. Then paragraphs are selected from the documents through subsuming relation calculation, step 102. Finally, the selected paragraphs are rewritten into a sum aiy, step 103. Each of these steps is described in greater detail below.
Figure 2 is a flow diagram of one embodiment of parsing a plurality of documents, corresponding to step 101 of Figure 1. In one embodiment, parsing is accomplished by applying shallow natural language processing to text.
First, noun phrases and verb phrases are extracted from the documents, step 201. To accomplish this, the words in the documents are tagged according to their respective parts-of-speech. A set of rules is applied to bracket out the noun phrases and verb phrases in the documents by matching the part-of-speech tags according to predefined patterns. The noun phrases are further analyzed to identify entity names. A word with the first letter in uppercase denotes that it is part of an entity name. The use of entity name, noun phrase, and verb phrase recognition captures the features of docinβfiόts while limiting the overhead involved in parsing to a minimum. Next, the noun phrases that are entity names are categorized, step 202. Exemplary categories include people's names, company and organization names, addresses, currency amounts, dates, geographical locations, measurements, etc. In an embodiment where the documents all relate to one central topic, the detected noun phrases, verb phrases, and entity names have much in common.
Finally, the entity names are converted into canonical form, step 203. For example, "06/26/00" would be converted to "June 26, 2000". The identified entity names are input into a subsuming relation calculation.
Figure 3 is a flow diagram of one embodiment of selecting, or in other words, extracting, paragraphs from the documents through subsuming relation calculation, corresponding to step 102 of Figure 1. In one embodiment, the subsuming relation calculation is designed to calculate the inherent subsumption between paragraphs from each document. This process determines the significance of each paragraph. The noun phrases, verb phrases and/or entity names in the documents represent the content of those documents. Different paragraphs may share common noun/verb phrases and entity names. For example, if all the noun/verb phrases and entity names in a paragraph A are also in a paragraph B, then B subsumes A.
First, noun/verb phrases and entity names in each paragraph of every document are linked with identical noun/verb phrases and entity names in other paragraphs of each document, step 301. Reference links are built between the common phrases and entity names shared by paragraphs. Figure 5 discussed below illustrates an example of one embodiment of linking entity names in paragraphs of documents having a common topic independent of domain and being composed in a language other than English. Next, the links for each paragraph are counted, step 302. The link count may be called a significance score. If a paragraph has more reference links, it is more significant than other paragraphs in representing the meaning of the documents. The more other paragraphs a given paragraph subsumes, the richer it is in content in comparison to the other paragraphs subsumed. Then, the paragraphs from the plurality of documents are ranked by their significant scores, step 303. The paragraphs with the most subsumption are relatively more dominative and informative. Therefore, these paragraphs are extracted, or in other words selected, prior to other paragraphs. In one embodiment, the top N paragraphs are bulleted, where N can be a predefined length factor decided jointly by an empirical function and a user's preference, step 304. The extracted paragraphs selected by the subsumption model are typically informative enough to represent the content of the central topic.
In one embodiment, the subsuming relation calculation is domain independent. It can process documents of a variety of topics. It does not assume any domain knowledge adaptation. Thus, it is relatively easy to implement for different applications.
Unlike other summarization systems, no statistic technique is used in the subsuming relation calculation. Therefore, no background corpus is needed to build a base frequency. The domain and length of the documents are not limited. The subsuming relation calculation is also not term-vector based, avoiding high dimension vector manipulation.
Figure 4 is a flow diagram of one embodiment of rewriting the selected paragraphs into a summary, coιτesponding to step 103 of Figure 1. First, the paragraphs are ranked, step 401, by their significance score. In one embodiment, the top N paragraphs are bulleted, where N can be a predefined length factor decided jointly by an empirical function and a user's preference. Cohesiveness is less likely if these bulleted paragraphs are output as a summary without further processing. So a co-reference resolution algorithm is applied to the paragraphs, step 402, to resolve anaphoric ambiguity. There are a number of such algorithms in the public domain. By introducing the co-reference resolution, most anaphoric ambiguity is removed, thus making the result summary more cohesive.
For example, a document might read, "I met John and Mary this morning. He was driving a red car. It's a nice sports car. She was very happy." A reader may not notice any co-reference ambiguity in it, since it's obviously that "he" stands for John, "she" stands for Mary and "if stands for the car. But the method and apparatus disclosed herein extracts the significant paragraphs (or sentences) for multiple documents and concatenates them into one text passage as a summary, and because these paragraphs may come from different documents, or different parts of the same document, they may contain pronouns that may refer to entity names in paragraphs that were not extracted and io not appear in the resulting summary. To reduce reader confusion, a one-to-one reference relation is built between each pronoun and its equivalent entity name. finally, the pronouns (for example, he, she, it, they, etc.) in the paragraphs are replaced with their full entity name antecedents, step 403. Thus, the readability of the output summary is improved.
The subsuming relation calculation can be applied to languages other than English. To apply the calculation to another language, only the shallow natural language processing and co-reference resolution components need to be modified. The core subsumption model is language independent.
Figure 5 is an example of one embodiment of linking identical entity names in paragraphs of documents having a common topic independent of domain and being composed in a language other than English. One paragraph 501 contains entity names which are also contained in another paragraph 502. The identical entity names in each paragraph are linked according to the flow diagram in Figure 3. Because all of the entity names in paragraph 501 are also contained in paragraph 502, paragraph 502 can be said to subsume paragraph 501.
The method and apparatus disclosed herein may be integrated into advanced Internet- or network-based knowledge systems as related to information retrieval, information extraction, and question and answer systems. Figure 6 is an example of one embodiment of a computer system. The system shown has a processor 601 coupled to a bus 602. Also shown coupled to the bus are a memory 603 which may contain instructions 604. Additional components shown coupled to the bus are a storage device 605 (such as a hard drive, floppy drive, CD-ROM, DND-ROM, etc.), an input device 606 (such as a keyboard, mouse, light pen, bar code reader, scanner, microphone, joystick, etc.), and an output device (such as a printer, monitor, speakers, etc.). Of course, an exemplary computer system could have more components than these or a subset of the components listed.
The method described above can be stored in the memory of a computer system (e.g., set top box, video recorders, etc.) as a set of instructions to be executed, as shown by way of example in Figure 6. In addition, the instructions to perform the method described above could alternatively be stored on other forms of machine-readable media, including magnetic and optical disks. For example, the method of the present invention could be stored on machine-readable media, such as magnetic disks or optical disks, which are accessible via a disk drive (or computer-readable medium drive). Further, the instructions can be downloaded into a computing device over a data network in a form of compiled and linked version.
Alternatively, the logic to perform the methods as discussed above, could be implemented in additional computer and/or machine readable media, such as discrete hardware components as large-scale integrated circuits (LSI's), application-specific integrated circuits (ASIC's), firmware such as electrically erasable programmable read¬ only memory (EEPROM's); and electrical, optical, acoustical and other forms of propagated signals (e.g., carrier waves, infrared signals, digital signals, etc.); etc:
Although the present invention has been described with reference to specific exemplary embodiments, it will be evident that various modifications and changes may be made to these embodiments without departing from the broader spirit and scope of the invention. Accordingly, the specification and drawings are to be regarded in an illustrative rather than a restrictive sense.

Claims

CLAIMSWhat is claimed is:
1. A computer-implemented method comprising: parsing a plurality of documents;
selecting paragraphs from the documents through subsuming relation calculation; and
'rewriting the selected paragraphs into a summary.
2. The method of claim 1 wherein parsing further comprises: extracting noun phrases and verb phrases from the documents; categorizing the noun phrases that are entity names; and converting the entity names into canonical form.
3. The method of claim 1 wherein subsuming relation calculation further comprises: linking noun phrases, verb phrases or entity names in each paragraph of every document with identical noun phrases, verb phrases or entity names in every other paragraph of every document; and counting the links for each paragraph.
4. The method of claim 1 wherein rewriting further comprises: ranking the paragraphs; applying a co-reference resolution algorithm to the paragraphs; and
replacing pronouns in the paragraphs with their full entity name antecedents.
5. The method of claim 1 wherein the documents have a common topic independent of domain.
6. The method of claim 1 wherein the documents are composed in English or in a language other than English.
7. A machine readable medium having stored thereon sequences of instructions which are executable by a processor, and which, when executed by the processor, cause the system to perform a method comprising: parsing a plurality of documents;
selecting paragraphs from the documents through subsuming relation calculation; and rewriting the selected paragraphs into a summary.
8. The medium of claim 7 wherein parsing further comprises : extracting noun phrases and verb phrases from the documents; categorizing the noun phrases that are entity names; and converting the entity names into canonical form.
9. The medium of claim 7 wherein subsuming relation calculation further comprises: linking noun phrases, verb phrases or entity names in each paragraph of every document with identical noun phrases, verb phrases or entity names in every other paragraph of every document; and counting the links for each paragraph.
10. The medium of claim 7 wherein rewriting further comprises: ranking the paragraphs; applying a co-reference resolution algorithm to the paragraphs; and replacing pronouns in the paragraphs with their full entity name antecedents.
11. The medium of claim 7 wherein the documents have a common topic independent of domain.
12. The medium of claim 7 wherein the documents are composed in English or in a language other than English.
13. An system comprising: a processor; a bus coupled to the processor; and a unit coupled to the bus to parse a plurality of documents, select paragraphs from the documents through subsuming relation calculation, and rewrite the selected paragraphs into a summary.
14. The system of claim 13 wherein the unit further extracts noun phrases and verb phrases from the documents, categorizes the noun phrases that are entity names, and converts the entity names into canonical form.
15. The system of claim 13 wherein the unit further links noun phrases, verb phrases or entity names in each paragraph of every document with identical noun phrases, verb phrases or entity names in every other paragraph of every document, and counts the links for each paragraph.
16. The system of claim 13 wherein the unit further ranks the paragraphs, applies a co-reference resolution algorithm to the paragraphs, and replaces pronouns in the paragraphs with their full entity name antecedents.
17. The system of claim 13 wherein the documents have a common topic independent
of domain.
18. The system of claim 13 wherein the documents are composed in English or in a language other than English.
PCT/CN2000/000265 2000-09-07 2000-09-07 Method and apparatus for summarizing multiple documents using a subsumption model WO2002021324A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
AU2000269782A AU2000269782A1 (en) 2000-09-07 2000-09-07 Method and apparatus for summarizing multiple documents using a subsumption model
PCT/CN2000/000265 WO2002021324A1 (en) 2000-09-07 2000-09-07 Method and apparatus for summarizing multiple documents using a subsumption model
US10/018,517 US7398196B1 (en) 2000-09-07 2000-09-07 Method and apparatus for summarizing multiple documents using a subsumption model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2000/000265 WO2002021324A1 (en) 2000-09-07 2000-09-07 Method and apparatus for summarizing multiple documents using a subsumption model

Publications (1)

Publication Number Publication Date
WO2002021324A1 true WO2002021324A1 (en) 2002-03-14

Family

ID=4574694

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2000/000265 WO2002021324A1 (en) 2000-09-07 2000-09-07 Method and apparatus for summarizing multiple documents using a subsumption model

Country Status (3)

Country Link
US (1) US7398196B1 (en)
AU (1) AU2000269782A1 (en)
WO (1) WO2002021324A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6850950B1 (en) * 1999-02-11 2005-02-01 Pitney Bowes Inc. Method facilitating data stream parsing for use with electronic commerce
US7372991B2 (en) 2003-09-26 2008-05-13 Seiko Epson Corporation Method and apparatus for summarizing and indexing the contents of an audio-visual presentation

Families Citing this family (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002027542A1 (en) 2000-09-28 2002-04-04 Intel Corporation (A Corporation Of Delaware) A method and apparatus for extracting entity names and their relations
US20060173916A1 (en) * 2004-12-22 2006-08-03 Verbeck Sibley Timothy J R Method and system for automatically generating a personalized sequence of rich media
US20110208732A1 (en) 2010-02-24 2011-08-25 Apple Inc. Systems and methods for organizing data items
US8977953B1 (en) * 2006-01-27 2015-03-10 Linguastat, Inc. Customizing information by combining pair of annotations from at least two different documents
US8712758B2 (en) * 2007-08-31 2014-04-29 Microsoft Corporation Coreference resolution in an ambiguity-sensitive natural language processing system
US20100095203A1 (en) * 2008-10-15 2010-04-15 Cisco Technology, Inc. Method and apparatus for incorporating visual deltas for new documents based on previous consumption
US8554542B2 (en) * 2010-05-05 2013-10-08 Xerox Corporation Textual entailment method for linking text of an abstract to text in the main body of a document
US8788260B2 (en) * 2010-05-11 2014-07-22 Microsoft Corporation Generating snippets based on content features
US8434001B2 (en) 2010-06-03 2013-04-30 Rhonda Enterprises, Llc Systems and methods for presenting a content summary of a media item to a user based on a position within the media item
US9326116B2 (en) 2010-08-24 2016-04-26 Rhonda Enterprises, Llc Systems and methods for suggesting a pause position within electronic text
US9069754B2 (en) 2010-09-29 2015-06-30 Rhonda Enterprises, Llc Method, system, and computer readable medium for detecting related subgroups of text in an electronic document
US9286291B2 (en) * 2013-02-15 2016-03-15 International Business Machines Corporation Disambiguation of dependent referring expression in natural language processing
US9411905B1 (en) * 2013-09-26 2016-08-09 Groupon, Inc. Multi-term query subsumption for document classification
US9514098B1 (en) * 2013-12-09 2016-12-06 Google Inc. Iteratively learning coreference embeddings of noun phrases using feature representations that include distributed word representations of the noun phrases
US10339122B2 (en) * 2015-09-10 2019-07-02 Conduent Business Services, Llc Enriching how-to guides by linking actionable phrases
US10445070B2 (en) * 2016-05-05 2019-10-15 International Business Machines Corporation ASCII based instant prototype generation
CN106021226A (en) * 2016-05-16 2016-10-12 中国建设银行股份有限公司 Text abstract generation method and apparatus
US10387538B2 (en) 2016-06-24 2019-08-20 International Business Machines Corporation System, method, and recording medium for dynamically changing search result delivery format
CN108090049B (en) * 2018-01-17 2021-02-05 山东工商学院 Multi-document abstract automatic extraction method and system based on sentence vectors
JP7135730B2 (en) * 2018-10-31 2022-09-13 富士通株式会社 Summary generation method and summary generation program

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0361464A2 (en) * 1988-09-30 1990-04-04 Kabushiki Kaisha Toshiba Method and apparatus for producing an abstract of a document
EP0737927A2 (en) * 1995-04-14 1996-10-16 Xerox Corporation Automatic method of generating thematic summaries
CN1133460A (en) * 1994-11-18 1996-10-16 松下电器产业株式会社 Information taking method, equipment, weighted method and receiving equipment for graphic and character television transmission
EP0751470A1 (en) * 1995-06-28 1997-01-02 Xerox Corporation Automatic method of generating feature probabilities for automatic extracting summarization

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5963940A (en) * 1995-08-16 1999-10-05 Syracuse University Natural language information retrieval system and method
US6076088A (en) * 1996-02-09 2000-06-13 Paik; Woojin Information extraction system and method using concept relation concept (CRC) triples
US5924108A (en) * 1996-03-29 1999-07-13 Microsoft Corporation Document summarizer for word processors
JP3614648B2 (en) * 1998-03-13 2005-01-26 富士通株式会社 Document understanding support apparatus, summary sentence generation method, and computer-readable recording medium recording document understanding support program
JP3429184B2 (en) * 1998-03-19 2003-07-22 シャープ株式会社 Text structure analyzer, abstracter, and program recording medium
JP3879321B2 (en) * 1998-12-17 2007-02-14 富士ゼロックス株式会社 Document summarization apparatus, document summarization method, and recording medium recording document summarization program
US6473730B1 (en) * 1999-04-12 2002-10-29 The Trustees Of Columbia University In The City Of New York Method and system for topical segmentation, segment significance and segment function
US7162413B1 (en) * 1999-07-09 2007-01-09 International Business Machines Corporation Rule induction for summarizing documents in a classified document collection
US6766287B1 (en) * 1999-12-15 2004-07-20 Xerox Corporation System for genre-specific summarization of documents

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0361464A2 (en) * 1988-09-30 1990-04-04 Kabushiki Kaisha Toshiba Method and apparatus for producing an abstract of a document
CN1133460A (en) * 1994-11-18 1996-10-16 松下电器产业株式会社 Information taking method, equipment, weighted method and receiving equipment for graphic and character television transmission
EP0737927A2 (en) * 1995-04-14 1996-10-16 Xerox Corporation Automatic method of generating thematic summaries
EP0751470A1 (en) * 1995-06-28 1997-01-02 Xerox Corporation Automatic method of generating feature probabilities for automatic extracting summarization

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6850950B1 (en) * 1999-02-11 2005-02-01 Pitney Bowes Inc. Method facilitating data stream parsing for use with electronic commerce
US7372991B2 (en) 2003-09-26 2008-05-13 Seiko Epson Corporation Method and apparatus for summarizing and indexing the contents of an audio-visual presentation

Also Published As

Publication number Publication date
US7398196B1 (en) 2008-07-08
AU2000269782A1 (en) 2002-03-22

Similar Documents

Publication Publication Date Title
US7398196B1 (en) Method and apparatus for summarizing multiple documents using a subsumption model
Weiss et al. Fundamentals of predictive text mining
Weiss et al. Text mining: predictive methods for analyzing unstructured information
US8799776B2 (en) Semantic processor for recognition of whole-part relations in natural language documents
Witten Text Mining.
Giannakopoulos et al. Summarization system evaluation revisited: N-gram graphs
US8447588B2 (en) Region-matching transducers for natural language processing
Al‐Sughaiyer et al. Arabic morphological analysis techniques: A comprehensive survey
Pecina Lexical association measures and collocation extraction
Nasukawa et al. Text analysis and knowledge mining system
US8266169B2 (en) Complex queries for corpus indexing and search
US9495358B2 (en) Cross-language text clustering
US9430742B2 (en) Method and apparatus for extracting entity names and their relations
US9009590B2 (en) Semantic processor for recognition of cause-effect relations in natural language documents
JP4467184B2 (en) Semantic analysis and selection of documents with knowledge creation potential
US20020046018A1 (en) Discourse parsing and summarization
US8510097B2 (en) Region-matching transducers for text-characterization
US20070027854A1 (en) Processor for fast contextual searching
WO2006014343A2 (en) Automated evaluation systems and methods
JPH06110948A (en) Method for identifying, retrieving and classifying document
Weiss Descriptive clustering as a method for exploring text collections
Jabbar et al. A survey on Urdu and Urdu like language stemmers and stemming techniques
Yeniterzi et al. Turkish named-entity recognition
Al-Lahham Index term selection heuristics for Arabic text retrieval
Shuldberg et al. Distilling information from text: the EDS TemplateFiller system

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 10018517

Country of ref document: US

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP