CN102207948A - Method for generating incident statement sentence material base - Google Patents

Method for generating incident statement sentence material base Download PDF

Info

Publication number
CN102207948A
CN102207948A CN2010102250380A CN201010225038A CN102207948A CN 102207948 A CN102207948 A CN 102207948A CN 2010102250380 A CN2010102250380 A CN 2010102250380A CN 201010225038 A CN201010225038 A CN 201010225038A CN 102207948 A CN102207948 A CN 102207948A
Authority
CN
China
Prior art keywords
incident
sentence
declarative
declarative sentence
material database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2010102250380A
Other languages
Chinese (zh)
Other versions
CN102207948B (en
Inventor
宋传宝
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin mass information technology Limited by Share Ltd
Original Assignee
TIANJIN HYLANDA INFORMATION TECHNOLOGY CO LTD
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by TIANJIN HYLANDA INFORMATION TECHNOLOGY CO LTD filed Critical TIANJIN HYLANDA INFORMATION TECHNOLOGY CO LTD
Priority to CN 201010225038 priority Critical patent/CN102207948B/en
Publication of CN102207948A publication Critical patent/CN102207948A/en
Application granted granted Critical
Publication of CN102207948B publication Critical patent/CN102207948B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention discloses a method for generating an incident statement sentence material base. The method comprises the following steps of: converting an article into a set consisting of a plurality of long sentences; aiming at the converted long sentence set, identifying and extracting time points and extracting incident description verbs; identifying and extracting named entities of a personal name, a place name, a mechanism name and a product name from the long sentences obtained in the first step, extracting and marking element information comprising the occurrence time of an incident, the occurrence place and the type of the incident to obtain a structural result; and extracting the original section of an incident statement sentence and the structural result, and storing into a database so as to generate the incident statement sentence material base. The incident statement sentence material base generated by the method can provide service such as updating, searching, inquiring and the like in the Internet, and provides application such as writing, editing, subject making and the like for the media information field.

Description

A kind of generation method of incident declarative sentence material database
Technical field
The present invention relates to the generation method in a kind of language materials storehouse, relate in particular to a kind of sentence level material library generating method, belong to the computational linguistics technical field at the incident declarative sentence.
Background technology
Material database also claims corpus (corpus), is be stored in the computing machine and can utilize language materials that computing machine retrieves, inquires about, analyzes overall.Material database has " on a large scale " and " authenticity " these two characteristics, is optimal linguistry resource therefore.
Text is basic, the most the most frequently used information carrier.In Computer Language Processing work, it is particularly important that the processing of text and treatment technology seem.Text message exists with the chapter form usually.In the many information processed and applied of current internet, also all be processing unit with the chapter, as: network information, search engine etc.Sentence as can The expressed the minimum linguistic unit of the meaning, information processing with use, have various ways and value, especially all the more so in processes such as the retrieval of medium information, writing, arrangement.And in the various Language Processing technology of current existence, be handle particle still rare with the sentence.
In the 7th computational linguistics associating academic conference in the whole nation of holding in 2003, paper " the sentence level semantic tagger of Modern Chinese language material " has been delivered in Miao Chuanjiang, Liu Zhiying cooperation.In this paper, a kind of scheme that marks the Modern Chinese language material has been discussed.It has two characteristics: the one, take mark mode from bottom to top, and promptly mark big linguistic unit earlier, mark little linguistic unit again; The 2nd, sentence is carried out semantic tagger, marked semantic type and the semantic constituent of their next stage of clause in sentence and the sentence.The corpus of setting up by this scheme is the valuable source of research of Modern Chinese sentence semantics and processing.
In addition, in application number is 200810065527.7 Chinese invention patent application, a kind of method of sentences in article being carried out Fast Classification and retrieval with electronic installation is disclosed.In this technical scheme, electronic installation generates the split catalog table of sentences in article by specific sorting technique.When retrieval: the e-book content that the user opens, processor extracts each sentence one by one, and find the split catalog at this sentence place, the split catalog name is referred to as the key words sorting note adds on this sentence, after the sentence of band key words sorting is chosen by the user, the sentence reading pointer navigates to split catalog, and sentence is wherein exported.Electronic installation can be to the sentences in article storage of classifying, and its step is as follows: 1) show the article content of being made up of some sentences on the display screen; 2) do the specific classification mark by editing machine for any sentence wherein; 3) set up and above-mentioned each key words sorting corresponding class catalogue in the reservoir,, then do not set up if catalogue exists; 4) processor detects and discerns the sentence that key words sorting is arranged, and automatically each is had the sentence of key words sorting to be saved in the corresponding above-mentioned classification catalogue.But, depend on manually to the excavation of sentence to a great extent with being organized in this patented claim, work efficiency is not high, processing requirements that at all can't satisfying magnanimity Chinese text data.
Summary of the invention
Technical matters to be solved by this invention is to provide a kind of sentence level material library generating method at the incident declarative sentence.This method is that particle extracts the incident declarative sentence in the text with the sentence, and it is carried out the index of fields such as Time To Event, thereby original article database is converted to required sentence level material database.
For realizing above-mentioned goal of the invention, the present invention adopts following technical scheme:
A kind of generation method of incident declarative sentence material database is characterized in that comprising following step:
(1) for one piece of article, scanning at first from left to right, when the character that scans was the punctuation mark of expression long sentence end, then the content record with the front was a long sentence, thereby one piece of article is converted to the set of a plurality of long sentences;
(2) at the set of the long sentence after the conversion,, carry out time point identification and extract processing in conjunction with delivering the time of article;
(3) after carrying out time point identification extraction processing, abandon subsequent treatment for the long sentence of not life period point expression, the long sentence of expressing for life period point carries out subsequent treatment;
(4) long sentence of expressing for life period point carries out the extraction of event description verb, and if there is no subsequent treatment then abandoned in the event description verb;
(5) long sentence that obtains for above-mentioned steps carries out the named entity recognition and the extraction of name, place name, mechanism's name, ProductName, if these several named entities all do not exist, then abandons subsequent treatment;
(6) carry out word sequence result after participle and named entity recognition are handled according to long sentence, carry out the shallow-layer syntactic analysis, analyze subject, predicate, object, determine the Subjective and Objective that incident takes place;
(7) the incident declarative sentence of confirming at above-mentioned steps, the named entity of integrating step (2), step (5) extracts the result, and the element information that comprises Time To Event, scene, event type is extracted index, obtains the structuring result;
(8) original segments of incident declarative sentence and structuring result are extracted deposit in the database, thus the incident of generation declarative sentence material database.
Wherein, in described step (1), the punctuation mark that described expression long sentence finishes is any one in full-shape fullstop, full-shape question mark, full-shape exclamation, full width ellipsis, half-angle question mark and the half-angle exclamation.
In the described step (2), described time point is discerned and extracted processing is to express with the time point that artificially collects to use word and the word trigger condition as identification substantially, and elder generation carries out word segmentation processing to text; Then candidate's time point is expressed speech particle sequence, confirm to differentiate according to the timetable expression patterns that complicate statistics obtains, and the legitimacy of proving time expression.
For the expression of confirming as time point, according to its inner number and measure word, be reference point reference time with the text issuing time of importing, expression formula is normalized to mode in epoch in Christian era.
In the described step (4), the step that extracts the event description verb is: carry out word segmentation processing for long sentence, investigate the part of speech of word segmentation result; The verb result is then inquired about in the event description verb of artificial screening, if Query Result is existence then its mark is extracted.
In the described step (5), constitute speech as the identification trigger condition with all kinds of named entity suffix speech that artificially collect with commonly used, combine the calculating of discerning name, place name, mechanism's name, ProductName according to Hidden Markov Model (HMM) or maximum entropy model with rule then.
Described step (6) is filtered the incident declarative sentence with following situation afterwards: 1. beginning place of incident declarative sentence contains the dateline content of information report; 2. the incident declarative sentence is for comprising direct speech oration, or is the part of direct speech oration; 3. the incident declarative sentence includes according to what context could be determined and refers to word.
In the described step (7), described structuring result comprises three contents at least: Time To Event, incident Subjective and Objective and event type speech.
In the described step (8), preserve the original segments and the structuring result of incident declarative sentence with database mode, promptly in database, set up seven fields, be respectively: the original segments of incident declarative sentence, time of origin, scene, event type, relate to the personage, relate to mechanism, relate to product.Perhaps, preserve the original segments and the structuring result of incident declarative sentence with text mode.
Incident declarative sentence material library generating method provided by the present invention has following advantage:
1. the accurate identification of incident declarative sentence: can realize accurate differentiation for this sentence type of stating events complete information;
2. the accurate identification of time point expression: realized accurate identification to various time point expression-forms, and the unified mode in epoch in Christian era that is normalized to;
3. the structuring of incident declarative sentence Internal Elements information extract to be calculated: calculate by natural language processing, element informations such as the time of origin in the incident declarative sentence, scene, event type, the personage who relates to, mechanism, product are analyzed exactly extracted.
The incident declarative sentence material database that utilizes this method to generate can provide services such as renewal, search, inquiry in the internet, also can provide application for writing, editor, special topic making etc. in medium information field.
Description of drawings
The present invention is described in further detail below in conjunction with the drawings and specific embodiments.
Fig. 1 is for being converted to the basic operation schematic flow sheet of sentence level material database from the article storehouse;
Fig. 2 is the generative process synoptic diagram of incident declarative sentence material database.
Embodiment
Fig. 1 is in the present event declarative sentence material library generating method, is converted to the basic procedure synoptic diagram of sentence level material database from the article storehouse.As can be seen from Figure 1, for the Chinese article of each piece in the article storehouse, can obtain various types of sentence materials, for example " incident statement " sentence, " direct speech oration " sentence etc. by sentence level material extraction operation.These " incident statement " sentences, " direct speech oration " etc. can put into events corresponding declarative sentence material database respectively or the direct speech oration material database is preserved.Need to prove, for the many sentences in the text, be not that each sentence can be formed with value, significant material.Have only those to determine type, and carry out the sentence type after structuring is handled, just can form corresponding sentence level material.According to the actual needs of network editing work, a subclass in the sentence level material database-incident declarative sentence material database is very useful.Generative process to it is described in detail below.
The incident declarative sentence be meant in the text can complete stating events generation content sentence.Different with general statement sentence is to comprise three kinds of key elements in the incident declarative sentence at least: Time To Event, incident main body or object, event type speech (being generally verb).By the incident declarative sentence, can make the user comparatively clearly understand the essential information of an incident.
In text, the incident declarative sentence is a large amount of the existence, especially in the article that adopts the narration style.As the basic statement of the fact, the incident declarative sentence generally can clearly be described outgoing event time of origin, incident Subjective and Objective, event type speech.By three kinds of above-mentioned key elements, the original segments of incident declarative sentence and structuring result can independent draws come out from text, form a complete relatively data recording.Many such data recording are deposited in the database according to different fields respectively, then formed the sentence level material database of incident declarative sentence, can be for application such as follow-up inquiry, retrieval, statistics.
The declarative sentence of structuring extraction incident is the problem that the present invention will put forth effort to solve how to utilize computer technology to discern also automatically.Consider the incident declarative sentence of extraction, its structuring result comprises three contents at least: Time To Event, incident Subjective and Objective and event type speech.Incident declarative sentence abstracting method provided by the present invention is by named entities such as recognition time point, name, place name, mechanism's names, differentiate the verb or the noun of presentation of events type, and then judge whether these key elements satisfy incident statement pattern, thereby the extraction and the structuring of realization event declarative sentence information.
For the extraction of incident declarative sentence from text, mainly realize by every step shown in Figure 2:
1. for one piece of article, scanning at first from left to right, the punctuation mark that finishes for the expression long sentence when the character that scans is when (comprising full-shape fullstop, full-shape question mark, full-shape exclamation, full width ellipsis, half-angle question mark, half-angle exclamation), and then the content record with the front is a long sentence.One piece of article is converted to the set of a plurality of long sentences by this process.
2. at the set of the long sentence after the conversion,, carry out time point identification and extract processing in conjunction with delivering the time of article.
Need identify whether there is clear and definite time point expression in the current sentence by the time point analytical technology herein.The main thought of time point analytical technology is to express basic word and word (as: the various expression words of numeral used with the time point that artificially collects, measure word such as " year ", " moon ", " day ", " yesterday " relative times such as " today " is expressed) as the trigger condition of identification, earlier text is carried out word segmentation processing; Then candidate's time point is expressed speech particle sequence,, confirm to differentiate, and verify its time expression legitimacy (as: February can not occur 30, and 13 can not be identified as month) according to the timetable expression patterns that complicate statistics obtains; For the expression of confirming as time point, according to its inner number and measure word, be reference point reference time with the text issuing time of importing, expression formula is normalized to mode in epoch in Christian era.
Above-mentioned time point analytical technology can be achieved by the named entity recognition program of increasing income.These named entity recognition programs of increasing income are mostly supported the mark on time point and date, can Direct Recognition extract the expression content of time point.For utilizing the named entity recognition program to realize the specific operation process of time point analytical technology, be the routine techniques means that the computer software fields those of ordinary skill can both be grasped, just do not given unnecessary details in detail at this.
3. after carrying out time point identification extraction processing, abandon subsequent treatment for the long sentence of not life period point expression, the long sentence of expressing for life period point then carries out subsequent treatment.This is because requirement must comprise time point expression (being Time To Event) in the incident declarative sentence.
4. the long sentence of expressing for life period point carries out the extraction of event description verb, if do not have the event description verb in the long sentence then abandon subsequent treatment.
The event description verb is meant the verb with certain influence effect, as: campaign for, recall, withdraw troops, blast etc.This part verb is by manually screening and get verb commonly used, and can the main foundation of screening be reported out that by independent Xiang Guan reference factor is the influence degree and the people's attention degree of verb therewith for investigating the described incident of verb as information.The step of identification event description verb is: carry out word segmentation processing (existing common participle software generally all includes the part-of-speech tagging function) for long sentence, investigate the part of speech of word segmentation result, the verb result is then inquired about in the event description verb of artificial screening, if Query Result is existence then its mark is extracted.
5. the long sentence that obtains for above-mentioned processing carries out the identification and the extraction of name, place name, mechanism's name, ProductName named entity, and will there be wherein a kind of at least in these several named entities in long sentence, otherwise abandons the subsequent treatment of this long sentence.
The main method of identification name, place name, mechanism's name, ProductName is with all kinds of named entity suffix speech (as: " city ", " village ", " company " " public security bureau " etc.) that artificially collect and uses formation speech (as: name surname, " world " always, " logistics " etc.) as the identification trigger condition, combine with rule according to Hidden Markov Model (HMM), maximum entropy model then and discern calculating.The Hidden Markov Model (HMM) of Shi Yonging, maximum entropy model etc. all are the natural language processing statistical models of using always herein, more information about these models can become to celebrate " statistics natural language processing " book (publishing house of Tsing-Hua University in May, 2008 version of writing with reference to the ancestor, ISBN:978-7-302-16598-9), just do not given unnecessary details in detail at this.
6. carry out word sequence result after participle and named entity recognition are handled according to long sentence, carry out the shallow-layer syntactic analysis, analyze its subject, predicate, object etc., determine the subject and object that incident takes place.
The shallow-layer syntactic analysis technology of herein using is such: at first long sentence is carried out syntactic analysis.This syntactic analysis can use existing mature technology to realize, for example the Chinese interdependent syntactic analysis software of a day company (http://www.wintim.com/) exploitation etc. is asked in the parser (http://nlp.cs.berkeley.edu/) of the parser (http://nlp.stanford.edu/software/) of Stanford Univ USA's exploitation, the exploitation of U.S. Berkeley University and Beijing, and the function of sentence being carried out syntactic analysis just is provided.By syntactic analysis, can mark the speech in subject, predicate, object, modifier and the corresponding former sentence thereof in the sentence, so that determine the subject and object that incident takes place.
7. for candidate's long sentence of having determined Time To Event, incident Subjective and Objective, event description speech, can be used for practical application, also need some special circumstances are filtered for guaranteeing it.
The situation of need filtering mainly contains following several: 1. incident declarative sentence, beginning place contain the dateline content (as: " according to Associated Press April 27 ") of information report; 2. the incident declarative sentence is for comprising direct speech oration, or is the part of direct speech oration; 3. the incident declarative sentence includes according to what context could be determined and refers to word (as: he, they etc.).
8. at the final incident declarative sentence of confirming of above-mentioned steps, the named entity that integrating step 2, step 5 are carried out extracts the result, and the time of origin of incident, scene, event type, the personage who relates to, the element informations such as mechanism that relate to are extracted index.
Index steps specifies as follows:
8.1 the index of Time To Event
Express according to the time point that identifies in the incident declarative sentence, if only have a time point to express then think that this time point is a Time To Event, if having a plurality of then carry out preferred.The mode of priority of Cai Yonging is in one embodiment of the invention: calculate earlier the number of words of each time point back literal before the next time point, finally select the back to modify maximum one of literal number of words.
8.2 the index of incident scene
The place name naming entity that identifies in the sentence according to the current sentence of incident declarative sentence and front thereof, if in the current event declarative sentence place name is arranged, then place name is labeled as the spot of incident, a plurality of place names then mark a plurality of, if not then recall sentence forward and extract place name, do not recall three sentences at most in the current sentence.
8.3 the index of event type
In one embodiment of the invention, adopt manual type to put the mapping knowledge of the corresponding event type of each event description speech in order.Event description speech according to identifying in the incident declarative sentence maps out its event type by inquiry mode, with these event type indexes event type that is the current event declarative sentence.
8.4 index for the personage who relates to, mechanism, product etc.
According to named entity results such as the personage who identifies in the incident declarative sentence, mechanism, products, directly its correspondence is marked in the structured field of incident declarative sentence, every kind of named entity can have a plurality of values.
Extract the structuring that obtains language material as a result for top incident declarative sentence, can preserve according to dual mode: 1. use database to preserve: database is set up seven fields, is respectively: the original segments of incident declarative sentence, time of origin, scene, event type, relate to the personage, relate to mechanism, relate to product.2. use text mode to preserve: directly will extract structurized incident declarative sentence and save as text, wherein seven field contents separate with space character, and space character can be a symbol of space, tab key or user oneself definition.
In addition, for the incident declarative sentence material database of above-mentioned acquisition, relevant retrieval work is divided into according to the original segments retrieval of incident declarative sentence with according to other six structured fields retrieves two kinds.
Before retrieval, need set up index to incident declarative sentence material database.For being stored in the database, directly carry out index for seven field contents.For being stored in the text, can carry out index by means of text index software such as open source software Lucene etc.Similarly, also be respectively by the original segments of incident declarative sentence and six structured fields index respectively during index.
After having set up index, for the Search Requirement of retrieving by the original segments of incident declarative sentence, retrieve in the original segments field contents of incident declarative sentence that can be in index, return original segments and six structured field contents of the incident declarative sentence of coupling.For the Search Requirement of retrieving by six field contents, retrieve in six can be in index corresponding field contents, return original segments and six structured field contents of the incident declarative sentence of coupling.
For the runed change of realization event declarative sentence material database, promptly dynamically in material database, add fresh content, delete out-of-date and incorrect content, the present invention further proposes the method for update event declarative sentence material database, specifies as follows:
Add operation:, can add in the incident declarative sentence material database according to two kinds of methods and go for content to be added.1. for content to be added, in index, search and whether have identical incident declarative sentence clauses and subclauses, if there is no, content is added, upgrade index simultaneously, the content that newly adds is added in the index go.2. directly content to be added is added in the incoming event declarative sentence material database and go, the retry that disappears then regenerates index.
Deletion action: for content to be deleted, in index, find events corresponding declarative sentence clauses and subclauses, from index, delete then.
Retouching operation: the content for revising, in index, find events corresponding declarative sentence clauses and subclauses, delete these clauses and subclauses and the content of revising is added index.On this basis, the processing of making amendment.
The incident declarative sentence material database that the present invention generated can be used widely in internet retrieval and medium field.Wherein for the internet, have a large amount of text messages in the internet, especially medium information, and every day is all constantly increasing.After carrying out the incident declarative sentence and extract at the text on the internet, we just can obtain a huge incident declarative sentence material database, and this material database can be retrieved by original segments or six fields such as time of origin, scene of incident declarative sentence.Its possible user is described below:
1) for common netizen, the incident that the event type that very convenient understanding oneself is concerned about or someone, certain mechanism, certain place name, certain product etc. were taken place can form an automatic chronicle of events about a certain things according to time-sequencing; Simultaneously can also carry out combined retrieval for condition simultaneously by fields such as time of origin, scene, event types.
2) for writer or medium practitioner, especially the reporter can organize the writing material easily, forms contribution; Carry out special topic for web editor and make, also can be at the personage in the special topic, mechanism, products etc. are directly showed its chronicle of events, perhaps at thematic body matter, enumerate relevant incident those set forth of all the elements or the like.
In addition, in government bodies or traditional media industry, all there is a large amount of industry text datas, can has also that the incident declarative sentence comprises intensive situation in the article.In this case,, these industry data can be vitalized, produce new retrieval and consult and productive value by processing again to the industry data.
More than incident declarative sentence material library generating method provided by the present invention is had been described in detail.To those skilled in the art, any conspicuous change of under the prerequisite that does not deviate from connotation of the present invention it being done all will constitute to infringement of patent right of the present invention, with corresponding legal responsibilities.

Claims (10)

1. the generation method of an incident declarative sentence material database is characterized in that comprising following step:
(1) for one piece of article, scanning at first from left to right, when the character that scans was the punctuation mark of expression long sentence end, then the content record with the front was a long sentence, thereby one piece of article is converted to the set of a plurality of long sentences;
(2) at the set of the long sentence after the conversion,, carry out time point identification and extract processing in conjunction with delivering the time of article;
(3) after carrying out time point identification extraction processing, abandon subsequent treatment for the long sentence of not life period point expression, the long sentence of expressing for life period point carries out subsequent treatment;
(4) long sentence of expressing for life period point carries out the extraction of event description verb, and if there is no subsequent treatment then abandoned in the event description verb;
(5) long sentence that obtains for above-mentioned steps carries out the named entity recognition and the extraction of name, place name, mechanism's name, ProductName, if these several named entities all do not exist, then abandons subsequent treatment;
(6) carry out word sequence result after participle and named entity recognition are handled according to long sentence, carry out the shallow-layer syntactic analysis, analyze subject, predicate, object, determine the Subjective and Objective that incident takes place;
(7) the incident declarative sentence of confirming at above-mentioned steps, the named entity of integrating step (2), step (5) extracts the result, and the element information that comprises Time To Event, scene, event type is extracted index, obtains the structuring result;
(8) original segments of incident declarative sentence and structuring result are extracted deposit in the database, thus the incident of generation declarative sentence material database.
2. the generation method of incident declarative sentence material database as claimed in claim 1 is characterized in that:
In the described step (1), the punctuation mark that described expression long sentence finishes is any one in full-shape fullstop, full-shape question mark, full-shape exclamation, full width ellipsis, half-angle question mark and the half-angle exclamation.
3. the generation method of incident declarative sentence material database as claimed in claim 1 is characterized in that:
In the described step (2), described time point is discerned and extracted processing is to express with the time point that artificially collects to use word and the word trigger condition as identification substantially, and elder generation carries out word segmentation processing to text; Then candidate's time point is expressed speech particle sequence, confirm to differentiate according to the timetable expression patterns that complicate statistics obtains, and the legitimacy of proving time expression.
4. the generation method of incident declarative sentence material database as claimed in claim 3 is characterized in that:
For the expression of confirming as time point, according to its inner number and measure word, be reference point reference time with the text issuing time of importing, expression formula is normalized to mode in epoch in Christian era.
5. the generation method of incident declarative sentence material database as claimed in claim 1 is characterized in that:
In the described step (4), the step that extracts the event description verb is: carry out word segmentation processing for long sentence, investigate the part of speech of word segmentation result; The verb result is then inquired about in the event description verb of artificial screening, if Query Result is existence then its mark is extracted.
6. the generation method of incident declarative sentence material database as claimed in claim 1 is characterized in that:
In the described step (5), constitute speech as the identification trigger condition with all kinds of named entity suffix speech that artificially collect with commonly used, combine the calculating of discerning name, place name, mechanism's name, ProductName according to Hidden Markov Model (HMM) or maximum entropy model with rule then.
7. the generation method of incident declarative sentence material database as claimed in claim 1 is characterized in that:
Described step (6) is filtered the incident declarative sentence with following situation afterwards: 1. beginning place of incident declarative sentence contains the dateline content of information report; 2. the incident declarative sentence is for comprising direct speech oration, or is the part of direct speech oration; 3. the incident declarative sentence includes according to what context could be determined and refers to word.
8. the generation method of incident declarative sentence material database as claimed in claim 1 is characterized in that:
In the described step (7), described structuring result comprises three contents at least: Time To Event, incident Subjective and Objective and event type speech.
9. the generation method of incident declarative sentence material database as claimed in claim 1 is characterized in that:
In the described step (8), preserve the original segments and the structuring result of incident declarative sentence with database mode, promptly in database, set up seven fields, be respectively: the original segments of incident declarative sentence, time of origin, scene, event type, relate to the personage, relate to mechanism, relate to product.
10. the generation method of incident declarative sentence material database as claimed in claim 1 is characterized in that:
In the described step (8), preserve the original segments and the structuring result of incident declarative sentence with text mode.
CN 201010225038 2010-07-13 2010-07-13 Method for generating incident statement sentence material base Active CN102207948B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201010225038 CN102207948B (en) 2010-07-13 2010-07-13 Method for generating incident statement sentence material base

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201010225038 CN102207948B (en) 2010-07-13 2010-07-13 Method for generating incident statement sentence material base

Publications (2)

Publication Number Publication Date
CN102207948A true CN102207948A (en) 2011-10-05
CN102207948B CN102207948B (en) 2013-07-24

Family

ID=44696786

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201010225038 Active CN102207948B (en) 2010-07-13 2010-07-13 Method for generating incident statement sentence material base

Country Status (1)

Country Link
CN (1) CN102207948B (en)

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104994075A (en) * 2015-06-01 2015-10-21 广东电网有限责任公司信息中心 Security event handling method, system and terminal based on output logs of security system
CN105190605A (en) * 2013-03-14 2015-12-23 微软技术许可有限责任公司 Chronology based content processing
CN105718473A (en) * 2014-12-05 2016-06-29 成都复晓科技有限公司 Data modeling method
JP2016532942A (en) * 2014-01-09 2016-10-20 バイドゥ オンライン ネットワーク テクノロジー (ベイジン) カンパニー リミテッド Method and apparatus for constructing event knowledge database
CN106055658A (en) * 2016-06-02 2016-10-26 中国人民解放军国防科学技术大学 Extraction method aiming at Twitter text event
CN106294619A (en) * 2016-08-01 2017-01-04 上海交通大学 Public sentiment intelligent supervision method
CN106897364A (en) * 2017-01-12 2017-06-27 上海大学 Chinese based on event refers to building of corpus method
CN106970898A (en) * 2017-03-31 2017-07-21 百度在线网络技术(北京)有限公司 Method and apparatus for generating article
CN107944032A (en) * 2017-12-13 2018-04-20 北京百度网讯科技有限公司 Method and apparatus for generating information
WO2018098751A1 (en) * 2016-11-30 2018-06-07 Microsoft Technology Licensing, Llc Providing recommended contents
CN108255811A (en) * 2018-01-11 2018-07-06 北京神州泰岳软件股份有限公司 Text time semanteme determines method, apparatus and electronic equipment
CN108549694A (en) * 2018-04-16 2018-09-18 南京云问网络技术有限公司 The processing method of temporal information in a kind of text
CN109446513A (en) * 2018-09-18 2019-03-08 中国电子科技集团公司第二十八研究所 The abstracting method of event in a kind of text based on natural language understanding
CN110263312A (en) * 2019-06-19 2019-09-20 北京百度网讯科技有限公司 Article generation method, device, server and computer-readable medium
CN110866389A (en) * 2018-08-17 2020-03-06 北大方正集团有限公司 Information value evaluation method, device, equipment and computer readable storage medium
CN110889274A (en) * 2018-08-17 2020-03-17 北大方正集团有限公司 Information quality evaluation method, device, equipment and computer readable storage medium
CN111191010A (en) * 2019-12-31 2020-05-22 天津外国语大学 Movie scenario multivariate information extraction method
US10679439B2 (en) 2016-12-02 2020-06-09 Baidu Online Network Technology (Beijing) Co., Ltd. Method and device for controlling code lock

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6205456B1 (en) * 1997-01-17 2001-03-20 Fujitsu Limited Summarization apparatus and method
CN1641634A (en) * 2004-01-15 2005-07-20 中国科学院计算技术研究所 Chinese new word and expression detecting method and its detecting system
CN101706807A (en) * 2009-11-27 2010-05-12 清华大学 Method for automatically acquiring new words from Chinese webpages

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6205456B1 (en) * 1997-01-17 2001-03-20 Fujitsu Limited Summarization apparatus and method
CN1641634A (en) * 2004-01-15 2005-07-20 中国科学院计算技术研究所 Chinese new word and expression detecting method and its detecting system
CN101706807A (en) * 2009-11-27 2010-05-12 清华大学 Method for automatically acquiring new words from Chinese webpages

Cited By (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105190605A (en) * 2013-03-14 2015-12-23 微软技术许可有限责任公司 Chronology based content processing
JP2016532942A (en) * 2014-01-09 2016-10-20 バイドゥ オンライン ネットワーク テクノロジー (ベイジン) カンパニー リミテッド Method and apparatus for constructing event knowledge database
CN103699689B (en) * 2014-01-09 2017-02-15 百度在线网络技术(北京)有限公司 Method and device for establishing event repository
US10282664B2 (en) 2014-01-09 2019-05-07 Baidu Online Network Technology (Beijing) Co., Ltd. Method and device for constructing event knowledge base
CN105718473B (en) * 2014-12-05 2019-01-25 成都复晓科技有限公司 A kind of method of data modeling
CN105718473A (en) * 2014-12-05 2016-06-29 成都复晓科技有限公司 Data modeling method
CN104994075A (en) * 2015-06-01 2015-10-21 广东电网有限责任公司信息中心 Security event handling method, system and terminal based on output logs of security system
CN106055658A (en) * 2016-06-02 2016-10-26 中国人民解放军国防科学技术大学 Extraction method aiming at Twitter text event
CN106294619A (en) * 2016-08-01 2017-01-04 上海交通大学 Public sentiment intelligent supervision method
US11494450B2 (en) 2016-11-30 2022-11-08 Microsoft Technology Licensing, Llc Providing recommended contents
WO2018098751A1 (en) * 2016-11-30 2018-06-07 Microsoft Technology Licensing, Llc Providing recommended contents
US10679439B2 (en) 2016-12-02 2020-06-09 Baidu Online Network Technology (Beijing) Co., Ltd. Method and device for controlling code lock
CN106897364B (en) * 2017-01-12 2021-02-23 上海大学 Chinese reference corpus construction method based on events
CN106897364A (en) * 2017-01-12 2017-06-27 上海大学 Chinese based on event refers to building of corpus method
CN106970898A (en) * 2017-03-31 2017-07-21 百度在线网络技术(北京)有限公司 Method and apparatus for generating article
CN107944032A (en) * 2017-12-13 2018-04-20 北京百度网讯科技有限公司 Method and apparatus for generating information
CN107944032B (en) * 2017-12-13 2021-12-31 北京百度网讯科技有限公司 Method and apparatus for generating information
CN108255811A (en) * 2018-01-11 2018-07-06 北京神州泰岳软件股份有限公司 Text time semanteme determines method, apparatus and electronic equipment
CN108549694A (en) * 2018-04-16 2018-09-18 南京云问网络技术有限公司 The processing method of temporal information in a kind of text
CN108549694B (en) * 2018-04-16 2021-11-23 南京云问网络技术有限公司 Method for processing time information in text
CN110866389A (en) * 2018-08-17 2020-03-06 北大方正集团有限公司 Information value evaluation method, device, equipment and computer readable storage medium
CN110889274A (en) * 2018-08-17 2020-03-17 北大方正集团有限公司 Information quality evaluation method, device, equipment and computer readable storage medium
CN110866389B (en) * 2018-08-17 2021-12-17 北大方正集团有限公司 Information value evaluation method, device, equipment and computer readable storage medium
CN110889274B (en) * 2018-08-17 2022-02-08 北大方正集团有限公司 Information quality evaluation method, device, equipment and computer readable storage medium
CN109446513A (en) * 2018-09-18 2019-03-08 中国电子科技集团公司第二十八研究所 The abstracting method of event in a kind of text based on natural language understanding
CN109446513B (en) * 2018-09-18 2023-06-20 中国电子科技集团公司第二十八研究所 Extraction method of events in text based on natural language understanding
CN110263312A (en) * 2019-06-19 2019-09-20 北京百度网讯科技有限公司 Article generation method, device, server and computer-readable medium
CN110263312B (en) * 2019-06-19 2023-09-12 北京百度网讯科技有限公司 Article generating method, apparatus, server and computer readable medium
CN111191010A (en) * 2019-12-31 2020-05-22 天津外国语大学 Movie scenario multivariate information extraction method
CN111191010B (en) * 2019-12-31 2023-08-08 天津外国语大学 Movie script multi-element information extraction method

Also Published As

Publication number Publication date
CN102207948B (en) 2013-07-24

Similar Documents

Publication Publication Date Title
CN102207948B (en) Method for generating incident statement sentence material base
EP2570974B1 (en) Automatic crowd sourcing for machine learning in information extraction
CN109992645B (en) Data management system and method based on text data
US8135669B2 (en) Information access with usage-driven metadata feedback
US9519636B2 (en) Deduction of analytic context based on text and semantic layer
US20090300043A1 (en) Text based schema discovery and information extraction
CN106095762A (en) A kind of news based on ontology model storehouse recommends method and device
CN103164471A (en) Recommendation method and system of video text labels
CN104021198B (en) The relational database information search method and device indexed based on Ontology
Maynard et al. Ontology-based information extraction for market monitoring and technology watch
CN101004737A (en) Individualized document processing system based on keywords
CN104685495A (en) A system and method for automatic generation of information-rich content from multiple microblogs, each microblog containing only sparse information
CN105608232A (en) Bug knowledge modeling method based on graphic database
CN103886020A (en) Quick search method of real estate information
CN112364172A (en) Method for constructing knowledge graph in government official document field
CN106354860A (en) Method for automatically labelling and pushing information resource based on label sets
CN102591897A (en) Apparatus and method for searching document
CN108595421A (en) A kind of abstracting method, the apparatus and system of Chinese entity associated relationship
CN109284362B (en) Content retrieval method and system
CN103823868A (en) Event recognition method and event relation extraction method oriented to on-line encyclopedia
CN102207947B (en) Direct speech material library generation method
CN106372123B (en) Tag-based related content recommendation method and system
KR102586580B1 (en) News editing supoort system using natural language processing artificial intelligence language model
Basile et al. Extending an Information Retrieval System through Time Event Extraction.
CN111401047A (en) Method and device for generating dispute focus of legal document and computer equipment

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C56 Change in the name or address of the patentee
CP03 Change of name, title or address

Address after: 300020 Tianjin Heping District, South Road, No. 11 International Building 23 purchase of Wheat

Patentee after: Tianjin mass information technology Limited by Share Ltd

Address before: 300384 Tianjin City Huayuan Industrial Zone Rong Yuan Road No. 1 North B room 322-323

Patentee before: Tianjin Hylanda Information Technology Co.,Ltd.