CN102163189A - Method and device for extracting evaluative information from critical texts - Google Patents

Method and device for extracting evaluative information from critical texts Download PDF

Info

Publication number
CN102163189A
CN102163189A CN2010101201014A CN201010120101A CN102163189A CN 102163189 A CN102163189 A CN 102163189A CN 2010101201014 A CN2010101201014 A CN 2010101201014A CN 201010120101 A CN201010120101 A CN 201010120101A CN 102163189 A CN102163189 A CN 102163189A
Authority
CN
China
Prior art keywords
evaluation object
evaluation
sentence
vector
candidate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2010101201014A
Other languages
Chinese (zh)
Other versions
CN102163189B (en
Inventor
贾文杰
张姝
孟遥
于浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujitsu Ltd
Original Assignee
Fujitsu Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujitsu Ltd filed Critical Fujitsu Ltd
Priority to CN201010120101.4A priority Critical patent/CN102163189B/en
Publication of CN102163189A publication Critical patent/CN102163189A/en
Application granted granted Critical
Publication of CN102163189B publication Critical patent/CN102163189B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a method and a device for extracting evaluative information from critical texts. The method comprises the following steps of: a preprocessing step for preprocessing the collected critical texts so as to obtain the evaluative constituents related to at least one evaluated object in the critical texts and the position of the at least one evaluated object in the critical texts; a first extraction step for initially matching the preprocessed critical texts so as to extract a first evaluation vector set comprising at least one evaluation vector; and a second extraction step for performing extended matching process by extending a scope of the at least one evaluated object so as to obtain corresponding evaluated object for at least one evaluation vector which is lack of the evaluated object and configured in the first evaluation vector set, thereby obtaining a second evaluation vector set. According to the method provided by the invention, the coverage rate for extracting significant evaluative information from the critical texts is improved, and the extraction accuracy is also improved.

Description

From comment property text, extract the method and apparatus of the property estimated information
Technical field
The present invention relates to the technical field of information processing on the whole, particularly relates to from information source the technology of extracting customizing messages, in particular to from comment property text, extract with by method, device and the program product of the relevant evaluation information of evaluation object.
Background technology
Along with the continuous evolution and the in-depth of infotech, can obtain magnanimity information from various information sources by approach such as internets.For example, a lot of users checked the existing review information relevant with this product or service earlier before obtaining product or service.Under internet environment, there be at present (hereinafter can be referred to as comment property text) such as the webpage that passes through natural language statement content of the many types that comprise the user comment suggestion or documents.Adopt information extraction technology can therefrom extract the comment of user, finally represent to the user, for client's selection provides reference with intuitive manner more for the different attribute of product or service.In these commentary articles, extract comment property suggestion and mainly finish following two tasks: 1. extract with by relevant attribute of evaluation object and evaluation speech etc.2. find pairing for attribute and the evaluation speech coupling that extracts by evaluation object.For first task, because attribute occurs in same sentence usually with corresponding evaluation speech, employing is mated nearby or according to the method that phrase structure is judged, just can be finished this task.But for second task, because language phenomenons such as common omission subject, use pronoun, make that the subject that finds and be omitted is pairing by the evaluation object not a duck soup, cause from commentary article, extracting exactly, efficiently very difficulty of comment property information.
Current known information extraction or searching system comprise from the system of interconnected online collection product evaluation, extracts the system of product evaluation speech, the perhaps special digestion system that refers to.Many pieces of papers and patent have been delivered in research about this respect, for example:
Chinese patent application (hereinafter referred to as patent documentation 1): application number 200580032865.5, inventor: Thomas's Thomas Hessler; Hai Kelaohe; The Yan Sihe Grindelwald, the applicant: Sa nox moral public limited company, denomination of invention is " be used for evaluation object or obtain the method and system of information from operator ";
Chinese patent application (hereinafter referred to as patent documentation 2): application number 200810243606.2, inventor: Zhu Qiaoming; Zhou Guodong; Kong Fang; Li Peifeng; Millipede China; Li Junhui; Qian Peide, the applicant: University Of Suzhou, denomination of invention is the digestion procedure that refers to of semantic role information " in a kind of Chinese language processing based on ".
But the method that above-mentioned patent documentation 1 proposes only is a kind of extracting method for the evaluation information in the sentence, though can retrieve the evaluation information relevant with certain specific products, can not handle the comment text that omits subject.Above-mentioned patent documentation 2 has provided a kind of general digestion procedure that refers to, but at be for pronoun, proper noun does not have and names speech, has and names clearing up of speech and indicative speech.Yet in comment property text, normally omitted fully by the omission of the subject of evaluation object for general conduct, do not have pronoun, indicative speech etc. replace entry, so the method for above-mentioned patent documentation 2 and be not suitable for second task that solves above-mentioned proposition.
As seen, how from information source (for example property commented on text), to extract more accurately and efficiently and remained problem demanding prompt solution by the relevant evaluation information of evaluation object.
Summary of the invention
Provided hereinafter about brief overview of the present invention, so that basic comprehension about some aspect of the present invention is provided.Should be appreciated that this general introduction is not about exhaustive general introduction of the present invention.It is not that intention is determined key of the present invention or pith, neither be intended to limit scope of the present invention.Its purpose only is to provide some notion with the form of simplifying, with this as the preorder in greater detail of argumentation after a while.
In view of above-mentioned problems of the prior art, according to embodiments of the invention proposed a kind of extraction with by the method for the relevant evaluation information of evaluation object, the method comprising the steps of:
Pre-treatment step is used for collected comment text is carried out pre-service, so as to obtain with this comment text in comprise at least one by the relevant comment composition of evaluation object and at least one is by the position of evaluation object in described comment text;
First extraction step is used for handling carry out initial matching through pretreated comment text, so that extract first evaluation vector set that obtains comprising at least one evaluation vector; With
Second extraction step, be used for by expanding described at least one matching treatment of being expanded by the action scope of evaluation object, so that in the set of above-mentioned first evaluation vector, lack by at least one evaluation vector of evaluation object and obtain accordingly by evaluation object, thereby obtain second evaluation vector set, as above-mentioned with by the relevant evaluation information of evaluation object.
Proposed according to another embodiment of the invention a kind of be used to extract with by the device of the relevant evaluation information of evaluation object, this device comprises:
Pretreatment unit, it is configured to collected comment text is carried out pre-service, so as to obtain with described comment text in comprise at least one by the relevant comment composition of evaluation object and described at least one by the position of evaluation object in described comment text;
First extraction unit, it is configured to handle carry out initial matching by the pretreated comment text of pretreatment unit, so that extract first evaluation vector set that obtains comprising at least one evaluation vector; With
Second extraction unit, it is configured to by expanding the matching treatment that at least one is expanded by the action scope of evaluation object, so that in the set of above-mentioned first evaluation vector, lack by at least one evaluation vector of evaluation object and obtain accordingly by evaluation object, thereby obtain second evaluation vector set, as above-mentioned with by the relevant evaluation information of evaluation object.
Relate to a kind of program product that stores the instruction code that machine readable gets according to other embodiments of the invention again, when described instruction code is read and is carried out by machine, can carry out aforesaid according to the extraction of the embodiment of the invention with by the method for the relevant evaluation information of evaluation object.
Method and apparatus according to the embodiment of the invention can be in the chapter rank, for example in the property the commented on text, for each candidate delimited reach (or be called " action scope ") by evaluation object, and according to reach be the attribute speech that extracts and estimate speech obtain coupling by evaluation object.An obtainable thus benefit is can be more exactly will and estimate speech etc. by evaluation object and associated attribute speech to mate.Because lacking by the evaluation vector of evaluation object is nonsensical for the user, so according to obtainable another benefit of the method and apparatus of the embodiment of the invention is that the coverage rate of the significant evaluation information that extracts is improved, and significantly improves extraction efficiency.
Description of drawings
With reference to below in conjunction with the explanation of accompanying drawing, can understand above and other purpose of the present invention, characteristics and advantage more easily to the embodiment of the invention.Parts in the accompanying drawing are not proportional draftings, and just for principle of the present invention is shown.For the ease of illustrating and describe some parts of the present invention, counterpart may be exaggerated in the accompanying drawing, that is, make it become bigger with respect to other parts in the exemplary means of the actual manufacturing of foundation the present invention.In the accompanying drawings, same or similar technical characterictic or parts will adopt identical or similar Reference numeral to represent.
Fig. 1 show according to an embodiment of the invention extract with by the general flow chart of the method for the relevant evaluation information of evaluation object;
Fig. 2 show the extraction of embodiment shown in Figure 1 with by the general flow chart of a concrete example of the method for the relevant evaluation information of evaluation object;
Fig. 3 show according to an embodiment of the invention extract with by the simplified block diagram of the device of the relevant evaluation information of evaluation object;
Fig. 4 show according to the extraction of Fig. 3 with by the simplified block diagram of a kind of way of realization of second extraction unit in the device of the relevant evaluation information of evaluation object; With
Fig. 5 illustrates the schematic block diagram that can be used for implementing according to the computer system of the method and apparatus of the embodiment of the invention.
Embodiment
Embodiments of the invention are described with reference to the accompanying drawings.Element of describing in an accompanying drawing of the present invention or a kind of embodiment and feature can combine with element and the feature shown in one or more other accompanying drawing or the embodiment.Should be noted that for purpose clearly, omitted the parts that have nothing to do with the present invention, those of ordinary skills are known and the expression and the description of processing in accompanying drawing and the explanation.
Fig. 1 show according to an embodiment of the invention extract with by the general flow chart of the method 100 of the relevant evaluation information of evaluation object.As shown in the figure, this method 100 is from step S110.At pre-treatment step S120, collected comment text is carried out pre-service, so as to obtain with this comment text in comprise at least one by the relevant comment composition of evaluation object and this at least one by the position of evaluation object in this comment text.At the first extraction step S130, handle carry out initial matching through pretreated comment text, so that extract first evaluation vector set that obtains comprising at least one evaluation vector.At the second extraction step S140, by expanding above-mentioned at least one matching treatment of being expanded by the action scope of evaluation object, so that be in first evaluation vector set, lack by at least one evaluation vector of evaluation object and obtain accordingly by evaluation object, thereby obtain the set of second evaluation vector.With the set of this second evaluation vector as with by the relevant evaluation information of evaluation object.
In order to understand the realization of method according to an embodiment of the invention better, below in conjunction with Fig. 2 provide as above-mentioned shown in Figure 1 from comment property text, extract with by a concrete example of the method for the relevant evaluation information of evaluation object.The frame of broken lines on right side shows the result of various processes among the figure.
For describe clear for the purpose of, suppose from the following product review comment property text that comprises with the natural language statement, to extract with by the relevant evaluation information of evaluation object:
" think the territory headstock, the tailstock is short, does not meet Chinese aesthetic fully.Particularly the tailstock is not only short but also thick.Because be sample car, storage battery also is import at least, than corolla to add the water power bottle much better.Into the car, the just preceding panel board of Zui Da bright spot.The car engine noise is bigger than corolla.But think that blue double-deck panel board in territory, very beautiful.”
Above-mentioned comment text can obtain from various information sources.For example obtain online, perhaps from the data storage device that stores the related commentary text, obtain by the internet.If the user need obtain specific area by the evaluation information of evaluation object, then can preestablish the key vocabularies table relevant etc., for example whether exist and exist how many such keywords to determine whether this comment text is required comment text in a certain comment text then by inquiring about with the field.In the context of this example, by evaluation object is the vehicle of various brands, but this is not restrictive, for example, other various products or service etc. also can be used as by evaluation object, and do not influence the enforcement according to the comment information extracting method of the embodiment of the invention.In addition, though pending comment text is to represent with the natural language of Chinese in this example, but this according to an embodiment of the invention method also goes for extracting the property estimated information from the comment text that any other language with similar language phenomenon (for example omitting subject etc.) makes up.
As shown in Figure 2, the method for this example starts from step S210.At step S220, carry out pre-service at above-mentioned comment text.Mainly be operations such as participle, part-of-speech tagging.Participle, part-of-speech tagging etc. are the field of information processing technique known, and its concrete operations do not repeat them here.As an example but unrestricted, for example, operation about participle, can be called " The First InternationalChinese Word Segmentation Bakeoff " referring to name, the author is Richard Sproat, is published in disclosed method in the document of 2nd SIGHAN workshop (2003); Operation about part-of-speech tagging, for example can be called: " Chinese Lexical Analysis UsingHHMM-ACL2003 HHMM-based Chinese Lexical Analyzer ICTCLAS " referring to name, the author is Hua-PingZhang etc., be published in 2nd SIGHAN workshop affiliated with41th ACL, Sapporo Japan, in July, 2003, the method for record such as 184-187 page or leaf.
By this pre-service, can obtain with this comment text in comprise at least one by the relevant evaluation composition of evaluation object (that is, the vehicle " think of territory " of relevant brand, " corolla ") and these are by the position of evaluation object in this comment property text.Following table 1 has provided the result of this comment text through obtaining after the pre-service:
Table 1:
<, headstock, short, 1 〉
<, the tailstock, short, 1 〉
<, the tailstock, short, 1 〉
<, the tailstock, thick, 1 〉
<, storage battery, good, 1 〉
<, preceding panel board, bright spot, 1 〉
<, noise, big, 1 〉
<, panel board, beautiful, 1 〉
As can be seen, extracted evaluation composition in the comment property text by pre-service, in this example, this evaluation composition comprises the attribute speech and estimates speech.The attribute speech comprises " headstock ", " tailstock ", and " storage battery ", " preceding panel board ", " noise " estimated speech and comprised " weak point ", " thick ", " good ", " bright spot ", " beautiful ".Constitute " attribute speech-evaluation speech to " by attribute speech and relevant evaluation speech thereof.Comprise that in table 1 eight attribute speech-evaluation speech that marked by angle brackets respectively are right.But in the embodiment of alternative, the property estimated composition can also comprise polarity mark " 1 " and " 1 " (below will describe), and this polarity mark can represent that the evaluation character of the property estimated composition is the front or negative.Usually, but the positive evaluation character of specified polarity sign " 1 " expression, and the negative evaluation character of " 1 " expression.And, although in table 1, do not illustrate, can also comprise the evaluation of estimate of the evaluation degree of expression evaluation speech in the property the estimated composition.This evaluation of estimate can obtain by various other known processing.As a kind of example, for example can utilize the author to be Yang Pin, Li Tao, Zhao Kui, name is called that disclosed technology obtains in " a kind of quantitative analysis method of network public-opinion " (" computer utility research ", 2009 the 3rd phases), and detail does not repeat them here.In addition, owing to omit phenomenon, also may only comprise in the property the estimated composition and estimate speech and do not comprise attribute speech (not shown this situation in the table 1).Comprise two by evaluation object in the comment text of this example, promptly, relevant vehicle brand " think of territory " and " corolla ", but this does not constitute the restriction to the method for this embodiment of the present invention, the comment text of evaluation information to be extracted can also include only one by evaluation object, perhaps comprise plurally, can use the method for embodiments of the invention equally by evaluation object.
In above-mentioned pre-service, for example can use machine learning techniques, template matches technology discern in the comment property text by evaluation object, attribute speech and estimate speech, right by attribute speech-evaluation speech that technology such as nearest coupling produce shown in above-mentioned table 1 then.Can implement this pre-service by various known method.As example rather than be intended to the present invention is construed as limiting, about the mentioned technology in front, for example can be called " the online product information based on the pattern match extractive technique obtains (Obtaining Product Information through Web Information ExtractionTechnique Based on Pattern Match) " referring to name, the author is Ma Jing, Ni Huifeng, be published in " information theory with the put into practice " document of (2007 the 30th the 02nd phases of volume), and name is called " the feature selecting research Chinese Automatic Entity RelationExtraction in the Chinese entity relationship extraction ", the author is Dong Jing, Sun Le, Feng Yuanyong, Huang Ruihong etc., be published in " Journal of Chinese Information Processing " (English periodical name: the document of (2007 the 21st the 4th phases of volume, classification number is TP391) JOURNAL OF CHINESE INFORMATIONPROCESSING).
Turning back to Fig. 2, at step S230, for example is that unit carries out the first extraction processing with the sentence for pass through pretreated comment text at step S220.Attribute speech-evaluation the speech that obtains in order to make need add by evaluation object it useful., first extract to handle at this for this reason, in the scope of each sentence according to for example recently principle such as coupling come for each attribute speech-evaluation speech definite accordingly by evaluation object.So can obtain extraction result as following table 2:
Table 2
<think the territory, headstock, short, 1 〉
<think the territory, the tailstock, short, 1 〉
<, the tailstock, short, 1 〉
<, the tailstock, thick, 1 〉
<corolla, storage battery, good ,-1 〉
<, preceding panel board, bright spot, 1 〉
<corolla, noise, big ,-1 〉
<think the territory, panel board, beautiful, 1 〉
Can according to storage in advance be used for the required related resource of information extraction, as vocabulary, syntax rule waits carries out this first and extracts and handle, this extraction is handled and for example can be finished by the various known method of field of information processing.As a kind of example and unrestricted, can be Jia Meiying for example by the author, YangBing Ru, Zheng Dequan etc., name is called " the military exercises information based on pattern match extracts " (modern books intelligence technology, 2009 (9), P70-75) in disclosed technology carry out this processing, the specific implementation process does not repeat them here.Shown in above-mentioned table 2, can be an evaluation vector to being defined as with having added by evaluation object and attribute speech-evaluation speech, for example, and evaluation vector<think of territory, headstock, short, 1 〉,<think the territory, the tailstock, short, 1〉and<corolla, storage battery, good ,-1 〉.Mention above, what the polarity mark in these evaluation vectors " 1 " or " 1 " were represented is the evaluation character of the evaluative component in the evaluation vector.In addition, do not add in the above-mentioned table 2 by the attribute of evaluation object speech-evaluation speech right<, the tailstock, short, 1 〉,<, the tailstock, thick, 1 〉,<, before panel board, bright spot, 1〉also can think evaluation vector, only this evaluation vector be " sky " by evaluation object.This shows that each evaluation vector can be estimated speech by by evaluation object, evaluation attributes (for example attribute speech), composition such as part such as evaluation of estimate (not shown in this example) etc.s, expressed with by the relevant clear and definite comment content of evaluation object.Wherein, evaluation attributes (for example attribute speech) are estimated speech, parts such as evaluation of estimate can be considered to by the relevant evaluation composition of evaluation object.
For evaluation vector<corolla, storage battery, good, -1, notice that the polarity mark of its evaluative component is " 1 ".From above-mentioned comment text, can find out, the pairing sentence of this evaluation vector for " because be sample car, storage battery also is import at least, than corolla to add the water power bottle much better." notice that this is a relatively sentence; expression conjunction " ratio " has relatively appearred, and, located the position that is compared in this comparison by evaluation object " corolla "; in other words, located passive comparison status in the comparison by evaluation object " corolla ".In this case, if utilize nearest matching technique directly to obtain evaluation vector<corolla like that according to general non-relatively sentence, storage battery, good, 1 〉, then obtain opposite error evaluation conclusion the most at last, promptly the storage battery of corolla is good.Therefore, in polarity mark's negate of the evaluation composition of this evaluation vector that will obtain like this, promptly " 1 " becomes " 1 ", thereby will obtain correct evaluation conclusion, and promptly the storage battery of corolla is bad relatively.In like manner, " the car engine noise is bigger than corolla for the sentence in this comment property text." also be the comparison sentence, can carry out similar processing, thereby obtain evaluation vector<corolla, noise, big, -1.
Generally, for example the comparative sentence sublist can be shown " the * * of A is better than B ", wherein " * * " expression with by the evaluation object A attribute speech relevant with B, the evaluation speech " good " be an example at this.At this relatively in sentence, then extract when handling carrying out first the comparison status that is had the initiative by evaluation object A, and be in passive comparison status by evaluation object B, can obtain evaluation vector<B, and * * is good, and-1〉rather than<B, * *, good, 1 〉.Understand easily, represent that on grammer relatively the conjunction of sentence also has other multiple, for example " with respect to ", " being not so good as ", or the like, need not to give unnecessary details.It is pointed out that if represent that relatively the conjunction of sentence comprises such as negative adverb such as " no ", can also carry out other particular procedure at this situation.For example, if relatively sentence is " the * * of A NoBetter than B ", then can in the pre-treatment step of above-mentioned steps S220, negative adverb " no " expression and significance be embodied, that is, and through pre-service obtain the attribute speech-evaluation speech relevant with this comparison sentence right<, * *, good, -1.Like this, extract by first among the above-mentioned steps S230 handle after, will in sentence relatively, be in passive comparison status by the evaluation polarity negate of the evaluative component of evaluation object B, so obtain evaluation vector<B, * *, good, 1, this evaluation vector has correctly reflected by the evaluation suggestion of evaluation object B.As seen, by means of various relatively sentences are processed into for example above-mentioned comparative sentence substandard form " the * * of A is better than B ", just can by correctly identify the comparison sentence and in be in passive comparison status extracted evaluation vector exactly by evaluation object.Above-mentioned negative adverb " no " be a kind of enumerate rather than be intended to limit.For other negative adverb or degree adverb etc., ground that also can class is handled.For example, can prepare the negative adverb of polarity that influences evaluative component common in the template of comparison sentence or the comparison sentence or character or degree adverb etc. in advance, so that can correctly determine the comparison sentence in the comment property text and correspondingly handle.
See easily, because the comparison sentence that occurs in the comment property text has been carried out distinctive processing and has not just been utilized nearest matching technique etc. to mate simply, therefore can obtain obviously higher coupling accuracy, particularly like this when handling the more comment text of sentence relatively.
In addition, though be to be that unit carries out first and extracts and handle in this example with a sentence in the comment property text,, also can be unit carries out this processing with text fragments with any suitable dimensions.Sometimes the comment text that obtains from various information sources is not to write in strict accordance with grammer.For example, the comment text that has does not have punctuation mark in the whole text, perhaps has only ending place that a fullstop is arranged in the whole text.In this case, carry out first unit that extract to handle and be not on the strict grammatical meaning with the fullstop sentence that is end mark, but for example can according to the text size that should be scheduled to the property commented on text be divided into several and extract unit according to text size of empirical value setting in advance; Perhaps, can wait the next sentence of determining in the comment property text according to the template that machine learning obtains.Therefore, in context environmental of the present disclosure, term " sentence " not merely refers to the linguistic unit with the fullstop ending on the strict grammatical meaning, and the first above-mentioned extraction will be handled and to be described in detail below second text fragments that extracts all suitable dimensions of the process object of handling all is referred to as sentence but will can be used as.Thus, the method for estimating property information according to the extraction of the embodiment of the invention can have greater flexibility and degree of freedom.
According to existing information extracting method, if occur relatively sentence in the comment property text, then generally taking nearby principle when extracting evaluation vector is that the attribute speech distributes by evaluation object with the property estimated compositions such as estimating speech.As mentioned above, because the comparative sentence minor structure is special and form is varied, such simple process method often obtains the evaluation vector of the wrong evaluation information of expression easily.According to the method for this embodiment of the invention, on the basis of the comparison other of the comparison other of having determined the sentence relatively and the status that wherein has the initiative and passive position, mate, can improve the above-mentioned first information effectively and extract the accuracy of handling.
In addition, can see from top description, owing in pending comment text, there is phenomenons such as omitting subject, by evaluation object and relative attribute speech or estimate evaluation composition such as speech and not necessarily in same sentence, occur, thus according to existing processing mode mate nearby or the processing of sentence one-level just can not obtain exactly with the institute abridged by the relevant evaluation vector of evaluation object.
For this reason, in the method shown in Fig. 2, in the back to expansion extraction step S240, by extracting processing to expansion to being carried out the back by the action scope of evaluation object.As previously mentioned, owing to having the phenomenon of omitting subject in the comment property text, make that handling the evaluation vector set that obtains by first extraction of step S230 comprises the evaluation vector that lacks by evaluation object, be evaluation vector<, the tailstock, short, 1 〉,<, the tailstock, thick, 1〉and<, preceding panel board, bright spot, 1 〉, shown in above-mentioned table 2.For convenience, will be called the set of first evaluation vector by the evaluation vector set in the form 2 of above-mentioned first extraction step S230 acquisition below.What in addition, can think that step S230 carries out is a kind of initial matching treatment.
Back in the expansion extraction step at this, for comprise in the comment property text at least one by specific in the evaluation object by evaluation object, this specific is expanded to not existing by at least one sentence of evaluation object thereafter by the action scope of evaluation object from the sentence at its current place.Like this, just can obtain accordingly by evaluation object for above-mentioned lacking in the set of first evaluation vector by the evaluation vector of evaluation object.This handle institute based on the grammar principle be to have omitted subject in the common sentence and be because of consistent with the subject in front the sentence.Single for having occurred in the single sentence by the situation of evaluation object, can be simply with this sentence as this by the reach of evaluation object, perhaps be called " action scope ".A plurality of for occurring in the single sentence by the situation of evaluation object, if this sentence is the comparison sentence, then can be preferentially with this comparison sentence as the comparison status that has the initiative by the action scope of evaluation object.In addition and since usually relatively the appearance of sentence may change in the sentence after this comparison sentence by evaluation object, therefore, when carrying out back, if sentence relatively, have no progeny to expansion extraction processing in then to the expansion extraction step.And, for action scope be carried out the back to expansion extract to handle by evaluation object, lack in other words in the sentence of its back and omitted by evaluation object.As can be seen, in this example, need carry out the back and specific need be satisfied two conditions by evaluation object to what expansion extract to be handled: the firstth, this is specific by evaluation object be not be in the comparison sentence passive comparison status by evaluation object; The secondth, this is specific to be lacked in the sentence after the sentence at evaluation object place and has been omitted in other words by evaluation object subject in other words.
Handle by above-mentioned back the extraction to expansion, in first evaluation vector set that provides in the above-mentioned table 2 lack by the evaluation vector of evaluation object<, the tailstock, short, 1 〉,<, the tailstock, thick, 1〉and<, preceding panel board, bright spot, 1〉determined accordingly by evaluation object " think of territory ", thereby enlarged the coverage rate that the property estimated information extraction is handled, helped obtaining how significant evaluation information, improved the efficient of information extraction.Following table 3 has provided through this back and has extracted the evaluation vector set that processing is obtained to expansion:
Table 3:
<think the territory, headstock, short, 1 〉
<think the territory, the tailstock, short, 1 〉
<think the territory, the tailstock, short, 1 〉
<think the territory, the tailstock, thick, 1 〉
<corolla, storage battery, good ,-1 〉
<, preceding panel board, bright spot, 1 〉
<corolla, noise, big ,-1 〉
<think the territory, panel board, beautiful, 1 〉
Though only carried out back extraction by evaluation object " think of territory " and handled in this example to expansion to meeting one of above-mentioned two conditions,, this is a kind of example rather than for the method according to this embodiment of the invention is construed as limiting.Those skilled in the art understand easily, for occur in the comment property text a plurality of meet above-mentioned two conditions by the situation of evaluation object, can be to all these by evaluation object, perhaps these are carried out above-mentioned back the extraction to expansion by in the evaluation object at least one and handle.In addition, though in this example in the scope in the whole text of comment property text for having been undertaken back to expansion by the action scope of evaluation object " think of territory ", but but in a kind of embodiment of alternative, also can only in the restricted portion of comment property text, carry out this back to expansion, that is, expanded to thereafter at least one sentence by the action scope of evaluation object " think of territory " from the sentence at its current place with what satisfy condition.Understand easily, but the embodiment of above-mentioned these alternatives also can both be from realizing coverage rate that the information extraction of the expansion property estimated is handled and the benefit of improving the efficient of information extraction in varying degrees.
From above-mentioned table 3, can see, since in the comment property text relatively sentence " ...; than corolla to add the water power bottle much better " appearance, make and extract handling interrupt to expansion after the action scope among the step S240, therefore still can't determine evaluation vector<, before panel board, bright spot, 1〉in lack by evaluation object.For this reason, can in step S250, carry out overall situation expansion and extract processing.
Extracting in the processing in this overall situation expansion, is still to lack by the evaluation vector of evaluation object in the set of first evaluation vector to obtain accordingly by evaluation object.This carries out in the following manner.At first from comment property text lack by the evaluation vector of the evaluation vector correspondence of evaluation object<, preceding panel board, bright spot, 1〉corresponding sentence sentence the preceding and after a sentence in respectively choose one nearest by evaluation object, respectively as first candidate by evaluation object and second candidate by evaluation object.It is noted that this candidate by evaluation object not should be in the comparison sentence passive comparison status by evaluation object.In this example, first candidate is the preceding a non-relatively sentence by evaluation object and second candidate by evaluation object " particularly the tailstock is not only short but also thick." middle distance nearest by evaluation object " think of territory " and after a non-relatively sentence " but think that blue double-deck panel board in territory, very beautiful." middle distance nearest by evaluation object " think of territory ".Then, calculate the first weighted value W of the statistical probability that expression first candidate occurred by evaluation object respectively in this comment text 1The second weighted value W of the statistical probability that in described comment text, is occurred by evaluation object with expression second candidate 2Though in this example for the ease of calculating and illustrating that selected first candidate who obtains is identical by evaluation object with second candidate by evaluation object, for the two different situation, overall situation expansion described here is extracted to handle and still also is suitable for.Because first candidate is identical by evaluation object with second candidate by evaluation object, therefore represent the first weighted value W of the probability that they occur in comment property text 1With the second weighted value W 2With identical.Can not calculate weighted value and select directly " think of territory " as evaluation vector<, preceding panel board, bright spot, 1〉in lack by evaluation object.Perhaps, also can pre-determine a first threshold, if the equal weight value W that is calculated 1=W 2More than or equal to this predetermined first threshold, then " think of territory " to be defined as be evaluation vector<, preceding panel board, bright spot, 1〉in lack by evaluation object.Different but have the situation of identical weighted value by evaluation object for first candidate by evaluation object and second candidate, also can choose wantonly one as needed by evaluation object, perhaps weighted value greater than the situation of predetermined first threshold under optional one as needed by evaluation object.
The candidate who different weighted values occurs having by the situation of evaluation object under, can select the bigger candidate of weighted value by evaluation object as evaluation vector<, preceding panel board, bright spot, 1〉in lack by evaluation object.
In above-mentioned process, for lack by the evaluation vector of evaluation object<, preceding panel board, bright spot, 1〉choose two nearest by evaluation object as the candidate by evaluation object, wherein " nearest " refer generally to this evaluation vector<, preceding panel board, bright spot, 1〉nearest non-the comparison in the sentence chosen the candidate by evaluation vector, and, if having in this nearest non-relatively sentence a plurality of by evaluation object, then preferentially choose with this evaluation vector nearest by evaluation vector.
Though having described in non-relatively sentence, above-mentioned example select the candidate by the situation of evaluation object, but, but in a kind of embodiment of alternative, also can from lack chosen the nearest comparison sentence of the evaluation vector of evaluation object the comparison status that has the initiative by evaluation object as the candidate by evaluation vector.That is to say, if by evaluation object be not be in the comparison sentence passive comparison status by evaluation object, its just may be in above-mentioned overall extension process selected as the candidate by evaluation object.
As for calculated candidate by the method for the weighted value of evaluation object, for example can utilize following formula (1) to calculate:
W i=(CV i-CP i)/(CP i+1) (1)
Wherein, CV iBe the number of times that described i candidate is occurred in comment property text by evaluation object, CP iBe this i candidate by evaluation object in the comparison sentence that comment property text is comprised as be in passive comparison status by the evaluation object occurrence number, i is a natural number.
In fact, can utilize various suitable formula to calculate the weighted value of the statistical probability that expression occurred by evaluation object in comment property text, and be not necessarily limited to the above-mentioned formula that provides.
It is to be undertaken by the statistical information that is occurred in comment property text by evaluation object of utilizing the candidate that processing is extracted in above-mentioned overall situation expansion, therefore also this expansion can be extracted to handle to be referred to as " the expansion extraction of statistics is handled ".
Following table 4 has provided the evaluation vector that obtains by above-mentioned overall situation expansion extraction processing and has gathered:
Table 4
<think the territory, headstock, short, 1 〉
<think the territory, the tailstock, short, 1 〉
<think the territory, the tailstock, short, 1 〉
<think the territory, the tailstock, thick, 1 〉
<corolla, storage battery, good ,-1 〉
<think the territory, storage battery, good, 1 〉
<think of territory, preceding panel board, bright spot, 1 〉
<corolla, noise, big ,-1 〉
<think the territory, noise, big, 1 〉
<think the territory, panel board, beautiful, 1 〉
As shown in table 4, extract by the expansion of the above-mentioned overall situation and to handle, determine evaluation vector<, preceding panel board, bright spot, 1〉in lack be " think of territory " by evaluation object, thereby obtain evaluation vector<think of territory, preceding panel board, bright spot, 1.Clearly, this overall extension process has further been widened the coverage rate of the property estimated information extraction disposal route according to an embodiment of the invention, helps obtaining more valuable information, has improved the efficient that information extraction is handled.
In addition, the evaluation vector set of noticing table 4 is compared with the evaluation vector set of table 3 and has been increased by two evaluation vectors<thinks of territory, and storage battery is good, and 1〉and<the think of territory, noise, greatly, 1 〉.This also is to extract to handle by overall situation expansion similar to the above to obtain.Particularly, in this example, comprise two relatively sentences " because be sample car, storage battery also is import at least, than corolla to add the water power bottle much better.”。" the car engine noise is bigger than corolla.", and these two relatively all lack in the sentence comparison status that has the initiative by evaluation object, this is because the omission of subject causes in this example.For this reason, also can extract and handle by means of aforesaid overall situation expansion, by select the candidate by evaluation object, calculated candidate by the weighted value of evaluation object (for example can calculate), selected required by the weighted value of evaluation object by treatment steps such as evaluation objects according to the candidate by above-mentioned formula 1 or other similar fashion, come for the comparison sentence by evaluation object in the comment property text, that lack the comparison status that has the initiative obtain the comparison status that has the initiative accordingly by evaluation object.Because each concrete treatment step and the expansion of the above-mentioned overall situation are extracted and handled similarly, detail can be extracted the related content of processing referring to the above-mentioned overall situation expansion of describing with reference to step S250 and table 4 among Fig. 2, does not repeat them here.As a result, for above-mentioned two relatively the comparison status that has the initiative determined of sentences all be " think of territory " by evaluation object.Correspondingly, constructed two evaluation vector<think of territories by evaluation object at determined, storage battery, good, 1〉and<think the territory, noise, big, 1 〉.Note, the evaluative component content by evaluation object (i.e. " corolla ") that is in passive relatively status in each self-corresponding relatively sentence of content and its of the content of evaluative component in the evaluation vector of these two structures (comprise the attribute speech, estimate speech etc.) is identical, and the evaluation polarity of evaluative component is " 1 " but not " 1 ", this be because determined be to be in initiatively to compare the status in the comparison sentence by evaluation object.This two evaluation vectors of being constructed can be handled and/or the evaluation vector of handling in second evaluation vector set that is obtained is extracted in the overall situation expansion of S250 as back the extraction to expansion by step S240.
Since to one relatively the sentence comparison status that can obtain exactly and have the initiative and passive comparison status by evaluation object accordingly by evaluation vector, therefore can under the prerequisite of guaranteeing maximum accuracy rate, obtain valuable information as much as possible, so can improve the property estimated information extraction efficient from comment property text.It will be appreciated by those skilled in the art that this for the comparison status that has the initiative in the sentence relatively to be obtained by evaluation object be not essential by the processing of evaluation vector accordingly, but the preferred version of handling property is extracted in a kind of further lifting.
Though it is noted that the processing of step S240 and S250 illustrates in order successively in Fig. 2, can according to circumstances select one of them or its combination arbitrarily.For example, in comment property text, compare under the less situation of sentence, the back extraction to expansion that can select only to carry out step S240 handled, because so can obtain accordingly by evaluation object extracting through first of step S230 to handle to lack by most evaluation vectors of evaluation object in resulting first evaluation vector set (as shown in table 2).Certainly, processing is extracted in the overall situation expansion that also can select to carry out step S250 on this basis again, then helps further improving the coverage rate that the property estimated information extraction is handled, and further improves extraction efficiency.In comment property text, comprise under the situation of more relatively sentence, can select the back of advanced row step S240 to handle, carry out the overall situation expansion of step S250 then and extract processing to expanding to extract; Perhaps, processing is extracted in the overall situation expansion that can select only to carry out step S250, because as mentioned above, relatively sentence will in have no progeny to extract and handle to expansion, therefore the back extracts to expansion that to handle the beneficial effect that is obtained under the many situations in sentence school relatively be not clearly sometimes, in view of the above, can be chosen in and compare the overall situation expansion extraction processing of only carrying out step S250 under the more situation of sentence in the comment property text, then can when guaranteeing to improve the property estimated information extraction processing coverage rate, improve the speed of information extraction processing.
Understand easily, if carried out step S240 back extract to handle to extract to expansion with the overall situation expansion of step S250 handle one of them, just can obtain above-mentioned corresponding beneficial effect.Step S240 and S250 can be referred to as expansion and extract treatment step, all belong to category, and distinguish mutually with the first extraction treatment step or the initial extraction treatment step of the step S130 shown in abovementioned steps S230 or Fig. 1 as second extraction step of step S140 among Fig. 1.At this, " first " and " second " is not to be intended to represent specific order or importance, and only is for relevant factor or assembly are discerned.
In addition, though in Fig. 2, do not illustrate, according to a kind of alternative embodiment, can also comprise the step that pending comment text and the various information shown in above-mentioned table 1-4 etc. is stored.But according to a kind of alternative embodiment, can also comprise the information that each step is obtained, for example the set of first evaluation vector and second evaluation vector are gathered the step of exporting.This output for example can be the various information that obtain are presented to the user in case its determine with interested by the relevant comment suggestion of evaluation object.The mode of output has no particular limits, and for example can export by modes such as text, image, sound.
According to above-mentioned extract according to an embodiment of the invention with by the method for the relevant evaluation information of evaluation object, the things of describing in the comment property text (for example specific products, service etc.) can be reached relevant informations such as its comment of carrying out, suggestion are extracted according to vector form, and can handle effectively stride sentence omission by the situation of evaluation object, thereby improved the accuracy of extracting evaluation vector on the one hand, can expand the coverage rate of extraction on the other hand.
In addition, additional embodiments of the present invention also provide a kind of extraction with by the device of the relevant evaluation information of evaluation object.The simplified block diagram of this device 300 has been shown among Fig. 3, it comprises: pretreatment unit 310, it is configured to collected comment text is carried out pre-service, so as to obtain with this comment text in comprise at least one by the relevant comment composition of evaluation object and this at least one by the position of evaluation object in this comment text.First extraction unit 320, it is configured to that the comment text of handling by pretreatment unit pre-310 is carried out initial matching handles, so that extract first evaluation vector set that obtains comprising at least one evaluation vector.Second extraction unit 330, it is configured to by expanding above-mentioned at least one matching treatment of being expanded by the action scope of evaluation object, so that be in first evaluation vector set, lack by at least one evaluation vector of evaluation object and obtain accordingly by evaluation object, thereby obtain second evaluation vector set, as with by the relevant evaluation information of evaluation object.
Fig. 4 shows the simplified block diagram of a kind of way of realization of second extraction unit 330 among Fig. 3.As shown in the figure, second extraction unit 330 can comprise that the back is to expanding extraction subelement 332 and overall situation expansion extraction subelement 334, be configured to respectively improve the accuracy and the efficient of information extraction by promoting the coverage rate that the property estimated information extraction is handled to expansion or overall situation expansion to being carried out the back by the action scope of evaluation object in the specific evaluation vector.
According to the extraction of this embodiment of the invention with can also be comprised related resource storage unit (not shown) by the relevant device of evaluation object, be used to store and carry out the required related resource of information extraction, as vocabulary, syntax rule etc.It is noted that this storage unit had both comprised the physical medium that is used to store, the logical organization that defines when also comprising the storage different content, reading/writing method etc.
Similarly, said memory cells or the storage unit that is provided with in addition can also be used to store above-mentioned various information extraction unit and extract the resulting information of handling, for example evaluation vector set, and attribute speech-evaluation speech is right, etc.
But, can also comprise handling relevant various information with the property estimated information extraction, the output unit (not shown) that for example first evaluation vector is gathered and the second evaluation vector set etc. exports according to a kind of embodiment of alternative.This output unit for example can be used for the various information that obtain are presented to the user so that its determine with interested by the relevant comment suggestion of evaluation object.Output unit for example can be text image display, loudspeaker etc.
Device 300 shown in above-mentioned Fig. 3 and 4 and included pretreatment unit 310, first extraction unit 320, second extraction unit 330 thereof, the back expansion with the overall situation to expansion extraction subelement 332 that comprises in second extraction unit 330 extracted subelement 334, and above-mentioned storage unit, output unit etc., above can being configured to carry out with reference to the described various operations of Fig. 1-2.About the further details of these operations, can describe in detail no longer one by one at this with reference to each embodiment described above, embodiment and example.
Describe in detail by block diagram, process flow diagram and/or embodiment above, illustrated the different embodiments of devices in accordance with embodiments of the present invention and/or method.When these block diagrams, process flow diagram and/or embodiment comprise one or more functions and/or operation, it will be obvious to those skilled in the art that each function among these block diagrams, process flow diagram and/or the embodiment and/or operation can by various hardware, software, firmware or in fact they combination in any and individually and/or enforcement jointly.In one embodiment, the several sections of the theme of describing in this instructions can pass through application-specific IC (ASIC), field programmable gate array (FPGA), digital signal processor (DSP) or other integrated form realizations.Yet, those skilled in the art will recognize that, some aspects of the embodiment of describing in this instructions can be whole or in part in integrated circuit with the form of one or more computer programs of on one or more computing machines, moving (for example, form with one or more computer programs of on one or more computer systems, moving), with the form of one or more programs of on one or more processors, moving (for example, form with one or more programs of on one or more microprocessors, moving), form with firmware, or implement equivalently with the form of their combination in any in fact, and, according to disclosed content in this instructions, being designed for circuit of the present disclosure and/or writing the code that is used for software of the present disclosure and/or firmware is fully within those skilled in the art's limit of power.
For example, each composition module, unit, subelement can be configured by the mode of software, firmware, hardware or its combination in any in the said apparatus 300.Under situation about realizing by software or firmware, can the program that constitute this software be installed to the computing machine with specialized hardware structure (multi-purpose computer 500 for example shown in Figure 5) from storage medium or network, this computing machine can be carried out various functions when various program is installed.
Fig. 5 shows the meaning block diagram that is suitable for realizing according to the computer system 500 of the method and apparatus of the embodiment of the invention.Computer system 500 is an example just, is not that hint is to the usable range of method and apparatus of the present invention or the limitation of function.Computer system 500 should be interpreted as that the arbitrary assembly shown in the exemplary operation 500 or its combination are had dependence or demand yet.
In Fig. 5, CPU (central processing unit) (CPU) 501 carries out various processing according to program stored among ROM (read-only memory) (ROM) 502 or from the program that storage area 508 is loaded into random-access memory (ram) 503.In RAM 503, also store data required when CPU 501 carries out various processing or the like as required.CPU 501, ROM 502 and RAM 503 are connected to each other via bus 504.Input/output interface 505 also is connected to bus 504.
Following parts also are connected to input/output interface 505: importation 506 (comprising keyboard, mouse or the like), output 507 (comprise display, for example cathode ray tube (CRT), LCD (LCD) etc. and loudspeaker etc.), storage area 508 (comprising hard disk etc.), communications portion 509 (comprising network interface unit for example LAN card, modulator-demodular unit etc.).Communications portion 509 is via for example the Internet executive communication processing of network.As required, driver 510 also can be connected to input/output interface 505.Detachable media 511 for example disk, CD, magneto-optic disk, semiconductor memory or the like can be installed on the driver 510 as required, makes the computer program of therefrom reading be installed to as required in the storage area 508.
Realizing by software under the situation of above-mentioned series of processes, can from network for example the Internet or from storage medium for example detachable media 511 program that constitutes softwares is installed.
It will be understood by those of skill in the art that this storage medium is not limited to shown in Figure 5 wherein having program stored therein, distribute separately so that the detachable media 511 of program to be provided to the user with equipment.The example of detachable media 511 comprises disk (comprising floppy disk), CD (comprising compact disc read-only memory (CD-ROM) and digital universal disc (DVD)), magneto-optic disk (comprising mini-disk (MD) (registered trademark)) and semiconductor memory.Perhaps, storage medium can be hard disk that comprises in ROM 502, the storage area 508 or the like, computer program stored wherein, and be distributed to the user with the equipment that comprises them.
Therefore, the invention allows for a kind of program product that stores the instruction code that machine readable gets.When described instruction code is read and is carried out by machine, can carry out above-mentioned according to the extraction of the embodiment of the invention with by the method for the relevant evaluation information of evaluation object.Correspondingly, the above-named various storage mediums that are used for carrying this program product are also included within of the present invention open.
Each list of references of mentioning in the superincumbent description for brevity, is incorporated into this with them by reference, this quoting as in this manual these lists of references having been carried out detailed description.
In the above in the description to the specific embodiment of the invention, can in one or more other embodiment, use in identical or similar mode at the feature that a kind of embodiment is described and/or illustrated, combined with the feature in other embodiment, or the feature in alternative other embodiment.
Should emphasize that term " comprises/comprise " existence that refers to feature, key element, step or assembly when this paper uses, but not get rid of the existence of one or more further feature, key element, step or assembly or additional.The term " first " that relates to ordinal number, the enforcement order or the importance degree of feature, key element, step or assembly that " second " etc. do not represented these terms and limited, and only be for describe clear for the purpose of and be used between these features, key element, step or assembly, identifying.
In addition, describe during method of the present invention is not limited to specifications or accompanying drawing shown in time sequencing carry out, also can be, carry out concurrently or independently according to other time sequencing.Therefore, the execution sequence of the method for describing in this instructions is not construed as limiting technical scope of the present invention.
By top description to embodiments of the invention as can be known, the technical scheme that the present invention is contained includes but not limited to the described content of following remarks:
Remarks 1, a kind of extraction with by the method for the relevant evaluation information of evaluation object, described method comprises step:
Pre-treatment step, be used for collected comment text is carried out pre-service, so as to obtain with described comment text in comprise at least one by the relevant comment composition of evaluation object and described at least one by the position of evaluation object in described comment text;
First extraction step is used for handling carry out initial matching through pretreated comment text, so that extract first evaluation vector set that obtains comprising at least one evaluation vector; With
Second extraction step, be used for by expanding described at least one matching treatment of being expanded by the action scope of evaluation object, so that in the set of described first evaluation vector, lack by at least one evaluation vector of evaluation object and obtain accordingly by evaluation object, thereby obtain second evaluation vector set, as described with by the relevant evaluation information of evaluation object.
Remarks 2, as remarks 1 described method, wherein, each described evaluation vector comprises by evaluation object with this evaluation composition that is complementary by evaluation object or only comprise the property estimated composition, and wherein, described first extraction step is included within the scope of each sentence of described comment text, according to the principle of nearest coupling be complementary with it for the property estimated composition distributes by evaluation object.
Remarks 3, as remarks 1 or 2 described methods, wherein, for include the comparison status that has the initiative by evaluation object and be in passive comparison status by the comparison sentence of evaluation object, in described first extraction step, carry out initial matching when handling, will with the evaluation polarity negate of the evaluation composition that is complementary by evaluation object that is in passive comparison status.
Remarks 4, as each described method among the remarks 1-3, wherein said and the evaluation composition that is complementary by evaluation object comprise with this by the relevant attribute speech of evaluation object and estimate speech, perhaps only comprise with this by the relevant evaluation speech of evaluation object.
Remarks 5, as remarks 4 described methods, wherein said and the evaluation composition that is complementary by evaluation object also comprise the relevant polarity mark of evaluation polarity with the property estimated composition, perhaps comprise polarity mark relevant with the evaluation polarity of the property estimated composition and the evaluation of estimate of being correlated with the evaluation degree of evaluation speech.
Remarks 6, as each described method among the remarks 1-5, wherein said second extraction step comprises that the back is to expansion extraction substep, be used at described at least one by evaluation object specific by evaluation object, this specific is expanded to not existing by at least one sentence of evaluation object following closely by the action scope of evaluation object from the sentence at its current place, lack by the evaluation vector of evaluation object in the set of first evaluation vector to obtain so that be accordingly by evaluation object, wherein, action scope be not in passive comparison status by the back to described specific being compared in the sentence by evaluation object of expansion.
Remarks 7, as each described method among the remarks 1-6, wherein said second extraction step comprises overall situation expansion extraction substep, being used in the following manner, is to lack by the evaluation vector of evaluation object in the set of first evaluation vector to obtain accordingly by evaluation object:
From described comment text, comprise, with described lack by sentence the preceding of the corresponding sentence of the evaluation vector of evaluation object and after a sentence in respectively choose one nearest by evaluation object, respectively as first candidate by evaluation object and second candidate by evaluation object, wherein said first candidate by evaluation object and second candidate by evaluation object be not be in the comparison sentence passive comparison status by evaluation object;
Calculate the first weighted value W of the statistical probability that described first candidate of expression occurred by evaluation object respectively in described comment text 1The second weighted value W of the statistical probability that in described comment text, is occurred by evaluation object with expression second candidate 2With
With the first weighted value W 1With the second weighted value W 2In bigger one pairing by evaluation object as described lack by in the evaluation vector of evaluation object by evaluation object, perhaps at the first weighted value W 1With the second weighted value W 2When equating optional one have greater than the weighted value of predetermined first threshold by evaluation object as described lack by in the evaluation vector of evaluation object by evaluation object.
Remarks 8, as each described method among the remarks 1-7, wherein said second extraction step comprises that overall situation expansion extracts substep, be used for by following mode be the comparison sentence by evaluation object described comment text, that lack the comparison status that has the initiative obtain the comparison status that has the initiative accordingly by evaluation object and with this by the relevant evaluation vector of evaluation object:
Sentence the preceding of relatively sentence that from described comment text, comprise, described and after a sentence in respectively choose one nearest by evaluation object, respectively as the 3rd candidate by evaluation object and the 4th candidate by evaluation object, wherein said the 3rd candidate by evaluation object and the 4th candidate by evaluation object be not be in the comparison sentence passive comparison status by evaluation object;
Calculate the 3rd weighted value W of the statistical probability that expression the 3rd candidate occurred by evaluation object respectively in described comment text 3The 4th weighted value W of the statistical probability that in described comment text, is occurred by evaluation object with expression the 4th candidate 4With
With the 3rd weighted value W 3With the 4th weighted value W 4In bigger one pairing by evaluation object as the described comparison status that has the initiative by evaluation object, perhaps at the 3rd weighted value W 3With the 4th weighted value W 4Equate optional constantly one have greater than the weighted value of the second predetermined threshold value by evaluation object as the described comparison status that has the initiative by evaluation object,
Wherein, for the comparison status that has the initiative that obtained by evaluation object structure evaluation vector and with the evaluation vector of being constructed as the evaluation vector in described second evaluation vector set, the evaluative component content by evaluation object that is in passive comparison status in the evaluative component in the evaluation vector of described structure and the described relatively sentence is identical but polarity is opposite.
Remarks 9, as remarks 7 or 8 described methods, wherein, described i candidate is by the weighted value W of evaluation object iAccording to following being calculated by evaluation object weighted value computing formula:
W i=(CV i-CP i)/(CP i+1)
Wherein, CV iBe the number of times that described i candidate is occurred in described comment text by evaluation object, CP iBe described i candidate by evaluation object in the comparison sentence that described comment text is comprised as be in passive comparison status by the evaluation object occurrence number, i is a natural number.
Remarks 10, as each described method among the remarks 1-9, wherein relatively sentence comprise in the following conjunction one of at least: ratio, with respect to, be not so good as.
Remarks 11, a kind of be used to extract with by the device of the relevant evaluation information of evaluation object, described device comprises:
Pretreatment unit, it is configured to collected comment text is carried out pre-service, so as to obtain with described comment text in comprise at least one by the relevant comment composition of evaluation object and described at least one by the position of evaluation object in described comment text;
First extraction unit, it is configured to handle carry out initial matching by the pretreated comment text of described pretreatment unit, so that extract first evaluation vector set that obtains comprising at least one evaluation vector; With
Second extraction unit, it is configured to by expanding described at least one matching treatment of being expanded by the action scope of evaluation object, so that in the set of described first evaluation vector, lack by at least one evaluation vector of evaluation object and obtain accordingly by evaluation object, thereby obtain second evaluation vector set, as described with by the relevant evaluation information of evaluation object.
Remarks 12, as remarks 11 described devices, wherein, each described evaluation vector comprises by evaluation object with this evaluation composition that is complementary by evaluation object or only comprise the property estimated composition, and wherein, described first extraction unit is configured within the scope of each sentence of described comment text, according to the principle of nearest coupling be complementary with it for the property estimated composition distributes by evaluation object.
13, as remarks 11 or 12 described devices, wherein, described first extraction unit be configured to for include the comparison status that has the initiative by evaluation object and be in passive comparison status by the comparison sentence of evaluation object, carrying out initial matching when handling, will with the evaluation polarity negate of the evaluation composition that is complementary by evaluation object that is in passive comparison status.
Remarks 14, as each described device among the remarks 11-13, wherein said and the evaluation composition that is complementary by evaluation object comprise with this by the relevant attribute speech of evaluation object and estimate speech, perhaps only comprise with this by the relevant evaluation speech of evaluation object.
Remarks 15, as remarks 14 described devices, wherein said and the evaluation composition that is complementary by evaluation object also comprise the relevant polarity mark of evaluation polarity with the property estimated composition, perhaps comprise polarity mark relevant with the evaluation polarity of the property estimated composition and the evaluation of estimate of being correlated with the evaluation degree of evaluation speech.
Remarks 16, as each described device among the remarks 11-15, wherein said second extraction unit comprises that the back is to expansion extraction subelement, its be configured at described at least one by specific in the evaluation object by evaluation object, this specific is expanded to not existing by at least one sentence of evaluation object thereafter by the action scope of evaluation object from the sentence at its current place, lack by the evaluation vector of evaluation object in the set of first evaluation vector to obtain so that be accordingly by evaluation object, wherein, action scope be not in passive comparison status by the back to described specific being compared in the sentence by evaluation object of expansion.
Remarks 17, as each described device among the remarks 11-16, wherein said second extraction unit comprises overall situation expansion extraction subelement, it is configured in the following manner, is to lack by the evaluation vector of evaluation object in the set of first evaluation vector to obtain accordingly by evaluation object:
From described comment text, comprise, with described lack by sentence the preceding of the corresponding sentence of the evaluation vector of evaluation object and after a sentence in respectively choose one nearest by evaluation object, respectively as first candidate by evaluation object and second candidate by evaluation object, wherein said first candidate by evaluation object and second candidate by evaluation object be not be in the comparison sentence passive comparison status by evaluation object;
Calculate the first weighted value W of the statistical probability that described first candidate of expression occurred by evaluation object respectively in described comment text 1The second weighted value W of the statistical probability that in described comment text, is occurred by evaluation object with expression second candidate 2With
With the first weighted value W 1With the second weighted value W 2In bigger one pairing by evaluation object as described lack by in the evaluation vector of evaluation object by evaluation object, perhaps at the first weighted value W 1With the second weighted value W 2When equating optional one have greater than the weighted value of predetermined first threshold by evaluation object as described lack by in the evaluation vector of evaluation object by evaluation object.
Remarks 18, as each described device among the remarks 11-17, wherein said second extraction unit comprises that overall situation expansion extracts subelement, its be configured to by following mode be the comparison sentence by evaluation object in the described comment text, that lack the comparison status that has the initiative obtain the comparison status that has the initiative accordingly by evaluation object and with this by the relevant evaluation vector of evaluation object:
Sentence the preceding of relatively sentence that from described comment text, comprise, described and after a sentence in respectively choose one nearest by evaluation object, respectively as the 3rd candidate by evaluation object and the 4th candidate by evaluation object, wherein said the 3rd candidate by evaluation object and the 4th candidate by evaluation object be not be in the comparison sentence passive comparison status by evaluation object;
Calculate the 3rd weighted value W of the statistical probability that expression the 3rd candidate occurred by evaluation object respectively in described comment text 3The 4th weighted value W of the statistical probability that in described comment text, is occurred by evaluation object with expression the 4th candidate 4With
With the 3rd weighted value W 3With the 4th weighted value W 4In bigger one pairing by evaluation object as the described comparison status that has the initiative by evaluation object, perhaps at the 3rd weighted value W 3With the 4th weighted value W 4Equate optional constantly one have greater than the weighted value of the second predetermined threshold value by evaluation object as the described comparison status that has the initiative by evaluation object,
Wherein, for the comparison status that has the initiative that obtained by evaluation object structure evaluation vector and with the evaluation vector of being constructed as the evaluation vector in described second evaluation vector set, the evaluative component content by evaluation object that is in passive comparison status in the evaluative component in the evaluation vector of described structure and the described relatively sentence is identical but polarity is opposite.
19. 1 kinds of remarks store the program product of the instruction code that machine readable gets,
When described instruction code is read and is carried out by machine, can carry out as among the remarks 1-10 any one described from comment property text, extract with by the method for the relevant evaluation information of evaluation object.
20. 1 kinds of storage mediums that carry as remarks 19 described program products of remarks.
Although the present invention is disclosed above by description to specific embodiments of the invention, but, should be appreciated that those skilled in the art can design various modifications of the present invention, improvement or equivalent in the spirit and scope of claims.These modifications, improvement or equivalent also should be believed to comprise in protection scope of the present invention.

Claims (10)

  1. An extraction with by the method for the relevant evaluation information of evaluation object, described method comprises step:
    Pre-treatment step, be used for collected comment text is carried out pre-service, so as to obtain with described comment text in comprise at least one by the relevant comment composition of evaluation object and described at least one by the position of evaluation object in described comment text;
    First extraction step is used for handling carry out initial matching through pretreated comment text, so that extract first evaluation vector set that obtains comprising at least one evaluation vector; With
    Second extraction step, be used for by expanding described at least one matching treatment of being expanded by the action scope of evaluation object, so that in the set of described first evaluation vector, lack by at least one evaluation vector of evaluation object and obtain accordingly by evaluation object, thereby obtain second evaluation vector set, as described with by the relevant evaluation information of evaluation object.
  2. 2. the method for claim 1, wherein, for include the comparison status that has the initiative by evaluation object and be in passive comparison status by the comparison sentence of evaluation object, in described first extraction step, carry out initial matching when handling, will with the evaluation polarity negate of the evaluation composition that is complementary by evaluation object that is in passive comparison status.
  3. 3. method as claimed in claim 1 or 2, wherein said second extraction step comprises that the back is to expansion extraction substep, be used at described at least one by evaluation object specific by evaluation object, this specific is expanded to not existing by at least one sentence of evaluation object thereafter by the action scope of evaluation object from the sentence at its current place, lack by the evaluation vector of evaluation object in the set of first evaluation vector to obtain so that be accordingly by evaluation object, wherein, action scope be not in passive comparison status by the back to described specific being compared in the sentence by evaluation object of expansion.
  4. 4. as each described method among the claim 1-3, wherein said second extraction step comprises overall situation expansion extraction substep, being used in the following manner, is to lack by the evaluation vector of evaluation object in the set of first evaluation vector to obtain accordingly by evaluation object:
    From described comment text, comprise, with described lack by sentence the preceding of the corresponding sentence of the evaluation vector of evaluation object and after a sentence in respectively choose one nearest by evaluation object, respectively as first candidate by evaluation object and second candidate by evaluation object, wherein said first candidate by evaluation object and second candidate by evaluation object be not be in the comparison sentence passive comparison status by evaluation object;
    Calculate the first weighted value W of the statistical probability that described first candidate of expression occurred by evaluation object respectively in described comment text 1The second weighted value W of the statistical probability that in described comment text, is occurred by evaluation object with expression second candidate 2With
    With the first weighted value W 1With the second weighted value W 2In bigger one pairing by evaluation object as described lack by in the evaluation vector of evaluation object by evaluation object, perhaps at the first weighted value W 1With the second weighted value W 2When equating optional one have greater than the weighted value of predetermined first threshold by evaluation object as described lack by in the evaluation vector of evaluation object by evaluation object.
  5. 5. as each described method among the claim 1-4, wherein said second extraction step comprises that overall situation expansion extracts substep, be used for by following mode be the comparison sentence by evaluation object described comment text, that lack the comparison status that has the initiative obtain the comparison status that has the initiative accordingly by evaluation object and with this by the relevant evaluation vector of evaluation object:
    Sentence the preceding of relatively sentence that from described comment text, comprise, described and after a sentence in respectively choose one nearest by evaluation object, respectively as the 3rd candidate by evaluation object and the 4th candidate by evaluation object, wherein said the 3rd candidate by evaluation object and the 4th candidate by evaluation object be not be in the comparison sentence passive comparison status by evaluation object;
    Calculate the 3rd weighted value W of the statistical probability that expression the 3rd candidate occurred by evaluation object respectively in described comment text 3The 4th weighted value W of the statistical probability that in described comment text, is occurred by evaluation object with expression the 4th candidate 4With
    With the 3rd weighted value W 3With the 4th weighted value W 4In bigger one pairing by evaluation object as the described comparison status that has the initiative by evaluation object, perhaps at the 3rd weighted value W 3With the 4th weighted value W 4Equate optional constantly one have greater than the weighted value of the second predetermined threshold value by evaluation object as the described comparison status that has the initiative by evaluation object,
    Wherein, for the comparison status that has the initiative that obtained by evaluation object structure evaluation vector and with the evaluation vector of being constructed as the evaluation vector in described second evaluation vector set, the evaluative component content by evaluation object that is in passive comparison status in the evaluative component in the evaluation vector of described structure and the described relatively sentence is identical but polarity is opposite.
  6. 6. as claim 4 or 5 described methods, wherein, described i candidate is by the weighted value W of evaluation object iAccording to following being calculated by evaluation object weighted value computing formula:
    W i=(CV i-CP i)/(CP i+1)
    Wherein, CV iBe the number of times that described i candidate is occurred in described comment text by evaluation object, CP iBe described i candidate by evaluation object in the comparison sentence that described comment text is comprised as be in passive comparison status by the evaluation object occurrence number, i is a natural number.
  7. One kind be used to extract with by the device of the relevant evaluation information of evaluation object, described device comprises:
    Pretreatment unit, it is configured to collected comment text is carried out pre-service, so as to obtain with described comment text in comprise at least one by the relevant comment composition of evaluation object and described at least one by the position of evaluation object in described comment text;
    First extraction unit, it is configured to handle carry out initial matching by the pretreated comment text of described pretreatment unit, so that extract first evaluation vector set that obtains comprising at least one evaluation vector; With
    Second extraction unit, it is configured to by expanding described at least one matching treatment of being expanded by the action scope of evaluation object, so that in the set of described first evaluation vector, lack by at least one evaluation vector of evaluation object and obtain accordingly by evaluation object, thereby obtain second evaluation vector set, as described with by the relevant evaluation information of evaluation object.
  8. 8. device as claimed in claim 7, wherein said second extraction unit comprises that the back is to expansion extraction subelement, its be configured at described at least one by specific in the evaluation object by evaluation object, this specific is expanded to not existing by at least one sentence of evaluation object thereafter by the action scope of evaluation object from the sentence at its current place, lack by the evaluation vector of evaluation object in the set of first evaluation vector to obtain so that be accordingly by evaluation object, wherein, action scope be not in passive comparison status by the back to described specific being compared in the sentence by evaluation object of expansion.
  9. 9. as claim 7 or 8 described devices, wherein said second extraction unit comprises overall situation expansion extraction subelement, and it is configured in the following manner, is to lack by the evaluation vector of evaluation object in the set of first evaluation vector to obtain accordingly by evaluation object:
    From described comment text, comprise, with described lack by sentence the preceding of the corresponding sentence of the evaluation vector of evaluation object and after a sentence in respectively choose one nearest by evaluation object, respectively as first candidate by evaluation object and second candidate by evaluation object, wherein said first candidate by evaluation object and second candidate by evaluation object be not be in the comparison sentence passive comparison status by evaluation object;
    Calculate the first weighted value W of the statistical probability that described first candidate of expression occurred by evaluation object respectively in described comment text 1The second weighted value W of the statistical probability that in described comment text, is occurred by evaluation object with expression second candidate 2With
    With the first weighted value W 1With the second weighted value W 2In bigger one pairing by evaluation object as described lack by in the evaluation vector of evaluation object by evaluation object, perhaps at the first weighted value W 1With the second weighted value W 2When equating optional one have greater than the weighted value of predetermined first threshold by evaluation object as described lack by in the evaluation vector of evaluation object by evaluation object.
  10. 10. as each described device among the claim 7-9, wherein said second extraction unit comprises that overall situation expansion extracts subelement, its be configured to by following mode be the comparison sentence by evaluation object in the described comment text, that lack the comparison status that has the initiative obtain the comparison status that has the initiative accordingly by evaluation object and with this by the relevant evaluation vector of evaluation object:
    Sentence the preceding of relatively sentence that from described comment text, comprise, described and after a sentence in respectively choose one nearest by evaluation object, respectively as the 3rd candidate by evaluation object and the 4th candidate by evaluation object, wherein said the 3rd candidate by evaluation object and the 4th candidate by evaluation object be not be in the comparison sentence passive comparison status by evaluation object;
    Calculate the 3rd weighted value W of the statistical probability that expression the 3rd candidate occurred by evaluation object respectively in described comment text 3The 4th weighted value W of the statistical probability that in described comment text, is occurred by evaluation object with expression the 4th candidate 4With
    With the 3rd weighted value W 3With the 4th weighted value W 4In bigger one pairing by evaluation object as the described comparison status that has the initiative by evaluation object, perhaps at the 3rd weighted value W 3With the 4th weighted value W 4Equate optional constantly one have greater than the weighted value of the second predetermined threshold value by evaluation object as the described comparison status that has the initiative by evaluation object,
    Wherein, for the comparison status that has the initiative that obtained by evaluation object structure evaluation vector and with the evaluation vector of being constructed as the evaluation vector in described second evaluation vector set, the evaluative component content by evaluation object that is in passive comparison status in the evaluative component in the evaluation vector of described structure and the described relatively sentence is identical but polarity is opposite.
CN201010120101.4A 2010-02-24 2010-02-24 Method and device for extracting evaluative information from critical texts Expired - Fee Related CN102163189B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201010120101.4A CN102163189B (en) 2010-02-24 2010-02-24 Method and device for extracting evaluative information from critical texts

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201010120101.4A CN102163189B (en) 2010-02-24 2010-02-24 Method and device for extracting evaluative information from critical texts

Publications (2)

Publication Number Publication Date
CN102163189A true CN102163189A (en) 2011-08-24
CN102163189B CN102163189B (en) 2014-07-23

Family

ID=44464422

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201010120101.4A Expired - Fee Related CN102163189B (en) 2010-02-24 2010-02-24 Method and device for extracting evaluative information from critical texts

Country Status (1)

Country Link
CN (1) CN102163189B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104272301A (en) * 2012-04-25 2015-01-07 国际商业机器公司 Evaluation polarity-based text classification method, computer program, and computer
CN104778184A (en) * 2014-01-15 2015-07-15 腾讯科技(深圳)有限公司 Feedback keyword determining method and device
CN106528519A (en) * 2015-09-09 2017-03-22 佳能信息技术(北京)有限公司 Text mining method and device
CN109063034A (en) * 2018-07-16 2018-12-21 浙江大学 Interior space semanteme value calculation method based on space and social multi-medium data
CN110738056A (en) * 2018-07-03 2020-01-31 百度在线网络技术(北京)有限公司 Method and apparatus for generating information
CN110929175A (en) * 2018-08-30 2020-03-27 北京京东尚科信息技术有限公司 Method, device, system and medium for evaluating user evaluation
CN113420122A (en) * 2021-06-24 2021-09-21 平安科技(深圳)有限公司 Method, device and equipment for analyzing text and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6205456B1 (en) * 1997-01-17 2001-03-20 Fujitsu Limited Summarization apparatus and method
CN1530857A (en) * 2003-03-05 2004-09-22 ��������˹�����տ����� Method and device for document and pattern distribution
CN101436186A (en) * 2007-11-12 2009-05-20 北京搜狗科技发展有限公司 Method and system for providing related searches

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6205456B1 (en) * 1997-01-17 2001-03-20 Fujitsu Limited Summarization apparatus and method
CN1530857A (en) * 2003-03-05 2004-09-22 ��������˹�����տ����� Method and device for document and pattern distribution
CN101436186A (en) * 2007-11-12 2009-05-20 北京搜狗科技发展有限公司 Method and system for providing related searches

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
SOO-MIN KIM ET AL.: "Extracting Opinions,Opinion Holders,and Topics Expressed in Online News Media Text", 《PROCEEDINGS OF THE WORKSHOP ON SENTIMENT AND SUBJECTIVITY IN TEXT》 *
张奇: "细颗粒度情感倾向分析若干关键问题研究", 《中国博士学位论文全文数据库》 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104272301A (en) * 2012-04-25 2015-01-07 国际商业机器公司 Evaluation polarity-based text classification method, computer program, and computer
CN104272301B (en) * 2012-04-25 2018-01-23 国际商业机器公司 For extracting method, computer-readable medium and the computer of a part of text
CN104778184A (en) * 2014-01-15 2015-07-15 腾讯科技(深圳)有限公司 Feedback keyword determining method and device
CN106528519A (en) * 2015-09-09 2017-03-22 佳能信息技术(北京)有限公司 Text mining method and device
CN106528519B (en) * 2015-09-09 2019-04-30 佳能信息技术(北京)有限公司 The method and apparatus of text mining
CN110738056A (en) * 2018-07-03 2020-01-31 百度在线网络技术(北京)有限公司 Method and apparatus for generating information
CN110738056B (en) * 2018-07-03 2023-12-19 百度在线网络技术(北京)有限公司 Method and device for generating information
CN109063034A (en) * 2018-07-16 2018-12-21 浙江大学 Interior space semanteme value calculation method based on space and social multi-medium data
CN109063034B (en) * 2018-07-16 2022-01-04 浙江大学 Indoor space semantic value calculation method based on space and social multimedia data
CN110929175A (en) * 2018-08-30 2020-03-27 北京京东尚科信息技术有限公司 Method, device, system and medium for evaluating user evaluation
CN113420122A (en) * 2021-06-24 2021-09-21 平安科技(深圳)有限公司 Method, device and equipment for analyzing text and storage medium

Also Published As

Publication number Publication date
CN102163189B (en) 2014-07-23

Similar Documents

Publication Publication Date Title
CN104679728B (en) A kind of text similarity detection method
CN111079406B (en) Natural language processing model training method, task execution method, equipment and system
CN102163189B (en) Method and device for extracting evaluative information from critical texts
CN101510221B (en) Enquiry statement analytical method and system for information retrieval
CN102227724B (en) Machine learning for transliteration
CN103336766B (en) Short text garbage identification and modeling method and device
CN110888968A (en) Customer service dialogue intention classification method and device, electronic equipment and medium
CN107832229A (en) A kind of system testing case automatic generating method based on NLP
CN106777275A (en) Entity attribute and property value extracting method based on many granularity semantic chunks
CN101894102A (en) Method and device for analyzing emotion tendentiousness of subjective text
CN105975478A (en) Word vector analysis-based online article belonging event detection method and device
Kaur et al. A survey of named entity recognition in English and other Indian languages
CN102096680A (en) Method and device for analyzing information validity
CN104408078A (en) Construction method for key word-based Chinese-English bilingual parallel corpora
CN102214189B (en) Data mining-based word usage knowledge acquisition system and method
CN101211339A (en) Intelligent web page classifier based on user behaviors
CN103593431A (en) Internet public opinion analyzing method and device
CN104035918A (en) Chinese organization name abbreviation recognition system adopting context feature matching
CN101833579A (en) Method and system for automatically detecting academic misconduct literature
CN101833555A (en) Information extraction method and device
CN103678565A (en) Domain self-adaption sentence alignment system based on self-guidance mode
CN103885933A (en) Method and equipment for evaluating text sentiment
CN109213998A (en) Chinese wrongly written character detection method and system
CN107480197B (en) Entity word recognition method and device
Ghoneim et al. Multiword expressions in the context of statistical machine translation

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20140723

Termination date: 20180224