CN104573046A - Comment analyzing method and system based on term vector - Google Patents
Comment analyzing method and system based on term vector Download PDFInfo
- Publication number
- CN104573046A CN104573046A CN201510027614.3A CN201510027614A CN104573046A CN 104573046 A CN104573046 A CN 104573046A CN 201510027614 A CN201510027614 A CN 201510027614A CN 104573046 A CN104573046 A CN 104573046A
- Authority
- CN
- China
- Prior art keywords
- comment
- term vector
- vector
- basic
- sentence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation or dialogue systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
Abstract
The invention discloses a comment analyzing method and system based on a term vector and relates to the technical field of emotion analysis, natural language processing and the like. A machine is utilized to analyze the comment, automatic user comment analysis is made by using the machine, and the working efficiency is improved. The method is characterized in that user comments are collected to form a comment corpus, each comment in the comment corpus is converted into the sentence vectors with identical dimension, a plurality of comment types are set, each comment is labeled with the corresponding type according to the labels which are input manually, a classifier is trained with the sentence vectors as the input and the comment type that each sentence vector corresponds to as the output, a new comment is acquired and converted into the sentence vector, and the sentence vector that the new comment corresponds to is input into the classifier to obtain the comment type of the new comment.
Description
Technical field
The present invention relates to the technical field such as sentiment analysis, natural language processing.
Background technology
Along with the development of electric business, on network, the comment of user to certain product is more and more.Analyze the comment of user, user can be understood to the view of producing and suggestion, contribute to the perfect of product like this, and the lifting of service quality.But along with the continuous increase of number of users, corresponding comment amount also increases very large, if or rely on manual read's comment, understand consumers' opinions, work efficiency will be reduced greatly, the opinions or suggestions of user to product or service can not be understood in time.
Summary of the invention
For above-mentioned situation, the present invention proposes a kind of method and system using equipment analysis to comment on, do automatic subscriber comment and analysis by machine, work efficiency is provided.
Based on the comment and analysis method of term vector in the present invention, comprising:
Step 1: collect user comment, forms comment corpus;
Step 2: every bar comment of comment corpus is converted into the identical sentence vector of dimension;
Step 3: some comment types are set, the every bar that is labeled as according to artificial input comments on the comment type marked belonging to it;
Step 4: with described sentence vector for input, the comment type that every bar sentence vector is corresponding is export training classifier;
Step 5: obtain a new comment, and be translated into sentence vector;
Step 6: being input to newly commenting on corresponding sentence vector in described sorter, obtaining the comment type of new comment.
Described step 2 comprises further:
Step 21: each comment is divided into some basic participles, comments on dictionary to obtaining after basic participle duplicate removal;
Step 22: each basic participle is converted into a term vector; The term vector dimension that each basic participle is corresponding is identical;
Step 23: superposed by term vector corresponding for the basic participle in every bar comment, obtains the sentence vector of this comment.
Described step 5 comprises further:
Step 51: new comment is divided into some basic participles;
Step 52: the term vector that in finding step 51, each basic participle is corresponding in comment dictionary;
Step 53: superposed by term vector corresponding for each basic participle of new comment, obtains the sentence vector of new comment.
Described step 22 comprises further: using the input of basic participle as neural network model, makes described neural network model unsupervised learning obtain term vector corresponding to this basic participle.
Preferably, described term vector dimension is 200.
Affiliated step 3 comprises further does following process to the comment in each comment type:
Step 31: the key weight calculating the basic participle in comment type in each comment;
Step 32: carry out descending sort according to the basic participle of key weight to all comments in this comment type;
Step 33: the keyword of basic participle as described comment type selecting front n inequality; Described n get be greater than 0 and be less than or equal to 5 natural number.
Present invention also offers a kind of Commentary Systems based on term vector, comprising:
Comment collection module, for collecting user comment, forms comment corpus;
Sample sentence vector conversion module, for being converted into the identical sentence vector of dimension by every bar comment of comment corpus;
Comment type labeling module, for arranging some comment types, the every bar that is labeled as according to artificial input comments on the comment type marked belonging to it;
Sorter training module, for vectorial for input with described sentence, the comment type that every bar sentence vector is corresponding is output training classifier;
Comment sentence vector modular converter, for obtaining a new comment, and is translated into sentence vector;
Sorter, the comment type that the sentence vector calculation corresponding according to new comment is newly commented on.
Described sample sentence vector conversion module comprises further:
Sample word-dividing mode, for each comment in comment corpus is divided into some basic participles, comments on dictionary to obtaining after basic participle duplicate removal;
Sample term vector conversion module, for being converted into a term vector by each basic participle; The term vector dimension that each basic participle is corresponding is identical;
Sample term vector laminating module, for being superposed by term vector corresponding for the basic participle in every bar comment, obtains commenting on the sentence vector of each comment in corpus.
Described comment sentence vector modular converter comprises further:
Comment word-dividing mode, for being divided into some basic participles by new comment;
Comment term vector conversion module, for searching the term vector that in new comment, each basic participle is corresponding in comment dictionary;
Comment term vector laminating module, the term vector corresponding for each the basic participle by new comment superposes, and obtains the sentence vector of new comment.
Described sample term vector conversion module is further used for the input of basic participle as neural network model, makes described neural network model unsupervised learning obtain term vector corresponding to this basic participle.
Preferably, described term vector dimension is 200.
Comment type labeling module comprises further:
Key weight computation module, for calculating the key weight of the basic participle in comment type in each comment;
Order module, for carrying out descending sort according to the basic participle of key weight to all comments in this comment type;
Keyword Selection module, for selecting the basic participle of a front n inequality as the keyword of described comment type; Described n get be greater than 0 and be less than or equal to 5 natural number.
In sum, owing to have employed technique scheme, the invention has the beneficial effects as follows:
Present invention achieves the robotization of comment and analysis, robotic, substantially increase work efficiency.
The present invention adopts neural network model to calculate the vector of basic participle, and the term vector represented so accurately can not represent the basic participle of its correspondence, and can also embody the incidence relation between word and word, degree of intelligence is higher.
The present invention adopts the vectorial to sentence of the stacked system of term vector, avoid a vector dimension to increase, because the term vector after training is word has been mapped to a new theme dimensional space in fact, so term vector is carried out superposition well can also represent the mapping situation of sentence at such feature space.Do like this, not only avoid the vector that sentence characteristics represents too sparse, the situation that dimension is too much, well at low dimension space representation sentence characteristics, and can not affect classification performance again.
Embodiment
All features disclosed in this instructions, or the step in disclosed all methods or process, except mutually exclusive feature and/or step, all can combine by any way.
Arbitrary feature disclosed in this instructions, unless specifically stated otherwise, all can be replaced by other equivalences or the alternative features with similar object.That is, unless specifically stated otherwise, each feature is an example in a series of equivalence or similar characteristics.
The present invention's specific embodiment comprises the following steps:
Step 1: arrange user comment, forms comment corpus.The concrete comment statement that web crawlers can be used to collect user from each large webpage forms comment corpus.Web crawlers is a kind of program of automatic acquisition web page contents, is the important component part of search engine.The comment statement collected is more, and the comment corpus that we obtain is more complete.
Step 2: every bar comment of comment corpus is converted into the identical sentence vector of dimension: comprise further and use participle (verb, sentence is carried out segmentation) comment statement is divided into basic participle (noun) by software, after each comment participle in comment corpus, obtain commenting on dictionary by after the whole basic participle deduplication obtained.Each basic participle in comment dictionary is being converted into term vector.
The present embodiment uses degree of depth learning art training term vector model:
In order to the advantage of term vector in outstanding the present invention, first set forth the limitation of traditional word bag model here.
Traditional word bag model is the feature be shown as by each vocabulary in a proper vector.If there is a dictionary, comprise 10 words in dictionary, word wherein needs to represent with 10 dimensional vectors, as " good " in dictionary can word bag model representation: v (' good')=[0,1,0,0,0,0,0,0,0,0], " bad " in dictionary can word bag model representation be v (' bad')=[0,0,1,0,0,0,0,0,0,0] etc.
Adopt this word bag model representation word to there is such limitation, when the word amount in dictionary is very large, such as reach ten million order of magnitude other time, represent with regard to needs ten million dimensional vector, occur dimension disaster, therefore need to do feature selecting or feature extraction.Meanwhile, such expression, is difficult to find the relation between word and word, such as ' fantastic ' and ' good' has similarity, but by word bag model, is difficult to measure the similarity between them.
Us are facilitated to improve the expression of term vector based on above-mentioned two reasons.We used neural network model, using the whole basic participle of comment dictionary as training sample, be input in neural network model, make neural network model unsupervised learning obtain the term vector feature of 200 dimensions.In other embodiments, term vector dimension also can be 50,100,150 etc.
After term vector superposition corresponding for all basic participle in a comment in comment corpus, obtain the sentence vector of this comment.
Suppose a comment statement S, wherein w
irepresent i-th the basic participle of this comment after participle, so have:
S=w
1, w
2... w
i... w
n, wherein n represents the word number of sentence.
In the present embodiment, each basic participle w
ibeing expressed as a length is the vector of 200, and:
V
wi={ v
1, v
2, v
3..., v
i... v
200, wherein each dimension represents the value of this word in an abstract dimension.
According to the accumulation principle of the present embodiment, the sentence vector of this comment will be expressed as:
So all comment statements in comment corpus are all expressed as the proper vector of one 200 dimension, avoid " dimension disaster ", also make the pass between word and word tie up in feature and embodied.
The benefit done like this is, the participle no matter commenting on statement has how many, and the dimension of sentence vector is all constant.If adopt traditional mode, replaced by the basic participle in statement with its term vector, if this statement has 10 basic participles, so the sentence vector dimension of this statement will reach 2000, there is the risk of dimension disaster equally.
Because the term vector after training is word has been mapped to a new theme dimensional space in fact, so added up by the term vector of the basic participle in sentence, the mapping situation of sentence at such feature space can be represented well.It is also like this that result proves, not only avoid the vector that sentence characteristics represents too sparse, the situation that dimension is too much, can represent again the feature of sentence well, and not affect classification performance at lower dimensional space.
Step 3: some comment types are set, the every bar that is labeled as according to artificial input comments on the comment type marked belonging to it:
We are manual comments on types to comment statement according to 5, and namely 1 to 5 (1 is non-constant, and 2 is poor, and 3 is general, and 4 is not bad, and 5 is fine) carry out classifying and marking.
Step 4: with described sentence vector for input, the comment type that every bar sentence vector is corresponding is export training classifier:
The present embodiment employs the good GBDT of performance (Gradient Boosting Decision Tree) sorting algorithm, the sentence vector training set of mark carries out unsupervised learning, obtains emotion classifiers.
GBDT is a kind of decision Tree algorithms of iteration, and its training method is based on Boosting simultaneously.Its main thought is, Modling model is the Gradient Descent direction at Modling model loss function before each time.In our training process, we optimize two of GBDT parameters, the depth capacity depth of decision tree number nTree and each decision tree.If we obtain by practical experience analysis 2 times that nTree is set to input feature vector, and depth is within 10, results contrast is good.
Step 5: obtain a new comment, and be translated into sentence vector:
Specifically, utilize participle software that participle is carried out in new comment, obtain basic participle.In comment dictionary, search the term vector that in new comment, basic participle is corresponding, the term vector of each basic participle is carried out superposition and obtains sentence vector.
Step 6: being input to newly commenting on corresponding sentence vector in described sorter, obtaining the comment type of new comment.
In order to make comment type have more directive property, we can carry out keyword extraction to each comment type.
Therefore, in another embodiment of the present invention, step 3 comprises further:
Step 31: the key weight calculating the basic participle in comment type in each comment;
Step 32: carry out descending sort according to the basic participle of key weight to all comments in this comment type;
Step 33: the keyword of basic participle as described comment type selecting several inequalities front; Choose in the present embodiment time front 5 inequalities basic participle as keyword, as class 1: dodge move back deadlock noise shake blunt.
The TFIDF that the present embodiment adopts, and in conjunction with part of speech, carry out the key weight calculation of basic participle.
That is, the key weight of a word, is made up of two parts, that is:
Wherein
for TFIDF weight,
for part of speech weight,
represent i-th basic participle in the comment of jth bar.
The concrete computing method of these two parts are:
Wherein, n
i,jrepresent the number of basic participle i in the comment of jth bar,
represent in this comment have how many basic participle, | D| is the quantity commenting on statement in comment corpus, d
krepresent the comment of kth bar,
represent comment corpus in have how many comment on statements include with
identical basic participle.
a segment factor, different according to the part of speech of basic participle,
value is different, when in general we think that part of speech is adjective
maximum, be secondly verb, noun, adverbial word, other.Such as, this basic participle is adjective, then
be 1; If verb,
be 0.8; If noun,
be 0.6; If adverbial word,
be 0.2, if other parts of speech,
be 0.
The part of speech of each basic participle just can be obtained when carrying out participle to statement by participle software equally in the lump.
The present invention is not limited to aforesaid embodiment.The present invention expands to any new feature of disclosing in this manual or any combination newly, and the step of the arbitrary new method disclosed or process or any combination newly.
Claims (10)
1., based on a comment and analysis method for term vector, it is characterized in that, comprising:
Step 1: collect user comment, forms comment corpus;
Step 2: every bar comment of comment corpus is converted into the identical sentence vector of dimension;
Step 3: some comment types are set, the every bar that is labeled as according to artificial input comments on the comment type marked belonging to it;
Step 4: with described sentence vector for input, the comment type that every bar sentence vector is corresponding is export training classifier;
Step 5: obtain a new comment, and be translated into sentence vector;
Step 6: being input to newly commenting on corresponding sentence vector in described sorter, obtaining the comment type of new comment.
2. a kind of comment and analysis method based on term vector according to claim 1, it is characterized in that, described step 2 comprises further:
Step 21: each comment is divided into some basic participles, comments on dictionary to obtaining after basic participle duplicate removal;
Step 22: each basic participle is converted into a term vector; The term vector dimension that each basic participle is corresponding is identical;
Step 23: superposed by term vector corresponding for the basic participle in every bar comment, obtains the sentence vector of this comment;
Described step 5 comprises further:
Step 51: new comment is divided into some basic participles;
Step 52: the term vector that in finding step 51, each basic participle is corresponding in comment dictionary;
Step 53: superposed by term vector corresponding for each basic participle of new comment, obtains the sentence vector of new comment.
3. a kind of comment and analysis method based on term vector according to claim 2, it is characterized in that, described step 22 comprises further: using the input of basic participle as neural network model, makes described neural network model unsupervised learning obtain term vector corresponding to this basic participle.
4. a kind of comment and analysis method based on term vector according to Claims 2 or 3, it is characterized in that, described term vector dimension is 200.
5. a kind of comment and analysis method based on term vector according to claim 2, is characterized in that, step 3 comprises further does following process to the comment in each comment type:
Step 31: the key weight calculating the basic participle in comment type in each comment;
Step 32: carry out descending sort according to the basic participle of key weight to all comments in this comment type;
Step 33: the keyword of basic participle as described comment type selecting front n inequality; Described n get be greater than 0 and be less than or equal to 5 natural number.
6., based on a comment and analysis system for term vector, it is characterized in that, comprising:
Comment collection module, for collecting user comment, forms comment corpus;
Sample sentence vector conversion module, for being converted into the identical sentence vector of dimension by every bar comment of comment corpus;
Comment type labeling module, for arranging some comment types, the every bar that is labeled as according to artificial input comments on the comment type marked belonging to it;
Sorter training module, for vectorial for input with described sentence, the comment type that every bar sentence vector is corresponding is output training classifier;
Comment sentence vector modular converter, for obtaining a new comment, and is translated into sentence vector;
Sorter, the comment type that the sentence vector calculation corresponding according to new comment is newly commented on.
7. a kind of comment and analysis system based on term vector according to claim 6, is characterized in that, described sample sentence vector conversion module comprises further:
Sample word-dividing mode, for each comment in comment corpus is divided into some basic participles, comments on dictionary to obtaining after basic participle duplicate removal;
Sample term vector conversion module, for being converted into a term vector by each basic participle; The term vector dimension that each basic participle is corresponding is identical;
Sample term vector laminating module, for being superposed by term vector corresponding for the basic participle in every bar comment, obtains commenting on the sentence vector of each comment in corpus;
Described comment sentence vector modular converter comprises further:
Comment word-dividing mode, for being divided into some basic participles by new comment;
Comment term vector conversion module, for searching the term vector that in new comment, each basic participle is corresponding in comment dictionary;
Comment term vector laminating module, the term vector corresponding for each the basic participle by new comment superposes, and obtains the sentence vector of new comment.
8. a kind of comment and analysis system based on term vector according to claim 7, it is characterized in that, described sample term vector conversion module is further used for the input of basic participle as neural network model, makes described neural network model unsupervised learning obtain term vector corresponding to this basic participle.
9. a kind of comment and analysis system based on term vector according to claim 7 or 8, it is characterized in that, described term vector dimension is 200.
10. a kind of comment and analysis system based on term vector according to claim 7, is characterized in that, comment type labeling module comprises further:
Key weight computation module, for calculating the key weight of the basic participle in comment type in each comment;
Order module, for carrying out descending sort according to the basic participle of key weight to all comments in this comment type;
Keyword Selection module, for selecting the basic participle of a front n inequality as the keyword of described comment type; Described n get be greater than 0 and be less than or equal to 5 natural number.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510027614.3A CN104573046B (en) | 2015-01-20 | 2015-01-20 | A kind of comment and analysis method and system based on term vector |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510027614.3A CN104573046B (en) | 2015-01-20 | 2015-01-20 | A kind of comment and analysis method and system based on term vector |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104573046A true CN104573046A (en) | 2015-04-29 |
CN104573046B CN104573046B (en) | 2018-07-31 |
Family
ID=53089108
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510027614.3A Active CN104573046B (en) | 2015-01-20 | 2015-01-20 | A kind of comment and analysis method and system based on term vector |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104573046B (en) |
Cited By (36)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105447206A (en) * | 2016-01-05 | 2016-03-30 | 深圳市中易科技有限责任公司 | New comment object identifying method and system based on word2vec algorithm |
CN105512687A (en) * | 2015-12-15 | 2016-04-20 | 北京锐安科技有限公司 | Emotion classification model training and textual emotion polarity analysis method and system |
CN105740382A (en) * | 2016-01-27 | 2016-07-06 | 中山大学 | Aspect classification method for short comment texts |
CN105809186A (en) * | 2016-02-25 | 2016-07-27 | 中国科学院声学研究所 | Emotion classification method and system |
CN105955965A (en) * | 2016-06-21 | 2016-09-21 | 上海智臻智能网络科技股份有限公司 | Question information processing method and device |
CN106095746A (en) * | 2016-06-01 | 2016-11-09 | 竹间智能科技(上海)有限公司 | Word emotion identification system and method |
CN106156004A (en) * | 2016-07-04 | 2016-11-23 | 中国传媒大学 | The sentiment analysis system and method for film comment information based on term vector |
CN106372086A (en) * | 2015-07-23 | 2017-02-01 | 华中师范大学 | Word vector acquisition method and apparatus |
CN106502994A (en) * | 2016-11-29 | 2017-03-15 | 上海智臻智能网络科技股份有限公司 | A kind of method and apparatus of the keyword extraction of text |
CN106776562A (en) * | 2016-12-20 | 2017-05-31 | 上海智臻智能网络科技股份有限公司 | A kind of keyword extracting method and extraction system |
CN106815192A (en) * | 2015-11-27 | 2017-06-09 | 北京国双科技有限公司 | Model training method and device and sentence emotion identification method and device |
CN106815193A (en) * | 2015-11-27 | 2017-06-09 | 北京国双科技有限公司 | Model training method and device and wrong word recognition methods and device |
CN106815198A (en) * | 2015-11-27 | 2017-06-09 | 北京国双科技有限公司 | The recognition methods of model training method and device and sentence type of service and device |
CN106815194A (en) * | 2015-11-27 | 2017-06-09 | 北京国双科技有限公司 | Model training method and device and keyword recognition method and device |
CN106844339A (en) * | 2017-01-09 | 2017-06-13 | 南京大学 | A kind of multi-platform control corresponding method based on term vector |
CN107291780A (en) * | 2016-04-12 | 2017-10-24 | 腾讯科技(深圳)有限公司 | A kind of user comment information methods of exhibiting and device |
WO2018010147A1 (en) * | 2016-07-14 | 2018-01-18 | Linkedin Corporation | User feed with professional and nonprofessional content |
CN107861936A (en) * | 2016-09-28 | 2018-03-30 | 平安科技(深圳)有限公司 | The polarity probability analysis method and device of sentence |
CN108182175A (en) * | 2017-12-29 | 2018-06-19 | 中国银联股份有限公司 | A kind of text quality's index selection method and device |
CN108205542A (en) * | 2016-12-16 | 2018-06-26 | 北京酷我科技有限公司 | A kind of analysis method and system of song comment |
CN108597519A (en) * | 2018-04-04 | 2018-09-28 | 百度在线网络技术(北京)有限公司 | A kind of bill classification method, apparatus, server and storage medium |
CN108664474A (en) * | 2018-05-21 | 2018-10-16 | 众安信息技术服务有限公司 | A kind of resume analytic method based on deep learning |
CN108829672A (en) * | 2018-06-05 | 2018-11-16 | 平安科技(深圳)有限公司 | Sentiment analysis method, apparatus, computer equipment and the storage medium of text |
CN108984775A (en) * | 2018-07-24 | 2018-12-11 | 南京新贝金服科技有限公司 | A kind of public sentiment monitoring method and system based on comment on commodity |
CN109255027A (en) * | 2018-08-27 | 2019-01-22 | 上海宝尊电子商务有限公司 | A kind of method and apparatus of electric business comment sentiment analysis noise reduction |
CN109359190A (en) * | 2018-08-17 | 2019-02-19 | 中国电子科技集团公司第三十研究所 | A kind of position analysis model construction method based on evaluation object camp |
CN109583208A (en) * | 2018-12-03 | 2019-04-05 | 华东计算技术研究所(中国电子科技集团公司第三十二研究所) | Malicious software identification method and system based on mobile application comment data |
CN109960442A (en) * | 2017-12-14 | 2019-07-02 | 腾讯科技(深圳)有限公司 | Transmission method, device, storage medium and the electronic device of prompt information |
CN110175851A (en) * | 2019-02-28 | 2019-08-27 | 腾讯科技(深圳)有限公司 | A kind of cheating detection method and device |
US10521482B2 (en) | 2017-04-24 | 2019-12-31 | Microsoft Technology Licensing, Llc | Finding members with similar data attributes of a user for recommending new social connections |
CN110866800A (en) * | 2019-09-23 | 2020-03-06 | 车智互联(北京)科技有限公司 | Comment generation method and computing device |
CN111274776A (en) * | 2020-01-21 | 2020-06-12 | 中国搜索信息科技股份有限公司 | Article generation method based on keywords |
CN111460224A (en) * | 2020-03-27 | 2020-07-28 | 广州虎牙科技有限公司 | Comment data quality labeling method, device, equipment and storage medium |
CN111753082A (en) * | 2020-03-23 | 2020-10-09 | 北京沃东天骏信息技术有限公司 | Text classification method and device based on comment data, equipment and medium |
WO2021121252A1 (en) * | 2019-12-17 | 2021-06-24 | Beijing Didi Infinity Technology And Development Co., Ltd. | Comment-based behavior prediction |
CN113393276A (en) * | 2021-06-25 | 2021-09-14 | 食亨(上海)科技服务有限公司 | Comment data classification method and device and computer readable medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070078845A1 (en) * | 2005-09-30 | 2007-04-05 | Scott James K | Identifying clusters of similar reviews and displaying representative reviews from multiple clusters |
CN102682000A (en) * | 2011-03-09 | 2012-09-19 | 北京百度网讯科技有限公司 | Text clustering method, question-answering system applying same and search engine applying same |
CN103064971A (en) * | 2013-01-05 | 2013-04-24 | 南京邮电大学 | Scoring and Chinese sentiment analysis based review spam detection method |
CN103116637A (en) * | 2013-02-08 | 2013-05-22 | 无锡南理工科技发展有限公司 | Text sentiment classification method facing Chinese Web comments |
CN104036010A (en) * | 2014-06-25 | 2014-09-10 | 华东师范大学 | Semi-supervised CBOW based user search term subject classification method |
-
2015
- 2015-01-20 CN CN201510027614.3A patent/CN104573046B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070078845A1 (en) * | 2005-09-30 | 2007-04-05 | Scott James K | Identifying clusters of similar reviews and displaying representative reviews from multiple clusters |
CN102682000A (en) * | 2011-03-09 | 2012-09-19 | 北京百度网讯科技有限公司 | Text clustering method, question-answering system applying same and search engine applying same |
CN103064971A (en) * | 2013-01-05 | 2013-04-24 | 南京邮电大学 | Scoring and Chinese sentiment analysis based review spam detection method |
CN103116637A (en) * | 2013-02-08 | 2013-05-22 | 无锡南理工科技发展有限公司 | Text sentiment classification method facing Chinese Web comments |
CN104036010A (en) * | 2014-06-25 | 2014-09-10 | 华东师范大学 | Semi-supervised CBOW based user search term subject classification method |
Cited By (52)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106372086B (en) * | 2015-07-23 | 2019-12-03 | 华中师范大学 | A kind of method and apparatus obtaining term vector |
CN106372086A (en) * | 2015-07-23 | 2017-02-01 | 华中师范大学 | Word vector acquisition method and apparatus |
CN106815193A (en) * | 2015-11-27 | 2017-06-09 | 北京国双科技有限公司 | Model training method and device and wrong word recognition methods and device |
CN106815192A (en) * | 2015-11-27 | 2017-06-09 | 北京国双科技有限公司 | Model training method and device and sentence emotion identification method and device |
CN106815192B (en) * | 2015-11-27 | 2020-04-21 | 北京国双科技有限公司 | Model training method and device and sentence emotion recognition method and device |
CN106815198A (en) * | 2015-11-27 | 2017-06-09 | 北京国双科技有限公司 | The recognition methods of model training method and device and sentence type of service and device |
CN106815194A (en) * | 2015-11-27 | 2017-06-09 | 北京国双科技有限公司 | Model training method and device and keyword recognition method and device |
CN105512687A (en) * | 2015-12-15 | 2016-04-20 | 北京锐安科技有限公司 | Emotion classification model training and textual emotion polarity analysis method and system |
CN105447206B (en) * | 2016-01-05 | 2017-04-05 | 深圳市中易科技有限责任公司 | New comment object identifying method and system based on word2vec algorithms |
CN105447206A (en) * | 2016-01-05 | 2016-03-30 | 深圳市中易科技有限责任公司 | New comment object identifying method and system based on word2vec algorithm |
CN105740382A (en) * | 2016-01-27 | 2016-07-06 | 中山大学 | Aspect classification method for short comment texts |
CN105809186A (en) * | 2016-02-25 | 2016-07-27 | 中国科学院声学研究所 | Emotion classification method and system |
CN107291780A (en) * | 2016-04-12 | 2017-10-24 | 腾讯科技(深圳)有限公司 | A kind of user comment information methods of exhibiting and device |
CN106095746A (en) * | 2016-06-01 | 2016-11-09 | 竹间智能科技(上海)有限公司 | Word emotion identification system and method |
CN106095746B (en) * | 2016-06-01 | 2019-05-10 | 竹间智能科技(上海)有限公司 | Text emotion identification system and method |
CN105955965A (en) * | 2016-06-21 | 2016-09-21 | 上海智臻智能网络科技股份有限公司 | Question information processing method and device |
CN106156004A (en) * | 2016-07-04 | 2016-11-23 | 中国传媒大学 | The sentiment analysis system and method for film comment information based on term vector |
CN106156004B (en) * | 2016-07-04 | 2019-03-26 | 中国传媒大学 | The sentiment analysis system and method for film comment information based on term vector |
WO2018010147A1 (en) * | 2016-07-14 | 2018-01-18 | Linkedin Corporation | User feed with professional and nonprofessional content |
CN107861936A (en) * | 2016-09-28 | 2018-03-30 | 平安科技(深圳)有限公司 | The polarity probability analysis method and device of sentence |
CN106502994A (en) * | 2016-11-29 | 2017-03-15 | 上海智臻智能网络科技股份有限公司 | A kind of method and apparatus of the keyword extraction of text |
CN106502994B (en) * | 2016-11-29 | 2019-12-13 | 上海智臻智能网络科技股份有限公司 | method and device for extracting keywords of text |
CN108205542A (en) * | 2016-12-16 | 2018-06-26 | 北京酷我科技有限公司 | A kind of analysis method and system of song comment |
CN106776562A (en) * | 2016-12-20 | 2017-05-31 | 上海智臻智能网络科技股份有限公司 | A kind of keyword extracting method and extraction system |
CN106776562B (en) * | 2016-12-20 | 2020-07-28 | 上海智臻智能网络科技股份有限公司 | Keyword extraction method and extraction system |
CN106844339B (en) * | 2017-01-09 | 2020-04-28 | 南京大学 | Word vector-based multi-platform control corresponding method |
CN106844339A (en) * | 2017-01-09 | 2017-06-13 | 南京大学 | A kind of multi-platform control corresponding method based on term vector |
US10521482B2 (en) | 2017-04-24 | 2019-12-31 | Microsoft Technology Licensing, Llc | Finding members with similar data attributes of a user for recommending new social connections |
CN109960442A (en) * | 2017-12-14 | 2019-07-02 | 腾讯科技(深圳)有限公司 | Transmission method, device, storage medium and the electronic device of prompt information |
CN109960442B (en) * | 2017-12-14 | 2022-12-13 | 腾讯科技(深圳)有限公司 | Prompt information transmission method and device, storage medium and electronic device |
CN108182175A (en) * | 2017-12-29 | 2018-06-19 | 中国银联股份有限公司 | A kind of text quality's index selection method and device |
CN108597519A (en) * | 2018-04-04 | 2018-09-28 | 百度在线网络技术(北京)有限公司 | A kind of bill classification method, apparatus, server and storage medium |
CN108664474A (en) * | 2018-05-21 | 2018-10-16 | 众安信息技术服务有限公司 | A kind of resume analytic method based on deep learning |
CN108829672A (en) * | 2018-06-05 | 2018-11-16 | 平安科技(深圳)有限公司 | Sentiment analysis method, apparatus, computer equipment and the storage medium of text |
WO2019232893A1 (en) * | 2018-06-05 | 2019-12-12 | 平安科技(深圳)有限公司 | Method and device for text emotion analysis, computer apparatus and storage medium |
CN108984775B (en) * | 2018-07-24 | 2020-05-22 | 南京新贝金服科技有限公司 | Public opinion monitoring method and system based on commodity comments |
CN108984775A (en) * | 2018-07-24 | 2018-12-11 | 南京新贝金服科技有限公司 | A kind of public sentiment monitoring method and system based on comment on commodity |
CN109359190A (en) * | 2018-08-17 | 2019-02-19 | 中国电子科技集团公司第三十研究所 | A kind of position analysis model construction method based on evaluation object camp |
CN109255027A (en) * | 2018-08-27 | 2019-01-22 | 上海宝尊电子商务有限公司 | A kind of method and apparatus of electric business comment sentiment analysis noise reduction |
CN109255027B (en) * | 2018-08-27 | 2022-06-24 | 上海宝尊电子商务有限公司 | E-commerce comment sentiment analysis noise reduction method and device |
CN109583208A (en) * | 2018-12-03 | 2019-04-05 | 华东计算技术研究所(中国电子科技集团公司第三十二研究所) | Malicious software identification method and system based on mobile application comment data |
CN110175851A (en) * | 2019-02-28 | 2019-08-27 | 腾讯科技(深圳)有限公司 | A kind of cheating detection method and device |
CN110175851B (en) * | 2019-02-28 | 2023-09-12 | 腾讯科技(深圳)有限公司 | Cheating behavior detection method and device |
CN110866800A (en) * | 2019-09-23 | 2020-03-06 | 车智互联(北京)科技有限公司 | Comment generation method and computing device |
WO2021121252A1 (en) * | 2019-12-17 | 2021-06-24 | Beijing Didi Infinity Technology And Development Co., Ltd. | Comment-based behavior prediction |
CN111274776A (en) * | 2020-01-21 | 2020-06-12 | 中国搜索信息科技股份有限公司 | Article generation method based on keywords |
CN111274776B (en) * | 2020-01-21 | 2020-12-15 | 中国搜索信息科技股份有限公司 | Article generation method based on keywords |
CN111753082A (en) * | 2020-03-23 | 2020-10-09 | 北京沃东天骏信息技术有限公司 | Text classification method and device based on comment data, equipment and medium |
CN111460224A (en) * | 2020-03-27 | 2020-07-28 | 广州虎牙科技有限公司 | Comment data quality labeling method, device, equipment and storage medium |
CN111460224B (en) * | 2020-03-27 | 2024-03-08 | 广州虎牙科技有限公司 | Comment data quality labeling method, comment data quality labeling device, comment data quality labeling equipment and storage medium |
CN113393276A (en) * | 2021-06-25 | 2021-09-14 | 食亨(上海)科技服务有限公司 | Comment data classification method and device and computer readable medium |
CN113393276B (en) * | 2021-06-25 | 2023-06-16 | 食亨(上海)科技服务有限公司 | Comment data classification method, comment data classification device and computer-readable medium |
Also Published As
Publication number | Publication date |
---|---|
CN104573046B (en) | 2018-07-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104573046A (en) | Comment analyzing method and system based on term vector | |
CN107122340B (en) | A kind of similarity detection method of the science and technology item return based on synonym analysis | |
CN104778209B (en) | A kind of opining mining method for millions scale news analysis | |
CN105808526B (en) | Commodity short text core word extracting method and device | |
CN104199857B (en) | A kind of tax document hierarchy classification method based on multi-tag classification | |
CN104951548B (en) | A kind of computational methods and system of negative public sentiment index | |
CN104391942B (en) | Short essay eigen extended method based on semantic collection of illustrative plates | |
CN102289522B (en) | Method of intelligently classifying texts | |
CN103473280B (en) | Method for mining comparable network language materials | |
CN107861939A (en) | A kind of domain entities disambiguation method for merging term vector and topic model | |
CN109299480A (en) | Terminology Translation method and device based on context of co-text | |
CN105550269A (en) | Product comment analyzing method and system with learning supervising function | |
CN107451278A (en) | Chinese Text Categorization based on more hidden layer extreme learning machines | |
CN106156272A (en) | A kind of information retrieval method based on multi-source semantic analysis | |
CN106021223A (en) | Sentence similarity calculation method and system | |
CN104778161A (en) | Keyword extracting method based on Word2Vec and Query log | |
CN105045812A (en) | Text topic classification method and system | |
CN104731768B (en) | A kind of location of incident abstracting method towards Chinese newsletter archive | |
CN105389379A (en) | Rubbish article classification method based on distributed feature representation of text | |
CN101127042A (en) | Sensibility classification method based on language model | |
CN101944099A (en) | Method for automatically classifying text documents by utilizing body | |
CN102411563A (en) | Method, device and system for identifying target words | |
CN103324628A (en) | Industry classification method and system for text publishing | |
CN106844632A (en) | Based on the product review sensibility classification method and device that improve SVMs | |
CN107291895B (en) | Quick hierarchical document query method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CP02 | Change in the address of a patent holder | ||
CP02 | Change in the address of a patent holder |
Address after: 610015 floor 13, building 1, No.1268, middle section of Tianfu Avenue, Chengdu high tech Zone, China (Sichuan) pilot Free Trade Zone, Chengdu Patentee after: Chengdu PinGuo Digital Entertainment Ltd. Address before: 610041 C12-16 building, Tianfu Software Park, hi tech Zone, Sichuan, Chengdu Patentee before: Chengdu PinGuo Digital Entertainment Ltd. |