CN104573046A - Comment analyzing method and system based on term vector - Google Patents

Comment analyzing method and system based on term vector Download PDF

Info

Publication number
CN104573046A
CN104573046A CN201510027614.3A CN201510027614A CN104573046A CN 104573046 A CN104573046 A CN 104573046A CN 201510027614 A CN201510027614 A CN 201510027614A CN 104573046 A CN104573046 A CN 104573046A
Authority
CN
China
Prior art keywords
comment
term vector
vector
basic
sentence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510027614.3A
Other languages
Chinese (zh)
Other versions
CN104573046B (en
Inventor
廖博森
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Pinguo Technology Co Ltd
Original Assignee
Chengdu Pinguo Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Pinguo Technology Co Ltd filed Critical Chengdu Pinguo Technology Co Ltd
Priority to CN201510027614.3A priority Critical patent/CN104573046B/en
Publication of CN104573046A publication Critical patent/CN104573046A/en
Application granted granted Critical
Publication of CN104573046B publication Critical patent/CN104573046B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates

Abstract

The invention discloses a comment analyzing method and system based on a term vector and relates to the technical field of emotion analysis, natural language processing and the like. A machine is utilized to analyze the comment, automatic user comment analysis is made by using the machine, and the working efficiency is improved. The method is characterized in that user comments are collected to form a comment corpus, each comment in the comment corpus is converted into the sentence vectors with identical dimension, a plurality of comment types are set, each comment is labeled with the corresponding type according to the labels which are input manually, a classifier is trained with the sentence vectors as the input and the comment type that each sentence vector corresponds to as the output, a new comment is acquired and converted into the sentence vector, and the sentence vector that the new comment corresponds to is input into the classifier to obtain the comment type of the new comment.

Description

A kind of comment and analysis method and system based on term vector
Technical field
The present invention relates to the technical field such as sentiment analysis, natural language processing.
Background technology
Along with the development of electric business, on network, the comment of user to certain product is more and more.Analyze the comment of user, user can be understood to the view of producing and suggestion, contribute to the perfect of product like this, and the lifting of service quality.But along with the continuous increase of number of users, corresponding comment amount also increases very large, if or rely on manual read's comment, understand consumers' opinions, work efficiency will be reduced greatly, the opinions or suggestions of user to product or service can not be understood in time.
Summary of the invention
For above-mentioned situation, the present invention proposes a kind of method and system using equipment analysis to comment on, do automatic subscriber comment and analysis by machine, work efficiency is provided.
Based on the comment and analysis method of term vector in the present invention, comprising:
Step 1: collect user comment, forms comment corpus;
Step 2: every bar comment of comment corpus is converted into the identical sentence vector of dimension;
Step 3: some comment types are set, the every bar that is labeled as according to artificial input comments on the comment type marked belonging to it;
Step 4: with described sentence vector for input, the comment type that every bar sentence vector is corresponding is export training classifier;
Step 5: obtain a new comment, and be translated into sentence vector;
Step 6: being input to newly commenting on corresponding sentence vector in described sorter, obtaining the comment type of new comment.
Described step 2 comprises further:
Step 21: each comment is divided into some basic participles, comments on dictionary to obtaining after basic participle duplicate removal;
Step 22: each basic participle is converted into a term vector; The term vector dimension that each basic participle is corresponding is identical;
Step 23: superposed by term vector corresponding for the basic participle in every bar comment, obtains the sentence vector of this comment.
Described step 5 comprises further:
Step 51: new comment is divided into some basic participles;
Step 52: the term vector that in finding step 51, each basic participle is corresponding in comment dictionary;
Step 53: superposed by term vector corresponding for each basic participle of new comment, obtains the sentence vector of new comment.
Described step 22 comprises further: using the input of basic participle as neural network model, makes described neural network model unsupervised learning obtain term vector corresponding to this basic participle.
Preferably, described term vector dimension is 200.
Affiliated step 3 comprises further does following process to the comment in each comment type:
Step 31: the key weight calculating the basic participle in comment type in each comment;
Step 32: carry out descending sort according to the basic participle of key weight to all comments in this comment type;
Step 33: the keyword of basic participle as described comment type selecting front n inequality; Described n get be greater than 0 and be less than or equal to 5 natural number.
Present invention also offers a kind of Commentary Systems based on term vector, comprising:
Comment collection module, for collecting user comment, forms comment corpus;
Sample sentence vector conversion module, for being converted into the identical sentence vector of dimension by every bar comment of comment corpus;
Comment type labeling module, for arranging some comment types, the every bar that is labeled as according to artificial input comments on the comment type marked belonging to it;
Sorter training module, for vectorial for input with described sentence, the comment type that every bar sentence vector is corresponding is output training classifier;
Comment sentence vector modular converter, for obtaining a new comment, and is translated into sentence vector;
Sorter, the comment type that the sentence vector calculation corresponding according to new comment is newly commented on.
Described sample sentence vector conversion module comprises further:
Sample word-dividing mode, for each comment in comment corpus is divided into some basic participles, comments on dictionary to obtaining after basic participle duplicate removal;
Sample term vector conversion module, for being converted into a term vector by each basic participle; The term vector dimension that each basic participle is corresponding is identical;
Sample term vector laminating module, for being superposed by term vector corresponding for the basic participle in every bar comment, obtains commenting on the sentence vector of each comment in corpus.
Described comment sentence vector modular converter comprises further:
Comment word-dividing mode, for being divided into some basic participles by new comment;
Comment term vector conversion module, for searching the term vector that in new comment, each basic participle is corresponding in comment dictionary;
Comment term vector laminating module, the term vector corresponding for each the basic participle by new comment superposes, and obtains the sentence vector of new comment.
Described sample term vector conversion module is further used for the input of basic participle as neural network model, makes described neural network model unsupervised learning obtain term vector corresponding to this basic participle.
Preferably, described term vector dimension is 200.
Comment type labeling module comprises further:
Key weight computation module, for calculating the key weight of the basic participle in comment type in each comment;
Order module, for carrying out descending sort according to the basic participle of key weight to all comments in this comment type;
Keyword Selection module, for selecting the basic participle of a front n inequality as the keyword of described comment type; Described n get be greater than 0 and be less than or equal to 5 natural number.
In sum, owing to have employed technique scheme, the invention has the beneficial effects as follows:
Present invention achieves the robotization of comment and analysis, robotic, substantially increase work efficiency.
The present invention adopts neural network model to calculate the vector of basic participle, and the term vector represented so accurately can not represent the basic participle of its correspondence, and can also embody the incidence relation between word and word, degree of intelligence is higher.
The present invention adopts the vectorial to sentence of the stacked system of term vector, avoid a vector dimension to increase, because the term vector after training is word has been mapped to a new theme dimensional space in fact, so term vector is carried out superposition well can also represent the mapping situation of sentence at such feature space.Do like this, not only avoid the vector that sentence characteristics represents too sparse, the situation that dimension is too much, well at low dimension space representation sentence characteristics, and can not affect classification performance again.
Embodiment
All features disclosed in this instructions, or the step in disclosed all methods or process, except mutually exclusive feature and/or step, all can combine by any way.
Arbitrary feature disclosed in this instructions, unless specifically stated otherwise, all can be replaced by other equivalences or the alternative features with similar object.That is, unless specifically stated otherwise, each feature is an example in a series of equivalence or similar characteristics.
The present invention's specific embodiment comprises the following steps:
Step 1: arrange user comment, forms comment corpus.The concrete comment statement that web crawlers can be used to collect user from each large webpage forms comment corpus.Web crawlers is a kind of program of automatic acquisition web page contents, is the important component part of search engine.The comment statement collected is more, and the comment corpus that we obtain is more complete.
Step 2: every bar comment of comment corpus is converted into the identical sentence vector of dimension: comprise further and use participle (verb, sentence is carried out segmentation) comment statement is divided into basic participle (noun) by software, after each comment participle in comment corpus, obtain commenting on dictionary by after the whole basic participle deduplication obtained.Each basic participle in comment dictionary is being converted into term vector.
The present embodiment uses degree of depth learning art training term vector model:
In order to the advantage of term vector in outstanding the present invention, first set forth the limitation of traditional word bag model here.
Traditional word bag model is the feature be shown as by each vocabulary in a proper vector.If there is a dictionary, comprise 10 words in dictionary, word wherein needs to represent with 10 dimensional vectors, as " good " in dictionary can word bag model representation: v (' good')=[0,1,0,0,0,0,0,0,0,0], " bad " in dictionary can word bag model representation be v (' bad')=[0,0,1,0,0,0,0,0,0,0] etc.
Adopt this word bag model representation word to there is such limitation, when the word amount in dictionary is very large, such as reach ten million order of magnitude other time, represent with regard to needs ten million dimensional vector, occur dimension disaster, therefore need to do feature selecting or feature extraction.Meanwhile, such expression, is difficult to find the relation between word and word, such as ' fantastic ' and ' good' has similarity, but by word bag model, is difficult to measure the similarity between them.
Us are facilitated to improve the expression of term vector based on above-mentioned two reasons.We used neural network model, using the whole basic participle of comment dictionary as training sample, be input in neural network model, make neural network model unsupervised learning obtain the term vector feature of 200 dimensions.In other embodiments, term vector dimension also can be 50,100,150 etc.
After term vector superposition corresponding for all basic participle in a comment in comment corpus, obtain the sentence vector of this comment.
Suppose a comment statement S, wherein w irepresent i-th the basic participle of this comment after participle, so have:
S=w 1, w 2... w i... w n, wherein n represents the word number of sentence.
In the present embodiment, each basic participle w ibeing expressed as a length is the vector of 200, and:
V wi={ v 1, v 2, v 3..., v i... v 200, wherein each dimension represents the value of this word in an abstract dimension.
According to the accumulation principle of the present embodiment, the sentence vector of this comment will be expressed as:
V S = Σ w i ∈ S V w i .
So all comment statements in comment corpus are all expressed as the proper vector of one 200 dimension, avoid " dimension disaster ", also make the pass between word and word tie up in feature and embodied.
The benefit done like this is, the participle no matter commenting on statement has how many, and the dimension of sentence vector is all constant.If adopt traditional mode, replaced by the basic participle in statement with its term vector, if this statement has 10 basic participles, so the sentence vector dimension of this statement will reach 2000, there is the risk of dimension disaster equally.
Because the term vector after training is word has been mapped to a new theme dimensional space in fact, so added up by the term vector of the basic participle in sentence, the mapping situation of sentence at such feature space can be represented well.It is also like this that result proves, not only avoid the vector that sentence characteristics represents too sparse, the situation that dimension is too much, can represent again the feature of sentence well, and not affect classification performance at lower dimensional space.
Step 3: some comment types are set, the every bar that is labeled as according to artificial input comments on the comment type marked belonging to it:
We are manual comments on types to comment statement according to 5, and namely 1 to 5 (1 is non-constant, and 2 is poor, and 3 is general, and 4 is not bad, and 5 is fine) carry out classifying and marking.
Step 4: with described sentence vector for input, the comment type that every bar sentence vector is corresponding is export training classifier:
The present embodiment employs the good GBDT of performance (Gradient Boosting Decision Tree) sorting algorithm, the sentence vector training set of mark carries out unsupervised learning, obtains emotion classifiers.
GBDT is a kind of decision Tree algorithms of iteration, and its training method is based on Boosting simultaneously.Its main thought is, Modling model is the Gradient Descent direction at Modling model loss function before each time.In our training process, we optimize two of GBDT parameters, the depth capacity depth of decision tree number nTree and each decision tree.If we obtain by practical experience analysis 2 times that nTree is set to input feature vector, and depth is within 10, results contrast is good.
Step 5: obtain a new comment, and be translated into sentence vector:
Specifically, utilize participle software that participle is carried out in new comment, obtain basic participle.In comment dictionary, search the term vector that in new comment, basic participle is corresponding, the term vector of each basic participle is carried out superposition and obtains sentence vector.
Step 6: being input to newly commenting on corresponding sentence vector in described sorter, obtaining the comment type of new comment.
In order to make comment type have more directive property, we can carry out keyword extraction to each comment type.
Therefore, in another embodiment of the present invention, step 3 comprises further:
Step 31: the key weight calculating the basic participle in comment type in each comment;
Step 32: carry out descending sort according to the basic participle of key weight to all comments in this comment type;
Step 33: the keyword of basic participle as described comment type selecting several inequalities front; Choose in the present embodiment time front 5 inequalities basic participle as keyword, as class 1: dodge move back deadlock noise shake blunt.
The TFIDF that the present embodiment adopts, and in conjunction with part of speech, carry out the key weight calculation of basic participle.
That is, the key weight of a word, is made up of two parts, that is:
W w i , j = P w i , j × T w i , j .
Wherein for TFIDF weight, for part of speech weight, represent i-th basic participle in the comment of jth bar.
The concrete computing method of these two parts are:
T w i , j = n i , j Σ i n i , j × log ( | D | | { k : w i , j ∈ d k } | ) .
Wherein, n i,jrepresent the number of basic participle i in the comment of jth bar, represent in this comment have how many basic participle, | D| is the quantity commenting on statement in comment corpus, d krepresent the comment of kth bar, represent comment corpus in have how many comment on statements include with identical basic participle.
a segment factor, different according to the part of speech of basic participle, value is different, when in general we think that part of speech is adjective maximum, be secondly verb, noun, adverbial word, other.Such as, this basic participle is adjective, then be 1; If verb, be 0.8; If noun, be 0.6; If adverbial word, be 0.2, if other parts of speech, be 0.
The part of speech of each basic participle just can be obtained when carrying out participle to statement by participle software equally in the lump.
The present invention is not limited to aforesaid embodiment.The present invention expands to any new feature of disclosing in this manual or any combination newly, and the step of the arbitrary new method disclosed or process or any combination newly.

Claims (10)

1., based on a comment and analysis method for term vector, it is characterized in that, comprising:
Step 1: collect user comment, forms comment corpus;
Step 2: every bar comment of comment corpus is converted into the identical sentence vector of dimension;
Step 3: some comment types are set, the every bar that is labeled as according to artificial input comments on the comment type marked belonging to it;
Step 4: with described sentence vector for input, the comment type that every bar sentence vector is corresponding is export training classifier;
Step 5: obtain a new comment, and be translated into sentence vector;
Step 6: being input to newly commenting on corresponding sentence vector in described sorter, obtaining the comment type of new comment.
2. a kind of comment and analysis method based on term vector according to claim 1, it is characterized in that, described step 2 comprises further:
Step 21: each comment is divided into some basic participles, comments on dictionary to obtaining after basic participle duplicate removal;
Step 22: each basic participle is converted into a term vector; The term vector dimension that each basic participle is corresponding is identical;
Step 23: superposed by term vector corresponding for the basic participle in every bar comment, obtains the sentence vector of this comment;
Described step 5 comprises further:
Step 51: new comment is divided into some basic participles;
Step 52: the term vector that in finding step 51, each basic participle is corresponding in comment dictionary;
Step 53: superposed by term vector corresponding for each basic participle of new comment, obtains the sentence vector of new comment.
3. a kind of comment and analysis method based on term vector according to claim 2, it is characterized in that, described step 22 comprises further: using the input of basic participle as neural network model, makes described neural network model unsupervised learning obtain term vector corresponding to this basic participle.
4. a kind of comment and analysis method based on term vector according to Claims 2 or 3, it is characterized in that, described term vector dimension is 200.
5. a kind of comment and analysis method based on term vector according to claim 2, is characterized in that, step 3 comprises further does following process to the comment in each comment type:
Step 31: the key weight calculating the basic participle in comment type in each comment;
Step 32: carry out descending sort according to the basic participle of key weight to all comments in this comment type;
Step 33: the keyword of basic participle as described comment type selecting front n inequality; Described n get be greater than 0 and be less than or equal to 5 natural number.
6., based on a comment and analysis system for term vector, it is characterized in that, comprising:
Comment collection module, for collecting user comment, forms comment corpus;
Sample sentence vector conversion module, for being converted into the identical sentence vector of dimension by every bar comment of comment corpus;
Comment type labeling module, for arranging some comment types, the every bar that is labeled as according to artificial input comments on the comment type marked belonging to it;
Sorter training module, for vectorial for input with described sentence, the comment type that every bar sentence vector is corresponding is output training classifier;
Comment sentence vector modular converter, for obtaining a new comment, and is translated into sentence vector;
Sorter, the comment type that the sentence vector calculation corresponding according to new comment is newly commented on.
7. a kind of comment and analysis system based on term vector according to claim 6, is characterized in that, described sample sentence vector conversion module comprises further:
Sample word-dividing mode, for each comment in comment corpus is divided into some basic participles, comments on dictionary to obtaining after basic participle duplicate removal;
Sample term vector conversion module, for being converted into a term vector by each basic participle; The term vector dimension that each basic participle is corresponding is identical;
Sample term vector laminating module, for being superposed by term vector corresponding for the basic participle in every bar comment, obtains commenting on the sentence vector of each comment in corpus;
Described comment sentence vector modular converter comprises further:
Comment word-dividing mode, for being divided into some basic participles by new comment;
Comment term vector conversion module, for searching the term vector that in new comment, each basic participle is corresponding in comment dictionary;
Comment term vector laminating module, the term vector corresponding for each the basic participle by new comment superposes, and obtains the sentence vector of new comment.
8. a kind of comment and analysis system based on term vector according to claim 7, it is characterized in that, described sample term vector conversion module is further used for the input of basic participle as neural network model, makes described neural network model unsupervised learning obtain term vector corresponding to this basic participle.
9. a kind of comment and analysis system based on term vector according to claim 7 or 8, it is characterized in that, described term vector dimension is 200.
10. a kind of comment and analysis system based on term vector according to claim 7, is characterized in that, comment type labeling module comprises further:
Key weight computation module, for calculating the key weight of the basic participle in comment type in each comment;
Order module, for carrying out descending sort according to the basic participle of key weight to all comments in this comment type;
Keyword Selection module, for selecting the basic participle of a front n inequality as the keyword of described comment type; Described n get be greater than 0 and be less than or equal to 5 natural number.
CN201510027614.3A 2015-01-20 2015-01-20 A kind of comment and analysis method and system based on term vector Active CN104573046B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510027614.3A CN104573046B (en) 2015-01-20 2015-01-20 A kind of comment and analysis method and system based on term vector

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510027614.3A CN104573046B (en) 2015-01-20 2015-01-20 A kind of comment and analysis method and system based on term vector

Publications (2)

Publication Number Publication Date
CN104573046A true CN104573046A (en) 2015-04-29
CN104573046B CN104573046B (en) 2018-07-31

Family

ID=53089108

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510027614.3A Active CN104573046B (en) 2015-01-20 2015-01-20 A kind of comment and analysis method and system based on term vector

Country Status (1)

Country Link
CN (1) CN104573046B (en)

Cited By (36)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105447206A (en) * 2016-01-05 2016-03-30 深圳市中易科技有限责任公司 New comment object identifying method and system based on word2vec algorithm
CN105512687A (en) * 2015-12-15 2016-04-20 北京锐安科技有限公司 Emotion classification model training and textual emotion polarity analysis method and system
CN105740382A (en) * 2016-01-27 2016-07-06 中山大学 Aspect classification method for short comment texts
CN105809186A (en) * 2016-02-25 2016-07-27 中国科学院声学研究所 Emotion classification method and system
CN105955965A (en) * 2016-06-21 2016-09-21 上海智臻智能网络科技股份有限公司 Question information processing method and device
CN106095746A (en) * 2016-06-01 2016-11-09 竹间智能科技(上海)有限公司 Word emotion identification system and method
CN106156004A (en) * 2016-07-04 2016-11-23 中国传媒大学 The sentiment analysis system and method for film comment information based on term vector
CN106372086A (en) * 2015-07-23 2017-02-01 华中师范大学 Word vector acquisition method and apparatus
CN106502994A (en) * 2016-11-29 2017-03-15 上海智臻智能网络科技股份有限公司 A kind of method and apparatus of the keyword extraction of text
CN106776562A (en) * 2016-12-20 2017-05-31 上海智臻智能网络科技股份有限公司 A kind of keyword extracting method and extraction system
CN106815192A (en) * 2015-11-27 2017-06-09 北京国双科技有限公司 Model training method and device and sentence emotion identification method and device
CN106815193A (en) * 2015-11-27 2017-06-09 北京国双科技有限公司 Model training method and device and wrong word recognition methods and device
CN106815198A (en) * 2015-11-27 2017-06-09 北京国双科技有限公司 The recognition methods of model training method and device and sentence type of service and device
CN106815194A (en) * 2015-11-27 2017-06-09 北京国双科技有限公司 Model training method and device and keyword recognition method and device
CN106844339A (en) * 2017-01-09 2017-06-13 南京大学 A kind of multi-platform control corresponding method based on term vector
CN107291780A (en) * 2016-04-12 2017-10-24 腾讯科技(深圳)有限公司 A kind of user comment information methods of exhibiting and device
WO2018010147A1 (en) * 2016-07-14 2018-01-18 Linkedin Corporation User feed with professional and nonprofessional content
CN107861936A (en) * 2016-09-28 2018-03-30 平安科技(深圳)有限公司 The polarity probability analysis method and device of sentence
CN108182175A (en) * 2017-12-29 2018-06-19 中国银联股份有限公司 A kind of text quality's index selection method and device
CN108205542A (en) * 2016-12-16 2018-06-26 北京酷我科技有限公司 A kind of analysis method and system of song comment
CN108597519A (en) * 2018-04-04 2018-09-28 百度在线网络技术(北京)有限公司 A kind of bill classification method, apparatus, server and storage medium
CN108664474A (en) * 2018-05-21 2018-10-16 众安信息技术服务有限公司 A kind of resume analytic method based on deep learning
CN108829672A (en) * 2018-06-05 2018-11-16 平安科技(深圳)有限公司 Sentiment analysis method, apparatus, computer equipment and the storage medium of text
CN108984775A (en) * 2018-07-24 2018-12-11 南京新贝金服科技有限公司 A kind of public sentiment monitoring method and system based on comment on commodity
CN109255027A (en) * 2018-08-27 2019-01-22 上海宝尊电子商务有限公司 A kind of method and apparatus of electric business comment sentiment analysis noise reduction
CN109359190A (en) * 2018-08-17 2019-02-19 中国电子科技集团公司第三十研究所 A kind of position analysis model construction method based on evaluation object camp
CN109583208A (en) * 2018-12-03 2019-04-05 华东计算技术研究所(中国电子科技集团公司第三十二研究所) Malicious software identification method and system based on mobile application comment data
CN109960442A (en) * 2017-12-14 2019-07-02 腾讯科技(深圳)有限公司 Transmission method, device, storage medium and the electronic device of prompt information
CN110175851A (en) * 2019-02-28 2019-08-27 腾讯科技(深圳)有限公司 A kind of cheating detection method and device
US10521482B2 (en) 2017-04-24 2019-12-31 Microsoft Technology Licensing, Llc Finding members with similar data attributes of a user for recommending new social connections
CN110866800A (en) * 2019-09-23 2020-03-06 车智互联(北京)科技有限公司 Comment generation method and computing device
CN111274776A (en) * 2020-01-21 2020-06-12 中国搜索信息科技股份有限公司 Article generation method based on keywords
CN111460224A (en) * 2020-03-27 2020-07-28 广州虎牙科技有限公司 Comment data quality labeling method, device, equipment and storage medium
CN111753082A (en) * 2020-03-23 2020-10-09 北京沃东天骏信息技术有限公司 Text classification method and device based on comment data, equipment and medium
WO2021121252A1 (en) * 2019-12-17 2021-06-24 Beijing Didi Infinity Technology And Development Co., Ltd. Comment-based behavior prediction
CN113393276A (en) * 2021-06-25 2021-09-14 食亨(上海)科技服务有限公司 Comment data classification method and device and computer readable medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070078845A1 (en) * 2005-09-30 2007-04-05 Scott James K Identifying clusters of similar reviews and displaying representative reviews from multiple clusters
CN102682000A (en) * 2011-03-09 2012-09-19 北京百度网讯科技有限公司 Text clustering method, question-answering system applying same and search engine applying same
CN103064971A (en) * 2013-01-05 2013-04-24 南京邮电大学 Scoring and Chinese sentiment analysis based review spam detection method
CN103116637A (en) * 2013-02-08 2013-05-22 无锡南理工科技发展有限公司 Text sentiment classification method facing Chinese Web comments
CN104036010A (en) * 2014-06-25 2014-09-10 华东师范大学 Semi-supervised CBOW based user search term subject classification method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070078845A1 (en) * 2005-09-30 2007-04-05 Scott James K Identifying clusters of similar reviews and displaying representative reviews from multiple clusters
CN102682000A (en) * 2011-03-09 2012-09-19 北京百度网讯科技有限公司 Text clustering method, question-answering system applying same and search engine applying same
CN103064971A (en) * 2013-01-05 2013-04-24 南京邮电大学 Scoring and Chinese sentiment analysis based review spam detection method
CN103116637A (en) * 2013-02-08 2013-05-22 无锡南理工科技发展有限公司 Text sentiment classification method facing Chinese Web comments
CN104036010A (en) * 2014-06-25 2014-09-10 华东师范大学 Semi-supervised CBOW based user search term subject classification method

Cited By (52)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106372086B (en) * 2015-07-23 2019-12-03 华中师范大学 A kind of method and apparatus obtaining term vector
CN106372086A (en) * 2015-07-23 2017-02-01 华中师范大学 Word vector acquisition method and apparatus
CN106815193A (en) * 2015-11-27 2017-06-09 北京国双科技有限公司 Model training method and device and wrong word recognition methods and device
CN106815192A (en) * 2015-11-27 2017-06-09 北京国双科技有限公司 Model training method and device and sentence emotion identification method and device
CN106815192B (en) * 2015-11-27 2020-04-21 北京国双科技有限公司 Model training method and device and sentence emotion recognition method and device
CN106815198A (en) * 2015-11-27 2017-06-09 北京国双科技有限公司 The recognition methods of model training method and device and sentence type of service and device
CN106815194A (en) * 2015-11-27 2017-06-09 北京国双科技有限公司 Model training method and device and keyword recognition method and device
CN105512687A (en) * 2015-12-15 2016-04-20 北京锐安科技有限公司 Emotion classification model training and textual emotion polarity analysis method and system
CN105447206B (en) * 2016-01-05 2017-04-05 深圳市中易科技有限责任公司 New comment object identifying method and system based on word2vec algorithms
CN105447206A (en) * 2016-01-05 2016-03-30 深圳市中易科技有限责任公司 New comment object identifying method and system based on word2vec algorithm
CN105740382A (en) * 2016-01-27 2016-07-06 中山大学 Aspect classification method for short comment texts
CN105809186A (en) * 2016-02-25 2016-07-27 中国科学院声学研究所 Emotion classification method and system
CN107291780A (en) * 2016-04-12 2017-10-24 腾讯科技(深圳)有限公司 A kind of user comment information methods of exhibiting and device
CN106095746A (en) * 2016-06-01 2016-11-09 竹间智能科技(上海)有限公司 Word emotion identification system and method
CN106095746B (en) * 2016-06-01 2019-05-10 竹间智能科技(上海)有限公司 Text emotion identification system and method
CN105955965A (en) * 2016-06-21 2016-09-21 上海智臻智能网络科技股份有限公司 Question information processing method and device
CN106156004A (en) * 2016-07-04 2016-11-23 中国传媒大学 The sentiment analysis system and method for film comment information based on term vector
CN106156004B (en) * 2016-07-04 2019-03-26 中国传媒大学 The sentiment analysis system and method for film comment information based on term vector
WO2018010147A1 (en) * 2016-07-14 2018-01-18 Linkedin Corporation User feed with professional and nonprofessional content
CN107861936A (en) * 2016-09-28 2018-03-30 平安科技(深圳)有限公司 The polarity probability analysis method and device of sentence
CN106502994A (en) * 2016-11-29 2017-03-15 上海智臻智能网络科技股份有限公司 A kind of method and apparatus of the keyword extraction of text
CN106502994B (en) * 2016-11-29 2019-12-13 上海智臻智能网络科技股份有限公司 method and device for extracting keywords of text
CN108205542A (en) * 2016-12-16 2018-06-26 北京酷我科技有限公司 A kind of analysis method and system of song comment
CN106776562A (en) * 2016-12-20 2017-05-31 上海智臻智能网络科技股份有限公司 A kind of keyword extracting method and extraction system
CN106776562B (en) * 2016-12-20 2020-07-28 上海智臻智能网络科技股份有限公司 Keyword extraction method and extraction system
CN106844339B (en) * 2017-01-09 2020-04-28 南京大学 Word vector-based multi-platform control corresponding method
CN106844339A (en) * 2017-01-09 2017-06-13 南京大学 A kind of multi-platform control corresponding method based on term vector
US10521482B2 (en) 2017-04-24 2019-12-31 Microsoft Technology Licensing, Llc Finding members with similar data attributes of a user for recommending new social connections
CN109960442A (en) * 2017-12-14 2019-07-02 腾讯科技(深圳)有限公司 Transmission method, device, storage medium and the electronic device of prompt information
CN109960442B (en) * 2017-12-14 2022-12-13 腾讯科技(深圳)有限公司 Prompt information transmission method and device, storage medium and electronic device
CN108182175A (en) * 2017-12-29 2018-06-19 中国银联股份有限公司 A kind of text quality's index selection method and device
CN108597519A (en) * 2018-04-04 2018-09-28 百度在线网络技术(北京)有限公司 A kind of bill classification method, apparatus, server and storage medium
CN108664474A (en) * 2018-05-21 2018-10-16 众安信息技术服务有限公司 A kind of resume analytic method based on deep learning
CN108829672A (en) * 2018-06-05 2018-11-16 平安科技(深圳)有限公司 Sentiment analysis method, apparatus, computer equipment and the storage medium of text
WO2019232893A1 (en) * 2018-06-05 2019-12-12 平安科技(深圳)有限公司 Method and device for text emotion analysis, computer apparatus and storage medium
CN108984775B (en) * 2018-07-24 2020-05-22 南京新贝金服科技有限公司 Public opinion monitoring method and system based on commodity comments
CN108984775A (en) * 2018-07-24 2018-12-11 南京新贝金服科技有限公司 A kind of public sentiment monitoring method and system based on comment on commodity
CN109359190A (en) * 2018-08-17 2019-02-19 中国电子科技集团公司第三十研究所 A kind of position analysis model construction method based on evaluation object camp
CN109255027A (en) * 2018-08-27 2019-01-22 上海宝尊电子商务有限公司 A kind of method and apparatus of electric business comment sentiment analysis noise reduction
CN109255027B (en) * 2018-08-27 2022-06-24 上海宝尊电子商务有限公司 E-commerce comment sentiment analysis noise reduction method and device
CN109583208A (en) * 2018-12-03 2019-04-05 华东计算技术研究所(中国电子科技集团公司第三十二研究所) Malicious software identification method and system based on mobile application comment data
CN110175851A (en) * 2019-02-28 2019-08-27 腾讯科技(深圳)有限公司 A kind of cheating detection method and device
CN110175851B (en) * 2019-02-28 2023-09-12 腾讯科技(深圳)有限公司 Cheating behavior detection method and device
CN110866800A (en) * 2019-09-23 2020-03-06 车智互联(北京)科技有限公司 Comment generation method and computing device
WO2021121252A1 (en) * 2019-12-17 2021-06-24 Beijing Didi Infinity Technology And Development Co., Ltd. Comment-based behavior prediction
CN111274776A (en) * 2020-01-21 2020-06-12 中国搜索信息科技股份有限公司 Article generation method based on keywords
CN111274776B (en) * 2020-01-21 2020-12-15 中国搜索信息科技股份有限公司 Article generation method based on keywords
CN111753082A (en) * 2020-03-23 2020-10-09 北京沃东天骏信息技术有限公司 Text classification method and device based on comment data, equipment and medium
CN111460224A (en) * 2020-03-27 2020-07-28 广州虎牙科技有限公司 Comment data quality labeling method, device, equipment and storage medium
CN111460224B (en) * 2020-03-27 2024-03-08 广州虎牙科技有限公司 Comment data quality labeling method, comment data quality labeling device, comment data quality labeling equipment and storage medium
CN113393276A (en) * 2021-06-25 2021-09-14 食亨(上海)科技服务有限公司 Comment data classification method and device and computer readable medium
CN113393276B (en) * 2021-06-25 2023-06-16 食亨(上海)科技服务有限公司 Comment data classification method, comment data classification device and computer-readable medium

Also Published As

Publication number Publication date
CN104573046B (en) 2018-07-31

Similar Documents

Publication Publication Date Title
CN104573046A (en) Comment analyzing method and system based on term vector
CN107122340B (en) A kind of similarity detection method of the science and technology item return based on synonym analysis
CN104778209B (en) A kind of opining mining method for millions scale news analysis
CN105808526B (en) Commodity short text core word extracting method and device
CN104199857B (en) A kind of tax document hierarchy classification method based on multi-tag classification
CN104951548B (en) A kind of computational methods and system of negative public sentiment index
CN104391942B (en) Short essay eigen extended method based on semantic collection of illustrative plates
CN102289522B (en) Method of intelligently classifying texts
CN103473280B (en) Method for mining comparable network language materials
CN107861939A (en) A kind of domain entities disambiguation method for merging term vector and topic model
CN109299480A (en) Terminology Translation method and device based on context of co-text
CN105550269A (en) Product comment analyzing method and system with learning supervising function
CN107451278A (en) Chinese Text Categorization based on more hidden layer extreme learning machines
CN106156272A (en) A kind of information retrieval method based on multi-source semantic analysis
CN106021223A (en) Sentence similarity calculation method and system
CN104778161A (en) Keyword extracting method based on Word2Vec and Query log
CN105045812A (en) Text topic classification method and system
CN104731768B (en) A kind of location of incident abstracting method towards Chinese newsletter archive
CN105389379A (en) Rubbish article classification method based on distributed feature representation of text
CN101127042A (en) Sensibility classification method based on language model
CN101944099A (en) Method for automatically classifying text documents by utilizing body
CN102411563A (en) Method, device and system for identifying target words
CN103324628A (en) Industry classification method and system for text publishing
CN106844632A (en) Based on the product review sensibility classification method and device that improve SVMs
CN107291895B (en) Quick hierarchical document query method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP02 Change in the address of a patent holder
CP02 Change in the address of a patent holder

Address after: 610015 floor 13, building 1, No.1268, middle section of Tianfu Avenue, Chengdu high tech Zone, China (Sichuan) pilot Free Trade Zone, Chengdu

Patentee after: Chengdu PinGuo Digital Entertainment Ltd.

Address before: 610041 C12-16 building, Tianfu Software Park, hi tech Zone, Sichuan, Chengdu

Patentee before: Chengdu PinGuo Digital Entertainment Ltd.