CN105335352A - Entity identification method based on Weibo emotion - Google Patents

Entity identification method based on Weibo emotion Download PDF

Info

Publication number
CN105335352A
CN105335352A CN201510864383.1A CN201510864383A CN105335352A CN 105335352 A CN105335352 A CN 105335352A CN 201510864383 A CN201510864383 A CN 201510864383A CN 105335352 A CN105335352 A CN 105335352A
Authority
CN
China
Prior art keywords
emotion
machine learning
word
entity
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510864383.1A
Other languages
Chinese (zh)
Inventor
崔晓辉
朱卫平
张威风
杨威
王志波
李伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CN201510864383.1A priority Critical patent/CN105335352A/en
Publication of CN105335352A publication Critical patent/CN105335352A/en
Pending legal-status Critical Current

Links

Abstract

The invention provides an entity identification technology based on Weibo emotion. The entity identification technology comprises the steps that Weibo data are acquired through an api collection technology and preprocessed, wherein a Circumplex annular emotion model is used as an emotion analysis model, and four kinds of emotion keyword dictionaries are generated; Weibo data are acquired through the API collection technology, preprocessing is conducted on data and vectorization is conducted on a data set, learning and training are conducted through four machine learning algorithms, quintuplicate cross validation is conducted, and classification is conducted on the new data set through a selected optimal machine learning classification program; finally, entity extraction is conducted on the classified data.

Description

Based on the entity recognition method of microblog emotional
Technical field
The present invention relates to the gather and analysis field of large data in network, be specifically related to a kind of entity recognition method based on microblog emotional.
Technical background
At home, because microblogging is the novel social media platform just grown up in recent years, so the research of the domestic sentiment analysis for microblogging short text is started late.Research relatively is early feature extraction leaf is strong, male three scholars of Zhang Ziqiong and Luo Zhen are based upon the N-POS language model generally used basis being carried out Chinese word group, propose the two words of Chinese subjective phrase model 2-POS, the fixed basis of the emotion recognition pad for Chinese-character text content.After this, the method of the machine learning such as Xu Junyong naive Bayesian and maximum entropy is carried out text emotion and is excavated classification, its result of study shows, in based on the Chinese text classifying content of emotion, utilize machine learning method can obtain satisfied effect, accuracy rate can reach more than 90%.For film comment, Hu Yi application N-Gram language model, Naive Bayes Classification method and support vector machine (SVM) carry out emotional semantic classification research, find when the limited deficiency of text training sample, the classification accuracy of N-Gram language model is higher, and has good extendability.On the basis of these researchs, research based on the text mining of emotion constantly increases, Related Research Domain is expanded, if the scholars such as Pang Lei are by naive Bayesian, SVM and maximum entropy three kinds of sorting techniques, the stock comment content in Sina's microblogging is expected to rise and positive and negative attitude classification expected to fall.Fu Xianghua, grandson elder generation and Feng time by different angles, sentiment analysis research is carried out to Chinese blog, and propose a kind of based on document subject matter generation model and the Chinese blog many-sided topic emotion method for digging knowing net dictionary; The sentiment analysis method of adding up based on dictionary is introduced microblog emotional analysis; Propose a kind of algorithm SOAD (sentimentorientationanalysisbasedonsyntacticdependency) based on syntax dependency parsing technology and emotional orientation analysis is carried out to blog article Search Results.
In general, along with the development of internet, in recent years, external a lot of scholar starts to carry out emotion Research on Mining in field more widely, comprises tourism blog, blawg, video display comment etc.Emotion is excavated and to be intended to according to special sorting technique from consumer extracting the comment of specific products or service actively or negative attitude, utilize the result of emotional semantic classification, consumer can recognize the necessary information making purchase decision, and the reaction of user and the performance of its rival can be learned by businessman.Along with widely using of computer technology, the emotion of comment content excavates the trend having become research recently, is widely used in every field.
Named entity recognition, is also referred to as Entity recognition or Named-Entity-Recognition simultaneously, refers to the entity in a string text with certain sense, mainly refer to name, place name, mechanism's name, proper noun etc.In the last few years, along with computer information retrieval technology and search engine technique obtain very fast development, named entity recognition technology based on Chinese has become the hot subject of natural language processing research circle, according to domestic present Research, the technical method at present based on the named entity recognition of Chinese mainly contains following four kinds: the recognition methods that the recognition methods of Corpus--based Method, rule-based recognition methods, rule and statistics combine, the recognition methods based on machine learning.
(1) Statistics-Based Method
The statistical model that the named entity recognition of Chinese adopts mainly contains: Hidden Markov Model (HMM), decision-tree model, supporting vector machine model, maximum entropy model and conditional random field models.Asahara, by adopting the method for support vector machine to the name of China and institutionally having carried out automatic identification, achieves reasonable result.
(2) rule-based method
Rule-based named entity recognition technology mainly utilizes two kinds of information: restricted composition and named entity word.The method that what Tan taked is drives based on transcription error thus obtain the contextual association rule of named entity place name, then these automatic identification of rule realization to Chinese Place Names is used, show through certain data test, the accuracy rate of this recognition methods can reach 97%.
(3) method that combines of rules and statistical approaches
Rule and statistics are combined together by some Chinese named entity automatic recognition systems of current main flow, and it first adopts statistical method to carry out mirror image identification to entity, then utilize rule to carry out correction to it and filter.Huang Degen utilizes a large amount of statistics obtained from a large amount of real text data, and calculates lasting word-building confidence level and the word-building confidence level of each name, then automatically identifies Chinese personal name in conjunction with certain rule.
(4) based on the method for machine learning
Named entity recognition technology in English is simpler than the named entity recognition technology of Chinese a lot, because the trouble that English does not have participle to bring, and the participle accuracy rate of Chinese is the key factor affecting Chinese named entity recognition technology.Named entity recognition technology comparative maturity in English, utilizes the machine learning method of support vector machine to classify to English word, can reach place name and the name recognition accuracy of more than 99%.
Microblogging, as a kind of main medium form of social network sites, is more and more subject to the favor of people.People tend to obtain the information such as news, comment, amusement from microblogging, and instantly, the impact that microblogging is propagated network public-opinion is more and more serious.Comprise the affective characteristics of different trend in micro-blog information, excavate these features and control all significant for public sentiment monitoring, the marketing, rumour.Most sentiment analysis is all that text emotion is divided into negative 3 classes in center, if directly the sentiment analysis of this coarseness is applied to this social media of microblogging, help limited to the understanding of people, be not enough to reach real society of listening to and pulse, listen attentively to the object of social affection.
Summary of the invention
For the deficiencies in the prior art, the present invention have devised a kind of entity analysis technology based on microblog emotional, and accuracy of identification of the present invention is high, and processing speed is fast, is applicable to the accurate identification of large-scale data.
For achieving the above object, present invention employs following technical scheme, a kind of entity recognition method based on microblog emotional, comprises following step:
Step 1. training stage, choose optimum machine learning algorithm;
Step 1.1, according to Circumplex annular emotion model, constructs four class emotion word dictionaries;
Four described class emotion word dictionaries are mapped among a two-dimensional coordinate system, the coordinate axis of this four dimensions respectively: happy and active, happy but inactive, unhappy but active and unhappy inactive;
Step 1.2 uses network AP I acquisition technique, with four class emotion word for keyword obtains microblog data from microblogging, as training data.
Step 1.3 carries out pre-service to the training data collected, the training dataset of generating standard;
Step 1.4 pair training data extracts key word, carries out vectorization according to vector space model to training dataset;
Punctuation mark and emoticon are carried out vectorization as a mark equally, can the be more effective and proper emotion of text be analyzed.The vectorization of punctuation mark and emoticon is that emoticon and punctuation mark are replaced to corresponding English word, and then carries out word vectorization, such as: smiling face replaces with happy, and the term vector (1,0,0,1,1,2) of happy.
Step 1.5, according to the machine learning algorithm preset, carries out emotional semantic classification and 5 retransposings checking to the training dataset of vectorization respectively;
Step 1.6 calculates accuracy rate and the recall rate of each machine learning algorithm 5 cross validations, picks out accuracy rate and the highest machine learning algorithm of recall rate mean value as optimum machine learning classification algorithm.
Step 2. experimental phase, according to the optimum machine learning classification algorithm that step 1 obtains, obtain the emotion entity be identified.
Step 2.1 obtains the experimental data collection of vectorization to the method that step 1.4 is identical according to step 1.1 in step 1;
Step 2.2 uses the optimum machine learning classification algorithm obtained in step 1, classifies to experimental data collection, obtains four class affection data collection;
Step 2.3 is carried out an entity respectively to four class affection data collection and is extracted, and obtains the emotion entity be identified.
Further, the pre-service in described step 1.3, comprises the phrase that corrects mistakes, the irrelevant phrase of deletion, the phrase that corrects mistakes, the microblogging deleting ambiguity and synonym conversion; The described phrase that corrects mistakes refers to be revised the word of misspelling; Delete irrelevant phrase to refer to delete the word of sentiment analysis without any benefit; The microblogging deleting ambiguity refers to the microblogging but belonging to different emotion classifications at a text; Synonym conversion refers to and another word of the word of equivalent is replaced.
Preferably, in described step 1.4, use TF-IDF algorithm to extract keyword, if comprise expression and punctuation mark, then the punctuation mark of conventional emoticon and the expression tone is converted into corresponding word.
Preferably, use word2vec Open-Source Tools to build term vector in described step 1.4, according to vector space model, vectorization is carried out to training dataset.
Preferably, in described step 2.3, use SENNA degree of deep learning tool bag, an entity is carried out respectively to four class affection data collection and extracts.
Preferably, in described step 1.5, the machine learning algorithm preset comprises naive Bayesian, logistic regression, support vector machine and k nearest neighbor algorithm 4 kinds of machine learning algorithms.
The present invention is undertaken classifying and Entity recognition by the study of the machine degree of depth, and carry out more fine-grained Entity recognition to the emotion of microblogging, the degree of accuracy of identification is high, effective.Following benefit can be produced:
1. by data processing and the sentiment analysis that can carry out the granularity of more refinement after analyzing;
2. the fine granularity sentiment analysis by obtaining, can react the emotional status of people to this colony of microblogging;
3. be conducive to government, tissue, the individual understanding and grasping to social affection.
Accompanying drawing explanation
Fig. 1 is process flow diagram of the present invention;
Embodiment
For making technological means of the present invention, creation characteristic, reaching object and effect is easy to understand, below in conjunction with embodiment, setting forth the present invention further.
Data in microblogging are very large, rely on artificial method to classify to it, and by manpower, material resources and financial resources a large amount of for cost, the Hashtag theme label therefore provided in use microblogging is as the emotion of this microblogging.If we think that a microblogging is marked by this emotion class label, then this microblogging belongs to this emotion classification.
Based on an entity recognition method for microblog emotional, comprise following step:
Step 1. training stage, choose optimum machine learning algorithm;
Step 1.1, according to Circumplex annular emotion model, constructs four class emotion word dictionaries; Four described class emotion word dictionaries are mapped among a two-dimensional coordinate system, the coordinate axis of this four dimensions respectively: happy and active, happy but inactive, unhappy but active and unhappy inactive;
Step 1.2 uses network AP I acquisition technique, with four class emotion word for keyword obtains microblog data from microblogging, as training data.
Step 1.3 carries out pre-service to the training data collected, the training dataset of generating standard; Comprise the pre-service of data: correct mistakes phrase, delete irrelevant phrase, delete ambiguous data, synonym is changed.
The phrase that corrects mistakes refers to be revised the word of misspelling, such as: eta is modified to eat, delete irrelevant phrase and refer to that those are deleted the word of sentiment analysis without any benefit, such as the, of etc. are without the word of practical significance, and the microblogging deleting ambiguity refers to that those text but belongs to the microblogging of different emotion classifications.Synonym conversion refers to and the word of an equivalent word is replaced.
Step 1.4 pair training data extraction key word, uses TF-IDF algorithm to extract keyword, if comprise expression and punctuation mark, then the punctuation mark of conventional emoticon and the expression tone is converted into corresponding word.
Use word2vec Open-Source Tools to build term vector, according to vector space model, vectorization is carried out to training dataset; Not only comprise word in described vectorization procedure, also comprise punctuation mark and emoticon.
Vector space model is classical text feature model, is proposed, and achieved successful application by people such as Salton in the sixties on SMART text retrieval system.
Build term vector: term vector refers to and represents a word with a vector, such as: happy can represent with vectorial (0,1,3,4,1,1).
Word2vec is a efficient tool word being characterized by real number value vector that Goole increased income in 2013.We use this instrument to be represented by each word vector.
The vectorization of data set: extract keyword to each data, the one group of keyword being comparative maturity TF-IDF algorithm and generating used herein, is then converted into term vector keyword.This data are represented with this group term vector.Such as: these data of Iwanttogohome, can extract keyword: I, go, home tri-keywords, the term vector of three keywords is (1,0,1,0,1,3), (0,1,2,3,0,0), (1,1,3,2,1,6) so this data can be represented with these three vectors.
Step 1.5, according to the machine learning algorithm preset, carries out emotional semantic classification and 5 retransposings checking to the training dataset of vectorization respectively;
5 retransposing checkings: the data set obtained is divided into 5 equal portions at random, will wherein 4 equal portions as training set, 1 equal portions are as test set, training set is used to train machine learning algorithm, after having trained, machine learning algorithm can generate a decision tree function, and tests remaining test set with decision tree function.And calculate accuracy rate and the recall rate of classification.This process repeats 5 times.
This method presets 4 kinds of four kinds of machine learning algorithms, employs following machine learning algorithm:
1. naive Bayesian
The ultimate principle of naive Bayesian is: for the data item of a given Awaiting Triage, need to obtain the probability that on the basis that occurs in this data item, other each classification occurs respectively, this probability is referred to as posterior probability usually, which is maximum, just thinks which target classification this pending data item belongs to.
Formula is as follows:
p ( C k | x ) = p ( C k ) p ( x | C k ) p ( x )
Formula describes: event C kprobability be P (C k), the probability of event x be P (x), event Ck under occurrence condition the probability of event x be P (A|Ck), under event x occurrence condition, the probability of happening of Ck is P (Ck|x)
Programmed logic is as follows: Ck represents classification, P (x) represents data to be sorted, for the classification number determined, P (Ck) is fixing, such as probability is here 0.25 (1/4), and for a Data classification, P (x) also determines, so it is maximum only to need to calculate P (x|Ck), just can show that P (Ck|x) is maximum.P (x|Ck) represents the probability occurring x in Ck class, and this probability obtains in training set, and such as: in training set assorting process, have 100 in Ck, x occupies 10, then probability is 0.1.
2. logistic regression
Logistic regression and numerous regretional analysis and multiple linear regression have some similar parts, and these regression models all belong to (generalizedlinearmodel) of generalized linear model.For in generalized linear model family member, the difference of each regretional analysis is more the difference of dependent variable.Constitutive logic needs following committed step when returning:
1. set up anticipation function, anticipation function refers to that the probability of happening of a certain thing is much.
2. constitutive logic function, logical function refers to Sigmoid function, because anticipation function is the approximation probability function obtained according to original training data, so likely there is the situation being less than 0 in the span of this probability function, therefore the concept of logical function is just introduced, logical function can be mapped to the number of minus infinity to positive infinity between [0,1].
3. the low method declined is used to try to achieve regression parameter, the training stage of logistic regression sorter, according to the logical function form built, we can obtain the likelihood function of this function, simultaneously in the process asking parameter, the method of usual employing is maximum likelihood method, and then utilizes gradient descent method to try to achieve optimum value in parameter.
Programmed logic is as follows: the eigenwert of data set is set to X1, X2, X3 ... corresponding weights are W1, W2, W3 ... if, Z=W1 × X1+W2 × X2+W3 × X3 ... then use sigmoid function that result is mapped to [0,1] on interval, p=sigmoid (z), i.e. 1/ (1+exp (-z)), then use gradient descent method and test data, obtain the maximum likelihood value of each weights.After obtaining each weights, just can obtain the expression formula of this function, just can calculate the possibility of each class, new data are classified.
3. support vector machine
Support vector machine is a kind of learning algorithm of supervision property, has application widely in statistical regression.The lineoid that support vector machine can build one or a lot of superelevation dimension for training data divides high-dimensional the inseparable data of some low dimensions.In text classification, support vector machine is one of best sorting algorithm.
Programmed logic is as follows: the fundamental purpose of Training Support Vector Machines finds out the lineoid equation of segmentation two class, if equation functions is W tx+b=0, W and X submeter represents a matrix and vector, and X here represents term vector, introduces relaxation factor and penalty factor, uses method of Lagrange multipliers, obtains optimum classification plane, obtains planar function, just can classify to other vectorial X.
4.K nearest neighbor algorithm
K nearest neighbor algorithm, be in machine learning algorithm in one of very ripe algorithm, K nearest neighbor algorithm is also one of the simplest machine learning algorithm simultaneously.The basic thought of nearest neighbor algorithm is in some given data contents, if the great majority of K the data point kind the most adjacent with other in characteristic vector space of sample data belong to same classification, so just this this classification of sample assignment.
Programmed logic is as follows: in training set, is projected by training vector in N dimension space, and new data vector X, calculates and nearest n the point of X, in putting at this n, if category-A other at most, then to belong to category-A other for this new data.
Step 1.6 calculates accuracy rate and the recall rate of each machine learning algorithm 5 cross validations, picks out accuracy rate and the highest machine learning algorithm of recall rate mean value as optimum machine learning classification algorithm.
Step 2. experimental phase, according to the optimum machine learning classification algorithm that step 1 obtains, obtain the emotion entity be identified.
Step 2.1 obtains the experimental data collection of vectorization to the method that step 1.4 is identical according to step 1.1 in step 1;
Step 2.2 uses the optimum machine learning classification algorithm obtained in step 1, classifies to experimental data collection, obtains four class affection data collection;
Step 2.3 uses SENNA degree of deep learning tool bag, carries out an entity respectively extract four class affection data collection.
Be more than ultimate principle of the present invention and main implementation method.The present invention can realize the extraction of content of microblog, learns the degree of depth of large data, improves the analysis precision of emotion, to the identification of microblog emotional entity.Help government, tissue or mechanism carry out the emotion research of popular colony, and in public opinion analysis, social event, event warning aspect has larger effect.

Claims (6)

1. based on an entity recognition method for microblog emotional, it is characterized in that, comprise following step:
Step 1. training stage, choose optimum machine learning algorithm;
Step 1.1, according to Circumplex annular emotion model, constructs four class emotion word dictionaries; Four described class emotion word dictionaries are mapped among a two-dimensional coordinate system, the coordinate axis of this four dimensions respectively: happy and active, happy but inactive, unhappy but active and unhappy inactive;
Step 1.2 uses network AP I acquisition technique, with four class emotion word for keyword obtains microblog data from microblogging, as training data;
Step 1.3 carries out pre-service to the training data collected, the training dataset of generating standard;
Step 1.4 pair training data extracts key word, carries out vectorization according to vector space model to training dataset;
Step 1.5, according to the machine learning algorithm preset, carries out emotional semantic classification and 5 retransposings checking to the training dataset of vectorization respectively;
Step 1.6 calculates accuracy rate and the recall rate of each machine learning algorithm 5 cross validations, picks out accuracy rate and the highest machine learning algorithm of recall rate mean value as optimum machine learning classification algorithm;
Step 2. experimental phase, according to the optimum machine learning classification algorithm that step 1 obtains, obtain the emotion entity be identified;
Step 2.1 obtains the experimental data collection of vectorization to the method that step 1.4 is identical according to step 1.1 in step 1;
Step 2.2 uses the optimum machine learning classification algorithm obtained in step 1, classifies to experimental data collection, obtains four class affection data collection;
Step 2.3 is carried out an entity respectively to four class affection data collection and is extracted, and obtains the emotion entity be identified.
2. a kind of entity recognition method based on microblog emotional according to claim 1, it is characterized in that, pre-service in described step 1.3, comprises the phrase that corrects mistakes, the irrelevant phrase of deletion, the phrase that corrects mistakes, the microblogging deleting ambiguity and synonym conversion; The described phrase that corrects mistakes refers to be revised the word of misspelling; Delete irrelevant phrase to refer to delete the word of sentiment analysis without any benefit; The microblogging deleting ambiguity refers to the microblogging but belonging to different emotion classifications at a text; Synonym conversion refers to and another word of the word of equivalent is replaced.
3. a kind of entity recognition method based on microblog emotional according to claim 1, it is characterized in that, TF-IDF algorithm is used to extract keyword in described step 1.4, if comprise expression and punctuation mark, then the punctuation mark of conventional emoticon and the expression tone is converted into corresponding word.
4. a kind of entity recognition method based on microblog emotional according to claim 1, is characterized in that, uses word2vec Open-Source Tools to build term vector, carry out vectorization according to vector space model to training dataset in described step 1.4.
5. a kind of entity recognition method based on microblog emotional according to claim 1, is characterized in that, in described step 2.3, uses SENNA degree of deep learning tool bag, carries out an entity respectively extract four class affection data collection.
6. a kind of entity recognition method based on microblog emotional according to claim 1, it is characterized in that, in described step 1.5, the machine learning algorithm preset comprises naive Bayesian, logistic regression, support vector machine and k nearest neighbor algorithm 4 kinds of machine learning algorithms.
CN201510864383.1A 2015-11-30 2015-11-30 Entity identification method based on Weibo emotion Pending CN105335352A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510864383.1A CN105335352A (en) 2015-11-30 2015-11-30 Entity identification method based on Weibo emotion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510864383.1A CN105335352A (en) 2015-11-30 2015-11-30 Entity identification method based on Weibo emotion

Publications (1)

Publication Number Publication Date
CN105335352A true CN105335352A (en) 2016-02-17

Family

ID=55285897

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510864383.1A Pending CN105335352A (en) 2015-11-30 2015-11-30 Entity identification method based on Weibo emotion

Country Status (1)

Country Link
CN (1) CN105335352A (en)

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105809186A (en) * 2016-02-25 2016-07-27 中国科学院声学研究所 Emotion classification method and system
CN105844176A (en) * 2016-03-23 2016-08-10 上海上讯信息技术股份有限公司 Security strategy generation method and equipment
CN105868185A (en) * 2016-05-16 2016-08-17 南京邮电大学 Part-of-speech-tagging-based dictionary construction method applied in shopping comment emotion analysis
CN105912576A (en) * 2016-03-31 2016-08-31 北京外国语大学 Emotion classification method and emotion classification system
CN106056154A (en) * 2016-05-27 2016-10-26 大连楼兰科技股份有限公司 Fault code recognition and classification method
CN106294684A (en) * 2016-08-06 2017-01-04 上海高欣计算机系统有限公司 The file classification method of term vector and terminal unit
CN106776566A (en) * 2016-12-22 2017-05-31 东软集团股份有限公司 The recognition methods of emotion vocabulary and device
CN106776539A (en) * 2016-11-09 2017-05-31 武汉泰迪智慧科技有限公司 A kind of various dimensions short text feature extracting method and system
CN107301248A (en) * 2017-07-19 2017-10-27 百度在线网络技术(北京)有限公司 Term vector construction method and device, computer equipment, the storage medium of text
CN108710620A (en) * 2018-01-18 2018-10-26 郝宁宁 A kind of book recommendation method and system of the k- nearest neighbor algorithms based on user
CN108984724A (en) * 2018-07-10 2018-12-11 凯尔博特信息科技(昆山)有限公司 It indicates to improve particular community emotional semantic classification accuracy rate method using higher-dimension
CN109165298A (en) * 2018-08-15 2019-01-08 上海文军信息技术有限公司 A kind of text emotion analysis system of autonomous upgrading and anti-noise
CN109739494A (en) * 2018-12-10 2019-05-10 复旦大学 A kind of API based on Tree-LSTM uses code building formula recommended method
CN109783800A (en) * 2018-12-13 2019-05-21 北京百度网讯科技有限公司 Acquisition methods, device, equipment and the storage medium of emotion keyword
CN109885833A (en) * 2019-02-18 2019-06-14 山东科技大学 A kind of sexy polarity detection method based on the joint insertion of multiple domain data set
CN110321562A (en) * 2019-06-28 2019-10-11 广州探迹科技有限公司 A kind of short text matching process and device based on BERT
US10489510B2 (en) 2017-04-20 2019-11-26 Ford Motor Company Sentiment analysis of product reviews from social media
CN110609936A (en) * 2018-06-11 2019-12-24 广州华资软件技术有限公司 Intelligent classification method for fuzzy address data
CN110866087A (en) * 2019-08-12 2020-03-06 上海大学 Entity-oriented text emotion analysis method based on topic model
WO2020244073A1 (en) * 2019-06-06 2020-12-10 平安科技(深圳)有限公司 Speech-based user classification method and device, computer apparatus, and storage medium
CN112183067A (en) * 2020-09-23 2021-01-05 夏一雪 Network public opinion artificial intelligence analysis system under big data environment
CN113361585A (en) * 2021-06-02 2021-09-07 浪潮软件科技有限公司 Method for optimizing and screening clues based on supervised learning algorithm
CN109783800B (en) * 2018-12-13 2024-04-12 北京百度网讯科技有限公司 Emotion keyword acquisition method, device, equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110231347A1 (en) * 2010-03-16 2011-09-22 Microsoft Corporation Named Entity Recognition in Query
CN103995803A (en) * 2014-04-25 2014-08-20 西北工业大学 Fine granularity text sentiment analysis method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110231347A1 (en) * 2010-03-16 2011-09-22 Microsoft Corporation Named Entity Recognition in Query
CN103995803A (en) * 2014-04-25 2014-08-20 西北工业大学 Fine granularity text sentiment analysis method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
LEON DERCZYNSKI ET AL.: "Analysis of named entity recognition and linking for tweets", 《INFORMATION PROCESSING AND MANAGEMENT》 *
MARYAM HASAN ET AL.: "EMOTEX: Detecting Emotions in Twitter Messages", 《2014 ASE BIGDATA/SOCIALCOM/CYBERSECURITY CONFERENCE》 *
廖祥文 等: "第三届中文倾向性分析评测(COAE2011)语料的构建与分析", 《中文信息学报》 *

Cited By (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105809186A (en) * 2016-02-25 2016-07-27 中国科学院声学研究所 Emotion classification method and system
CN105844176A (en) * 2016-03-23 2016-08-10 上海上讯信息技术股份有限公司 Security strategy generation method and equipment
CN105912576A (en) * 2016-03-31 2016-08-31 北京外国语大学 Emotion classification method and emotion classification system
CN105868185A (en) * 2016-05-16 2016-08-17 南京邮电大学 Part-of-speech-tagging-based dictionary construction method applied in shopping comment emotion analysis
CN106056154A (en) * 2016-05-27 2016-10-26 大连楼兰科技股份有限公司 Fault code recognition and classification method
CN106294684A (en) * 2016-08-06 2017-01-04 上海高欣计算机系统有限公司 The file classification method of term vector and terminal unit
CN106776539A (en) * 2016-11-09 2017-05-31 武汉泰迪智慧科技有限公司 A kind of various dimensions short text feature extracting method and system
CN106776566A (en) * 2016-12-22 2017-05-31 东软集团股份有限公司 The recognition methods of emotion vocabulary and device
CN106776566B (en) * 2016-12-22 2019-12-24 东软集团股份有限公司 Method and device for recognizing emotion vocabulary
US10489510B2 (en) 2017-04-20 2019-11-26 Ford Motor Company Sentiment analysis of product reviews from social media
CN107301248A (en) * 2017-07-19 2017-10-27 百度在线网络技术(北京)有限公司 Term vector construction method and device, computer equipment, the storage medium of text
CN107301248B (en) * 2017-07-19 2020-07-21 百度在线网络技术(北京)有限公司 Word vector construction method and device of text, computer equipment and storage medium
CN108710620A (en) * 2018-01-18 2018-10-26 郝宁宁 A kind of book recommendation method and system of the k- nearest neighbor algorithms based on user
CN108710620B (en) * 2018-01-18 2022-05-20 日照格朗电子商务有限公司 Book recommendation method based on k-nearest neighbor algorithm of user
CN110609936A (en) * 2018-06-11 2019-12-24 广州华资软件技术有限公司 Intelligent classification method for fuzzy address data
CN108984724A (en) * 2018-07-10 2018-12-11 凯尔博特信息科技(昆山)有限公司 It indicates to improve particular community emotional semantic classification accuracy rate method using higher-dimension
CN108984724B (en) * 2018-07-10 2021-09-28 凯尔博特信息科技(昆山)有限公司 Method for improving emotion classification accuracy of specific attributes by using high-dimensional representation
CN109165298B (en) * 2018-08-15 2022-11-15 上海五节数据科技有限公司 Text emotion analysis system capable of achieving automatic upgrading and resisting noise
CN109165298A (en) * 2018-08-15 2019-01-08 上海文军信息技术有限公司 A kind of text emotion analysis system of autonomous upgrading and anti-noise
CN109739494A (en) * 2018-12-10 2019-05-10 复旦大学 A kind of API based on Tree-LSTM uses code building formula recommended method
CN109783800B (en) * 2018-12-13 2024-04-12 北京百度网讯科技有限公司 Emotion keyword acquisition method, device, equipment and storage medium
CN109783800A (en) * 2018-12-13 2019-05-21 北京百度网讯科技有限公司 Acquisition methods, device, equipment and the storage medium of emotion keyword
CN109885833A (en) * 2019-02-18 2019-06-14 山东科技大学 A kind of sexy polarity detection method based on the joint insertion of multiple domain data set
WO2020244073A1 (en) * 2019-06-06 2020-12-10 平安科技(深圳)有限公司 Speech-based user classification method and device, computer apparatus, and storage medium
CN110321562A (en) * 2019-06-28 2019-10-11 广州探迹科技有限公司 A kind of short text matching process and device based on BERT
CN110866087A (en) * 2019-08-12 2020-03-06 上海大学 Entity-oriented text emotion analysis method based on topic model
CN110866087B (en) * 2019-08-12 2023-11-17 上海大学 Entity-oriented text emotion analysis method based on topic model
CN112183067A (en) * 2020-09-23 2021-01-05 夏一雪 Network public opinion artificial intelligence analysis system under big data environment
CN112183067B (en) * 2020-09-23 2022-05-27 夏一雪 Network public opinion artificial intelligence analysis system under big data environment
CN113361585A (en) * 2021-06-02 2021-09-07 浪潮软件科技有限公司 Method for optimizing and screening clues based on supervised learning algorithm

Similar Documents

Publication Publication Date Title
CN105335352A (en) Entity identification method based on Weibo emotion
Jain et al. Application of machine learning techniques to sentiment analysis
Sharif et al. Sentiment analysis of Bengali texts on online restaurant reviews using multinomial Naïve Bayes
US10606946B2 (en) Learning word embedding using morphological knowledge
BaygIn Classification of text documents based on Naive Bayes using N-Gram features
Gaikwad et al. Multiclass mood classification on Twitter using lexicon dictionary and machine learning algorithms
CN107357895A (en) A kind of processing method of the text representation based on bag of words
Dedhia et al. Ensemble model for Twitter sentiment analysis
Campbell et al. Content+ context networks for user classification in twitter
Krishnan et al. A supervised approach for extractive text summarization using minimal robust features
Patil Fake news detection using majority voting technique
Kaysar et al. Word sense disambiguation of Bengali words using FP-growth algorithm
Shahade et al. Multi-lingual opinion mining for social media discourses: an approach using deep learning based hybrid fine-tuned smith algorithm with adam optimizer
CN113434668B (en) Deep learning text classification method and system based on model fusion
Tang et al. Text semantic understanding based on knowledge enhancement and multi-granular feature extraction
Christopoulou et al. Mixture of topic-based distributional semantic and affective models
Iyer et al. A heterogeneous graphical model to understand user-level sentiments in social media
Ariwibowo et al. Hate Speech Text Classification Using Long Short-Term Memory (LSTM)
Aich et al. Content based spam detection in short text messages with emphasis on dealing with imbalanced datasets
Abainia et al. Neural Text Categorizer for topic identification of noisy Arabic Texts
Babour et al. Tweet sentiment analytics with context sensitive tone-word lexicon
CN111159393B (en) Text generation method for abstract extraction based on LDA and D2V
Jain et al. Text analytics framework using Apache spark and combination of lexical and machine learning techniques
Shahade et al. Deep learning approach-based hybrid fine-tuned Smith algorithm with Adam optimiser for multilingual opinion mining
Barkovska et al. A Conceptual Text Classification Model Based on Two-Factor Selection of Significant Words.

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20160217

WD01 Invention patent application deemed withdrawn after publication