CN103399940A - Field information retrieval method based on behaviors - Google Patents

Field information retrieval method based on behaviors Download PDF

Info

Publication number
CN103399940A
CN103399940A CN2013103494032A CN201310349403A CN103399940A CN 103399940 A CN103399940 A CN 103399940A CN 2013103494032 A CN2013103494032 A CN 2013103494032A CN 201310349403 A CN201310349403 A CN 201310349403A CN 103399940 A CN103399940 A CN 103399940A
Authority
CN
China
Prior art keywords
user
behavior
information
department
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2013103494032A
Other languages
Chinese (zh)
Other versions
CN103399940B (en
Inventor
郝佳
阎艳
王国新
宫琳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Priority to CN201310349403.2A priority Critical patent/CN103399940B/en
Publication of CN103399940A publication Critical patent/CN103399940A/en
Application granted granted Critical
Publication of CN103399940B publication Critical patent/CN103399940B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention provides a field information retrieval method based on behaviors. The information retrieval method based on the behaviors can achieve optimization by means of department subordination relations of users and various interdependent user behaviors. The information retrieval method based on the behaviors includes the following steps that firstly, the department subordination relations of system users are obtained through analysis of an department subordination relations, namely which users belong to which departments, and activities performed in the process that the users use the information management system; secondly, the similarity degree of information is calculated through behavior data; thirdly, full-text retrieval is conducted on search terms inputted by the users, matching contents are obtained to the maximum to obtain full text matching results; fourthly, the full-text retrieval is filtered, the results are ranked according to interest reflected by the user behaviors, and the contents which the users are most interested in are ranked to the headmost position.

Description

Realm information search method based on behavior
Technical field
The present invention relates to information retrieval method, relate in particular to a kind of search method of realm information based on behavior.
Background technology
At present, Knowledge Management System is widely used in enterprise, as Company Knowledge accumulation and the platform of reusing.Retrieval technique is the critical function module in Knowledge Management System, and information retrieval function can the help system user find needed Company Knowledge quickly and accurately effectively.The keyword matching technique is often adopted in existing retrieval, and the problem that this technology is brought is no matter whom the user is, no matter when the user uses system, as long as the identical keyword of input can obtain identical result for retrieval.The obvious deficiency of this search method has been to ignore information requirements different between different user and the constantly information requirement of variation of same user.The problem to be solved in the present invention is to optimize the information retrieval function of Knowledge Management System in mechanism of particular organization.
Information retrieval function is all important Core Feature for a lot of infosystems, has some patents to propose relevant technical scheme for the optimization problem of information retrieval function.The optimization that has daily record or the behavior of Patents by the collection system user to realize retrieval technique, judge user's interest by the behavior of analyzing daily record and user, utilize user interest to filter result for retrieval in information retrieval process.This retrieval technique based on behavior/daily record can address the above problem to a certain extent.
But, when this method is applied in Knowledge Management System, still exist improved space.Because Knowledge Management System often is deployed in specific organizational structure, and the personnel in these organizational structures generally are distributed in the middle of the department with membership, and the department's membership information between this user can be used for optimizing the retrieval technique based on behavior.The user's behavior kind that can collect in this external Knowledge Management System is horn of plenty (with respect to only utilizing the behavior of checking) more also, such as: check, collect, comment on, share etc., but be not separate between multiple behavior, often exist certain dependence, the behavior of for example commenting on generally depends on the behavior of checking, namely the user to comment on certain information be generally to be based upon on the basis of checking this information.The present invention proposes a kind of retrieval technique of realm information based on behavior, and this technology is optimized search method by the department's membership of personnel in organizational structure and the dependence between multiple behavior.
In existing method, the most general search method that is based on key word.The key word of the method match user input efficiently, find the content the highest with the user entered keyword matching degree as final result for retrieval.This search method has only considered that the information content and user input the coupling of keyword, and the result that causes is that to input the result that identical keyword obtains be all identical to any user.Therefore this search method can not the increasing Enterprise Knowledge Management System of satisfying personalized feature to the requirement of information retrieval function., along with the demand of personalized retrieval is more and more, based on the search method of behavior, become the important method that realizes personalized retrieval.When the interest that daily record or the behavior of this method by the analytic system collection obtains the user, accessed the information of which content but this method generally only depends on the user.This method is applicable to general website user's interest analysis, and for the infosystem of implementing in enterprise, can't satisfy the demands fully.At first the department's membership between the user has been explained their interest similarity relation to a certain extent in enterprise, the behavioural information that can collect in this external this information management system is horn of plenty more also, and the existing method based on behavior can not utilize multiple behavioural information to carry out analysis user interest.
Summary of the invention
The purpose of this invention is to provide a kind of search method of realm information based on behavior, the method can utilize department's membership between the user and multiple complementary user behavior to optimize information retrieval method based on behavior.
Should, based on the realm information search method of behavior, comprise the following steps:
The first step: obtain department's membership of system user by the analysis to information management system, namely which user belongs to the relation between which department and department, and the activity carried out in the process of using information management system of user;
Second step: by the similarity between the behavioral data computing information, concrete steps are as follows:
(1) step: eliminate the dependence between multiple user behavior in behavioral data, obtain without the behavioral data that relies on;
Described (1) step is eliminated the dependence method for removing the lower behavior record of weight rank.
(2) step: the department's membership between the user is reflected in the middle of the behavioral data collection, namely to the behavioral data collection, adds behavior record;
The method that described (2) step is reflected to data centralization is there is no the record of corelation behaviour for the user, if it with department or be subordinate to most of user in department and all carried out associative operation, thinks that the active user is interested in the content.
(3) step: the behavioral data collection is divided into a plurality of independently data sets,, take user behavior as foundation, has the how many kinds of behavior just to be divided into how many individual independently behavioral data collection, each is the corresponding relation between data set expression user, information and behavior independently;
(4) step: carry out similarity between computing information by a plurality of independently data sets respectively, the weight according to different behaviors is weighted the similarity that obtains between final information to similarity afterwards;
The 3rd step: the term to user input carries out full-text search, and maximum matching content of obtaining, obtain matching result in full;
The 4th step: to filtering of full-text search, the interest that reflects according to user behavior sorts to result, and the content that the user is most interested in is aligned to foremost.
Beneficial effect of the present invention:
The search method of applying in the organizational structure that method of the present invention can be optimized, utilize to greatest extent existing information to optimize result for retrieval, with respect to other methods based on behavior, method in the present invention is not only only considered behavioural information, also department's membership of the membership between behavior and user is combined the similarity between more can reflection information.
Description of drawings
Fig. 1 is the realm information search method schematic diagram of the behavior that the present invention is based on;
Fig. 2 is data structure exemplary plot in the specific embodiment of the invention;
Fig. 3 is the basic framework of the realm information search method of the behavior that the present invention is based on;
Fig. 4 is 4U2K1A illustraton of model in the specific embodiment of the invention;
Fig. 5 is 4U2K2A illustraton of model in the specific embodiment of the invention;
Fig. 6 is 4U in the specific embodiment of the invention (d)The 2K1A illustraton of model;
Fig. 7 is 4U in the specific embodiment of the invention (d)The 2K2A illustraton of model.
Embodiment
Below in conjunction with accompanying drawing, the present invention is further introduced.
Basic concepts definition described in the following the present invention of paper:
Information: the content that the information in the present invention refers to text or can adopt text to be described.An information bank is represented as a co-occurrence matrix in the present invention.Information of each line display in co-occurrence matrix, each row represents a keyword.
Behavior: can provide a lot of functions in general information management system, such as: search for, check, collection etc.The user when using these functions, can form behavior record.These behavior records are from having reflected in varying degrees user's interest.This every trade is the feature that two needs explanations are arranged: the relation between (1) multiple behavior is different, is separate (for example collection and comment) between some behavior, is dependence (collect and check) between other behaviors.This dependence is relative, can define according to the demand of system; (2) ability of multiple behavior reflection user interest is different, for example collects behavior with respect to checking behavior and more can reflect user's interest, and therefore each behavior meeting in the present invention is endowed weight and characterizes the power of user interest to reflect it.
The user: in the present invention, the user refers to the main body of using infosystem.According to department's membership of user, in the present invention, the employing tree structure is carried out the relation between the descriptive system user.
In the present invention, the mode of employing three-dimensional matrice represents the relation between above-mentioned three concepts (information, behavior, user).Fig. 2 is the data structure example.What in Fig. 2, K represented is information, and that U represents is the user, and what A represented is behavior.This example has comprised 2 information, 2 users and 2 kinds of behaviors.In figure the numeral 0 the expression be the user not to the corresponding behavior of information and executing, the numeral 1 the expression be the user to information and executing corresponding behavior.
Search method in the present invention is to be based upon on the basis of existing global search technology, by the result that rearranges full-text search, is optimized.User's input inquiry keyword at first as shown in Figure 3, the user search key word is input to full-text search engine and carries out the full text coupling afterwards, and the result that obtains is in full filtered by the method for this paper, exports afterwards final result for retrieval.The method core is to solve the problem that how to rearrange the full-text search acquired results in this framework.In sequencer procedure, every information of full-text search Output rusults all can be endowed a weight, and final list is sorted according to this weight.Weight obtains by formula (1):
kw j = Σ i = 1 N r s ij 1 ≤ j ≤ N s - - - ( 1 )
Kw wherein jWhat represent is the weight of j bar full-text search result; s ijExpression be similarity (behavior similarity) between information i and information j; N rWhat represent is the sum that the active user once operated the information of (having relevant behavioural information); N sWhat represent is the sum of full-text search engine Output rusults.
This formula represents is that the information that operated to the user is more similar, in final result for retrieval the closer to front.For outstanding characteristics of the present invention, just considered the similarity about behavior in the present invention in sequence, in fact can consider simultaneously the similarity of content aspect and the similarity of behavior aspect by the mode of weighting.According to the sequence formula, the core algorithm of the method need to solve the similarity problem of how calculating between any two information.
Method in the present invention is considered department's membership between the user and the dependence between multiple behavior simultaneously.Membership between department can reflect user's interest to a certain extent, for example user U does not have any behavior record about information K, but the useful behavior record about information K of the most of identical department of user U, can infer that it should be interested to information K user U, but not have the behavior record about this information for some reason.Dependence between multiple behavior can help to reduce the overlapping calculation of user behavior, if having carried out, checks and the comment behavior user U information K, and we have defined the comment behavior and have depended on the behavior of checking (comment inevitable at first check), therefore can not calculate simultaneously this two kinds of behaviors when the assessment user interest.
, for different systems, may apply separately the dependence of department's membership and multiple behavior.Therefore in below declarative procedure minute for two large classes totally 6 kinds of models describe, finally provide universal model.
The first kind: user's independent model
User's independent model refers to department's membership of not considering between the user, only considers the dependence between multiple behavior in model.Being divided into three kinds of models in this model I describes.
1) single file is model
Single file is that model is the simplest a kind of model.Only include a kind of behavior (for example checking) in this model, the calculating of similarity realizes by the analysis to single behavior fully.Fig. 4 has provided a 4U2K1A model in order to explanation, and what 4U2K1A represented is to comprise 4 users, the model of 2 information and a kind of behavior, and model has not been examined user department membership and behavior dependence.Line in Fig. 4 represents be corresponding user to information and executing behavior.
The corresponding data structure of Fig. 4 is as shown in Equation 2:
A = 1 1 1 0 1 0 1 1 - - - ( 2 )
The similarity of calculating between two information by formula 3 on this basis:
sim ( k 1 , k 2 ) = Σ k 1 ^ k 2 N u = 0.5 , sim ⊆ 0 1
Wherein sim represents the behavior similarity between information; k 1, k 2Two row vectors in representation formula (2); N uWhat represent is user's quantity.Can calculate equally similarity between many information by formula (3).
2) multirow is model
Multirow is that model has comprised polytype behavior, but Existence dependency relationship not between these behaviors, and the calculating of similarity realizes by multiple behavior.Fig. 5 has provided the 4U2K2A model in order to explanation.
The corresponding data structure of Fig. 5 is as shown in Equation (4):
A a = 1 1 1 0 1 0 1 1 A b = 1 0 1 0 0 0 1 1 - - - ( 4 )
On this basis, by the similarity between two information of formula (5) calculating:
sim ( k 1 , k 2 ) = αsim ( k a 1 , k a 2 ) + βsim ( k b 1 , k b 2 ) - - - ( 5 )
sim ⋐ 0 2 α + β = N a = 2
K wherein a1, k a2That represent is A aTwo row vectors; k b1, k b2That represent is A bTwo row vectors; What α, β represented is the weights of two kinds of behaviors on the similarity impact.N aWhat represent is the quantity of behavior kind.
3) multirow is (dependence) model
This model comprises multiple behavior, and is not separate between these behaviors, but Existence dependency relationship., in order to eliminate the dependence between this behavior, adopt formula (6) to process two matrixes shown in formula (4).Application of formula (5) can obtain two similarities between information on this basis.
Figure BDA00003653334500075
Wherein ∧ represents the conjunction of two matrixes;
Figure BDA00003653334500076
The logic NOT of representing matrix.
Equations of The Second Kind: user's dependent model
User's dependent model refers in model department's membership of considering between the user, and customer relationship is described to tree structure.Being divided into three kinds of models in this model I describes.
4) single file is model
Be different from 1) in single file be model, model has herein been considered the department's membership between the user, and its membership adopts tree structure to be described.Fig. 6 has provided a 4U (d)The 2K1A model.Wherein (d)What represent is that model is considered the department's membership between the user.
User department membership in model shown in Figure 6 can be expressed by formula (7).
U = 0 0 0 1 1 0 1 0 1 1 0 0 1 0 0 0 - - - ( 7 )
Expressed two kinds of relations in matrix shown in formula (7), the first is relationship between superior and subordinate, and the second is same department relation.What represent when uij=1 is that user i and user j are relationship between superior and subordinate or are with department's relation.Need below the problem that solves is how this department membership to be reflected in final similarity computation process.Formula (8) can join this membership in the matrix shown in formula (2).
a ij = 1 a ij = 1 or , au ij ≥ δ 0 else - - - ( 8 )
A wherein ijWhat represent is the value in matrix shown in formula (2); What δ represented is threshold value, and department's membership of this larger user of value is less, opposite larger on the impact of similarity on the impact that similarity is calculated; au ijExpression be value in matrix A U.Can utilize the method for formula (3) to calculate two behavior similarities between information on this basis again.
5) multirow is model
This model is at model 4) basis on, added multiple different behavior.Fig. 7 has provided 4U (d)2K2A is in order to explanation.With respect to model 4), difference is to have added another behavior., due to the independence between two kinds of behaviors, this model can be split as two models 4), obtain final behavior similarity by formula (5) after calculating.
6) multirow is (dependence) model
This model is at model 5) basis on, added the dependence of different rows between being.This model is also complicated and the most general model.Comprising restriction relation between multiple behavior and the department's membership between the user.This model can be realized solving of final behavior similarity by above all formula, and its basic step is as follows:
The first step: by formula (6), remove relation of interdependence between multiple behavior;
Second step: the department's membership between the user is reflected in behavioural matrix by formula (8);
The 3rd step: be model with model partition for many information of the simplest multi-user (without membership) single file, and by formula (3), calculate similarity;
The 4th step: the similarity by a plurality of independent models of formula (5) combination obtains final similarity.
It is above in the situation that the different situations of consideration have provided six kinds of different models.These models can the visual information system requirement different and determine, the similarity by model between can acquired information, and this similarity can affect final result for retrieval by formula (1).

Claims (3)

1., based on the realm information search method of behavior, it is characterized in that, comprise the following steps:
The first step: obtain department's membership of system user by the analysis to information management system, namely which user belongs to the relation between which department and department, and the activity carried out in the process of using information management system of user;
Second step: by the similarity between the behavioral data computing information, concrete steps are as follows:
(1) step: eliminate the dependence between multiple user behavior in behavioral data, obtain without the behavioral data that relies on;
(2) step: the department's membership between the user is reflected in the middle of the behavioral data collection, namely to the behavioral data collection, adds behavior record;
(3) step: the behavioral data collection is divided into a plurality of independently data sets,, take user behavior as foundation, has the how many kinds of behavior just to be divided into how many individual independently behavioral data collection, each is the corresponding relation between data set expression user, information and behavior independently;
(4) step: carry out similarity between computing information by a plurality of independently data sets respectively, the weight according to different behaviors is weighted the similarity that obtains between final information to similarity afterwards;
The 3rd step: the term to user input carries out full-text search, and maximum matching content of obtaining, obtain matching result in full;
The 4th step: to filtering of full-text search, the interest that reflects according to user behavior sorts to result, and the content that the user is most interested in is aligned to foremost.
2. the search method of the realm information based on behavior as claimed in claim 1, is characterized in that, described (1) step is eliminated the dependence method for removing the lower behavior record of weight rank.
3. the search method of the realm information based on behavior as claimed in claim 1 or 2, it is characterized in that, the method that described (2) step is reflected to data centralization is there is no the record of corelation behaviour for the user,, if it is with department or be subordinate to most of user in department and all carried out associative operation, think that the active user is interested in the content.
CN201310349403.2A 2013-08-12 2013-08-12 The realm information search method of Behavior-based control Active CN103399940B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310349403.2A CN103399940B (en) 2013-08-12 2013-08-12 The realm information search method of Behavior-based control

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310349403.2A CN103399940B (en) 2013-08-12 2013-08-12 The realm information search method of Behavior-based control

Publications (2)

Publication Number Publication Date
CN103399940A true CN103399940A (en) 2013-11-20
CN103399940B CN103399940B (en) 2016-08-10

Family

ID=49563568

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310349403.2A Active CN103399940B (en) 2013-08-12 2013-08-12 The realm information search method of Behavior-based control

Country Status (1)

Country Link
CN (1) CN103399940B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109582852A (en) * 2018-12-05 2019-04-05 中国银行股份有限公司 A kind of sort method and system of full-text search result

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020078045A1 (en) * 2000-12-14 2002-06-20 Rabindranath Dutta System, method, and program for ranking search results using user category weighting
CN101520784A (en) * 2008-02-29 2009-09-02 富士通株式会社 Information issuing system and information issuing method
CN101556603A (en) * 2009-05-06 2009-10-14 北京航空航天大学 Coordinate search method used for reordering search results
US20130110807A1 (en) * 2011-10-31 2013-05-02 International Business Machines Corporation Intranet search, search engine and terminal equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020078045A1 (en) * 2000-12-14 2002-06-20 Rabindranath Dutta System, method, and program for ranking search results using user category weighting
CN101520784A (en) * 2008-02-29 2009-09-02 富士通株式会社 Information issuing system and information issuing method
CN101556603A (en) * 2009-05-06 2009-10-14 北京航空航天大学 Coordinate search method used for reordering search results
US20130110807A1 (en) * 2011-10-31 2013-05-02 International Business Machines Corporation Intranet search, search engine and terminal equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YAN YAN 等: "Multi-Action based Approach for Constructing Knowledge Map", 《JOURNAL OF BEIJING INSTITUTE OF TECHNOLOGY》, vol. 24, no. 3, 28 September 2015 (2015-09-28) *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109582852A (en) * 2018-12-05 2019-04-05 中国银行股份有限公司 A kind of sort method and system of full-text search result
CN109582852B (en) * 2018-12-05 2021-04-09 中国银行股份有限公司 Method and system for sorting full-text retrieval results

Also Published As

Publication number Publication date
CN103399940B (en) 2016-08-10

Similar Documents

Publication Publication Date Title
CN101582080B (en) Web image clustering method based on image and text relevant mining
Kim et al. Futuristic data-driven scenario building: Incorporating text mining and fuzzy association rule mining into fuzzy cognitive map
CN101408885B (en) Modeling topics using statistical distributions
CN101408886B (en) Selecting tags for a document by analyzing paragraphs of the document
Cobo et al. Science mapping software tools: Review, analysis, and cooperative study among tools
CN101819573B (en) Self-adaptive network public opinion identification method
CN102419755B (en) Method and device for sorting search results
CN101566997B (en) Determining words related to given set of words
US9875294B2 (en) Method and apparatus for classifying object based on social networking service, and storage medium
CN103714084A (en) Method and device for recommending information
Cao et al. Data mining for business applications
CN102637170A (en) Question pushing method and system
CN104268292A (en) Label word library update method of portrait system
Irudeen et al. Big data solution for Sri Lankan development: A case study from travel and tourism
CN102955813B (en) A kind of information search method and system
CN110390052B (en) Search recommendation method, training method, device and equipment of CTR (China train redundancy report) estimation model
CN106599065A (en) Food safety online public opinion early warning system based on Storm distributed framework
CN104298776A (en) LDA model-based search engine result optimization system
CN106156023A (en) The methods, devices and systems of semantic matches
CN110442873A (en) A kind of hot spot work order acquisition methods and device based on CBOW model
CN110222260A (en) A kind of searching method, device and storage medium
Kim et al. Can media forecast technological progress?: A text-mining approach to the on-line newspaper and blog's representation of prospective industrial technologies
Huang et al. Identification of topic evolution: Network analytics with piecewise linear representation and word embedding
Rossetto et al. Absorptive capacity and innovation: An overview of international scientific production of last twenty-five years
Jaiswal et al. Data Mining Techniques and Knowledge Discovery Database

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant