CN101751437A

CN101751437A - Web active retrieval system based on reinforcement learning

Info

Publication number: CN101751437A
Application number: CN200810240358A
Authority: CN
Inventors: 杨彦武; 张文生; 李益群; 肖宪; 刘琰琼; 梁玉旋
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2008-12-17
Filing date: 2008-12-17
Publication date: 2010-06-23

Abstract

The invention discloses a Web active retrieval system based on reinforcement learning; the system comprises a Web search Agent module, a Web filter Agent module, a Web interface Agent module and a user information learning Agent module; wherein, the Web search Agent module is used for searching subjects based on user interests, analyzing Web content and realizing Web download function; the Web filter Agent module is used for finishing web content analysis, page filtering and classified index; the Web interface Agent module is used for recommending webs on behalf of user interests after learning, receiving the user feedbacks and recording user browsing behaviors and having statistical analysis function; and the user information learning Agent module is used for realizing the interest updates based on reinforced learning, updating the user information continuously and finishing the optimum model on behalf of user interest. The Web active retrieval system based on reinforcement learning has strong self-adaptability, high accuracy and convenient use.

Description

Web active retrieval system based on intensified learning

Technical field

The present invention relates to Web user's active retrieval technique field, relate in particular to a kind of Web active retrieval system, be used to realize Web user is best embodied the Web page recommendation of user interest pattern based on intensified learning.

Background technology

The Markovian decision process comprises an ambient condition collection S, method behavior set A, award function R and state transition function P.Award function R (s, a, s ') is to adopt action a under the situation of state s, and ambient condition is transferred to the instantaneous reward value of s ' acquisition; Note P (s, a, s ') adopts action a to make ambient condition transfer to the probability of s ' under the situation of state s.The essence of Markovian decision process is: current state is only relevant with the action of selecting under current state and the current state with reward value to the probability of next state transitions, and has nothing to do with former historic state and historical action.Therefore under the knowledge frame of the environmental model that state transition probability function P and award function R have all been determined, the technology of dynamic programming can be used for finding the solution optimal strategy.Yet under the most of situation in real world, the environmental model of state transition probability function P and award function R is difficult to but determine that intensified learning mainly is that research is awarded under function and the state transition function condition of unknown emphatically, how to learn the optimum behavior strategy.

Intensified learning (reinforcement learning claims to encourage study again again, estimates study) is an important branch of machine learning method, in fields such as Based Intelligent Control machine people and analyses and prediction many application is arranged.Intensified learning is to the study of mapping from the environment to the behavior in the intelligence system, so that award (enhanced signal) the functional value maximum of accumulation, the supervised learning that intensified learning is different from the conventional machines study mainly shows on the teacher signal, the enhanced signal that is provided by environment in the intensified learning is that the action of making is made a kind of evaluation as reward value, rather than tells how direct reinforcement learning system (reinforcementlearning system) goes to produce correct action.Because the information that external environment condition provides is less, the experience that reinforcement learning system must obtain is on one's own account learnt.In this way, reinforcement learning system in action-obtain to calculate the basis in the environment estimated, improved action scheme is proposed to conform.The learning art of present intensified learning roughly can be divided into two classes: the one, and the action space of search intelligence system, thereby the behavior of the optimum that discovery can be made.Search techniques such as typical technology such as genetic algorithm; The 2nd, adopt based on the thought of the technology of statistics and dynamic programming and estimate and predict cost function value under a certain definite ambient condition, thereby determine optimum behavior by the cost function that obtains.

In the problem that intensified learning need solve, because environment is uncertain, the resulting R of study each time under tactful π instructs _tMight be inequality.Therefore to consider the mathematical expectation of all possible return function in difference study at the value function under the s state.Often adopt approach method to carry out the estimation of value function in the reality, a kind of topmost method is exactly the Monte Carlo method of sampling.The Monte Carlo method of sampling and dynamic programming technology are combined, pass through test of many times, remove to approach real state value function with the rewards and punishments rreturn value that reality obtains, the value function that the Monte Carlo method of sampling normally adopts once study circulation to be obtained goes to approach actual value function, and reinforcement Learning Method uses the value function (being the Bootstrapping method) and the instantaneous award of current acquisition of NextState to approach the current state value function.Reinforcement Learning Method needs repeatedly, and the study circulation could finally approach actual value function.

Information retrieval (Information Retrieval) is commonly referred to as the text based information retrieval, comprises the various aspects such as storage, tissue, performance, inquiry, access of information, and its core is the index and the retrieval of text message.From the historical development progress, information retrieval has experienced human manual information retrieval, computer automation retrieval, a plurality of developing stage such as intelligent network retrieval.At present, information retrieval has developed into networking and intelligentized stage.Retrieval object of information also from sealing, more stable unanimity, by independent database integrated management information expansion of content to opening, dynamic, quick, widely distributed, the loose complex Web content of pages of management; The user who uses information retrieval originally is the Intelligence Specialist, present information retrieval comprises business people, managerial personnel, teacher, student, each professional person's etc. ordinary populace, and they have proposed higher, more diversified requirement to information retrieval from result to the mode.The needs that adapt to networking, intellectuality and Web personalization are new trends of present information retrieval technique development.In the real world, exist at present more based on statistical method for network personalized method.But this method adaptive ability is relatively poor, and does not possess learning ability.Yet the characteristics of intensified learning can be improved present this network personalized analytical approach based on statistical method.

Summary of the invention

(1) technical matters that will solve

In view of this, fundamental purpose of the present invention is to provide a kind of Web active retrieval system based on intensified learning, browses Web more easily with assisting users, and finds the needed target pages of user more accurately.

(2) technical scheme

For achieving the above object, the invention provides a kind of Web active retrieval system based on intensified learning, this system comprises:

1) Web search Agent module is downloaded these several functional blocks by information search, Web page analysis and the Web page and is formed, and mainly finishes the download function of utilizing search, web page contents analysis and the page relevant with the user interest theme.At first, import original request earlier by the user, carry out obtaining of initial page, and the peer link on the page is extracted and be stored in a buffer zone according to search engine, page download module will be visited corresponding webpage according to link (URL address), preserve according to the subject key words classification simultaneously.

2) Web filters the Agent module, filtering two functional blocks by the page analysis and the page forms, mainly finish the page that search is obtained to information search Agent and carry out content analysis, utilize the Q learning system in the intensified learning that each Web page is carried out the calculating of Q study median, at this moment, reward value immediately during the TFIDF value for keyword of the Web page is learnt as Q, can calculate the pairing Q value of each Web page, to the ordering of Q value and the filtration of the Web page, take out the more forward Web page of Q value rank; The effect that Web filters the Agent module comprises that mainly Web page analysis, the Web page filter and Web page classifications index, and provides Web interface Agent module with the result.

3) Web interface Agent module, by page recommending module, page display module with browse the behavioral statistics module and form, finish the maximally related webpage of recommending to finish of user model and browsed and receive value of feedback, to functions such as the analysis of browsing behavior and statistics with study module.

4) user profile study Agent module, by initialization, major function pieces such as interest-degree corrected Calculation and interest-degree renewal are formed, at first need user's registration, automatically generate this user's initial interest file (Profiles) according to the information of knowledge base, in search browsing page process, utilize the TD learning algorithm in the intensified learning user's interest model to be upgraded and improve, the record of being finished according to Web interface Agent module to the behavior of user's browsing pages generates the user's who gets access to value of feedback, constantly user's interest is upgraded by interest-degree calculating and update module, utilize the TD learning algorithm that user's interest mode is calculated, the optimal weight that finally reaches the user profile model distributes.

(3) beneficial effect

From technique scheme as can be seen, the present invention has following beneficial effect:

1, this Web active retrieval system provided by the invention has the accuracy height, and characteristics easy to use can realize that Web user is carried out the personalized Web page to be recommended.

2, this Web active retrieval system provided by the invention, utilize the system of the Q study of intensified learning, the page by Web search Agent is filtered, can better consider the page of not just current recommendation, and consider that Web search Agent recommends the info web of the hyperlink of the page, has made full use of the Web page structure and has carried out the optimization of filtering system.

3, the present invention is in carrying out user profile study, having utilized the TD in the intensified learning to learn diversified model learns and upgrades, make user model can be more and more near user's best representative model, model is applied to Web filters Agent and just can extract the Web page that more can meet user interest model and recommended.

Description of drawings

Fig. 1 is the overall logic block diagram of Web active retrieval system provided by the invention;

Fig. 2 is the structural representation that in the Web active retrieval system provided by the invention the Web page is filtered;

Fig. 3 is the middle user profile study of Web active retrieval system provided by the invention and the structural representation that upgrades.

Embodiment

For making the purpose, technical solutions and advantages of the present invention clearer, below in conjunction with specific embodiment, and with reference to accompanying drawing, the present invention is described in more detail.

As shown in Figure 1, Fig. 1 is the overall logic block diagram of Web active retrieval system provided by the invention, comprises an original input end and a page recommendation output terminal.Wherein, original input end is used for sending original user's request to Webpage (Web) search Agent module, after receiving user's request, Web search Agent module utilizes search engine search, after search finishes, Web filtration Agent module is learnt the page wherein and is sorted, and extracts the page and recommends the user by Web interface Agent module; The user utilizes Web interface Agent module to carry out record after browsing, and obtains user's browsing information, analyzes and adds up, and value of feedback is submitted to user profile study Agent module, and user profile study Agent module will be upgraded and optimizes it.

As shown in Figure 2, Fig. 2 is the structural representation of the Web filtration Agent module in the Web active retrieval system provided by the invention.This Web filters the Agent module and comprises correlation estimation module, Q study module and Q value order module composition.After Web search Agent block search finishes, the page of storage is taken out, carry out the analysis of content of pages, according to the TFIDF technology, the Web page can be represented with vector pattern, utilize the method calculating user of cosine computed range and the similarity of the Web page, we need learn each keyword wherein for a given i (i pagefile), the reward value immediately that the TFIDF value of each corresponding page is obtained during just as this page of arrival, the pairing reward value that can be used as next state corresponding point of the award of one deck page down of the hyperlink of current page, when asking the reward value sum total, need be multiplied by a discount value, here, we taked based on the depth-first search algorithm of figure realize calculating maximum path reward value discount and, and give Q[i] with its assignment, Q[i] just be recorded as current i page Q value, calculate and the Q value of all webpage correspondences is carried out after the Q value of all pages.Preceding K the page that rank is forward recommended the user.

As shown in Figure 3, Fig. 3 is the structural representation of user profile study Agent module in the Web active retrieval system provided by the invention.This user profile study Agent module comprises that mainly calculating user model upgrades weight module and TD study update module, after Web interface Agent has obtained user's the Web information of browsing, the value of feedback that Web interface Agent can provide a user is learnt for the user profile study module to user profile study Agent module and is upgraded, value of feedback mainly is made up of with value of feedback implicit expression explicit, wherein explicit feedback mainly is to be obtained by user's scoring, and the value of feedback of implicit expression depends mainly on elemental: bookmarking (bm), reading time (rt), scrolling (sc), following up the hyperlinks in the filtered documents (fl).Utilize value of feedback to upgrade to user's weight:

W _p，k←W _p，k+βr _i，k

β is the speed of user learning.W _{P, k}The value of the k dimension of expression user vector.

The reward value in measurement future of utilizing that the variation of the item weights (after the normalized) of vector is similar to.When the user model weight changes less than the threshold value that obtains through test in advance, represent that the interest of the selection representative of consumer of keyword has been tending towards optimum, when this variations be timing, illustrate that active user's model vector will obtain better value of feedback.

W _pk，t←W _pk，t-1+[R _t+γΔv _p，t]

R_{t} = \frac{1}{K} Σ_{i = 1}^{K} r_{i}

Δ v_{p, t} = \frac{1}{K} Σ_{i = 1}^{K} Σ_{j = 1}^{n} (w_{pj, i} - w_{pj, i - 1})

If W _PkOccur in the page of recommending, then increase the keyword weight of these appearance, if changing value is a negative value, then the variation of the weights of these keywords is less.Handle after the content of this K page, the Web active retrieval system can proceed to next retrieval point, and study will be proceeded, until can reach the approximate optimal user model of representing.

Web active retrieval system provided by the invention has that learning ability is strong, easy to use, high reliability features, can be convenient to use this system's assisting users and carry out active retrieval and recommend the Web page.

Above-described specific embodiment; purpose of the present invention, technical scheme and beneficial effect are further described; institute is understood that; the above only is specific embodiments of the invention; be not limited to the present invention; within the spirit and principles in the present invention all, any modification of being made, be equal to replacement, improvement etc., all should be included within protection scope of the present invention.

Claims

1. the Web active retrieval system based on intensified learning is characterized in that, this system comprises:

Web searches for the Agent module, is used to receive user's initial request, and user's request is analyzed, and utilizes the related subject analysis to download the Web page, the result is committed to Web filters the Agent module;

Web filters the Agent module, is used to realize that the page that search is obtained to information search Agent carries out content analysis, utilizes the Q learning system in the intensified learning that each Web page is carried out the calculating of Q study median, and provides Web interface Agent module with the result;

Web interface Agent module is used to provide the user to recommend the Web page, and recording user is browsed behavior, and the result is submitted to user profile study Agent module;

User profile study Agent module, the TD learning algorithm that is used for utilizing intensified learning upgrades user's interest model and improves, the record of being finished according to Web interface Agent module to the behavior of user's browsing pages, the user's that generation gets access to value of feedback, constantly user's interest is upgraded by interest-degree calculating and update module, utilize the TD learning algorithm that user's interest mode is calculated, the optimal weight that finally reaches the user profile model distributes.

2. the Web active retrieval system based on intensified learning according to claim 1, it is characterized in that, this Web search Agent module is downloaded these several functional modules by information search, Web page analysis and the Web page and is formed, and is used to realize the download function of the search relevant with the user interest theme, web page contents analysis and the page; Import original request earlier by the user, carry out obtaining of initial page according to search engine, and the peer link on the page is extracted and be stored in a buffer zone, page download module will be visited corresponding webpage according to the link URL address, preserve according to the subject key words classification simultaneously.

3. the Web active retrieval system based on intensified learning according to claim 1, it is characterized in that, this Web filters the Agent module and receives after the Web search Agent module result of implementation, utilize the Q learning algorithm in conjunction with the user characteristics model Web content of pages to be analyzed and sorted, take out the forward Web page of Q value rank and filter; The effect that Web filters the Agent module comprises that mainly Web page analysis, the Web page filter and Web page classifications index, and provides Web interface Agent module with the result.

4. the Web active retrieval system based on intensified learning according to claim 1, it is characterized in that, this Web interface Agent module is used to realize that the maximally related webpage of recommending to finish with study module of user model browsed and receive value of feedback, to analysis and the statistics of browsing behavior.

5. the Web active retrieval system based on intensified learning according to claim 1, it is characterized in that, this user profile study Agent module is used for initially generating according to knowledge base this user's essential information file, in the process of search in the future and browsing page, constantly user's interest is upgraded, to determine the user profile model.

6. the Web active retrieval system based on intensified learning according to claim 5, it is characterized in that, this user profile study Agent module is according to the user feedback value, further the user profile model is upgraded, user model is carried out iterative computation and renewal and judge whether user model reaches convergence based on the TD algorithm.

7. according to claim 5 or 6 described Web active retrieval systems, it is characterized in that this user profile study Agent module is submitted to Web with operation result and filtered Agent, enters next step study based on intensified learning.

8. according to claim 5 or 6 described Web active retrieval systems based on intensified learning, it is characterized in that this user profile study Agent module is when the iteration result reaches convergence, it is optimum that the user profile model reaches, and, obtain and recommend the Web page according to the user profile model.