CN105069102A - Information push method and apparatus - Google Patents

Information push method and apparatus Download PDF

Info

Publication number
CN105069102A
CN105069102A CN201510483126.3A CN201510483126A CN105069102A CN 105069102 A CN105069102 A CN 105069102A CN 201510483126 A CN201510483126 A CN 201510483126A CN 105069102 A CN105069102 A CN 105069102A
Authority
CN
China
Prior art keywords
page
keyword
key words
accessed
association
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510483126.3A
Other languages
Chinese (zh)
Other versions
CN105069102B (en
Inventor
裘皓萍
陈炜于
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Baidu Online Network Technology Beijing Co Ltd
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201510483126.3A priority Critical patent/CN105069102B/en
Publication of CN105069102A publication Critical patent/CN105069102A/en
Priority to PCT/CN2015/095754 priority patent/WO2017020451A1/en
Application granted granted Critical
Publication of CN105069102B publication Critical patent/CN105069102B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Abstract

The present application discloses an information push method and an information push apparatus. A specific embodiment manner of the method comprises the following steps: acquiring websites and page traffics of visited pages of at least one site; performing content parsing on the pages corresponding to each website to generate a keyword set of each visited page; performing mutual comparison based on the keyword sets, and combining the keyword sets of which the similarity is greater than a first preset threshold to generate at least one associated page keyword set; based on a sequencing result of the sum of the page traffics of the visited pages corresponding to each set in the at least one associated page keyword set, generating first push information by using one or more sets in the at least one associated page keyword set; and based on at least one visited page corresponding to the associated page keyword set for generating the first push information, generating second push information associated with the first push information, and pushing the second push information to a user. The implementation manner can enrich the content of the push information.

Description

Information-pushing method and device
Technical field
The application relates to field of computer technology, is specifically related to Internet technical field, particularly relates to a kind of information-pushing method and device.
Background technology
Information pushing, is also called " Web broadcast ", is by certain technical standard or agreement, and the information needed by pushing user on the internet reduces a technology of information overload.Information advancing technique to user by active push information, can be reduced user on network, search for institute's time spent.
But in existing information advancing technique, the information being pushed to user is one or more separate information often, the relevance between shortage information.If institute's pushed information is the fragment of a certain event progress, is difficult to pass through pushed content and makes user understand event context or the evolution of institute's pushed information.Therefore, it is under-utilized that this information advancing technique also exists network information related data, the problem that pushed information content is enriched not.
Summary of the invention
The object of the application is the information-pushing method and the device that propose a kind of improvement, solves the technical matters that above background technology part is mentioned.
On the one hand, this application provides a kind of information-pushing method, described method comprises: the page access information obtaining at least one website, and wherein, described page access information comprises network address and the page access amount of the accessed page; The page corresponding to each network address carries out Context resolution, generates the keyword set of each accessed page; Mutually comparing based on keyword set, keyword set similarity being greater than the first predetermined threshold value merges, generate at least one association page key words set, wherein, the accessed page corresponding to the keyword set for generating the set of association page key words associates the page each other; Based on the ranking results of the page access amount sum of the accessed page of each set correspondence at least one association page key words set described, utilize one or more set generation first pushed information at least one association page key words set described; Based at least one the accessed page corresponding to the association page key words set for generating described first pushed information, the second pushed information that generation is associated with described first pushed information is also pushed to user.
In certain embodiments, described based at least one the accessed page corresponding to the association page key words set for generating described first pushed information, the second pushed information that generation is associated with described first pushed information is also pushed to user, comprise: according to the time interval of presetting, cluster is carried out to the issuing time of the accessed page corresponding to the association page key words set for generating described first pushed information, be divided at least one time period, wherein, when at least one time period described comprises the plural time period, mistiming between the issuing time taking from any two time periods is respectively greater than the described time interval, for the one or more time periods at least one time period described, from the accessed page corresponding to each time period, extract a page respectively, based on the extracted page, generate the second pushed information and be pushed to user.
In certain embodiments, the described issuing time to the accessed page corresponding to the association page key words set for generating described first pushed information carries out cluster according to the time interval of presetting, before being divided at least one time period, also comprise: for the accessed page corresponding to the set of association page key words, the accessed page corresponding to keyword set similarity being greater than the second predetermined threshold value screens out to a page, using the remaining accessed page after screening out the page as the accessed page corresponding to the set of association page key words, wherein, described second predetermined threshold value is greater than the first predetermined threshold value.
In certain embodiments, the described page corresponding to each network address carries out Context resolution, and the keyword set generating each accessed page comprises: carry out statistical study and/or semantic analysis to the content of the described accessed page, extract at least one keyword; Based at least one keyword described, generate keyword set.
In certain embodiments, described based at least one keyword described, generation keyword set comprises: for the single keyword in each at least one keyword described, carry out expanding to generate expanded keyword, wherein, described expanded keyword comprises following at least one item: the conjunctive word of the synonym of described single keyword, the near synonym of described single keyword, described single keyword; Based at least one keyword described and described expanded keyword, generate keyword set.
In certain embodiments, the keyword set one of to meet the following conditions is greater than the keyword set of the first predetermined threshold value as similarity: the number of same keyword is greater than number threshold value; The number of same keyword is greater than fractional threshold with the ratio of total number of keyword in the keyword set compared.
In certain embodiments, each keyword in described keyword set also has importance degree coefficient, and, described mutually comparing based on keyword set, keyword set similarity being greater than the first predetermined threshold value merges, and generates at least one association page key words set and comprises: carry out Similarity Measure based on described importance degree coefficient to different keyword set; Keyword set similarity being greater than similarity threshold merges, and generates the set of association page key words.
Second aspect, this application provides a kind of information push-delivery apparatus, and described device comprises: data obtaining module, and be configured for the page access information obtaining at least one website, wherein, described page access information comprises network address and the page access amount of the accessed page; Keyword set generation module, is configured for the page corresponding to each network address and carries out Context resolution, generate the keyword set of each accessed page; Keyword set merges module, be configured for mutually comparing based on keyword set, keyword set similarity being greater than the first predetermined threshold value merges, generate at least one association page key words set, wherein, the page is associated each other for the accessed page that the keyword set generating the set of association page key words is corresponding; First pushed information generation module, be configured for the ranking results of the page access amount sum based on the accessed page corresponding to each set at least one association page key words set described, utilize one or more set generation first pushed information at least one association page key words set described; Second pushed information generates and pushing module, be configured for based at least one the accessed page corresponding to the association page key words set for generating described first pushed information, the second pushed information that generation is associated with described first pushed information is also pushed to user.
In certain embodiments, described second pushed information generates and pushing module comprises: cluster cell, be configured for and according to the time interval of presetting, cluster is carried out to the issuing time of the accessed page corresponding to the association page key words set for generating described first pushed information, be divided at least one time period, wherein, when at least one time period described comprises the plural time period, the mistiming between the issuing time taking from any two time periods is respectively greater than the described time interval; Extraction unit, is configured for for the one or more time periods at least one time period described, extracts a page respectively from the accessed page corresponding to each time period; Generation unit, is configured for based on the extracted page, generates the second pushed information and is pushed to user.
In certain embodiments, described second pushed information generates and pushing module also comprises: screen out unit, be configured for for the accessed page corresponding to the set of association page key words, the accessed page corresponding to keyword set similarity being greater than the second predetermined threshold value screens out to a page, using the remaining accessed page after screening out the page as the accessed page corresponding to the set of association page key words, wherein, described second predetermined threshold value is greater than the first predetermined threshold value.
In certain embodiments, described keyword set generation module comprises: keyword extracting unit, is configured for and carries out statistical study and/or semantic analysis to the content of the described accessed page, extract at least one keyword; Keyword set generation unit, is configured for based at least one keyword described, generates keyword set.
In certain embodiments, described keyword set generation unit comprises: expansion subelement, be configured for for the single keyword in each at least one keyword described, carry out expanding to generate expanded keyword, wherein, described expanded keyword comprises following at least one item: the conjunctive word of the synonym of described single keyword, the near synonym of described single keyword, described single keyword; Keyword set generates subelement, is configured for based at least one keyword described and described expanded keyword, generates keyword set.
In certain embodiments, described keyword set merges module and is configured for further: the keyword set keyword set one of to meet the following conditions being greater than the first predetermined threshold value as similarity: the number of same keyword is greater than number threshold value; The number of same keyword is greater than fractional threshold with the ratio of total number of keyword in the keyword set compared.
In certain embodiments, each keyword in described keyword set also has importance degree coefficient, and described keyword set merges module and comprises: computing unit, is configured for and carries out Similarity Measure based on described importance degree coefficient to different keyword set; Merge and generation unit, be configured for keyword set similarity being greater than similarity threshold and merge, generate the set of association page key words.
The information-pushing method that the application provides and device, by obtaining the page access information of at least one website, then corresponding to each network address page carries out Context resolution, generate the keyword set of each accessed page, mutually comparing then based on keyword set, keyword set similarity being greater than the first predetermined threshold value merges, generate at least one association page key words set, then based on the ranking results of the page access amount sum of each self-corresponding accessed page of at least one association page key words set, utilize one or more set generation first pushed information at least one association page key words set, and, based at least one the accessed page corresponding to the association page key words set for generating the first pushed information, the second pushed information that generation is associated with the first pushed information is also pushed to user.This information-pushing method and device, after pushing the first pushed information to user, can also push to user the second pushed information be associated with the first pushed information further, thus enrich the content of pushed information.
Accompanying drawing explanation
By reading the detailed description to non-limiting example done with reference to the following drawings, the other features, objects and advantages of the application will become more obvious:
Fig. 1 is the process flow diagram of an embodiment of information-pushing method according to the application;
Fig. 2 is the schematic diagram of an application example of information-pushing method according to the application;
Fig. 3 is the process flow diagram of another embodiment of information-pushing method according to the application;
Fig. 4 is the design sketch of an application scenarios of the embodiment of the information-pushing method shown in Fig. 3;
Fig. 5 is the structural representation of an embodiment of information push-delivery apparatus according to the application;
Fig. 6 is the structural representation of the computer system of the electronic equipment be suitable for for realizing the embodiment of the present application.
Embodiment
Below in conjunction with drawings and Examples, the application is described in further detail.Be understandable that, specific embodiment described herein is only for explaining related invention, but not the restriction to this invention.It also should be noted that, for convenience of description, in accompanying drawing, illustrate only the part relevant to Invention.
It should be noted that, when not conflicting, the embodiment in the application and the feature in embodiment can combine mutually.Below with reference to the accompanying drawings and describe the application in detail in conjunction with the embodiments.
Please refer to Fig. 1, it illustrates the flow process 100 of an embodiment of the method for information pushing.The present embodiment is mainly applied in this way in the electronic equipment of certain arithmetic capability and illustrates, this electronic equipment can include but not limited to smart mobile phone, panel computer, E-book reader, MP3 player (MovingPictureExpertsGroupAudioLayerIII, dynamic image expert compression standard audio frequency aspect 3), MP4 (MovingPictureExpertsGroupAudioLayerIV, dynamic image expert compression standard audio frequency aspect 4) player, pocket computer on knee and desk-top computer etc.This information-pushing method, comprises the following steps:
Step 101, obtains the page access information of at least one website, and wherein, page access information comprises network address and the page access amount of the accessed page.
In the present embodiment, electronic equipment (can be such as the background server that the application comprising information pushing runs electric terminal thereon or provides support for the application comprising information pushing) can from the page access information locally or remotely obtaining at least one website.Wherein, when above-mentioned electronic equipment is the Website server provided support at least one website, it directly can obtain above-mentioned page access information from this locality; And when above-mentioned electronic equipment is not the Website server provided support for website, it can obtain above-mentioned page access information by wired connection mode or radio connection from Website server.Above-mentioned radio connection includes but not limited to 3G/4G connection, WiFi connection, bluetooth connection, WiMAX connection, Zigbee connection, UWB (ultrawideband) connection and other radio connection developed known or future now.
Here, page access information can comprise network address and the page access amount of the accessed page.The accessed page can be the page of mistake accessed by the user.Usually, the corresponding network address of each page of user's access, this network address can represent with uniform resource locator (UniformResoureLocator, URL).Electronic equipment can obtain the URL of the page of mistake accessed by the user from one or more website (such as forum website).Alternatively, electronic equipment also can obtain the content of pages of the accessed page.
For each page that electronic equipment obtains, electronic equipment can also obtain page access amount while obtaining the URL of the page.Wherein page access amount can be the total accessed number of times of the page, also can be the accessed number of times of the page in certain hour section (such as 24 hours).The accessed page that electronic equipment obtains can be all pages of mistake accessed by the user, also can be the page that visit capacity is greater than certain threshold value (as 50 times), can also be the page that visit capacity arranges forward predetermined number (as 100,000) from high to low.
Step 102, the page corresponding to each network address carries out Context resolution, generates the keyword set of each accessed page.
In the present embodiment, electronic equipment can be resolved by various method the content of the page corresponding to each above-mentioned network address, therefrom extracts one or more keyword, generates keyword set.
In the optional implementation of the present embodiment, electronic equipment can be statistical analysis technique to the analytical approach of the content of the above-mentioned page.Such as electronic equipment can by the keyword of the implicit above-mentioned page of Dirichlet distribute (LatentDirichletAllocation, LDA) model extraction.Concrete, each page can be considered as a word frequency vector (such as comprising the vector of each word and the frequency of occurrences thereof) by electronic equipment, thus text message is changed into the numerical information being easy to modeling, and set up three layers of bayesian probability model according to word, theme and document (can using the content of pages of each page as a document) three-decker.Wherein, document obeys multinomial distribution to theme, and theme obeys multinomial distribution to word.Like this, each page represents the probability distribution that some themes are formed, and each theme represents the probability distribution that a lot of word is formed.Electronic equipment can according to the probability distribution of word, distribution probability is greater than the keyword of word as the page of certain threshold value (being such as greater than 1%), from each page, also can selects the keyword of word as the page of some (such as 20) according to distribution probability from high to low.
In the optional implementation of the present embodiment, electronic equipment also can be semantic analysis to the analytical approach of the content of the above-mentioned page.Such as, electronic equipment can carry out the process such as complete syncopate method to the content of the accessed page, and content segmentation is become word; Again importance calculating is carried out to obtained word and (such as adopt word frequency-reverse document-frequency method (TermFrequency-InverseDocumentFrequency, TF-IDF)), the result calculated based on importance filters out some conventional function words (for Chinese, as " ", " ") etc. do not produce the vocabulary of actual semanteme, and then obtain keyword.
Particularly, first electronic equipment can utilize complete syncopate method to be syncopated as all possible word mated with language dictionary, then uses statistical language model to determine optimum cutting result.Theming as " the current season income of residents " for content of pages, can first carry out language dictionary coupling, find all words of coupling---this, season, resident, income, this season, current season, degree, the income of residents, the people; These words represent with word grid (wordlattices) form, then do route searching based on word grid, then Corpus--based Method language model (such as N-Gram model) finds optimal path.If the language model scores of result display " the current season income of residents " is the highest, then " the current season income of residents " is the optimum cutting of " the current season income of residents ".N-Gram model described is here conventional a kind of language model, for Chinese, can be referred to as Chinese language model (ChineseLanguageModel, CLM).This N-Gram model is based on a kind of like this hypothesis, the appearance of N number of word is only to N-1 word is relevant above, and all uncorrelated with other any word, the probability of whole sentence is exactly the product of each word probability of occurrence, and the number of times that these probability can occur by directly adding up N number of word from language material simultaneously obtains.
Utilize after content segmentation becomes word by complete syncopate method, electronic equipment can adopt word frequency-reverse document-frequency (termfrequency-inversedocumentfrequency, TF-IDF) method to carry out importance calculating to these words.The main thought of word frequency-reverse document-frequency method is, if certain word or phrase occur more in a document or the page, and seldom occurs in other articles, then thinks that this word or phrase have good class discrimination ability, is applicable to for classification.Wherein, frequency (TermFrequency, TF) can weigh certain word or the phrase importance for a document or the page, if certain word or phrase occur that often, then TF is larger in a document or the page, otherwise TF is less; Reverse document frequency (inversedocumentfrequency, IDF) can weigh the general importance of a word or phrase, the frequency that word occurs in document sets or corpus is higher, and the general importance of this word is higher, IDF is less, otherwise IDF is larger.Electronic equipment can weigh certain word or the importance of phrase inside certain page according to the product of TF and IDF, thus extracts one or more keywords of the page.
It should be noted that, the various methods of above-mentioned semantic analysis mode are the known technologies of extensively investigation and application at present, do not repeat them here.
In some optional implementations of the present embodiment, electronic equipment can also carry out expansion and generate expanded keyword to the single keyword in above-mentioned one or more keyword, and expanded keyword is generated keyword set together with extracted keyword.In practice, each word can have synonym, such as " father " can have synonym " father ", each word also can have near synonym, such as " attend " to have near synonym " participation ", the all right relevant word of each word, such as " engineering drawing " can have conjunctive word " drafting ", etc.The synonym of single keyword in above-mentioned one or more keyword, near synonym, conjunctive word can gather by electronic equipment, as the expanded keyword of single keyword, and these expanded keyword are added above-mentioned keyword set.Wherein, the word often occurred together or phrase can as conjunctive words.Alternatively, the conjunctive word of single keyword can be obtained by the conjunctive word model of machine learning training in advance according to the large volume document captured in advance or page data.Such as, this conjunctive word model can be according to the large volume document that captures in advance or content of pages, through process such as complete syncopate methods, content segmentation is become word, then adds up the model of the probability that at least two words occur simultaneously.Wherein, simultaneously probability of occurrence be greater than the word of certain threshold value can conjunctive word each other.
In some optional implementations of the present embodiment, each keyword in keyword set can also have importance degree coefficient.Wherein, importance degree coefficient is the numerical value of measurement keyword relative to the importance degree of the page at its place.Such as, the importance degree coefficient of the keyword extracted from the page can be set to 1, the synon importance degree coefficient of this keyword be set to 0.8, the importance degree coefficient of the near synonym of this keyword or conjunctive word is set to 0.5, etc.What deserves to be explained is, importance degree coefficient is the significance level in order to distinguish keyword, and above concrete numerical value is the exemplary illustration to importance degree coefficient, does not form the restriction to importance degree coefficient.Alternatively, can also be associated with the number of times that keyword occurs the page from the importance degree coefficient of the keyword of page extraction, occurrence number is more, and importance degree coefficient is larger; The importance degree coefficient of expanded keyword can also be relevant with the degree of association between expanded keyword with the keyword extracted from the page, and such as, the synonym of the keyword extracted from the page can have the importance degree coefficient identical with this keyword.In practice, also the degree of association of conjunctive word can be comprised in the conjunctive word model preset, this degree of association can be directly proportional to the probability that word occurs simultaneously, can be the importance degree coefficient of this keyword and the product of the degree of association from the importance degree coefficient of the conjunctive word of the keyword of page extraction.
Step 103, mutually comparing based on keyword set, keyword set similarity being greater than the first predetermined threshold value merges, and generates at least one association page key words set.
In the present embodiment, electronic equipment can comparing further to different keyword set mutually, calculates the similarity between each keyword set, and keyword set similarity being greater than the first predetermined threshold value merges, and generates the set of association page key words.Wherein, the page can be associated each other for the accessed page that the keyword set generating the set of association page key words is corresponding.
Here, the similarity between keyword set can characterize the similarity degree between different keyword set.In the present embodiment, electronic equipment can characterize the similarity degree between keyword set by the number of same keyword between two set.Electronic equipment also can adopt the known Text similarity computing method of cosine similarity (cosinesimilarity) algorithm, Jaccard coefficient and so on to carry out Similarity Measure.For Jaccard coefficient method, electronic equipment can adopt the similarity of following formulae discovery two between keyword set A and B: the number of the word that the number/keyword set A of the word that the similarity between keyword set A with keyword set B=between keyword set A with keyword set B is total comprises together with keyword set B.
In some implementations, the word in keyword set can also have importance degree coefficient.Now, for cosine similarity algorithm, electronic equipment can adopt the similarity calculated with the following method between keyword set A and keyword set B: in the sum of products/keyword set A of the importance degree coefficient of word total between keyword set A and keyword set B in the quadratic sum of the importance degree coefficient of each word and keyword set B the quadratic sum of the importance degree coefficient of each word extract square root respectively after product.Such as, keyword set A comprises (Japan 1, make island 0.8, fill out sea 0.5), wherein, 1,0.8 and 0.5 be keyword " Japan " in keyword set A respectively, " making island " and " filling out sea " importance degree coefficient of having, keyword set B comprises (Japan 0.7, make island 1, sovereignty 0.6), wherein, 0.7,1 and 0.6 be keyword " Japan " in keyword set B respectively, " making island " and " sovereignty " importance degree coefficient of having, then the similarity between keyword set A and keyword set B can be:
( 1 × 0.7 + 0.8 × 1 ) / ( 1 2 + 0.8 2 + 0.5 2 × 0.7 2 + 1 2 + 0.6 2 ) .
What deserves to be explained is, first predetermined threshold value can be the threshold value (such as 0.5) of rule of thumb artificial setting, also can be carry out training according to the page sample obtained in advance to obtain disaggregated model, and by checking sample, this disaggregated model is verified, the threshold value when this disaggregated model has certain classification accuracy (as 99%).
Wherein, electronic equipment can only merge putting into a set after each word duplicate removal in different keyword set, electronic equipment also can put into a set by after each word duplicate removal in different keyword set, is added the importance degree coefficient of same keyword to merge simultaneously.
By this step, the accessed page division obtained in step 101 can be multiple classification by electronic equipment.Wherein, each classification is made up of at least one accessed page, and the content of pages of these accessed pages is similar or be associated, and associates the page each other.Meanwhile, the keyword set merged generation association page key words set that these association pages are corresponding.
In some implementations, in this step, electronic equipment also can pass through the method acquisition association page of text cluster (as K-means), and generates the set of association page key words.For K-means clustering method, first electronic equipment can choose the barycenter of K the highest page of page access amount as cluster, then the distance of other pages to each barycenter is measured, and it is grouped into the class of nearest barycenter, then the barycenter of each class obtained is recalculated, circulation execution step " measure the distance of other pages to each barycenter; and it is grouped into the class of nearest barycenter " until new barycenter and the protoplasm heart are equal to or less than appointment threshold value, now, the page is divided into K classification.In this K classification, the accessed page of each classification correspondence can associate the page each other.The keyword set associating the accessed page of the page is each other merged according to above-mentioned method, the set of association page key words can be obtained.
Step 104, based on the ranking results of the page access amount sum of the accessed page of each set correspondence at least one association page key words set, utilizes one or more set generation first pushed information at least one association page key words set.
In the present embodiment, first electronic equipment can obtain the summation of the page access amount of at least one accessed page corresponding to association page key words set above-mentioned, and the summation of these page access amounts is carried out sort (such as clooating sequence be the summation of page access amount from high to low), then based on ranking results, one or more set generation first pushed information at least one association page key words set are utilized.
Such as, when the order from high to low of the summation according to above-mentioned page access amount sorts, electronic equipment can obtain the association page key words set of the forward predetermined number (such as 10) of arrangement, then the accessed page corresponding to these association page key words set or these association page key words set, generates the first pushed information.Here, electronic equipment can choose the page that in the accessed page corresponding to association page key words set, issuing time is nearest, using the theme of this page or keyword as the first pushed information.Electronic equipment also can by each word in association page key words set according to the page quantity of the corresponding accessed page or page access amount is descending sorts, choose come the most front predetermined number keyword as the first pushed information.Electronic equipment can also using the theme of the page the highest for page access amount in the association page corresponding for the set of association page key words as the first pushed information.Electronic equipment can also otherwise, as by the keyword of the page the highest for page access amount in the accessed page corresponding for the set of association page key words as the first pushed information.The application does not limit this.Alternatively, the first pushed information can also comprise the summation of the page access amount of the association page corresponding to association page key words set, or for the page access amount of the accessed page that generates the first pushed information.
In some implementations, this first pushed information can be pushed to user by electronic equipment.First pushed information directly can also be presented to user by electronic equipment, first pushed information can also be pushed to user with hyperlink form, this hyperlink can be the text comprising keyword or subject name, for the accessed page that page access amount in the association page corresponding to the association page key words set that is linked to the accessed page corresponding to this first pushed information or generates this first pushed information is the highest.
By this step, electronic equipment can obtain front N (N is positive integer) the individual classification that in classification corresponding to the above-mentioned page, visit capacity is the highest, and this N number of classification is generated N bar first pushed information.
Step 105, based at least one the accessed page corresponding to the association page key words set for generating the first pushed information, the second pushed information that generation is associated with the first pushed information is also pushed to user.
In the present embodiment, for every bar first pushed information, electronic equipment can obtain the accessed page corresponding to the association page key words set for generating the first pushed information, and therefrom choose at least one accessed page, according to this, at least one accessed page generates the second pushed information be associated with aforementioned first pushed information.
Here, the second pushed information can generate according to the page be associated with the first pushed information.Such as, if the first pushed information is that in association page key words set, each word is according to the page quantity of the corresponding accessed page or page access amount is descending carries out sorting and the keyword coming the most front predetermined number chosen, the second pushed information can be the theme of M (M is positive integer) the individual page that the number of word in the keyword comprising this predetermined number is maximum; If the first pushed information is the theme associating the accessed page that page access amount is the highest in the association page corresponding to page key words set, second pushed information can be the theme associating front M (M is positive integer) the individual page (can comprising the page for generating the first pushed information, also can not comprising the page for generating the first pushed information) that page access amount is the highest in the association page corresponding to page key words set.
Wherein, the second pushed information can be presented to user by electronic equipment together with the first pushed information, also after presenting the first pushed information to user, can detecting the scheduled operation of user, in response to scheduled operation being detected, the second pushed information being showed user.Such as, the second pushed information can present when user clicks the first pushed information, also can present when user clicks button corresponding to the first pushed information, can also present in response to mouse-over, etc.Alternatively, the second pushed information can be pushed to user with the form of hyperlink, and this hyperlink can be associated with the page corresponding to the second pushed information.
As shown in Figure 2, the example of the present embodiment when embody rule is given.In the figure 2 example, electronic equipment is first from network address and the page access amount of the accessed page of at least one station for acquiring, then Context resolution is carried out to each accessed page, generate the keyword set of each accessed page, mutually comparing then based on keyword set, keyword set similarity being greater than the first predetermined threshold value merges, generate the set of association page key words, then from the set of association page key words, choose 3 association page key words set that the page access amount sum of the corresponding association page is the highest, and the theme of the accessed page the highest for page access amount corresponding respectively for the set of these 3 association page key words is generated the first pushed information 201 (hot news as in network), then, at least one (as 3) accessed page is obtained from the accessed page corresponding to the association page key words set for generating the first pushed information 201, generate the second pushed information 202 (the background news as hot news) be associated with the first pushed information, and be pushed to user.
In fig. 2, first pushed information 201 can comprise theme 2011, associate page access amount sum 2012 and the button 2013 of the association page corresponding to page key words set, when button 2013 is clicked by user, electronic equipment shows the theme 2021 that the second pushed information 202 comprises.Wherein theme 2011 and theme 2021 can be the texts of hyperlink form, respectively in order to be linked to the accessed page of theme 2011 and theme 2021 correspondence.Can to be such as electronic equipment push media event more concerned on network to the editorial staff of website for the application scenarios of this example, and the background information of these media events, so that editorial staff edits media event and upgrades web site contents.
Above-described embodiment of the application by pushing the second pushed information be associated with the first related information to user, thus can show the content of abundanter pushed information to user.
With further reference to Fig. 3, it illustrates the flow process 300 of another embodiment of the method for the information pushing of the application.This information-pushing method 300, comprises the following steps:
Step 301, obtains the page access information of at least one website, and wherein, page access information comprises network address and the page access amount of the accessed page.
In the present embodiment, electronic equipment (can be such as the background server that the application comprising information pushing runs electric terminal thereon or provides support for the application comprising information pushing) can from the page access information locally or remotely obtaining at least one website.Here, page access information can comprise network address (such as URL) and the page access amount of the accessed page.
Step 302, the page corresponding to each network address carries out Context resolution, generates the keyword set of each accessed page.
In the present embodiment, electronic equipment can be resolved by various method (such as statistical analysis technique or semantic analysis) the content of the page corresponding to each above-mentioned network address, therefrom extract one or more keyword, generate keyword set.In some implementations, electronic equipment can also carry out expansion and generate expanded keyword to the single keyword in above-mentioned one or more keyword, and extracted keyword is generated keyword set together with expanded keyword.Wherein, expanded keyword can comprise the synonym of extracted single keyword, near synonym and conjunctive word.Alternatively, each keyword in keyword set can also have importance degree coefficient.
Step 303, mutually comparing based on keyword set, keyword set similarity being greater than the first predetermined threshold value merges, and generates at least one association page key words set.
In the present embodiment, electronic equipment can comparing further to different keyword set mutually, calculates the similarity between each keyword set, and keyword set similarity being greater than the first predetermined threshold value merges, and generates the set of association page key words.Wherein, the page can be associated each other for the accessed page that the keyword set generating the set of association page key words is corresponding.
Here, the similarity between keyword set can characterize the similarity degree between different keyword set.In the present embodiment, electronic equipment can characterize the similarity degree between keyword set by the number of same keyword between two set.Electronic equipment can adopt the known Text similarity computing method of cosine similarity (cosinesimilarity) algorithm, Jaccard coefficient and so on to carry out Similarity Measure.In some implementations, the word in keyword set can also have importance degree coefficient.Now, electronic equipment can based on the similarity between importance degree coefficient calculations keyword set.
Step 304, based on the ranking results of the page access amount sum of the accessed page of each set correspondence at least one association page key words set, utilizes one or more set generation first pushed information at least one association page key words set.
In the present embodiment, first electronic equipment can obtain the summation of the page access amount of at least one accessed page corresponding to association page key words set above-mentioned, and the summation of these page access amounts is carried out sort (such as clooating sequence be the summation of page access amount from high to low), then based on ranking results, one or more set generation first pushed information at least one association page key words set are utilized.
Step 305, carrying out cluster to the issuing time of the accessed page corresponding to the association page key words set for generating the first pushed information according to the time interval of presetting, being divided at least one time period.
In the present embodiment, the issuing time of the accessed page that electronic equipment can be corresponding to the association page key words set for generating the first pushed information carries out cluster according to the time interval of presetting, and is divided at least one time period.Here, when at least one time period above-mentioned comprises the plural time period, the result of cluster can be: the mistiming between the issuing time taking from any two time periods is respectively greater than the above-mentioned default time interval.
Cluster is the process set of physics or abstract object being divided into the multiple classes be made up of similar object.Here, the issuing time of the accessed page according to the object of the time interval cluster preset is by electronic equipment: the issuing time of the accessed page is divided at least one time period, thus the accessed page is divided into the close multiple classes of issuing time.
In the present embodiment, various known clustering algorithm can be used according to the cluster of issuing time.Such as, electronic equipment can based on hierarchical clustering algorithm, two issuing time that each merging time interval is minimum, until the mistiming between the time interval minimum two issuing time is more than or equal to the default time interval, thus, the accessed page corresponding for the set of association page key words is divided into the page issued in different time sections according to issuing time.Any two the accessed pages issued in different time sections, their issuing time is all greater than the default time interval.
In the optional implementation of the present embodiment, electronic equipment can also according to the prefixed time interval of the different time sections determination cluster of a day.Such as, electronic equipment can obtain the page issue amount of many days in advance, according to the distribution time division interval of page issue amount.Such as, supposing that the Homepage Publishing amount of 0:00 to 6:00 every day is fewer, then can be that the prefixed time interval of 0:00 to 6:00 is set to a long period section, as 2 hours by issuing time; Equally, supposing that the Homepage Publishing amount between every day 9:00 to 11:00 is many, then can be that the prefixed time interval of 9:00 to 11:00 is set to shorter time period, as 20 minutes by issuing time.
By this step, the accessed page corresponding for the set of an association page key words temporally can divide and come by electronic equipment, and the accessed page of different time sections may have recorded the event content of different stages of development.
Step 306, for the one or more time periods at least one time period above-mentioned, extracts a page respectively from the accessed page corresponding to each time period.
In the present embodiment, electronic equipment for the one or more time periods at least one time period above-mentioned, can extract a page respectively from the accessed page corresponding to each time period.
Here, the page that electronic equipment extracts, can be the corresponding time period interior any page issued, also can be the page by certain Rule.When electronic equipment is by certain Rule page, the page that in the corresponding time period, page access amount is the highest can be obtained, also the issuing time page the earliest in the corresponding time period can be obtained, the page can also be obtained according to the priority level of the website of the issue page preset, etc., the application does not limit this.
Step 307, based on the extracted page, generates the second pushed information and is pushed to user.
In the present embodiment, electronic equipment based on the page extracted in step 306, according to certain rule, can generate the second pushed information, and the second pushed information can be pushed to user.The mode that electronic equipment generates the second pushed information based on the extracted page has a lot, such as, electronic equipment can using the theme of the extracted page or key word as the second pushed information, electronic equipment also can choose the page of front predetermined number from the extracted page according to issuing time order from the close-by examples to those far off, using the theme of these pages or key word as the second pushed information, etc.The application does not limit this.
In the optional implementation of the present embodiment, between step 304 and step 305, the duplicate removal step of the page can also be comprised.The accessed page corresponding to the set of association page key words can be done following process by electronic equipment: for the accessed page corresponding to the set of association page key words, the accessed page corresponding to keyword set similarity being greater than the second predetermined threshold value screens out to a page, using the remaining accessed page after screening out the page as the accessed page corresponding to the set of association page key words.
Here, the algorithm of similarity is identical with the computing method in the step 103 of previous embodiment, does not repeat them here.Wherein, the second predetermined threshold value can be greater than the first predetermined threshold value.Electronic equipment by the principle of this step to the accessed page duplicate removal corresponding to the set of association page key words is:
Such as, the second predetermined threshold value value is 98%, then, when the similarity of two keyword set is greater than 98%, electronic equipment can think that the accessed page corresponding to these two keyword set is the page of identical content, the page namely repeated.Electronic equipment can retain any one page from the page repeated, also can retain by certain rule interestingness page from the page repeated, retain etc. as chosen the issuing time page the earliest, screen out other pages in the page of repetition, using the remaining accessed page after screening out the page as the accessed page corresponding to the set of association page key words simultaneously.Suppose that the accessed page associated corresponding to page key words set has 1000, wherein there are 30 groups of pages repeated, the page often organizing repetition all comprises 2 pages, then electronic equipment screens out 1 page from this each group page of 30 groups, retain 1 page, then remain 970 pages as the accessed page corresponding to the set of association page key words.For the page be not retained in the page repeated, the page info of this page can be deleted by electronic equipment.Alternatively, for the page repeated, the page access amount of the page be not retained can be accumulated in the page access amount of the page of reservation by electronic equipment.
In the present embodiment, the step 301 in above-mentioned realization flow, step 302, step 303 and step 304 are substantially identical with the step 101 in previous embodiment, step 102, step 103 and step 104 respectively, do not repeat them here.
As can be seen from Figure 3, the embodiment corresponding with Fig. 1 unlike, flow process 300 step 305,306,307 of the information-pushing method in the present embodiment instead of step 105.By step 305,306,307, the accessed page corresponding for association page key words set corresponding for the first pushed information can extract according to the time period by the present embodiment, thus generates the second pushed information be associated with the first pushed information.When these pages belong to same event, the content of pages of the page in each time period can give a state of development of outgoing event, within each time period, extract a page generate the second pushed information, user can be made to be understood by the evolution of the second pushed information to whole event.As shown in Figure 4, be the design sketch of an application scenarios of the information-pushing method of the present embodiment.Application scenarios shown in Fig. 4 is the propelling movement scene of hot news information, wherein, and 401 instruction the first pushed information, 402 instruction the second pushed information.The present embodiment contributes to the Information of Development pushed to user in each time period of the first pushed information.Alternatively, before the issuing time of the accessed page is carried out cluster, first to page duplicate removal, the page of identical content can be had thus the validity of reduction information pushing to avoid acquisition in different time sections.
With further reference to Fig. 5, as the realization to method shown in above-mentioned each figure, this application provides an a kind of embodiment of device of information pushing, this device embodiment is corresponding with the embodiment of the method shown in Fig. 1, and this device specifically can be applied in electronic equipment.
As shown in Figure 5, the device 500 of the information pushing described in the present embodiment comprises: data obtaining module 501, keyword set generation module 502, keyword set merge module 503, first pushed information generation module 504 and the second pushed information generates and pushing module 505.Wherein, data obtaining module 501 is configured for the page access information obtaining at least one website, and wherein, page access information comprises network address and the page access amount of the accessed page; Keyword set generation module 502 is configured for the page corresponding to each network address and carries out Context resolution, generates the keyword set of each accessed page; Keyword set merges module 503 and is configured for mutually comparing based on keyword set, keyword set similarity being greater than the first predetermined threshold value merges, generate at least one association page key words set, wherein, the page is associated each other for the accessed page that the keyword set generating the set of association page key words is corresponding; First pushed information generation module 504 is configured for the ranking results of the page access amount sum of the accessed page of each set correspondence concentrated based at least one association page key words, utilizes one or more set generation first pushed information at least one association page key words set above-mentioned; Second pushed information generates and pushing module 505 is configured for based at least one the accessed page corresponding to the association page key words set for generating the first pushed information, and the second pushed information that generation is associated with the first pushed information is also pushed to user.
In the present embodiment, information push-delivery apparatus 500 can first by data obtaining module 501 from the page access information locally or remotely obtaining at least one website.Here, page access information can comprise network address (such as URL) and the page access amount of the accessed page.
In the present embodiment, keyword set generation module 502 can then be resolved by various method (such as statistical analysis technique or semantic analysis) the content of the page corresponding to each above-mentioned network address, therefrom extract one or more keyword, generate keyword set.In some implementations, keyword set generation module 502 can also carry out expansion and generate expanded keyword to the single keyword in above-mentioned one or more keyword, and extracted keyword is generated keyword set together with expanded keyword.Wherein, expanded keyword can comprise the synonym of extracted single keyword, near synonym and conjunctive word.Alternatively, each keyword in keyword set can also have importance degree coefficient.
In the present embodiment, keyword set merges module 503 and then mutually can compare the keyword set that keyword set generation module 502 generates, and keyword set similarity being greater than the first predetermined threshold value merges, generate at least one association page key words set.Wherein, the page is associated each other for the accessed page that the keyword set generating the set of association page key words is corresponding.Here, the similarity between keyword set can be calculated by multiple method.
In the present embodiment, first pushed information generation module 504 then can obtain the summation of the page access amount of at least one accessed page corresponding to association page key words set above-mentioned, and the summation of these page access amounts is carried out sort (such as clooating sequence be the summation of page access amount from high to low), then based on ranking results, one or more set generation first pushed information at least one association page key words set are utilized.
In the present embodiment, second pushed information generates and pushing module 505 then can for every bar first pushed information, obtain the accessed page corresponding to the association page key words set for generating the first pushed information, and therefrom choose at least one accessed page, then according to this, at least one accessed page generates the second pushed information of being associated with aforementioned first pushed information and is pushed to the second user.
In some optional implementations of the present embodiment, second pushed information generates and pushing module 505 can comprise: cluster cell (not shown), be configured for and according to the time interval of presetting, cluster is carried out to the issuing time of the accessed page corresponding to the association page key words set for generating the first pushed information, be divided at least one time period; Extraction unit (not shown), is configured for for the one or more time periods at least one time period, extracts a page respectively from the accessed page corresponding to each time period; Generation unit (not shown), is configured for based on the extracted page, generates the second pushed information and is pushed to user.Here, when at least one time period above-mentioned comprises the plural time period, the result of cluster can be: the mistiming between the issuing time taking from any two time periods is respectively greater than the above-mentioned default time interval.
In some optional implementations of the present embodiment, second pushed information generates and pushing module 505 can also comprise: screen out unit (not shown), be configured for for the accessed page corresponding to the set of association page key words, the accessed page corresponding to keyword set similarity being greater than the second predetermined threshold value screens out to a page, using the remaining accessed page after screening out the page as the accessed page corresponding to the set of association page key words.Wherein, the second predetermined threshold value is greater than the first predetermined threshold value.The effect screening out unit is to the accessed page duplicate removal corresponding to the set of association page key words.
What deserves to be explained is, all modules or the unit of record in information push-delivery apparatus 500 are corresponding with each step in the method described with reference to figure 1.Thus, the operation described for method above and feature are equally applicable to information push-delivery apparatus 500 and the module wherein comprised or unit, do not repeat them here.
It will be understood by those skilled in the art that above-mentioned information push-delivery apparatus 500 also comprises some other known features, such as processor, storeies etc., in order to unnecessarily fuzzy embodiment of the present disclosure, these known structures are not shown in Figure 5.
Below with reference to Fig. 6, it illustrates the structural representation of the computer system 600 of the electronic equipment be suitable for for realizing the embodiment of the present application.
As shown in Figure 6, computer system 600 comprises CPU (central processing unit) (CPU) 601, and it or can be loaded into the program random access storage device (RAM) 603 from storage area 608 and perform various suitable action and process according to the program be stored in ROM (read-only memory) (ROM) 602.In RAM603, also store system 600 and operate required various program and data.CPU601, ROM602 and RAM603 are connected with each other by bus 604.I/O (I/O) interface 605 is also connected to bus 604.
I/O interface 605 is connected to: the importation 606 comprising keyboard, mouse etc. with lower component; Comprise the output 607 of such as cathode-ray tube (CRT) (CRT), liquid crystal display (LCD) etc. and loudspeaker etc.; Comprise the storage area 608 of hard disk etc.; And comprise the communications portion 609 of network interface unit of such as LAN card, modulator-demodular unit etc.Communications portion 609 is via the network executive communication process of such as the Internet.Driver 610 is also connected to I/O interface 605 as required.Detachable media 611, such as disk, CD, magneto-optic disk, semiconductor memory etc., be arranged on driver 610 as required, so that the computer program read from it is mounted into storage area 608 as required.
Especially, according to the embodiment of the application, the process that reference flow sheet describes above may be implemented as computer software programs.Such as, the embodiment of the application comprises a kind of computer program, and it comprises the computer program visibly comprised on a machine-readable medium, and described computer program comprises the program code for the method shown in flowchart.In such embodiments, this computer program can be downloaded and installed from network by communications portion 609, and/or is mounted from detachable media 611.
Unit involved in the embodiment of the present application can be realized by the mode of software, also can be realized by the mode of hardware.Described module also can be arranged within a processor, such as, can be described as: a kind of processor comprises.Wherein data obtaining module, keyword set generation module, keyword set merge module, the first pushed information generation module and the second pushed information and generate and pushing module, the title of these modules does not form the restriction to this module itself under certain conditions, such as, data obtaining module can also be described to " being configured for the module of the page access information obtaining at least one website ".
As another aspect, present invention also provides a kind of computer-readable recording medium, this computer-readable recording medium can be the computer-readable recording medium comprised in device described in above-described embodiment; Also can be individualism, be unkitted the computer-readable recording medium allocated in terminal.Described computer-readable recording medium stores more than one or one program, and described program is used for performance description in the method for the information pushing of the application by one or more than one processor.
More than describe and be only the preferred embodiment of the application and the explanation to institute's application technology principle.Those skilled in the art are to be understood that, invention scope involved in the application, be not limited to the technical scheme of the particular combination of above-mentioned technical characteristic, also should be encompassed in when not departing from described inventive concept, other technical scheme of being carried out combination in any by above-mentioned technical characteristic or its equivalent feature and being formed simultaneously.The technical characteristic that such as, disclosed in above-mentioned feature and the application (but being not limited to) has similar functions is replaced mutually and the technical scheme formed.

Claims (14)

1. an information-pushing method, is characterized in that, described method comprises:
Obtain the page access information of at least one website, wherein, described page access information comprises network address and the page access amount of the accessed page;
The page corresponding to each network address carries out Context resolution, generates the keyword set of each accessed page;
Mutually comparing based on keyword set, keyword set similarity being greater than the first predetermined threshold value merges, generate at least one association page key words set, wherein, the accessed page corresponding to the keyword set for generating the set of association page key words associates the page each other;
Based on the ranking results of the page access amount sum of the accessed page of each set correspondence at least one association page key words set described, utilize one or more set generation first pushed information at least one association page key words set described;
Based at least one the accessed page corresponding to the association page key words set for generating described first pushed information, the second pushed information that generation is associated with described first pushed information is also pushed to user.
2. method according to claim 1, it is characterized in that, described based at least one the accessed page corresponding to the association page key words set for generating described first pushed information, the second pushed information that generation is associated with described first pushed information is also pushed to user, comprising:
According to the time interval of presetting, cluster is carried out to the issuing time of the accessed page corresponding to the association page key words set for generating described first pushed information, be divided at least one time period, wherein, when at least one time period described comprises the plural time period, the mistiming between the issuing time taking from any two time periods is respectively greater than the described time interval;
For the one or more time periods at least one time period described, from the accessed page corresponding to each time period, extract a page respectively;
Based on the extracted page, generate the second pushed information and be pushed to user.
3. method according to claim 2, it is characterized in that, the described issuing time to the accessed page corresponding to the association page key words set for generating described first pushed information carries out cluster according to the time interval of presetting, and before being divided at least one time period, also comprises:
For the accessed page corresponding to the set of association page key words, the accessed page corresponding to keyword set similarity being greater than the second predetermined threshold value screens out to a page, using the remaining accessed page after screening out the page as the accessed page corresponding to the set of association page key words, wherein, described second predetermined threshold value is greater than the first predetermined threshold value.
4. method according to claim 1, is characterized in that, the described page corresponding to each network address carries out Context resolution, and the keyword set generating each accessed page comprises:
Statistical study and/or semantic analysis are carried out to the content of the described accessed page, extracts at least one keyword;
Based at least one keyword described, generate keyword set.
5. method according to claim 4, is characterized in that, described based at least one keyword described, generates keyword set and comprises:
For the single keyword in each at least one keyword described, carry out expanding to generate expanded keyword, wherein, described expanded keyword comprises following at least one item: the conjunctive word of the synonym of described single keyword, the near synonym of described single keyword, described single keyword;
Based at least one keyword described and described expanded keyword, generate keyword set.
6. according to described method arbitrary in claim 1-5, it is characterized in that, the keyword set one of to meet the following conditions be greater than the keyword set of the first predetermined threshold value as similarity:
The number of same keyword is greater than number threshold value;
The number of same keyword is greater than fractional threshold with the ratio of total number of keyword in the keyword set compared.
7., according to described method arbitrary in claim 1-5, it is characterized in that, each keyword in described keyword set also has importance degree coefficient, and
Described mutually comparing based on keyword set, keyword set similarity being greater than the first predetermined threshold value merges, and generates at least one association page key words set and comprises:
Based on described importance degree coefficient, Similarity Measure is carried out to different keyword set;
Keyword set similarity being greater than similarity threshold merges, and generates the set of association page key words.
8. an information push-delivery apparatus, is characterized in that, described device comprises:
Data obtaining module, be configured for the page access information obtaining at least one website, wherein, described page access information comprises network address and the page access amount of the accessed page;
Keyword set generation module, is configured for the page corresponding to each network address and carries out Context resolution, generate the keyword set of each accessed page;
Keyword set merges module, be configured for mutually comparing based on keyword set, keyword set similarity being greater than the first predetermined threshold value merges, generate at least one association page key words set, wherein, the page is associated each other for the accessed page that the keyword set generating the set of association page key words is corresponding;
First pushed information generation module, be configured for the ranking results of the page access amount sum based on the accessed page corresponding to each set at least one association page key words set described, utilize one or more set generation first pushed information at least one association page key words set described;
Second pushed information generates and pushing module, be configured for based at least one the accessed page corresponding to the association page key words set for generating described first pushed information, the second pushed information that generation is associated with described first pushed information is also pushed to user.
9. device according to claim 8, is characterized in that, described second pushed information generates and pushing module comprises:
Cluster cell, be configured for and according to the time interval of presetting, cluster is carried out to the issuing time of the accessed page corresponding to the association page key words set for generating described first pushed information, be divided at least one time period, wherein, when at least one time period described comprises the plural time period, the mistiming between the issuing time taking from any two time periods is respectively greater than the described time interval;
Extraction unit, is configured for for the one or more time periods at least one time period described, extracts a page respectively from the accessed page corresponding to each time period;
Generation unit, is configured for based on the extracted page, generates the second pushed information and is pushed to user.
10. method according to claim 9, is characterized in that, described second pushed information generates and pushing module also comprises:
Screen out unit, be configured for for the accessed page corresponding to the set of association page key words, the accessed page corresponding to keyword set similarity being greater than the second predetermined threshold value screens out to a page, using the remaining accessed page after screening out the page as the accessed page corresponding to the set of association page key words, wherein, described second predetermined threshold value is greater than the first predetermined threshold value.
11. methods according to claim 8, is characterized in that, described keyword set generation module comprises:
Keyword extracting unit, is configured for and carries out statistical study and/or semantic analysis to the content of the described accessed page, extract at least one keyword;
Keyword set generation unit, is configured for based at least one keyword described, generates keyword set.
12. methods according to claim 11, is characterized in that, described keyword set generation unit comprises:
Expansion subelement, be configured for for the single keyword in each at least one keyword described, carry out expanding to generate expanded keyword, wherein, described expanded keyword comprises following at least one item: the conjunctive word of the synonym of described single keyword, the near synonym of described single keyword, described single keyword;
Keyword set generates subelement, is configured for based at least one keyword described and described expanded keyword, generates keyword set.
13. methods according to Claim 8 according to any one of-12, is characterized in that, described keyword set merges module and is configured for further:
The keyword set one of to meet the following conditions is greater than the keyword set of the first predetermined threshold value as similarity:
The number of same keyword is greater than number threshold value;
The number of same keyword is greater than fractional threshold with the ratio of total number of keyword in the keyword set compared.
14. methods according to Claim 8 according to any one of-12, it is characterized in that, each keyword in described keyword set also has importance degree coefficient, and
Described keyword set merges module and comprises:
Computing unit, is configured for and carries out Similarity Measure based on described importance degree coefficient to different keyword set;
Merge and generation unit, be configured for keyword set similarity being greater than similarity threshold and merge, generate the set of association page key words.
CN201510483126.3A 2015-08-03 2015-08-03 Information push method and apparatus Active CN105069102B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201510483126.3A CN105069102B (en) 2015-08-03 2015-08-03 Information push method and apparatus
PCT/CN2015/095754 WO2017020451A1 (en) 2015-08-03 2015-11-27 Information push method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510483126.3A CN105069102B (en) 2015-08-03 2015-08-03 Information push method and apparatus

Publications (2)

Publication Number Publication Date
CN105069102A true CN105069102A (en) 2015-11-18
CN105069102B CN105069102B (en) 2017-05-24

Family

ID=54498472

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510483126.3A Active CN105069102B (en) 2015-08-03 2015-08-03 Information push method and apparatus

Country Status (2)

Country Link
CN (1) CN105069102B (en)
WO (1) WO2017020451A1 (en)

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105491056A (en) * 2015-12-25 2016-04-13 深圳市金立通信设备有限公司 Information pushing method and terminal
CN106294815A (en) * 2016-08-16 2017-01-04 晶赞广告(上海)有限公司 The clustering method of a kind of URL and device
CN106372204A (en) * 2016-08-31 2017-02-01 北京小米移动软件有限公司 Push message processing method and device
WO2017020451A1 (en) * 2015-08-03 2017-02-09 百度在线网络技术(北京)有限公司 Information push method and device
CN106777283A (en) * 2016-12-29 2017-05-31 北京奇虎科技有限公司 The method for digging and device of a kind of synonym
CN106777403A (en) * 2017-03-28 2017-05-31 百度在线网络技术(北京)有限公司 Information-pushing method and device
CN106933912A (en) * 2015-12-31 2017-07-07 北京国双科技有限公司 The acquisition methods and device of keyword
WO2017143703A1 (en) * 2016-02-24 2017-08-31 百度在线网络技术(北京)有限公司 Offline resource mining method and device
CN107172151A (en) * 2017-05-18 2017-09-15 百度在线网络技术(北京)有限公司 Method and apparatus for pushed information
CN107196999A (en) * 2017-05-03 2017-09-22 网易传媒科技(北京)有限公司 Method and apparatus for issuing information flow propelling data
CN107451161A (en) * 2016-06-01 2017-12-08 阿里巴巴集团控股有限公司 Show method for pushing, device and the platform of object
CN107463552A (en) * 2017-07-20 2017-12-12 北京奇艺世纪科技有限公司 A kind of method and apparatus for generating video subject title
CN108241699A (en) * 2016-12-26 2018-07-03 百度在线网络技术(北京)有限公司 For the method and apparatus of pushed information
CN108304377A (en) * 2017-12-28 2018-07-20 东软集团股份有限公司 A kind of extracting method and relevant apparatus of long-tail word
CN108363707A (en) * 2017-01-26 2018-08-03 百度在线网络技术(北京)有限公司 Method and apparatus for generating webpage
CN108416019A (en) * 2018-03-06 2018-08-17 王海泉 Conjunctive word method of adjustment and adjustment system
CN108846028A (en) * 2018-05-24 2018-11-20 网易传媒科技(北京)有限公司 Article put-on method, medium, device and calculating equipment
CN109189908A (en) * 2018-08-22 2019-01-11 重庆市智权之路科技有限公司 Mass data extracts push working method
CN109345307A (en) * 2018-09-28 2019-02-15 西安Tcl软件开发有限公司 Advertisement sending method, system, terminal and computer readable storage medium
CN109582863A (en) * 2018-11-19 2019-04-05 珠海格力电器股份有限公司 A kind of recommended method and server
CN110309395A (en) * 2019-07-05 2019-10-08 云南电网有限责任公司电力科学研究院 A kind of professional dictionary construction method based on data acquisition technology
CN110888986A (en) * 2019-12-06 2020-03-17 北京明略软件系统有限公司 Information pushing method and device, electronic equipment and computer readable storage medium
CN111008340A (en) * 2019-12-19 2020-04-14 中国联合网络通信集团有限公司 Course recommendation method, device and storage medium
CN111523027A (en) * 2020-04-16 2020-08-11 武汉有牛科技有限公司 Automatic data news writing robot based on block chain technology
CN113420550A (en) * 2021-06-30 2021-09-21 中国农业银行股份有限公司 Method and device for extracting keywords
CN116340639A (en) * 2023-03-31 2023-06-27 北京百度网讯科技有限公司 News recall method, device, equipment and storage medium

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110163701B (en) * 2018-02-11 2023-11-03 北京京东尚科信息技术有限公司 Method and device for pushing information
CN108921918B (en) * 2018-07-24 2023-05-30 Oppo广东移动通信有限公司 Video creation method and related device
CN109785919B (en) * 2018-11-30 2023-06-23 平安科技(深圳)有限公司 Noun matching method, noun matching device, noun matching equipment and computer readable storage medium
CN112733006B (en) * 2019-10-14 2022-12-02 中国移动通信集团上海有限公司 User portrait generation method, device and equipment and storage medium
CN111460289B (en) * 2020-03-27 2024-03-29 北京百度网讯科技有限公司 News information pushing method and device
CN114357278B (en) * 2020-09-28 2024-03-19 腾讯科技(深圳)有限公司 Topic recommendation method, device and equipment
CN113781113B (en) * 2021-09-09 2022-06-21 杭州爆米花鹰眼科技有限责任公司 Chained information pushing system and method
CN114817730B (en) * 2022-05-06 2023-06-20 成都坐联智城科技有限公司 Information activity information recommendation system and method under big data situation

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070260597A1 (en) * 2006-05-02 2007-11-08 Mark Cramer Dynamic search engine results employing user behavior
CN104102723A (en) * 2014-07-21 2014-10-15 百度在线网络技术(北京)有限公司 Search content providing method and search engine

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101984423B (en) * 2010-10-21 2012-07-04 百度在线网络技术(北京)有限公司 Hot-search word generation method and system
CN103164521B (en) * 2013-03-11 2016-03-23 亿赞普(北京)科技有限公司 A kind ofly to browse and the keyword calculation method of search behavior and device based on user
CN105069102B (en) * 2015-08-03 2017-05-24 百度在线网络技术(北京)有限公司 Information push method and apparatus

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070260597A1 (en) * 2006-05-02 2007-11-08 Mark Cramer Dynamic search engine results employing user behavior
CN104102723A (en) * 2014-07-21 2014-10-15 百度在线网络技术(北京)有限公司 Search content providing method and search engine

Cited By (36)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017020451A1 (en) * 2015-08-03 2017-02-09 百度在线网络技术(北京)有限公司 Information push method and device
CN105491056A (en) * 2015-12-25 2016-04-13 深圳市金立通信设备有限公司 Information pushing method and terminal
CN106933912B (en) * 2015-12-31 2020-07-03 北京国双科技有限公司 Keyword acquisition method and device
CN106933912A (en) * 2015-12-31 2017-07-07 北京国双科技有限公司 The acquisition methods and device of keyword
WO2017143703A1 (en) * 2016-02-24 2017-08-31 百度在线网络技术(北京)有限公司 Offline resource mining method and device
US11416502B2 (en) 2016-02-24 2022-08-16 Baidu Online Network Technology (Beijing) Co., Ltd. Method and apparatus for mining offline resources
CN107451161A (en) * 2016-06-01 2017-12-08 阿里巴巴集团控股有限公司 Show method for pushing, device and the platform of object
CN106294815A (en) * 2016-08-16 2017-01-04 晶赞广告(上海)有限公司 The clustering method of a kind of URL and device
CN106372204A (en) * 2016-08-31 2017-02-01 北京小米移动软件有限公司 Push message processing method and device
CN108241699A (en) * 2016-12-26 2018-07-03 百度在线网络技术(北京)有限公司 For the method and apparatus of pushed information
CN106777283A (en) * 2016-12-29 2017-05-31 北京奇虎科技有限公司 The method for digging and device of a kind of synonym
CN108363707A (en) * 2017-01-26 2018-08-03 百度在线网络技术(北京)有限公司 Method and apparatus for generating webpage
CN106777403A (en) * 2017-03-28 2017-05-31 百度在线网络技术(北京)有限公司 Information-pushing method and device
CN106777403B (en) * 2017-03-28 2020-07-28 百度在线网络技术(北京)有限公司 Information pushing method and device
CN107196999A (en) * 2017-05-03 2017-09-22 网易传媒科技(北京)有限公司 Method and apparatus for issuing information flow propelling data
CN107196999B (en) * 2017-05-03 2020-01-24 网易传媒科技(北京)有限公司 Method and equipment for transmitting information flow push data
CN107172151B (en) * 2017-05-18 2020-08-07 百度在线网络技术(北京)有限公司 Method and device for pushing information
CN107172151A (en) * 2017-05-18 2017-09-15 百度在线网络技术(北京)有限公司 Method and apparatus for pushed information
CN107463552A (en) * 2017-07-20 2017-12-12 北京奇艺世纪科技有限公司 A kind of method and apparatus for generating video subject title
CN108304377B (en) * 2017-12-28 2021-08-06 东软集团股份有限公司 Extraction method of long-tail words and related device
CN108304377A (en) * 2017-12-28 2018-07-20 东软集团股份有限公司 A kind of extracting method and relevant apparatus of long-tail word
CN108416019A (en) * 2018-03-06 2018-08-17 王海泉 Conjunctive word method of adjustment and adjustment system
CN108846028A (en) * 2018-05-24 2018-11-20 网易传媒科技(北京)有限公司 Article put-on method, medium, device and calculating equipment
CN109189908A (en) * 2018-08-22 2019-01-11 重庆市智权之路科技有限公司 Mass data extracts push working method
CN109345307A (en) * 2018-09-28 2019-02-15 西安Tcl软件开发有限公司 Advertisement sending method, system, terminal and computer readable storage medium
CN109582863B (en) * 2018-11-19 2020-08-04 珠海格力电器股份有限公司 Recommendation method and server
CN109582863A (en) * 2018-11-19 2019-04-05 珠海格力电器股份有限公司 A kind of recommended method and server
CN110309395A (en) * 2019-07-05 2019-10-08 云南电网有限责任公司电力科学研究院 A kind of professional dictionary construction method based on data acquisition technology
CN110888986A (en) * 2019-12-06 2020-03-17 北京明略软件系统有限公司 Information pushing method and device, electronic equipment and computer readable storage medium
CN111008340A (en) * 2019-12-19 2020-04-14 中国联合网络通信集团有限公司 Course recommendation method, device and storage medium
CN111523027A (en) * 2020-04-16 2020-08-11 武汉有牛科技有限公司 Automatic data news writing robot based on block chain technology
CN111523027B (en) * 2020-04-16 2023-08-01 武汉有牛科技有限公司 Automatic data news writing robot based on blockchain technology
CN113420550A (en) * 2021-06-30 2021-09-21 中国农业银行股份有限公司 Method and device for extracting keywords
CN113420550B (en) * 2021-06-30 2024-03-01 中国农业银行股份有限公司 Keyword extraction method and device
CN116340639A (en) * 2023-03-31 2023-06-27 北京百度网讯科技有限公司 News recall method, device, equipment and storage medium
CN116340639B (en) * 2023-03-31 2023-12-12 北京百度网讯科技有限公司 News recall method, device, equipment and storage medium

Also Published As

Publication number Publication date
WO2017020451A1 (en) 2017-02-09
CN105069102B (en) 2017-05-24

Similar Documents

Publication Publication Date Title
CN105069102A (en) Information push method and apparatus
CN106649818B (en) Application search intention identification method and device, application search method and server
US8990241B2 (en) System and method for recommending queries related to trending topics based on a received query
CN109885773B (en) Personalized article recommendation method, system, medium and equipment
JP5160601B2 (en) System, method and apparatus for phrase mining based on relative frequency
JP5886733B2 (en) Video group reconstruction / summarization apparatus, video group reconstruction / summarization method, and video group reconstruction / summarization program
CN103023714B (en) The liveness of topic Network Based and cluster topology analytical system and method
US20090319449A1 (en) Providing context for web articles
CN103838756A (en) Method and device for determining pushed information
CN102646132B (en) Method and device for recognizing attributes of broadband users
JP6394388B2 (en) Synonym relation determination device, synonym relation determination method, and program thereof
CN102119385A (en) Method and subsystem for searching media content within a content-search-service system
CN105426514A (en) Personalized mobile APP recommendation method
CN104077417A (en) Figure tag recommendation method and system in social network
CN108090178B (en) Text data analysis method, text data analysis device, server and storage medium
CN103678412A (en) Document retrieval method and device
CN104217038A (en) Knowledge network building method for financial news
WO2014000130A1 (en) Method or system for automated extraction of hyper-local events from one or more web pages
CN113742592A (en) Public opinion information pushing method, device, equipment and storage medium
JP6047365B2 (en) SEARCH DEVICE, SEARCH PROGRAM, AND SEARCH METHOD
Marujo et al. Hourly traffic prediction of news stories
CN110909247B (en) Text information pushing method, electronic equipment and computer storage medium
Kavila et al. An automatic legal document summarization and search using hybrid system
CN111859079B (en) Information searching method, device, computer equipment and storage medium
Shah et al. An automatic text summarization on Naive Bayes classifier using latent semantic analysis

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant