CN103491205A - Related resource address push method and device based on video retrieval - Google Patents

Related resource address push method and device based on video retrieval Download PDF

Info

Publication number
CN103491205A
CN103491205A CN201310462461.6A CN201310462461A CN103491205A CN 103491205 A CN103491205 A CN 103491205A CN 201310462461 A CN201310462461 A CN 201310462461A CN 103491205 A CN103491205 A CN 103491205A
Authority
CN
China
Prior art keywords
participle
resource data
video resource
participles
video
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201310462461.6A
Other languages
Chinese (zh)
Other versions
CN103491205B (en
Inventor
崔代超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Qihoo Technology Co Ltd
Original Assignee
Beijing Qihoo Technology Co Ltd
Qizhi Software Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qihoo Technology Co Ltd, Qizhi Software Beijing Co Ltd filed Critical Beijing Qihoo Technology Co Ltd
Priority to CN201310462461.6A priority Critical patent/CN103491205B/en
Publication of CN103491205A publication Critical patent/CN103491205A/en
Priority to PCT/CN2014/086519 priority patent/WO2015043389A1/en
Application granted granted Critical
Publication of CN103491205B publication Critical patent/CN103491205B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a related resource address push method and device based on video retrieval. The related resource address push method and device based on the video retrieval comprises the steps of obtaining the characteristic text information of first video resource data when the loading or playing requests of the first video resource data are received, mapping the characteristic text information as one or more first participles, searching related second participles having the co-occurrence rate with the one or more first participles higher than a preset threshold value, wherein the co-occurrence rate is the possibility of the current one or more first participles and the second participles emerge together in identical video resource data, obtaining the network chained addresses of the second video resource data matched with the one or more fist participles and the related second participles, and pushing the network chained addresses of the second video resource data. The related resource address push method and device based on the video retrieval achieves the purpose of delving resources of good quality in a video database deeply, and improves delving efficiency of the resources. In addition, an index table can be enlarged continuously along with the accumulation of the video content of the internet, and the fact that a recall rate is facilitated is enlarged.

Description

A kind of method for pushing and device of the correlated resources address based on video search
Technical field
The present invention relates to the technical field of the Internet, be specifically related to a kind of method for pushing of the correlated resources address based on video search and a kind of pusher of the correlated resources address based on video search.
Background technology
Video search engine is a kind of vertical search technology that is different from comprehensive search.Video search engine captures the result of the video class in the Internet and sets up index, because it can provide pure video class result to the searchers, thereby can greatly save the time that the netizen finds video.
According to the relevant statistics of video search, show, the video of the types such as amusement, game, video display, news, animation is user's main object search.This shows that the user has the character of general demand for video search itself.The user is often without very strong purpose, and Search Results is " that " not, but with certain autgmentability, as long as in the category that target is liked the user.Therefore, tend to, outside Search Results, the user is carried out to associated recommendation be.
But, existing video search engine aspect associated recommendation, do also have not enough: the partial video search engine does not have associated recommendation, the video search engine that associated recommendation arranged just according to user's search history data, obtain the plain mode such as associated system by manual sorting and realize recommending.This commending system is based on the existing search custom of user, and recall rate is lower, because user's hunting zone generally can be more much smaller than the scope of resource in existing the Internet, can not fully excavate the high-quality video in the Internet in addition.
Another kind of search recommend method is to rely on manual sorting go out the associated system of a resource or obtain such system from other knowledge hierarchies, is applied in commending system.For example, in certain search engine search " the square dance " time, can obtain the recommendation word of " social dancing ", " belly dance ", " fitness exercise " etc., can obtain the recommendation word of " passing through live wire ", " World of Warcraft " etc. during search " dota ", but this system recall rate is lower, in the search of long-tail, generally can not provide recommendation.
Summary of the invention
In view of the above problems, the present invention has been proposed in order to provide a kind of method for pushing of a kind of correlated resources address based on video search that overcomes the problems referred to above or address the above problem at least in part and the pusher of corresponding a kind of correlated resources address based on video search.
According to one aspect of the present invention, a kind of method for pushing of the correlated resources address based on video search is provided, comprising:
When the loading that receives the first video resource data or playing request, obtain the feature Ben Wenben information of described the first video resource data;
Described feature Ben Wenben information is mapped as to one or more first participles;
Search and the co-occurrence rate of described one or more first participles associated the second participle higher than predetermined threshold value; Described co-occurrence rate is current one or more first participle and the second participle common probability occurred in same video resource data;
Obtain the network linking address of the second video resource data of mating with described one or more first participles and described associated the second participle;
Push the network linking address of described the second video resource data.
Alternatively, when described loading when receiving the first video resource data or playing request, the step of obtaining the feature Ben Wenben information of described the first video resource data comprises:
When receiving the playing request of the first video data, receive the feature Ben Wenben information of the described first video resource data of current terminal transmission;
Perhaps,
When receiving the first video data load request, extract the feature Ben Wenben information of the preset described video resource data in this locality.
Alternatively, the described step that described feature Ben Wenben information is mapped as to one or more first participles comprises:
Extract the participle that described feature Ben Wenben information is shone upon;
Perhaps,
When the feature Ben Wenben information received is compound word, described feature Ben Wenben information is split as to the sub-word of a plurality of search; Extract a plurality of participles that the sub-word of described a plurality of search shines upon.
Alternatively, described searching with the co-occurrence rate of the described one or more first participles step higher than associated second participle of predetermined threshold value comprises:
When described feature Ben Wenben information is mapped as a first participle, extract the preset concordance list that the described first participle is corresponding; Wherein, described concordance list comprises the information of the video resource data under the described first participle, and, all participles in described video resource data; All participles in described video resource data, for by capturing the video resource data, extract the feature text message of described video resource data, and described feature text message is carried out to the participle generation;
Calculate the co-occurrence rate of each the second participle in the described first participle and described concordance list, the ratio of the information sum of video resource data in the number of times that described co-occurrence rate is each second participle appearance in described concordance list and described concordance list; Wherein, described the second participle is the participle except the described first participle in all participles in described video resource data;
Extract described co-occurrence rate higher than the second participle of predetermined threshold value as associated the second participle.
Alternatively, described searching with the co-occurrence rate of the described one or more first participles step higher than associated second participle of predetermined threshold value comprises:
When described feature Ben Wenben information is mapped as a plurality of first participle, extract respectively a plurality of preset concordance list that described a plurality of first participle is corresponding; Each concordance list comprises the information of the video resource data under the described first participle, and, all participles in described video resource data; All participles in described video resource data, for by capturing the video resource data, extract the feature text message of described video resource data, and described feature text message is carried out to the participle generation;
Extract with common the second participle occurred of described a plurality of first participles as candidate's participle; Wherein, described the second participle is the participle except the described first participle in all participles in described video resource data;
Calculate respectively the co-occurrence rate of the described first participle and described candidate's participle in each concordance list, the ratio of the information sum of video resource data in the number of times that described co-occurrence rate is candidate's participle appearance in described concordance list and described concordance list;
Be respectively a plurality of weights that described a plurality of first participle is corresponding with the co-occurrence rate configuration of described candidate's participle;
Calculate respectively a plurality of mean value that has configured the co-occurrence rate of weight, as the co-occurrence rate of described a plurality of first participles and described candidate's participle;
Extract described co-occurrence rate higher than candidate's participle of predetermined threshold value as associated the second participle.
Alternatively, described searching with the co-occurrence rate of the described one or more first participles step higher than associated second participle of predetermined threshold value comprises:
When described feature Ben Wenben information is mapped as a plurality of first participle, extract respectively a plurality of preset concordance list that described a plurality of first participle is corresponding; Wherein, each concordance list comprises the information of the video resource data under the described first participle, and, all participles in described video resource data; All participles in described video resource data, for by capturing the video resource data, extract the feature text message of described video resource data, and described feature text message is carried out to the participle generation;
Adopt described a plurality of concordance list to determine main participle, the maximum first participle corresponding to concordance list of information sum that described main participle is the video resource data;
Calculate the co-occurrence rate of each the second participle in the concordance list that described main participle is corresponding with it, the ratio of the information sum of video resource data in the number of times that described co-occurrence rate is each second participle appearance in described concordance list and described concordance list; Wherein, described the second participle is the participle except the described first participle in all participles in described video resource data;
Extract described co-occurrence rate higher than the second participle of predetermined threshold value as associated the second participle.
Alternatively, described feature text message comprises video title, video keyword and/or video presentation.
Alternatively, the described step of obtaining the network linking address of the second video resource data of mating with described one or more first participles and described associated the second participle comprises:
Obtain described main participle and described associated the second participle the network linking address of the second video resource data.
According to a further aspect in the invention, provide a kind of pusher of the correlated resources address based on video search, having comprised:
Feature text message acquisition module, be suitable for, when the loading that receives the first video resource data or playing request, obtaining the feature Ben Wenben information of described the first video resource data;
First participle mapping block, be suitable for described feature Ben Wenben information is mapped as to one or more first participles;
Module searched in the second participle, is suitable for searching and the co-occurrence rate of described one or more first participles associated the second participle higher than predetermined threshold value; Described co-occurrence rate is current one or more first participle and the second participle common probability occurred in same video resource data;
Network link address acquisition module, the network linking address that is suitable for obtaining the second video resource data of mating with described one or more first participles and described associated the second participle;
Network link address pushing module, be suitable for pushing the network linking address of described the second video resource data.
Alternatively, described feature text message acquisition module also is suitable for:
When receiving the playing request of the first video data, receive the feature Ben Wenben information of the described first video resource data of current terminal transmission;
Perhaps,
When receiving the first video data load request, extract the feature Ben Wenben information of the preset described video resource data in this locality.
Alternatively, described first participle mapping block also is suitable for:
Extract the participle that described feature Ben Wenben information is shone upon;
Perhaps,
When the feature Ben Wenben information received is compound word, described feature Ben Wenben information is split as to the sub-word of a plurality of search; Extract a plurality of participles that the sub-word of described a plurality of search shines upon.
Alternatively, described the second participle is searched module and also is suitable for:
When described feature Ben Wenben information is mapped as a first participle, extract the preset concordance list that the described first participle is corresponding; Wherein, described concordance list comprises the information of the video resource data under the described first participle, and, all participles in described video resource data; All participles in described video resource data, for by capturing the video resource data, extract the feature text message of described video resource data, and described feature text message is carried out to the participle generation;
Calculate the co-occurrence rate of each the second participle in the described first participle and described concordance list, the ratio of the information sum of video resource data in the number of times that described co-occurrence rate is each second participle appearance in described concordance list and described concordance list; Wherein, described the second participle is the participle except the described first participle in all participles in described video resource data;
Extract described co-occurrence rate higher than the second participle of predetermined threshold value as associated the second participle.
Alternatively, described the second participle is searched module and also is suitable for:
When described feature Ben Wenben information is mapped as a plurality of first participle, extract respectively a plurality of preset concordance list that described a plurality of first participle is corresponding; Each concordance list comprises the information of the video resource data under the described first participle, and, all participles in described video resource data; All participles in described video resource data, for by capturing the video resource data, extract the feature text message of described video resource data, and described feature text message is carried out to the participle generation;
Extract with common the second participle occurred of described a plurality of first participles as candidate's participle; Wherein, described the second participle is the participle except the described first participle in all participles in described video resource data;
Calculate respectively the co-occurrence rate of the described first participle and described candidate's participle in each concordance list, the ratio of the information sum of video resource data in the number of times that described co-occurrence rate is candidate's participle appearance in described concordance list and described concordance list;
Be respectively a plurality of weights that described a plurality of first participle is corresponding with the co-occurrence rate configuration of described candidate's participle;
Calculate respectively a plurality of mean value that has configured the co-occurrence rate of weight, as the co-occurrence rate of described a plurality of first participles and described candidate's participle;
Extract described co-occurrence rate higher than candidate's participle of predetermined threshold value as associated the second participle.
Alternatively, described the second participle is searched module and also is suitable for:
When described feature Ben Wenben information is mapped as a plurality of first participle, extract respectively a plurality of preset concordance list that described a plurality of first participle is corresponding; Wherein, each concordance list comprises the information of the video resource data under the described first participle, and, all participles in described video resource data; All participles in described video resource data, for by capturing the video resource data, extract the feature text message of described video resource data, and described feature text message is carried out to the participle generation;
Adopt described a plurality of concordance list to determine main participle, the maximum first participle corresponding to concordance list of information sum that described main participle is the video resource data;
Calculate the co-occurrence rate of each the second participle in the concordance list that described main participle is corresponding with it, the ratio of the information sum of video resource data in the number of times that described co-occurrence rate is each second participle appearance in described concordance list and described concordance list; Wherein, described the second participle is the participle except the described first participle in all participles in described video resource data;
Extract described co-occurrence rate higher than the second participle of predetermined threshold value as associated the second participle.
Alternatively, described feature text message comprises video title, video keyword and/or video presentation.
Alternatively, described network link address acquisition module also is suitable for:
Obtain the network linking address of the second video resource data of described main participle and described associated the second participle.
The present invention can be according to existing content distributed the propelling movement, make search engine break away from the dependence to the user search custom, though by fewer user search arranged video resource data-pushing that video library gathers existing more related resource out, thereby realize the high-quality resource in degree of depth excavation video library, improved the efficiency of excavating resource; In addition, concordance list can constantly enlarge along with the continuous accumulation of internet video content, and the word number that the content quantity that each large video station is produced and range can have been searched for considerably beyond the user, be conducive to enlarge recall rate.
The network link address of the second video resource data of the coupling of the present invention by obtaining the first participle and the second participle, the user can directly carry out obtaining of video data resource based on this address, make user's simple search can obtain more result, without repeatedly submitting search to, thereby alleviated the burden of access services device, reduce taking of Internet resources, and promoted user's experience.
Above-mentioned explanation is only the general introduction of technical solution of the present invention, in order to better understand technological means of the present invention, and can be implemented according to the content of specification, and for above and other objects of the present invention, feature and advantage can be become apparent, below especially exemplified by the specific embodiment of the present invention.
The accompanying drawing explanation
By reading hereinafter detailed description of the preferred embodiment, various other advantage and benefits will become cheer and bright for those of ordinary skills.Accompanying drawing is only for the purpose of preferred implementation is shown, and do not think limitation of the present invention.And, in whole accompanying drawing, by identical reference symbol, mean identical parts.In the accompanying drawings:
Fig. 1 shows the flow chart of steps of a kind of according to an embodiment of the invention method for pushing embodiment of the correlated resources address based on video search; And
Fig. 2 shows the structured flowchart of a kind of according to an embodiment of the invention pusher embodiment of the correlated resources address based on video search.
Embodiment
Exemplary embodiment of the present disclosure is described below with reference to accompanying drawings in more detail.Although shown exemplary embodiment of the present disclosure in accompanying drawing, yet should be appreciated that and can realize the disclosure and the embodiment that should do not set forth limits here with various forms.On the contrary, it is in order more thoroughly to understand the disclosure that these embodiment are provided, and can be by the scope of the present disclosure complete conveys to those skilled in the art.
With reference to Fig. 1, show the flow chart of steps of a kind of according to an embodiment of the invention propelling movement embodiment of the correlated resources address based on video search, specifically can comprise the steps:
Step 101, when the loading that receives the first video resource data or playing request, obtain the feature Ben Wenben information of described the first video resource data;
It should be noted that, the first video resource data can be positioned on terminal equipment, also can be positioned on network, and feature Ben Wenben information can be the entrained information of video resource data.
In a preferred embodiment of the present invention, described step 101 specifically can comprise following sub-step:
Sub-step S11, when receiving the playing request of the first video data, receive the feature Ben Wenben information of the described first video resource data of current terminal transmission;
When the first video resource data are positioned on terminal equipment, can be extracted the feature text message of the first video resource data by terminal equipment, then upload to corresponding server side.
Perhaps,
Sub-step S12, when receiving the first video data load request, extract the feature Ben Wenben information of the preset described video resource data in this locality.
When the first video resource data are positioned on network, can be extracted by server side the feature text message of the first video resource data.
In a preferred embodiment of the present invention, described feature text message can comprise video title, video keyword and/or video presentation.
For example, in the video resource data of a section by name " after [clapping the visitor] Dongguan heavy rain, become Venice, over thousands of cast anchor-online broadcasting-XX net of car water logging, the video high definition is watched online ", its feature text message can be as follows:
Video title (Title): become Venice after [clapping the visitor] Dongguan heavy rain, over thousands of cast anchor-online broadcasting-XX net of car water logging, the video high definition is watched online;
Video keyword (Keywords): YY reporter's living information Dongguan water logging;
Video presentation (Description): the heavy rain of yesterday morning allows the neighbour of some areas, Dongguan feel moment as having come Venice.The dolly travelled suffers that in heavy rain water logging casts anchor, and in some neighbour families, is also a vast expanse of water.
In actual applications, feature Ben Wenben information can be word, comprises independently word of a semanteme, for example the mid-autumn, the Dragon Boat Festival, National Day etc.; Feature Ben Wenben information can be also compound word, comprises independently word of two or more semantemes, for example moon cake for the Mid-autumn Festival, Dragon Boat Festival pyramid-shaped dumpling, Tibet tourism on National Day etc.Generally speaking, the video resource data in terminal equipment often only have video title (Title), movie name such as " iron and steel is chivalrous ", " Spider-Man "; Video resource data in network often comprise one or more in video title (Title), video keyword (Keywords) and video presentation (Description).
Step 102, be mapped as one or more first participles by described feature Ben Wenben information;
It should be noted that, mapped participle can set in advance, can be for calculating the co-occurrence rate between different participles.
The rule of mapping can be also set in advance one or more, can comprise the word without practical significance such as the dirty word of removing the video search character, qualifier, auxiliary words of mood, wide in range word; Can comprise the setting stop-word, i.e. some common words, the standard stopped when splitting phrase, for example, I, you etc.; The correspondence that can also comprise incidence relation, correspond to a kind of expression by the multiple expression of same thing, such as August 15, the Mid-autumn Festival, moon cake joint etc. are associated as to the mid-autumn; Can also comprise other mapping rulers, the embodiment of the present invention is not limited this.
English be take word as unit, between word and word, be to separate by space, and Chinese is to take word as unit, and in sentence, all words link up and could describe a meaning.For example, english sentence I am a student with Chinese is: " I am a student ".Computer can very simply know that by space student is a word, but can not be readily understood that " ", " life " two words just mean a word altogether.Chinese Chinese character sequence is cut into to significant word, is exactly Chinese word segmentation.For example, I am a student, and the result of participle is: I, be, one, student.
Below introduce some segmenting methods commonly used:
1, the segmenting method based on string matching: refer to according to certain strategy the entry in Chinese character string to be analyzed and preset machine dictionary is mated, if find certain character string in dictionary, the match is successful (identifying a word).The actual Words partition system used, be all mechanical Chinese word segmentation as minute means at the beginning of a kind of, also need by utilizing various other language messages further to improve the accuracy rate of cutting.
2, the segmenting method based on mark scanning or sign cutting: refer to preferential some words with obvious characteristic of identifying and be syncopated as in character string to be analyzed, using these words as breakpoint, former character string can be divided into to less string and advance again mechanical Chinese word segmentation, thereby reduce the error rate of mating; Perhaps participle and part-of-speech tagging are combined, utilize abundant grammatical category information to participle decision-making offer help, and conversely word segmentation result is tested, is adjusted again in the mark process, thereby improve the accuracy rate of cutting.
3, the segmenting method based on understanding: refer to by allowing the understanding of anthropomorphic distich of computer mould, reach the effect of identification word.Its basic thought is exactly to carry out syntax, semantic analysis in participle, utilizes syntactic information and semantic information to process Ambiguity.It generally includes three parts: participle subsystem, syntactic-semantic subsystem, master control part.Under the coordination of master control part, syntax and semantic information that the participle subsystem can obtain relevant word, sentence etc. judged segmentation ambiguity, and it has simulated the understanding process of people to sentence.This segmenting method need to be used a large amount of linguistries and information.
4, the segmenting method based on statistics: refer to, the confidence level that in Chinese information, due to word, with frequency or the probability of the adjacent co-occurrence of word, can reflect into word preferably, so can be added up the frequency of the combination of each word of adjacent co-occurrence in language material, calculate their information that appears alternatively, and the adjacent co-occurrence probabilities that calculate two Chinese character X, Y.The information of appearing alternatively can embody the tightness degree of marriage relation between Chinese character.During higher than some threshold values, just can think that this word group may form a word when tightness degree.This method only need be added up the word group frequency in language material, does not need the cutting dictionary.
In a preferred embodiment of the present invention, described step 102 specifically can comprise following sub-step:
Sub-step S21, extract the participle that described feature Ben Wenben information is shone upon;
The situation that is word for feature Ben Wenben information, can directly extract its corresponding participle according to default mapping ruler.For example, feature Ben Wenben information is " Mid-autumn Festival ", " my Mid-autumn Festival " or " Mid-autumn Festival " etc., and the first participle of mapping can be " mid-autumn ".Certainly, feature Ben Wenben information can be also same word with the first participle of its mapping, and for example feature Ben Wenben information is " mid-autumn ", and the first participle of mapping also can " mid-autumn ".
Perhaps,
Sub-step S22, when the feature Ben Wenben information received is compound word, be split as the sub-word of a plurality of search by described feature Ben Wenben information;
Sub-step S23, extract a plurality of participles that the sub-word of described a plurality of search shines upon.
The situation that is compound word for feature Ben Wenben information, can carry out participle according to default mapping ruler, obtains searching for sub-word, then extracts respectively participle corresponding to the sub-word of search.For example, the feature Ben Wenben information received is " moon cake in the Mid-autumn Festival ", it can be split as to " Mid-autumn Festival " and " moon cake " two sub-words of search, then will be mapped as " mid-autumn " in " Mid-autumn Festival ", " moon cake " is mapped as to " moon cake ", obtains " mid-autumn " and " moon cake " two first participles.
Step 103, search and the co-occurrence rate of described one or more first participles associated the second participle higher than predetermined threshold value;
Described co-occurrence rate is current one or more first participle and the second participle common probability occurred in same video resource data;
Particularly, the co-occurrence rate can be current one or more participles and the second participle common probability occurred in the feature text message of same video resource data, the co-occurrence rate that specifically can comprise a first participle and the second participle, the co-occurrence rate of a plurality of participles and the second participle.
It should be noted that, the second participle can be in all default participle, the participle except the first participle.Associated the second participle can be and the co-occurrence rate of the first participle the second participle higher than predetermined threshold value.
In actual applications, the video resource data can comprise the feature text message, and this feature text message can be for putting down in writing the relevant information of these video resource data, also can be for extracting participle.
In a preferred embodiment of the present invention, described step 103 specifically can comprise following sub-step:
Sub-step S31, when described feature Ben Wenben information is mapped as a first participle, extract the preset concordance list that the described first participle is corresponding; Wherein, described concordance list comprises the information of the video resource data under the described first participle, and, all participles in described video resource data; All participles in described video resource data, for by capturing the video resource data, extract the feature text message of described video resource data, and described feature text message is carried out to the participle generation;
In specific implementation, can adopt in advance search engine to pass through the video resource data on each website platform of crawler capturing, then set up index database: the feature text message that extracts the video resource data carries out word segmentation processing, and set up the concordance list that each participle is corresponding, information that can the store video resource data in this concordance list (can be ID, internal address, outer net address etc. video labeling, can be also a record formed by current participle and other participles), all participles in the video resource data (comprising the first participle and the second participle except the first participle).
In a preferred embodiment of the present invention, described feature text message can comprise video title, video keyword and/or video presentation.
For example, the concordance list in " mid-autumn " can be as follows:
Figure BDA0000391658320000121
Wherein, the first participle is " mid-autumn ", and the information of video resource data comprises video labeling.Certainly, the information of video resource data also can not comprise video labeling, and the record that only has the first participle to become with the second participle (being that second participle of every a line is as a record).
Certainly, above-mentioned concordance list, just as example, when implementing the embodiment of the present invention, can arrange other concordance lists according to actual conditions, and the embodiment of the present invention is not limited this.In addition, except above-mentioned concordance list, those skilled in the art can also adopt other concordance lists according to actual needs, and the embodiment of the present invention is not limited this yet.
It should be noted that, can the cycle or not timing capture the video resource data on each platform, then upgrade index and build storehouse, upgrade each concordance list.
Sub-step S32, calculate the co-occurrence rate of each the second participle in the described first participle and described concordance list, the ratio that described co-occurrence rate is the information sum of video resource data in number of times that in described concordance list, each second participle occurs and described concordance list; Wherein, described the second participle is the participle except the described first participle in all participles in described video resource data;
The number of times occurred due to each second participle in concordance list is the same with the quantity of video data data under it, and the co-occurrence rate also can be expressed as the ratio of the information sum of video resource data in the quantity of the affiliated video data data of each second participle in described concordance list and described concordance list.
For example, always have the information of 100 video resource data in the concordance list of participle " square dance ", always have the information of 200 video resource data in the concordance list of participle " Soldiers Brother ", " square dance " and " Soldiers Brother " appear at totally 10 of the information of the video resource data in these two concordance lists simultaneously, for " square dance ", " square dance " is 10/100=10% with the co-occurrence rate of " Soldiers Brother ", and, for " Soldiers Brother ", the co-occurrence rate of " Soldiers Brother ”Yu“ square dance " is 10/200=5%.
Sub-step S33, extract described co-occurrence rate higher than the second participle of predetermined threshold value as associated the second participle.
In specific implementation, predetermined threshold value can be set according to actual conditions by those skilled in the art, and the embodiment of the present invention is not limited this.Association the second participle extracted in the embodiment of the present invention can be sky, also can be for one or more.
In a preferred embodiment of the present invention, described step 103 specifically can comprise following sub-step:
Sub-step S41, when described feature Ben Wenben information is mapped as a plurality of first participle, extract respectively a plurality of preset concordance list that described a plurality of first participle is corresponding; Each concordance list comprises the information of the video resource data under the described first participle, and, all participles in described video resource data; All participles in described video resource data, for by capturing the video resource data, extract the feature text message of described video resource data, and described feature text message is carried out to the participle generation;
In specific implementation, can adopt in advance search engine to pass through the video resource data on each platform of crawler capturing, then set up index and build storehouse: the feature text message that extracts the video resource data carries out word segmentation processing, and set up the concordance list that each participle is corresponding, information that can the store video resource data in this concordance list (can be ID, internal address, outer net address etc. video labeling, can be also a record formed by current participle and other participles), all participles in the video resource data (comprising the first participle and the second participle except the first participle).
In a preferred embodiment of the present invention, described feature text message can comprise video title, video keyword and/or video presentation.
Sub-step S42, extract with common the second participle occurred of described a plurality of first participles as candidate's participle; Wherein, described the second participle is the participle except the described first participle in all participles in described video resource data;
Particularly, current have a plurality of first participles, and concordance list corresponding to a plurality of quantity arranged, and candidate's participle need to occur in each concordance list, and candidate's participle all occurs jointly with current each first participle respectively in same concordance list.
Sub-step S43 calculates respectively the co-occurrence rate of the described first participle and described candidate's participle, the ratio of the information sum of video resource data in the number of times that described co-occurrence rate is candidate's participle appearance in described concordance list and described concordance list in each concordance list;
For example, feature Ben Wenben information " moon cake in the Mid-autumn Festival " can be mapped as to the first participle " mid-autumn " and " moon cake ", extract one of them candidate's participle for " moon ", can calculate respectively co-occurrence rate (being assumed to be 70%), " moon cake " and " moon " co-occurrence rate (being assumed to be 60%) in " mid-autumn " and " moon ".
Sub-step S44, be respectively a plurality of weights that described a plurality of first participle is corresponding with the co-occurrence rate configuration of described candidate's participle;
Weight can be according to being determined of the information of video resource data in the concordance list between each first participle sum ratio, wherein, in concordance list, more its weights of the information sum of video resource data are larger.For example, in the concordance list in " mid-autumn ", the information of video resource data adds up to 900, and the information of video resource data adds up to 100 in the concordance list of " moon cake ", the weight of the co-occurrence rate in " mid-autumn " and " moon " can be 0.9, and the weight of " moon cake " and " moon " co-occurrence rate can be 0.1.
Certainly, above-mentioned weight is just as example, when implementing the embodiment of the present invention, can other weights be set according to actual conditions, such as corresponding weight being set according to current social focus (news rank, microblogging rank etc.), according to the local of user and/or online operation behavior (video playback, news reading etc.), corresponding weight etc. being set, the embodiment of the present invention is not limited this.In addition, except above-mentioned weight, those skilled in the art can also adopt other weights according to actual needs, and the embodiment of the present invention is not limited this yet.
Sub-step S45, calculate respectively a plurality of mean value that has configured the co-occurrence rate of weight, as the co-occurrence rate of described a plurality of first participles and described candidate's participle;
In the embodiment of the present invention, can using the weighted average of a plurality of co-occurrence rates as final co-occurrence rate.
For example, the mid-autumn ", the co-occurrence rate of " moon cake " and " moon " can be (70%*0.9+60%*0.1)/2=34.5%.
Sub-step S46, extract described co-occurrence rate higher than candidate's participle of predetermined threshold value as associated the second participle.
In specific implementation, predetermined threshold value can be set according to actual conditions by those skilled in the art, and the embodiment of the present invention is not limited this.Association the second participle extracted in the embodiment of the present invention can be sky, also can be for one or more.
In a preferred embodiment of the present invention, described step 103 specifically can comprise following sub-step:
Sub-step S51, when described feature Ben Wenben information is mapped as a plurality of first participle, extract respectively a plurality of preset concordance list that described a plurality of first participle is corresponding; Wherein, each concordance list comprises the information of the video resource data under the described first participle, and, all participles in described video resource data; All participles in described video resource data, for by capturing the video resource data, extract the feature text message of described video resource data, and described feature text message is carried out to the participle generation;
In specific implementation, can adopt in advance search engine to pass through the video resource data on each platform of crawler capturing, then set up index and build storehouse: the feature text message that extracts the video resource data carries out word segmentation processing, and set up the concordance list that each participle is corresponding, information that can the store video resource data in this concordance list (can be ID, internal address, outer net address etc. video labeling, can be also a record formed by current participle and other participles), all participles in the video resource data (comprising the first participle and the second participle except the first participle).
In a preferred embodiment of the present invention, described feature text message can comprise video title, video keyword and/or video presentation.
Sub-step S52, adopt described a plurality of concordance list to determine main participle, the maximum first participle corresponding to concordance list of information sum that described main participle is the video resource data;
Experience in order to improve the user, for the video resource data, differ more greatly different a plurality of first participles, can ignore the few first participle of informational capacity of video resource data.For example, the first participle " mid-autumn " and " moon cake " that for feature Ben Wenben information " moon cake in the Mid-autumn Festival ", shine upon, in the concordance list in " mid-autumn ", the information of video resource data adds up to 900, and the information of video resource data adds up to 100 in the concordance list of " moon cake ", " mid-autumn " can be set as main participle.
Sub-step S53, calculate the co-occurrence rate of each the second participle in the concordance list that described main participle is corresponding with it, the ratio that described co-occurrence rate is the information sum of video resource data in number of times that in described concordance list, each second participle occurs and described concordance list; Wherein, described the second participle is the participle except the described first participle in all participles in described video resource data;
In the embodiment of the present invention, can using main participle the co-occurrence rate as final co-occurrence rate.
Sub-step S54, extract described co-occurrence rate higher than the second participle of predetermined threshold value as associated the second participle.
In specific implementation, predetermined threshold value can be set according to actual conditions by those skilled in the art, and the embodiment of the present invention is not limited this.Association the second participle extracted in the embodiment of the present invention can be sky, also can be for one or more.
Step 104, the network linking address that obtains the second video resource data of mating with described one or more first participles and described associated the second participle;
Particularly, after sub-step S33, can obtain when the combination of the previous first participle with one or more associated the second participles.For example feature Ben Wenben information is " dota ", the word higher with its co-occurrence rate is: " making laughs ", " egg pain ", " 2009 ", " great waves ", " the first visual angle " and " classics ", the co-occurrence rate is respectively 40%, 35%, 30%, 25%, 20% and 10%, and the combination obtained is followed successively by " dota makes laughs ", " dota egg pain ", " dota2009 ", " dota great waves ", " dota the first visual angle " and " dota classics ".
After sub-step S46, can obtain the combination of current a plurality of first participle and one or more associated the second participles.For example the Soldiers Brother is waved on feature Ben Wenben information Wei“ square ", it is mapped as to the first participle " square dance " and " Soldiers Brother ", extract the second participle simultaneously occurred with these two first participles, the second participle " teaching " for example, it can be used as associated the second participle, and final the acquisition combined " square dance Soldiers Brother teaching ".
In a preferred embodiment of the present invention, step 104 specifically can comprise following sub-step:
Sub-step S61, obtain described main participle and described associated the second participle the network linking address of the second video resource data.
After sub-step S54, can obtain the combination of current main participle and one or more associated the second participles.For example, the first participle " mid-autumn " and " moon cake " that for feature Ben Wenben information " moon cake in the Mid-autumn Festival ", shine upon, can arrange " mid-autumn " as main participle, obtains associated the second participle " moon ", and final the acquisition combined " moon in the mid-autumn ".
In the embodiment of the present invention, the search of the video data resource of can the combination based on one or more first participles and the second participle being mated, when searching, record its network link address, can be specifically internal address, can be also outer net address.
Step 105, push the network linking address of described the second video resource data.
In practical application, the network linking address of the second video resource data can be placed on arbitrary position of current page, also can be pushed by embedding the modes such as icon or button the network linking address that the user can be by triggering the second video resource data and then load described video data resource.
The present invention can be according to existing content distributed the propelling movement, make search engine break away from the dependence to the user search custom, though by fewer user search arranged video resource data-pushing that video library gathers existing more related resource out, thereby realize the high-quality resource in degree of depth excavation video library, improved the efficiency of excavating resource; In addition, concordance list can constantly enlarge along with the continuous accumulation of internet video content, and the word number that the content quantity that each large video station is produced and range can have been searched for considerably beyond the user, be conducive to enlarge recall rate.
The network link address of the second video resource data of the coupling of the present invention by obtaining the first participle and the second participle, the user can directly carry out obtaining of video data resource based on this address, make user's simple search can obtain more result, without repeatedly submitting search to, thereby alleviated the burden of access services device, reduce taking of Internet resources, and promoted user's experience.
For embodiment of the method, for simple description, therefore it all is expressed as to a series of combination of actions, but those skilled in the art should know, the embodiment of the present invention is not subject to the restriction of described sequence of movement, because according to the embodiment of the present invention, some step can adopt other orders or carry out simultaneously.Secondly, those skilled in the art also should know, the embodiment described in specification all belongs to preferred embodiment, and related action might not be that the embodiment of the present invention is necessary.
With reference to Fig. 2, show the structured flowchart of a kind of according to an embodiment of the invention pusher embodiment of the correlated resources address based on video search, specifically can comprise as lower module:
Feature text message acquisition module 201, be suitable for, when the loading that receives the first video resource data or playing request, obtaining the feature Ben Wenben information of described the first video resource data;
First participle mapping block 202, be suitable for described feature Ben Wenben information is mapped as to one or more first participles;
Module 203 searched in the second participle, is suitable for searching and the co-occurrence rate of described one or more first participles associated the second participle higher than predetermined threshold value; Described co-occurrence rate is current one or more first participle and the second participle common probability occurred in same video resource data;
Network link address acquisition module 204, the network linking address that is suitable for obtaining the second video resource data of mating with described one or more first participles and described associated the second participle;
Network link address pushing module 205, be suitable for pushing the network linking address of described the second video resource data.
In a preferred embodiment of the present invention, described feature text message acquisition module can also be suitable for:
When receiving the playing request of the first video data, receive the feature Ben Wenben information of the described first video resource data of current terminal transmission;
Perhaps,
When receiving the first video data load request, extract the feature Ben Wenben information of the preset described video resource data in this locality.
In a preferred embodiment of the present invention, described first participle mapping block can also be suitable for:
Extract the participle that described feature Ben Wenben information is shone upon;
Perhaps,
When the feature Ben Wenben information received is compound word, described feature Ben Wenben information is split as to the sub-word of a plurality of search; Extract a plurality of participles that the sub-word of described a plurality of search shines upon.
In a preferred embodiment of the present invention, described the second participle is searched module and can also be suitable for:
When described feature Ben Wenben information is mapped as a first participle, extract the preset concordance list that the described first participle is corresponding; Wherein, described concordance list comprises the information of the video resource data under the described first participle, and, all participles in described video resource data; All participles in described video resource data, for by capturing the video resource data, extract the feature text message of described video resource data, and described feature text message is carried out to the participle generation;
Calculate the co-occurrence rate of each the second participle in the described first participle and described concordance list, the ratio of the information sum of video resource data in the number of times that described co-occurrence rate is each second participle appearance in described concordance list and described concordance list; Wherein, described the second participle is the participle except the described first participle in all participles in described video resource data;
Extract described co-occurrence rate higher than the second participle of predetermined threshold value as associated the second participle.
In a preferred embodiment of the present invention, described the second participle is searched module and can also be suitable for:
When described feature Ben Wenben information is mapped as a plurality of first participle, extract respectively a plurality of preset concordance list that described a plurality of first participle is corresponding; Each concordance list comprises the information of the video resource data under the described first participle, and, all participles in described video resource data; All participles in described video resource data, for by capturing the video resource data, extract the feature text message of described video resource data, and described feature text message is carried out to the participle generation;
Extract with common the second participle occurred of described a plurality of first participles as candidate's participle; Wherein, described the second participle is the participle except the described first participle in all participles in described video resource data;
Calculate respectively the co-occurrence rate of the described first participle and described candidate's participle in each concordance list, the ratio of the information sum of video resource data in the number of times that described co-occurrence rate is candidate's participle appearance in described concordance list and described concordance list;
Be respectively a plurality of weights that described a plurality of first participle is corresponding with the co-occurrence rate configuration of described candidate's participle;
Calculate respectively a plurality of mean value that has configured the co-occurrence rate of weight, as the co-occurrence rate of described a plurality of first participles and described candidate's participle;
Extract described co-occurrence rate higher than candidate's participle of predetermined threshold value as associated the second participle.
In a preferred embodiment of the present invention, described the second participle is searched module and can also be suitable for:
When described feature Ben Wenben information is mapped as a plurality of first participle, extract respectively a plurality of preset concordance list that described a plurality of first participle is corresponding; Wherein, each concordance list comprises the information of the video resource data under the described first participle, and, all participles in described video resource data; All participles in described video resource data, for by capturing the video resource data, extract the feature text message of described video resource data, and described feature text message is carried out to the participle generation;
Adopt described a plurality of concordance list to determine main participle, the maximum first participle corresponding to concordance list of information sum that described main participle is the video resource data;
Calculate the co-occurrence rate of each the second participle in the concordance list that described main participle is corresponding with it, the ratio of the information sum of video resource data in the number of times that described co-occurrence rate is each second participle appearance in described concordance list and described concordance list; Wherein, described the second participle is the participle except the described first participle in all participles in described video resource data;
Extract described co-occurrence rate higher than the second participle of predetermined threshold value as associated the second participle.
In a preferred embodiment of the present invention, described feature text message can comprise video title, video keyword and/or video presentation.
In a preferred embodiment of the present invention, described network link address acquisition module can also be suitable for:
Obtain the network linking address of the second video resource data of described main participle and described associated the second participle.
For device embodiment, because it is substantially similar to embodiment of the method, so description is fairly simple, relevant part gets final product referring to the part explanation of embodiment of the method.
The algorithm provided at this is intrinsic not relevant to any certain computer, virtual system or miscellaneous equipment with demonstration.Various general-purpose systems also can with based on using together with this teaching.According to top description, it is apparent constructing the desired structure of this type systematic.In addition, the present invention is not also for any certain programmed language.It should be understood that and can utilize various programming languages to realize content of the present invention described here, and the top description that language-specific is done is in order to disclose preferred forms of the present invention.
In the specification that provided herein, a large amount of details have been described.Yet, can understand, embodiments of the invention can be in the situation that do not have these details to put into practice.In some instances, be not shown specifically known method, structure and technology, so that not fuzzy understanding of this description.
Similarly, be to be understood that, in order to simplify the disclosure and to help to understand one or more in each inventive aspect, in the description to exemplary embodiment of the present invention, each feature of the present invention is grouped together in single embodiment, figure or the description to it sometimes in the above.Yet the method for the disclosure should be construed to the following intention of reflection: the present invention for required protection requires the more feature of feature than institute clearly puts down in writing in each claim.Or rather, as following claims are reflected, inventive aspect is to be less than all features of the disclosed single embodiment in front.Therefore, claims of following embodiment are incorporated to this embodiment thus clearly, and wherein each claim itself is as independent embodiment of the present invention.
Those skilled in the art are appreciated that and can adaptively change and they are arranged in one or more equipment different from this embodiment the module in the equipment in embodiment.Can be combined into a module or unit or assembly to the module in embodiment or unit or assembly, and can put them into a plurality of submodules or subelement or sub-component in addition.At least some in such feature and/or process or unit are mutually repelling, and can adopt any combination to disclosed all features in this specification (comprising claim, summary and the accompanying drawing followed) and so all processes or the unit of disclosed any method or equipment are combined.Unless clearly statement in addition, in this specification (comprising claim, summary and the accompanying drawing followed) disclosed each feature can be by providing identical, be equal to or the alternative features of similar purpose replaces.
In addition, those skilled in the art can understand, although embodiment more described herein comprise some feature rather than further feature included in other embodiment, the combination of the feature of different embodiment means within scope of the present invention and forms different embodiment.For example, in the following claims, the one of any of embodiment required for protection can be used with compound mode arbitrarily.
All parts embodiment of the present invention can realize with hardware, or realizes with the software module of moving on one or more processor, or realizes with their combination.It will be understood by those of skill in the art that and can use in practice microprocessor or digital signal processor (DSP) to realize according to some or all some or repertoire of parts in the pushing equipment of the correlated resources address based on video search of the embodiment of the present invention.The present invention for example can also be embodied as, for carrying out part or all equipment or device program (, computer program and computer program) of method as described herein.The program of the present invention that realizes like this can be stored on computer-readable medium, or can have the form of one or more signal.Such signal can be downloaded and obtain from internet website, or provides on carrier signal, or provides with any other form.
It should be noted above-described embodiment the present invention will be described rather than limit the invention, and those skilled in the art can design alternative embodiment in the situation that do not break away from the scope of claims.In the claims, any reference symbol between bracket should be configured to limitations on claims.Word " comprises " not to be got rid of existence and is not listed in element or the step in claim.Being positioned at word " " before element or " one " does not get rid of and has a plurality of such elements.The present invention can realize by means of the hardware that includes some different elements and by means of the computer of suitably programming.In having enumerated the unit claim of some devices, several in these devices can be to carry out imbody by same hardware branch.The use of word first, second and C grade does not mean any order.Can be title by these word explanations.
The invention discloses the method for pushing of A1, a kind of correlated resources address based on video search, comprising:
When the loading that receives the first video resource data or playing request, obtain the feature Ben Wenben information of described the first video resource data;
Described feature Ben Wenben information is mapped as to one or more first participles;
Search and the co-occurrence rate of described one or more first participles associated the second participle higher than predetermined threshold value; Described co-occurrence rate is current one or more first participle and the second participle common probability occurred in same video resource data;
Obtain the network linking address of the second video resource data of mating with described one or more first participles and described associated the second participle;
Push the network linking address of described the second video resource data.
A2, method as described as A1, when described loading when receiving the first video resource data or playing request, the step of obtaining the feature Ben Wenben information of described the first video resource data comprises:
When receiving the playing request of the first video data, receive the feature Ben Wenben information of the described first video resource data of current terminal transmission;
Perhaps,
When receiving the first video data load request, extract the feature Ben Wenben information of the preset described video resource data in this locality.
A3, method as described as A1, the described step that described feature Ben Wenben information is mapped as to one or more first participles comprises:
Extract the participle that described feature Ben Wenben information is shone upon;
Perhaps,
When the feature Ben Wenben information received is compound word, described feature Ben Wenben information is split as to the sub-word of a plurality of search; Extract a plurality of participles that the sub-word of described a plurality of search shines upon.
A4, method as described as A1, described searching with the co-occurrence rate of the described one or more first participles step higher than associated second participle of predetermined threshold value comprises:
When described feature Ben Wenben information is mapped as a first participle, extract the preset concordance list that the described first participle is corresponding; Wherein, described concordance list comprises the information of the video resource data under the described first participle, and, all participles in described video resource data; All participles in described video resource data, for by capturing the video resource data, extract the feature text message of described video resource data, and described feature text message is carried out to the participle generation;
Calculate the co-occurrence rate of each the second participle in the described first participle and described concordance list, the ratio of the information sum of video resource data in the number of times that described co-occurrence rate is each second participle appearance in described concordance list and described concordance list; Wherein, described the second participle is the participle except the described first participle in all participles in described video resource data;
Extract described co-occurrence rate higher than the second participle of predetermined threshold value as associated the second participle.
A5, method as described as A1, described searching with the co-occurrence rate of the described one or more first participles step higher than associated second participle of predetermined threshold value comprises:
When described feature Ben Wenben information is mapped as a plurality of first participle, extract respectively a plurality of preset concordance list that described a plurality of first participle is corresponding; Each concordance list comprises the information of the video resource data under the described first participle, and, all participles in described video resource data; All participles in described video resource data, for by capturing the video resource data, extract the feature text message of described video resource data, and described feature text message is carried out to the participle generation;
Extract with common the second participle occurred of described a plurality of first participles as candidate's participle; Wherein, described the second participle is the participle except the described first participle in all participles in described video resource data;
Calculate respectively the co-occurrence rate of the described first participle and described candidate's participle in each concordance list, the ratio of the information sum of video resource data in the number of times that described co-occurrence rate is candidate's participle appearance in described concordance list and described concordance list;
Be respectively a plurality of weights that described a plurality of first participle is corresponding with the co-occurrence rate configuration of described candidate's participle;
Calculate respectively a plurality of mean value that has configured the co-occurrence rate of weight, as the co-occurrence rate of described a plurality of first participles and described candidate's participle;
Extract described co-occurrence rate higher than candidate's participle of predetermined threshold value as associated the second participle.
A6, method as described as A1, described searching with the co-occurrence rate of the described one or more first participles step higher than associated second participle of predetermined threshold value comprises:
When described feature Ben Wenben information is mapped as a plurality of first participle, extract respectively a plurality of preset concordance list that described a plurality of first participle is corresponding; Wherein, each concordance list comprises the information of the video resource data under the described first participle, and, all participles in described video resource data; All participles in described video resource data, for by capturing the video resource data, extract the feature text message of described video resource data, and described feature text message is carried out to the participle generation;
Adopt described a plurality of concordance list to determine main participle, the maximum first participle corresponding to concordance list of information sum that described main participle is the video resource data;
Calculate the co-occurrence rate of each the second participle in the concordance list that described main participle is corresponding with it, the ratio of the information sum of video resource data in the number of times that described co-occurrence rate is each second participle appearance in described concordance list and described concordance list; Wherein, described the second participle is the participle except the described first participle in all participles in described video resource data;
Extract described co-occurrence rate higher than the second participle of predetermined threshold value as associated the second participle.
A7, as A1 or A4 or A5 or the described method of A6, described feature text message comprises video title, video keyword and/or video presentation.
A8, method as described as A6, the described step of obtaining the network linking address of the second video resource data of mating with described one or more first participles and described associated the second participle comprises:
Obtain described main participle and described associated the second participle the network linking address of the second video resource data.
The invention also discloses the pusher of B9, a kind of correlated resources address based on video search, comprising:
Feature text message acquisition module, be suitable for, when the loading that receives the first video resource data or playing request, obtaining the feature Ben Wenben information of described the first video resource data;
First participle mapping block, be suitable for described feature Ben Wenben information is mapped as to one or more first participles;
Module searched in the second participle, is suitable for searching and the co-occurrence rate of described one or more first participles associated the second participle higher than predetermined threshold value; Described co-occurrence rate is current one or more first participle and the second participle common probability occurred in same video resource data;
Network link address acquisition module, the network linking address that is suitable for obtaining the second video resource data of mating with described one or more first participles and described associated the second participle;
Network link address pushing module, be suitable for pushing the network linking address of described the second video resource data.
B10, device as described as B9, described feature text message acquisition module also is suitable for:
When receiving the playing request of the first video data, receive the feature Ben Wenben information of the described first video resource data of current terminal transmission;
Perhaps,
When receiving the first video data load request, extract the feature Ben Wenben information of the preset described video resource data in this locality.
B11, device as described as B9, described first participle mapping block also is suitable for:
Extract the participle that described feature Ben Wenben information is shone upon;
Perhaps,
When the feature Ben Wenben information received is compound word, described feature Ben Wenben information is split as to the sub-word of a plurality of search; Extract a plurality of participles that the sub-word of described a plurality of search shines upon.
B12, device as described as B9, described the second participle is searched module and also is suitable for:
When described feature Ben Wenben information is mapped as a first participle, extract the preset concordance list that the described first participle is corresponding; Wherein, described concordance list comprises the information of the video resource data under the described first participle, and, all participles in described video resource data; All participles in described video resource data, for by capturing the video resource data, extract the feature text message of described video resource data, and described feature text message is carried out to the participle generation;
Calculate the co-occurrence rate of each the second participle in the described first participle and described concordance list, the ratio of the information sum of video resource data in the number of times that described co-occurrence rate is each second participle appearance in described concordance list and described concordance list; Wherein, described the second participle is the participle except the described first participle in all participles in described video resource data;
Extract described co-occurrence rate higher than the second participle of predetermined threshold value as associated the second participle.
B13, device as described as B9, described the second participle is searched module and also is suitable for:
When described feature Ben Wenben information is mapped as a plurality of first participle, extract respectively a plurality of preset concordance list that described a plurality of first participle is corresponding; Each concordance list comprises the information of the video resource data under the described first participle, and, all participles in described video resource data; All participles in described video resource data, for by capturing the video resource data, extract the feature text message of described video resource data, and described feature text message is carried out to the participle generation;
Extract with common the second participle occurred of described a plurality of first participles as candidate's participle; Wherein, described the second participle is the participle except the described first participle in all participles in described video resource data;
Calculate respectively the co-occurrence rate of the described first participle and described candidate's participle in each concordance list, the ratio of the information sum of video resource data in the number of times that described co-occurrence rate is candidate's participle appearance in described concordance list and described concordance list;
Be respectively a plurality of weights that described a plurality of first participle is corresponding with the co-occurrence rate configuration of described candidate's participle;
Calculate respectively a plurality of mean value that has configured the co-occurrence rate of weight, as the co-occurrence rate of described a plurality of first participles and described candidate's participle;
Extract described co-occurrence rate higher than candidate's participle of predetermined threshold value as associated the second participle.
B14, device as described as B9, described the second participle is searched module and also is suitable for:
When described feature Ben Wenben information is mapped as a plurality of first participle, extract respectively a plurality of preset concordance list that described a plurality of first participle is corresponding; Wherein, each concordance list comprises the information of the video resource data under the described first participle, and, all participles in described video resource data; All participles in described video resource data, for by capturing the video resource data, extract the feature text message of described video resource data, and described feature text message is carried out to the participle generation;
Adopt described a plurality of concordance list to determine main participle, the maximum first participle corresponding to concordance list of information sum that described main participle is the video resource data;
Calculate the co-occurrence rate of each the second participle in the concordance list that described main participle is corresponding with it, the ratio of the information sum of video resource data in the number of times that described co-occurrence rate is each second participle appearance in described concordance list and described concordance list; Wherein, described the second participle is the participle except the described first participle in all participles in described video resource data;
Extract described co-occurrence rate higher than the second participle of predetermined threshold value as associated the second participle.
B15, as B9 or B12 or B13 or the described device of B14, described feature text message comprises video title, video keyword and/or video presentation.
B16, device as described as B14, described network link address acquisition module also is suitable for:
Obtain the network linking address of the second video resource data of described main participle and described associated the second participle.

Claims (10)

1. the method for pushing of the correlated resources address based on video search comprises:
When the loading that receives the first video resource data or playing request, obtain the feature Ben Wenben information of described the first video resource data;
Described feature Ben Wenben information is mapped as to one or more first participles;
Search and the co-occurrence rate of described one or more first participles associated the second participle higher than predetermined threshold value; Described co-occurrence rate is current one or more first participle and the second participle common probability occurred in same video resource data;
Obtain the network linking address of the second video resource data of mating with described one or more first participles and described associated the second participle;
Push the network linking address of described the second video resource data.
2. the method for claim 1, is characterized in that, when described loading when receiving the first video resource data or playing request, the step of obtaining the feature Ben Wenben information of described the first video resource data comprises:
When receiving the playing request of the first video data, receive the feature Ben Wenben information of the described first video resource data of current terminal transmission;
Perhaps,
When receiving the first video data load request, extract the feature Ben Wenben information of the preset described video resource data in this locality.
3. the method for claim 1, is characterized in that, the described step that described feature Ben Wenben information is mapped as to one or more first participles comprises:
Extract the participle that described feature Ben Wenben information is shone upon;
Perhaps,
When the feature Ben Wenben information received is compound word, described feature Ben Wenben information is split as to the sub-word of a plurality of search; Extract a plurality of participles that the sub-word of described a plurality of search shines upon.
4. the method for claim 1, is characterized in that, described searching with the co-occurrence rate of the described one or more first participles step higher than associated second participle of predetermined threshold value comprises:
When described feature Ben Wenben information is mapped as a first participle, extract the preset concordance list that the described first participle is corresponding; Wherein, described concordance list comprises the information of the video resource data under the described first participle, and, all participles in described video resource data; All participles in described video resource data, for by capturing the video resource data, extract the feature text message of described video resource data, and described feature text message is carried out to the participle generation;
Calculate the co-occurrence rate of each the second participle in the described first participle and described concordance list, the ratio of the information sum of video resource data in the number of times that described co-occurrence rate is each second participle appearance in described concordance list and described concordance list; Wherein, described the second participle is the participle except the described first participle in all participles in described video resource data;
Extract described co-occurrence rate higher than the second participle of predetermined threshold value as associated the second participle.
5. the method for claim 1, is characterized in that, described searching with the co-occurrence rate of the described one or more first participles step higher than associated second participle of predetermined threshold value comprises:
When described feature Ben Wenben information is mapped as a plurality of first participle, extract respectively a plurality of preset concordance list that described a plurality of first participle is corresponding; Each concordance list comprises the information of the video resource data under the described first participle, and, all participles in described video resource data; All participles in described video resource data, for by capturing the video resource data, extract the feature text message of described video resource data, and described feature text message is carried out to the participle generation;
Extract with common the second participle occurred of described a plurality of first participles as candidate's participle; Wherein, described the second participle is the participle except the described first participle in all participles in described video resource data;
Calculate respectively the co-occurrence rate of the described first participle and described candidate's participle in each concordance list, the ratio of the information sum of video resource data in the number of times that described co-occurrence rate is candidate's participle appearance in described concordance list and described concordance list;
Be respectively a plurality of weights that described a plurality of first participle is corresponding with the co-occurrence rate configuration of described candidate's participle;
Calculate respectively a plurality of mean value that has configured the co-occurrence rate of weight, as the co-occurrence rate of described a plurality of first participles and described candidate's participle;
Extract described co-occurrence rate higher than candidate's participle of predetermined threshold value as associated the second participle.
6. the method for claim 1, is characterized in that, described searching with the co-occurrence rate of the described one or more first participles step higher than associated second participle of predetermined threshold value comprises:
When described feature Ben Wenben information is mapped as a plurality of first participle, extract respectively a plurality of preset concordance list that described a plurality of first participle is corresponding; Wherein, each concordance list comprises the information of the video resource data under the described first participle, and, all participles in described video resource data; All participles in described video resource data, for by capturing the video resource data, extract the feature text message of described video resource data, and described feature text message is carried out to the participle generation;
Adopt described a plurality of concordance list to determine main participle, the maximum first participle corresponding to concordance list of information sum that described main participle is the video resource data;
Calculate the co-occurrence rate of each the second participle in the concordance list that described main participle is corresponding with it, the ratio of the information sum of video resource data in the number of times that described co-occurrence rate is each second participle appearance in described concordance list and described concordance list; Wherein, described the second participle is the participle except the described first participle in all participles in described video resource data;
Extract described co-occurrence rate higher than the second participle of predetermined threshold value as associated the second participle.
7. as claim 1 or 4 or 5 or 6 described methods, it is characterized in that, described feature text message comprises video title, video keyword and/or video presentation.
8. method as claimed in claim 6, is characterized in that, the described step of obtaining the network linking address of the second video resource data of mating with described one or more first participles and described associated the second participle comprises:
Obtain described main participle and described associated the second participle the network linking address of the second video resource data.
9. the pusher of the correlated resources address based on video search comprises:
Feature text message acquisition module, be suitable for, when the loading that receives the first video resource data or playing request, obtaining the feature Ben Wenben information of described the first video resource data;
First participle mapping block, be suitable for described feature Ben Wenben information is mapped as to one or more first participles;
Module searched in the second participle, is suitable for searching and the co-occurrence rate of described one or more first participles associated the second participle higher than predetermined threshold value; Described co-occurrence rate is current one or more first participle and the second participle common probability occurred in same video resource data;
Network link address acquisition module, the network linking address that is suitable for obtaining the second video resource data of mating with described one or more first participles and described associated the second participle;
Network link address pushing module, be suitable for pushing the network linking address of described the second video resource data.
10. device as claimed in claim 9, is characterized in that, described feature text message acquisition module also is suitable for:
When receiving the playing request of the first video data, receive the feature Ben Wenben information of the described first video resource data of current terminal transmission;
Perhaps,
When receiving the first video data load request, extract the feature Ben Wenben information of the preset described video resource data in this locality.
CN201310462461.6A 2013-09-30 2013-09-30 The method for pushing of a kind of correlated resources address based on video search and device Active CN103491205B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201310462461.6A CN103491205B (en) 2013-09-30 2013-09-30 The method for pushing of a kind of correlated resources address based on video search and device
PCT/CN2014/086519 WO2015043389A1 (en) 2013-09-30 2014-09-15 Participle information push method and device based on video search

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310462461.6A CN103491205B (en) 2013-09-30 2013-09-30 The method for pushing of a kind of correlated resources address based on video search and device

Publications (2)

Publication Number Publication Date
CN103491205A true CN103491205A (en) 2014-01-01
CN103491205B CN103491205B (en) 2016-08-17

Family

ID=49831158

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310462461.6A Active CN103491205B (en) 2013-09-30 2013-09-30 The method for pushing of a kind of correlated resources address based on video search and device

Country Status (1)

Country Link
CN (1) CN103491205B (en)

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015043389A1 (en) * 2013-09-30 2015-04-02 北京奇虎科技有限公司 Participle information push method and device based on video search
CN105279172A (en) * 2014-06-30 2016-01-27 惠州市伟乐科技股份有限公司 Video matching method and device
CN105912600A (en) * 2016-04-05 2016-08-31 上海智臻智能网络科技股份有限公司 Question-answer knowledge base and establishing method thereof, intelligent question-answering method and system
CN110427381A (en) * 2019-08-07 2019-11-08 北京嘉和海森健康科技有限公司 A kind of data processing method and relevant device
CN110674386A (en) * 2018-06-14 2020-01-10 北京百度网讯科技有限公司 Resource recommendation method, device and storage medium
US10581880B2 (en) 2016-09-19 2020-03-03 Group-Ib Tds Ltd. System and method for generating rules for attack detection feedback system
CN111400546A (en) * 2020-03-18 2020-07-10 腾讯科技(深圳)有限公司 Video recall method and video recommendation method and device
US10721271B2 (en) 2016-12-29 2020-07-21 Trust Ltd. System and method for detecting phishing web pages
US10721251B2 (en) 2016-08-03 2020-07-21 Group Ib, Ltd Method and system for detecting remote access during activity on the pages of a web resource
US10762352B2 (en) 2018-01-17 2020-09-01 Group Ib, Ltd Method and system for the automatic identification of fuzzy copies of video content
US10778719B2 (en) 2016-12-29 2020-09-15 Trust Ltd. System and method for gathering information to detect phishing activity
US10958684B2 (en) 2018-01-17 2021-03-23 Group Ib, Ltd Method and computer device for identifying malicious web resources
US11005779B2 (en) 2018-02-13 2021-05-11 Trust Ltd. Method of and server for detecting associated web resources
US11122061B2 (en) 2018-01-17 2021-09-14 Group IB TDS, Ltd Method and server for determining malicious files in network traffic
US11151581B2 (en) 2020-03-04 2021-10-19 Group-Ib Global Private Limited System and method for brand protection based on search results
US11153351B2 (en) 2018-12-17 2021-10-19 Trust Ltd. Method and computing device for identifying suspicious users in message exchange systems
US11250129B2 (en) 2019-12-05 2022-02-15 Group IB TDS, Ltd Method and system for determining affiliation of software to software families
US11356470B2 (en) 2019-12-19 2022-06-07 Group IB TDS, Ltd Method and system for determining network vulnerabilities
US11431749B2 (en) 2018-12-28 2022-08-30 Trust Ltd. Method and computing device for generating indication of malicious web resources
US11451580B2 (en) 2018-01-17 2022-09-20 Trust Ltd. Method and system of decentralized malware identification
US11503044B2 (en) 2018-01-17 2022-11-15 Group IB TDS, Ltd Method computing device for detecting malicious domain names in network traffic
US11526608B2 (en) 2019-12-05 2022-12-13 Group IB TDS, Ltd Method and system for determining affiliation of software to software families
US11755700B2 (en) 2017-11-21 2023-09-12 Group Ib, Ltd Method for classifying user action sequence
US11847223B2 (en) 2020-08-06 2023-12-19 Group IB TDS, Ltd Method and system for generating a list of indicators of compromise
US11934498B2 (en) 2019-02-27 2024-03-19 Group Ib, Ltd Method and system of user identification
US11947572B2 (en) 2021-03-29 2024-04-02 Group IB TDS, Ltd Method and system for clustering executable files

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040064447A1 (en) * 2002-09-27 2004-04-01 Simske Steven J. System and method for management of synonymic searching
CN101236567A (en) * 2008-02-04 2008-08-06 上海升岳电子科技有限公司 Method and terminal apparatus for accomplishing on-line network multimedia application
CN101599995A (en) * 2009-07-13 2009-12-09 中国传媒大学 The directory distribution method and the network architecture towards high-concurrency retrieval system
WO2010068931A1 (en) * 2008-12-12 2010-06-17 Atigeo Llc Providing recommendations using information determined for domains of interest
CN101957828A (en) * 2009-07-20 2011-01-26 阿里巴巴集团控股有限公司 Method and device for sequencing search results

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040064447A1 (en) * 2002-09-27 2004-04-01 Simske Steven J. System and method for management of synonymic searching
CN101236567A (en) * 2008-02-04 2008-08-06 上海升岳电子科技有限公司 Method and terminal apparatus for accomplishing on-line network multimedia application
WO2010068931A1 (en) * 2008-12-12 2010-06-17 Atigeo Llc Providing recommendations using information determined for domains of interest
CN101599995A (en) * 2009-07-13 2009-12-09 中国传媒大学 The directory distribution method and the network architecture towards high-concurrency retrieval system
CN101957828A (en) * 2009-07-20 2011-01-26 阿里巴巴集团控股有限公司 Method and device for sequencing search results

Cited By (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015043389A1 (en) * 2013-09-30 2015-04-02 北京奇虎科技有限公司 Participle information push method and device based on video search
CN105279172A (en) * 2014-06-30 2016-01-27 惠州市伟乐科技股份有限公司 Video matching method and device
CN105279172B (en) * 2014-06-30 2019-07-09 惠州市伟乐科技股份有限公司 Video matching method and device
CN105912600A (en) * 2016-04-05 2016-08-31 上海智臻智能网络科技股份有限公司 Question-answer knowledge base and establishing method thereof, intelligent question-answering method and system
US10721251B2 (en) 2016-08-03 2020-07-21 Group Ib, Ltd Method and system for detecting remote access during activity on the pages of a web resource
US10581880B2 (en) 2016-09-19 2020-03-03 Group-Ib Tds Ltd. System and method for generating rules for attack detection feedback system
US10721271B2 (en) 2016-12-29 2020-07-21 Trust Ltd. System and method for detecting phishing web pages
US10778719B2 (en) 2016-12-29 2020-09-15 Trust Ltd. System and method for gathering information to detect phishing activity
US11755700B2 (en) 2017-11-21 2023-09-12 Group Ib, Ltd Method for classifying user action sequence
US11503044B2 (en) 2018-01-17 2022-11-15 Group IB TDS, Ltd Method computing device for detecting malicious domain names in network traffic
US11475670B2 (en) 2018-01-17 2022-10-18 Group Ib, Ltd Method of creating a template of original video content
US11122061B2 (en) 2018-01-17 2021-09-14 Group IB TDS, Ltd Method and server for determining malicious files in network traffic
US10762352B2 (en) 2018-01-17 2020-09-01 Group Ib, Ltd Method and system for the automatic identification of fuzzy copies of video content
US11451580B2 (en) 2018-01-17 2022-09-20 Trust Ltd. Method and system of decentralized malware identification
US10958684B2 (en) 2018-01-17 2021-03-23 Group Ib, Ltd Method and computer device for identifying malicious web resources
US11005779B2 (en) 2018-02-13 2021-05-11 Trust Ltd. Method of and server for detecting associated web resources
CN110674386A (en) * 2018-06-14 2020-01-10 北京百度网讯科技有限公司 Resource recommendation method, device and storage medium
CN110674386B (en) * 2018-06-14 2022-11-01 北京百度网讯科技有限公司 Resource recommendation method, device and storage medium
US11153351B2 (en) 2018-12-17 2021-10-19 Trust Ltd. Method and computing device for identifying suspicious users in message exchange systems
US11431749B2 (en) 2018-12-28 2022-08-30 Trust Ltd. Method and computing device for generating indication of malicious web resources
US11934498B2 (en) 2019-02-27 2024-03-19 Group Ib, Ltd Method and system of user identification
CN110427381A (en) * 2019-08-07 2019-11-08 北京嘉和海森健康科技有限公司 A kind of data processing method and relevant device
US11250129B2 (en) 2019-12-05 2022-02-15 Group IB TDS, Ltd Method and system for determining affiliation of software to software families
US11526608B2 (en) 2019-12-05 2022-12-13 Group IB TDS, Ltd Method and system for determining affiliation of software to software families
US11356470B2 (en) 2019-12-19 2022-06-07 Group IB TDS, Ltd Method and system for determining network vulnerabilities
US11151581B2 (en) 2020-03-04 2021-10-19 Group-Ib Global Private Limited System and method for brand protection based on search results
CN111400546A (en) * 2020-03-18 2020-07-10 腾讯科技(深圳)有限公司 Video recall method and video recommendation method and device
CN111400546B (en) * 2020-03-18 2020-12-01 腾讯科技(深圳)有限公司 Video recall method and video recommendation method and device
US11847223B2 (en) 2020-08-06 2023-12-19 Group IB TDS, Ltd Method and system for generating a list of indicators of compromise
US11947572B2 (en) 2021-03-29 2024-04-02 Group IB TDS, Ltd Method and system for clustering executable files

Also Published As

Publication number Publication date
CN103491205B (en) 2016-08-17

Similar Documents

Publication Publication Date Title
CN103491205A (en) Related resource address push method and device based on video retrieval
CN106649818B (en) Application search intention identification method and device, application search method and server
CN105786977B (en) Mobile search method and device based on artificial intelligence
CN103488787A (en) Method and device for pushing online playing entry objects based on video retrieval
CN102349072B (en) Identifying query aspects
CN1924858B (en) Method and device for fetching new words and input method system
CN112507715A (en) Method, device, equipment and storage medium for determining incidence relation between entities
CN103544267A (en) Search method and device based on search recommended words
CN110162768B (en) Method and device for acquiring entity relationship, computer readable medium and electronic equipment
CN103544266A (en) Method and device for generating search suggestion words
CN103049435A (en) Text fine granularity sentiment analysis method and text fine granularity sentiment analysis device
CN103870000A (en) Method and device for sorting candidate items generated by input method
CN103870001A (en) Input method candidate item generating method and electronic device
CN103631889B (en) Image recognizing method and device
CN109325146A (en) A kind of video recommendation method, device, storage medium and server
CN107180087B (en) A kind of searching method and device
CN113704507B (en) Data processing method, computer device and readable storage medium
CN110795565A (en) Semantic recognition-based alias mining method, device, medium and electronic equipment
CN103942264A (en) Method and device for pushing webpages containing news information
CN110209781B (en) Text processing method and device and related equipment
CN104503988A (en) Searching method and device
CN104881428A (en) Information graph extracting and retrieving method and device for information graph webpages
CN104391969A (en) User query statement syntactic structure determining method and device
CN111813993A (en) Video content expanding method and device, terminal equipment and storage medium
CN103955480A (en) Method and equipment for determining target object information corresponding to user

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20220707

Address after: Room 801, 8th floor, No. 104, floors 1-19, building 2, yard 6, Jiuxianqiao Road, Chaoyang District, Beijing 100015

Patentee after: BEIJING QIHOO TECHNOLOGY Co.,Ltd.

Address before: 100088 room 112, block D, 28 new street, new street, Xicheng District, Beijing (Desheng Park)

Patentee before: BEIJING QIHOO TECHNOLOGY Co.,Ltd.

Patentee before: Qizhi software (Beijing) Co.,Ltd.