CN102446225A - Real-time search method, device and system - Google Patents

Real-time search method, device and system Download PDF

Info

Publication number
CN102446225A
CN102446225A CN2012100068607A CN201210006860A CN102446225A CN 102446225 A CN102446225 A CN 102446225A CN 2012100068607 A CN2012100068607 A CN 2012100068607A CN 201210006860 A CN201210006860 A CN 201210006860A CN 102446225 A CN102446225 A CN 102446225A
Authority
CN
China
Prior art keywords
data
real
search
life
targeted website
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2012100068607A
Other languages
Chinese (zh)
Inventor
刘晓刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SHENZHEN AIGU TECHNOLOGY CO LTD
Original Assignee
SHENZHEN AIGU TECHNOLOGY CO LTD
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SHENZHEN AIGU TECHNOLOGY CO LTD filed Critical SHENZHEN AIGU TECHNOLOGY CO LTD
Priority to CN2012100068607A priority Critical patent/CN102446225A/en
Publication of CN102446225A publication Critical patent/CN102446225A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention discloses a real-time search method. The method comprises the following steps: S1, setting interest point data specified by a system; S2, according to the interest point data, capturing associative data from a target web site into the system; S3, according to a preset data acquisition cycle, traversing the target web site; S4, judging whether an updated target web site exists in the target web site, wherein the updated items include newly-presented web pages and changed web pages; if no updated target web site exists in the target web site, back to the step 2; otherwise turning to a step S5; and S5, capturing associative data on the updated target web site into the system and updating the target web site so as to achieve synchronous acquisition. The invention also discloses a real-time search device and a real-time search system. By using the real-time search method, the real-time search device and the real-time search system disclosed by the invention, instant information can be searched in real time at high speed with small resource occupation.

Description

A kind of methods, devices and systems of real-time search
Technical field
The present invention relates to a kind of web search field, particularly a kind of methods, devices and systems.
Background technology
To the whole world, little of each enterprise, businessman to big, or even one family and individual, information is people's the closest key factors of relation of working and live.Though in these years the technology of search engine is more and more advanced, but still there is a very big problem hit or miss in information search on the internet.Used the people of search engine that such impression was all arranged: can search the result who wants less than you sometimes, opposite, search out up to a million unwanted results sometimes unexpectedly.And in fact, second kind of result be the most troubling also be the most reluctant.If want from these 1,000,000 Search Results, to find the own information that really needs, just look for a needle in a haystack as being both.
Suppose that the internet is a huge library, embrace a wide spectrum of ideas.The first stage of construction in the library, the books negligible amounts in the library is put disorderly and unsystematicly, and the user searches information, needs oneself one turns over, and this is the primary stage of internet.Spent a period of time, begun to have the keeper to put these information disaggregatedly in order, and provide a catalogue to supply us to search, this keeper is exactly a portal website, and this is to be the brilliant stage of portal website of representative with the Yahoo.Afterwards, began more clever keeper to occur, organized a group of people; See one one of the books in the library individual time; Edit and record into a huge index to the content of books then, to the public service is provided, which type of books I needs as long as the public tells the keeper; The keeper just tells you comprising your all books of desired content; And tell where your each book specifically is placed on, and oneself go for just passablely, this clever keeper is exactly to be the search engine of representative with google.But constantly all there are the books and the out-of-date books that moved back the shop of newly going into the shop in the library, and conventional management person has no idea in time to know these information, and we need a more senior keeper now; Can not only be my needs by my which books of requirement let me know, also want to remember my requirement, when having new books to go into the shop; As long as with my demand coupling, notify me with regard to the very first time, let my books of can in time coming to read; This is real-time search.
We do real-time search, and purpose is exactly to obtain the emerging information in internet the very first time, and notifies the user, let the information that the user views in time oneself to be needed.Search in real time has huge value for ageing stronger internet, applications.Present most typical application is exactly the search to microblogging.
We can see that microblogging is popular to have had the time above 3 years, but does not slowly occur to the search engine of microblogging, and in 1 year up to date, each big main flow search engine just releases one after another to the real-time ability of searching for of microblogging.
Google is directed against the real-time search of twitter because popularization is not puted forth effort in prematurity still at present; Have is that domestic to do real-time search reasonable; But its object search is the microblogging of Netease itself, and the real-time search of Tengxun oneself also only is directed against the microblogging of oneself, and informal the popularization.At present domesticly doing that the real-time search of microblogging has superiority, is Pan Gu's search of just just having issued (2011) February in this year, and the object of its search has been contained main flow microbloggings such as Tengxun, Sina, Sohu, Netease, and in continuing to expand.
Because threshold is higher, some small-sized search engines are not broken through technical barrier as yet fully, so microblogging search in real time at present still is in the primary stage of moving steadily, and do not have any furniture and prepare enough enough strength formation monopoly powers.
On the other hand, search application also possesses great value in the life information field in real time.The development of present domestic life information website is like a raging fire, and some life information is had relatively high expectations to ageing, has just been seized by others like rent and sale information, some source of promoting sales by cutting prices of some narrow resources.Because threshold is higher; At present in the life information field; The domestic incomer who does not find that still life information is searched in real time, traditional large-scale general search engine adopts mostly and regularly the data that newly collect is set up increment index, regularly merges increment index and full dose index database; The way of regular update full dose index database, this way has following deficiency:
1. because increment index is regularly to set up, therefore can't accomplish the real-time update data.Newly-increased data can only be buffered, and wait until and are just built into index, thereby could be searched when the next index upgrade cycle arrives.Based on such mechanism, can accomplish the quasi real time effect of a minute level (2-5 minute) through the increment index of optimizing.
2. the merging mechanism of increment index and full dose index database is comparatively complicated, restive.If adopt the mode of the single full dose index of single increment index, will greatly cause merging process very slow owing to the full dose index becomes in long-term operation, thereby also can have influence on retrieval performance.If adopt the mode of the multistage full dose index database of multistage increment index; Renewal and the deletion action that comprise in the increment index so to available data; Will be distributed in a plurality of full dose index databases; Need extra management organization's assist process during merging, significantly increase system complexity, also have the inconsistent problem of data simultaneously easily.
3. traditional index is set up a index to some concrete application usually, and each concrete index and necessary resources thereof (like segmenter, similarity counter etc.) all are independently, and the necessary resources between a plurality of index can not be shared.Dictionary such as segmenter can take a large amount of internal memories, if a plurality of index is deployed in same station server, each index must load a dictionary alone, causes a large amount of internal memory wastes.
Summary of the invention
In order to solve above technical matters, the present invention provides a kind of real-time searching method, device and system.
The invention discloses a kind of real-time searching method, comprising:
S1. the interest point data of initialization system appointment;
S2. grasp associated data from the targeted website to system according to described interest point data;
S3. travel through described targeted website according to preset data collection cycle;
S4. judge whether described targeted website has the targeted website of renewal, and described renewal comprises: emerging webpage, the webpage that changed; If not, return step S2, if get into step S5;
S5. grasp associated data to the described system on the targeted website of described renewal and upgrade, realize synchronous acquisition, classification shows search information.
In real-time searching method of the present invention, described step S1 also comprises the following steps: between step S2
S11. through sample analysis, extract the structural model storehouse, generate automatically and extract template mass data;
S12. the described interest point data of pre-service calculates and the similarity in described structural model storehouse, judges the structure of associated data.
In real-time searching method of the present invention, described classification shows that search information comprises: life Taobao, life classification, life retail shop, life circle and life are used,
Described life Taobao second-level directory down comprises: house property information, service for life, friend-making are marriage-seeking, vehicle is bought and sold service, pet/pet supplies, flea market, job hunting resume, recruitment information, business service;
Described life classification second-level directory comprises house property information, flea market, vehicle dealing and service, ticketing service reward voucher, the educational training of different cities;
The second-level directory in described life merchant street comprises: the cuisines of different cities, shopping, beauty, leisure, hotel, body-building, tourism;
Described life circle second-level directory comprises: webpage, picture, video;
Described life is used second-level directory and is comprised: amusement, recreation, instrument;
Wherein, all have three grades of catalogues under the described second-level directory, described three grades of catalogues are concrete program.
In real-time searching method of the present invention, described extraction template comprises: banner, type of webpage, content type, title, keyword, summary, text, peer link.
The invention discloses a kind of real-time searched devices, be used to realize above-mentioned method, comprising:
Interest point data setup unit: the interest point data of initialization system appointment;
Associated data placement unit: be used for grasping associated data from the targeted website to system according to described interest point data;
Traversal unit, targeted website: link to each other with described associated data placement unit, be used for traveling through described targeted website according to preset data collection cycle;
Judging unit is upgraded in the targeted website: link to each other with traversal unit, described targeted website, be used to judge whether described targeted website has the targeted website of renewal, and described renewal comprises: emerging webpage, the webpage that changed;
Placement unit is upgraded in the targeted website: upgrade judging unit with described targeted website and link to each other, be used to grasp associated data to the described system on the targeted website of described renewal and upgrade, realize synchronous acquisition, classification demonstration search information.
In real-time searcher of the present invention, also comprise and extract template generation unit and associated data structures identifying unit,
Described extraction template generation unit links to each other with described interest point data setup unit, is used for extracting the structural model storehouse through the sample analysis to mass data, generates automatically and extracts template;
The associated data structures identifying unit links to each other with described extraction template generation unit and associated data placement unit, is used for the described interest point data of pre-service, calculates and the similarity in described structural model storehouse, judges the structure of associated data.
The invention discloses a kind of real-time search system; Comprise: the searcher of search website; The controller that is used to control described searcher that links to each other with described searcher, the raw data base that links to each other with described controller, the index that links to each other with described raw data base; The index data base that links to each other with described index; With the searcher that described index data base links to each other, described searcher links to each other with man-machine interaction unit, comprises that also described searcher comprises the real-time searched devices described in the claim 5.
In real-time search system of the present invention; Also comprise user behavior data storehouse and log analyzer; Described user behavior data storehouse links to each other with described man-machine interaction unit; Described log analyzer and described raw data base and index data base and user behavior log database link to each other respectively, are used for the uncertain user inquiring of search content.
In real-time search system of the present invention, described man-machine interaction unit comprises keypad/display/touch-screen.
In real-time search system of the present invention; Described real-time search system comprises at least one index server; Described index server comprises at least one burst server, and described index server is realized the search of associated data through described burst server.
A kind of real-time searching method, device and the system of embodiment of the present invention have following beneficial technical effects:
1. on search strategy, adopted the optimization routing algorithm based on product classification of original creation, the interest point data of an acquisition system appointment.The benefit of the maximum of this algorithm; Be the path that need not to travel through a large amount of valueless data, through contrasting the classification associated path tree that we preset, the path with the targeted website is classified into effective traversal number automatically; Significantly reduce grabbing of junk data and climbed, greatly improved the speed of data acquisition.
2. adopt the automatic structural data of original creation to extract intelligent template (DocView) technology at pretreatment stage,, extract the structural model storehouse, generate automatically and extract template through sample analysis to mass data.When data were pretreated, the similarity in calculating and structural model storehouse was judged the structure to data.And can adjust template automatically to the webpage that changed according to historical data.
3. quasi real time data are climbed the technology of getting, and through the distributed reptile technology, accomplish the online emerging data of more real-time collection.
Description of drawings
Fig. 1 is a kind of real-time searching method process flow diagram of the embodiment of the invention;
Fig. 2 is a kind of real-time searcher block diagram of the embodiment of the invention;
Fig. 3 is a kind of real-time search system structural map of the embodiment of the invention;
Fig. 4 is a kind of real-time search system functional frame composition of the embodiment of the invention.
Embodiment
By specifying technology contents of the present invention, structural attitude, realized purpose and effect, give explanation below in conjunction with embodiment and conjunction with figs. are detailed.
The shortage standardization of the geometric growth of internet scale and WWW is compared the retrieval of networked information retrieval and conventional information and is demonstrated tangible difference: the internet information retrieval towards object be mass data; The information content that the internet information retrieval is provided embraces a wide spectrum of ideas, and form is multifarious.In order to provide structurized to the user, data intuitively, we must filter the webpage that the collects denoising of entering, and purify a series of data processing such as subject information structuring extraction.
The search engine of main flow is all more weak in structural data extraction field at present; Universal search engine such as Baidu and Google is all only done tag processes to gathering the data of returning; The character of universal search engine has determined its precision information requirement that can not satisfy special dimension, special population service.Market demand diversification has determined the service mode of search engine segmentation will occur, to different industries accurate more industry service mode is provided.The appearance that develops into vertical search engine that we can say universal search engine provides the good market space, certainly will vertical search engine occupies part market in the internet trend will occur, also is the inexorable trend of search engine industry sectionalization.For can better aggregation information, in magnanimity information, extract structurized data, for the user provides better user experience.
Nowadays,, utilize the taxonomy principle to be born and a kind of new spreading network information carrier one classified information net internet research and development constantly through people.Classified information is referred to as classified advertisement again, just as people on newpapers and periodicals, search recruitment information, the information of renting a house, travel information, discounting promotional advertisement is the same, these information are exactly the classified information of our indication.In information society, classified information progressively receives people's extensive favor.
The emergence of network class information has very perfectly solved the major issue of many inconvenience.The classified information net not only contains much information, and is more timely, and can not lose, and also has more priorly to be: it utilizes search engine, searches more aspect, faster! The problem that the appearance of classified information net better helps live and work every aspects such as people have solved clothing, food, live, row, amusement, emotion, education, occupation, commerce to be run into; Also injected fresh vigor, constantly led modern new life revolution for the people's work life!
The described search system of technical scheme of the present invention; Be named as like cluck search system; Its life search is local life information and an experience all on the polymerization internet; Help each Chinese easily realize freer, more the life of quality is arranged, like " the life search+social patterns " of cluck life search innovation, be that 300,000,000 netizens and 700,000,000 cellphone subscribers provide the most convenient search service of effectively living.
See also Fig. 1, a kind of real-time searching method comprises:
S1. the interest point data of initialization system appointment;
S11. through sample analysis, extract the structural model storehouse, generate automatically and extract template mass data;
Extracting template (DocView model) comprising: key elements such as banner, type of webpage, content type, title, keyword, summary, text, peer link.Wherein text and peer link key element belong to the content-data of webpage, other 6 metadata that then belong to webpage.To describe in detail each key element in the model below.
Banner is the uniqueness sign that Web is gone up webpage, and the URL that in the DocView model, uses webpage is as banner.
Type of webpage is to divide according to the form of expression of web page contents, in this joint, webpage is divided three classes: subject web page (topic), Hub webpage (hub), picture webpage (pic) are arranged.Wherein, there is subject web page to be meant in the webpage through text description one or many things, certain topic is arranged; Like a concrete news web page is exactly that subject web page is typically arranged.The Hub webpage is meant the webpage that is used to provide the webpage guiding specially, thereby is the webpage of ultra chain aggregation; Homepage like portal website is exactly typical Hub webpage.The picture webpage is meant that the content of webpage is the embodied through picture, and its Chinese words only is an explanation to picture seldom; It is exactly typical picture webpage that the personnel that comprise picture like certain mechanism introduce webpage.
It is because there is bigger difference in three types of webpages on purposes and disposal route that webpage is divided into above-mentioned three types.Wherein Hub webpage and the difference of other two types of webpages are that the effect that webpage brings into play is different on Web, and the Hub webpage can not stated a things usually concretely, and provides the set of links about relevant information.And the picture webpage is that with the difference of other two types of webpages the method for handling is different and since the content of picture webpage be express through picture rather than through literal, thereby the method for conventional information process field is effective inadequately to the picture webpage.Difference between three types of webpages causes the plurality of applications field all can do suitable difference to them.
Content type is from semantically the content of webpage being classified, and it is the direct approach that computing machine obtains web page semantics information, has in the research field on Web widely to use.It is that classification obtains to web page contents through specific sorter, depends on certain taxonomic hierarchies.
Title, keyword and summary are the important metadata of general description Web document content, for the work in fields such as Web information retrieval important effect are arranged.
Text is a real part of describing theme in the original web page, therefore, in some concrete application, replaces original web page more reasonable with text.
Peer link is meant at this webpage middle finger to the linking of the webpage relevant with body matter, but not the link of noises such as advertisement.Text and relevant ultra chain are reconfigured the webpage after just having obtained purifying.
S12. the described interest point data of pre-service calculates and the similarity in described structural model storehouse, judges the structure of associated data.
S2. grasp associated data from the targeted website to system according to described interest point data;
S3. travel through described targeted website according to preset data collection cycle;
Like cluck what search that system uses is search plan quasi real time, at first system can regularly collect, each last content of replacement of collecting, we are referred to as " collecting in batches ".Owing to all be to come again once at every turn, for extensive search engine, each time of collecting can be spent several weeks usually.And because to do expense so bigger, the interval time of common twice collection can be very not short yet (for example Google a period of time once be whenever at a distance from over 28 days once).The benefit of doing like this is that system's realization is fairly simple, and major defect is that " timeliness n " is (freshness) not high, the consumption that repeats to collect the extra bandwidth of being brought in addition, and the present regular collection period of system is 15 days.
S4. judge whether described targeted website has the targeted website of renewal, and described renewal comprises: emerging webpage, the webpage that changed;
In order to solve ageing not high problem, like cluck search system and use increment to collect scheme, collect a collection ofly during beginning, be backward:
(1) collect emerging webpage,
(2) collecting those had the webpage that changes after collecting last time,
(3) webpage of having found after collecting last time, no longer to have existed, and from the storehouse, delete.
Because except that news website, the content change of many webpages is not very frequent (having to study and point out that the average life cycle of 50% webpage is approximately 50 days [Cho and Garcia-Molina, 2000]; [Cho; 2002]), the webpage amount of doing each collection like this can be very not big, can accomplish the online emerging data of more real-time collection simultaneously; From present system data; Our distributed reptile can be accomplished data ability synchronous acquisition new in 1 hour basically, for the higher demand of some real-time, can accomplish other collection of minute level.
S5. grasp associated data to the described system on the targeted website of described renewal and upgrade, realize synchronous acquisition, classification shows search information.
Described classification shows that search information comprises: life Taobao, life classification, life retail shop, life circle and life are used.
In this method: like cluck the intelligent template of having searched the engine independent research: extract template (DocView) model and be used for carrying out denoising and structured message extraction to gathering the data of coming.
The DocView model feature: can extract normal webpage arbitrarily, full automation need not generate template in advance to concrete website, must generate decimation rule to each webpage automatic time, does not need manual intervention fully.It is high that intelligence extracts accuracy rate, is not the coupling of machinery, adopts Intellectual Analysis Technology, and accuracy rate can reach more than 98%.Can guarantee faster treatment speed, owing to adopt the Intellectual Analysis Technology of the page, remove the rubbish piece earlier, reduce the pressure of analyzing, be that processing speed improves greatly.Versatility is better, is easy to safeguard, only needs setup parameter, the corresponding characteristic of configuration just can improve the corresponding performance that extracts; General layman just can safeguard through simple training.
Get into this search engine; System can eject search instructions, informs that the user can search for " house property information, service for life, friend-making are marriage-seeking, vehicle dealing service, pet and pet supplies, flea market, job hunting resume, recruitment information, popular industry, educational training " or the like.
This search engine comprises five big types of services for life; Let the user share all kinds of services for life; Comprising life Taobao, life classification, life merchant street, life circle and life application, is the search service of target to satisfy the user in the primary demand of content information aspect, for example lives, does shopping, tourism, news, webpage, software, picture, music, video, map search etc.; With the search infrastructure service make become the primary demand aspect, unified, the open service product cluster of standard; Can laterally freely expand, implement stage by stage, in an orderly manner, be that the value-added service providing capability supports.Wherein,
1. click life Taobao button, then get into all kinds of lives dealing information searches, life Taobao functional module comprises: vertically grow directly from seeds live this rope, content exchange, stand and share outward etc.; As can show the hotel service of " 200 yuan/day of husky Sea View Rooms of Da Mei; in the panic buying " and so on; Like user's information of asking the private tutor or the like of can the website posting, this search engine is supported the service for life in different cities, supports the different time option to grasp search information; Support the function of microblogging, can pay close attention to users interest.
2. click life sort button then gets into all kinds of life classified informations search, and life sort module comprises: localization classification issue, information exhibition, the interior localization search etc. of standing; As can show menus such as " Shenzhen house property information ", " Shenzhen flea market ", " Shenzhen educational training ", this this search engine is supported the life classified service in different cities, supports to release news and deletion information.
3. click life merchant street button, then get into the information search of all kinds of life retail shop, life merchant street functional module comprises: the search of local life retail shop, retail shop add issue, retail shop's comment etc.; This this search engine is supported the life merchant street service in different cities, supports to release news and deletion information.
4. click life circle button, then get into all kinds of life circle information searches, the life range sub-function module comprises: life information collection discussion is shared, website collection, picture collection etc.; This this search engine is supported the life range sub-services in different cities, supports to release news and deletion information.
5. click life application button then gets into all kinds of life application messages search, and life applied function module comprises: the application interpolation etc. of sharing, live is used in life.Can carry out the lottery ticket inquiry like the user, services such as train time inquiry, same, this search engine is supported the service for life in different cities, supports the different time option to grasp search information, supports the function of microblogging, can pay close attention to users interest.
See also Fig. 2, a kind of real-time searched devices 1; Be used to realize above-mentioned method, comprise: judging unit 40 is upgraded in interest point data setup unit 10, extraction template generation unit 15, associated data structures identifying unit 20, associated data placement unit 25, traversal unit 30, targeted website, targeted website, placement unit 50 is upgraded in the targeted website.
Interest point data setup unit 10: the interest point data of initialization system appointment;
Extract template generation unit 15 and link to each other, be used for extracting the structural model storehouse, generate automatically and extract template through sample analysis to mass data with interest point data setup unit 10;
Associated data structures identifying unit 20 with extract template generation unit 15 and associated data placement unit 25 and link to each other, be used for the described interest point data of pre-service, the similarity in calculating and described structural model storehouse is judged the structure of associated data.
Associated data placement unit 25: be used for grasping associated data from the targeted website to system according to described interest point data;
Traversal unit 30, targeted website: link to each other with described associated data placement unit 25, be used for traveling through described targeted website according to preset data collection cycle;
Judging unit 40 is upgraded in the targeted website: link to each other with traversal unit 30, targeted website, be used to judge whether described targeted website has the targeted website of renewal, and described renewal comprises: emerging webpage, the webpage that changed;
Placement unit 50 is upgraded in the targeted website: upgrade judging unit 40 with the targeted website and link to each other, be used to grasp associated data to the described system on the targeted website of described renewal and upgrade the realization synchronous acquisition.
See also Fig. 3, a kind of real-time search system; Comprise: the searcher 100 of search website, the controller 110 that is used to control described searcher 100 that links to each other with searcher 100, the raw data base 120 that links to each other with controller 110; The index 130 that links to each other with raw data base 120; The index data base 140 that links to each other with index 130, the searcher 150 that links to each other with index data base 140, searcher 150 links to each other with man-machine interaction unit 160; Searcher 150 comprises above-mentioned real-time searched devices 1 and user behavior data storehouse 170 and log analyzer 180; User behavior data storehouse 170 links to each other with man-machine interaction unit 160, and log analyzer 180 and described raw data base 120 and index data base 140 and user behavior log database 170 link to each other respectively, are used for the uncertain user inquiring of search content.
Wherein, Man-machine interaction unit 160 comprises keypad/display/touch-screen; Search system comprises at least one index server in real time, and described index server comprises at least one burst server, and described index server is realized the search of associated data through described burst server.
Native system is based on the distributed real-time directory system that lucene makes up, and comprises one or more master nodes in the system, and we are called index server (indexserver); Comprise one or more data nodes, we are called burst server (shardserver), and a plurality of index are set up in system's support; We are called index; Each index can be divided into one or more index bursts in system, we are called shard, and shard divides by wherein comprising data startkey and endkey; Numerous shard of each index can be distributed among a plurality of shardserver, and all information of index and shard are safeguarded by indexserver.Therefore indexserver is the maincenter of whole cluster, in case the whole cluster that lost efficacy is all unavailable, prevents that single point failure is machine-processed so need to introduce, and this mechanism is supported that by zookeeper that is: fulfilling in real time of system is exactly the real-time of shard.
Indexbase has preserved the distributed intelligence of all shard of system; When the new data request can not find corresponding shard; Can create new shard; So this shard this where create? This just relates to the problem of a load balancing, and our target is to let in the system data volume of each node even as much as possible.
Preserved the information of all nodes among the Indexbase, wherein comprised the data volume on each node, this data volume is the data volume summation of all shard on the node; During each client-requested shard; This value is upgraded in the capital, increases data and then adds, and deletes data and then subtracts.
So load balancing has been done with regard to relatively good, when newly creating shard, specifies this shard to leave on that minimum node of data volume at every turn, after client obtains shard information, will create corresponding shard to the relevant position according to the information of shard indication.
The time also be the backup shard that creates a shard according to the node data amount doing the data redundancy backup in addition.
Like cluck to search the vertical search reptile be on Web, to collect and discovery information with certain strategy; After information being handled and is organized; For the application that the user provides some information inquiry to serve, mainly form: grasping system, directory system and search system by three parts.
Grasping system: spider just; Be responsible for grasping data from information source; Spider is normally based on the shuttering work of structure in advance; Can only the Processing Structure simple relatively information of the spider of no template, the gordian technique point that grasping system relates to creep path analysis, increment grasps discerns with complete extractings, information structuring integrality, information uniqueness, multiple web pages is information integrated, automatic indexing etc.
Directory system: set up the data file of similar bibliography grabbing the information of coming, so that realize retrieval at a high speed.The gordian technique point that directory system relates to has participle technique, pre-judging score and back scoring, increment index and full index, ordering techniques, focus speech high-speed cache, the parsing of standard retrieval statement etc.
Search system: the website that function of search is provided.
Like cluck search engine data not only comprise the uncertain user inquiring of content, also will be included in the magnanimity webpage of dynamic change quantitatively, and these webpages can initiatively not deliver to system, but need go to grasp by system.
Under the more unimpeded situation of network; Downloading one piece of webpage from the Internet approximately needs about 1 second, if therefore in user inquiring, go to grab next thousands of webpage, analyzing and processing one by one immediately on the net; With user's match query, can not satisfy the response time requirement of search engine.Moreover, the system benefit of doing so also not high (can repeat to grasp too many webpage); In the face of a large amount of user inquirings, can not imagination inquiry each, just " search " once on the net in system.Present some engine (paddy elder sister; Rainforest wood wind 116) polymerization search just has been to use instant search; But that is a kind of pseudo-search, and they have just called the search interface of some search engine, are different with a kind of real-time search system technical scheme of the present invention.
Cluck the life classification vertical search characteristics of loving are: people issue brief and concise classified advertisement on the internet, comprise needed various services of daily life and product, and supplying has the Internet user of demand freely to browse.Common classified information form has: house to let, work recruitment, and transfer of second-hand things, ticket card are bought and sold, are made friends with the city, or the like.
The pageview of classified information website is huge to be it is advantageous that:
Convenience: it is initiatively that the netizen obtains classified information on the net, as long as to some product or serve interestedly, only needs the flicking mouse just can further understand more, detailed information, thereby makes netizen can undergo according to the selection of oneself product, service.
Accuracy: like that cluck classification for search information is the advertisement that the typical case has own reading rate; On classified information; Can go out the number of visits of every classified information through visitor's traffic statistics system precise statistics; The sales data of these quantifications helps the advertiser correctly to assess advertising results, authorization advertisement putting strategy.
Magnanimity property: classified information is particular about scale effect, and the information capacity of network class information is almost unlimited, and especially network class information is also utilized hyperlink, can use detailed layering classification, makes up huge database, and the most detailed advertising message is provided.
Ageing: classified information is directly edited issue on network, and the very first time shows that on the internet the very first time lets targeted customer's active searching arrive.
See also Fig. 4, functional frame composition of the present invention, according to this Organization Chart, the usage of this search system is:
User's point is opened network address and is got into homepage, and the homepage top can show the five functional module, that is: life Taobao, life classification, life retail shop, life circle and life are used.Mouse is placed on the search engine place; Can eject a dialog box prompting user search scope can be: " house property information, service for life, friend-making are marriage-seeking, vehicle dealing service, pet/pet supplies, flea market, job hunting resume, recruitment information, business service "; But the user also can import the query contents different with above-mentioned hunting zone and get into native system.
The five functional module that the user also can put on the homepage gets into the different secondary pages; The secondary page is characterised in that: the left side has the corresponding search module of five functional module with homepage, promptly has " life Taobao, life classification, life retail shop, life circle and life are used " equally.Page center section is the search information that system grabs arrives, as: " ask short-term to rent a house, 188 yuan, be positioned at Da Mei sand ", " recruiting the webpage design slip-stick artist; 2500 yuan of monthly pays, phone: ######## " and so on, the right-hand component of the page shows the user who registered, and is used for microblogging and adds concern; The search sub-directory that comprises part simultaneously, as: " rent a house, share the room, ask and rent a house; second-hand house, day is rented a house office building, factory building " or the like.
The user of native system not only can search for the information that needs in addition, can also freely release news so that strengthen linking up information interchange through registration.
Life Taobao second-level directory down comprises: house property information, service for life, friend-making are marriage-seeking, vehicle is bought and sold service, pet/pet supplies, flea market, job hunting resume, recruitment information, business service; Open arbitrary second-level directory like fruit dot, can get into three grades of catalogues under this second-level directory, during like house property information, information can eject dialog box, and " rent a house, share the room, ask and rent a house, second-hand house day is rented a house office building, factory building to be for user's option." user can therefrom select, also can import the input content different and get into search with this all option.
Life classification second-level directory comprises house property information, flea market, vehicle dealing and service, ticketing service reward voucher, the educational training of different cities; Characteristics: help people to solve in the life and search dealing, the issue of information such as recruitment information, the information of renting a house, travel information, second-hand merchandising.
The second-level directory in life merchant street comprises: the cuisines of different cities, shopping, beauty, leisure, hotel, body-building, tourism; The life merchant street that the user finds most convenient is convenient in this service, stays in the A district like the user, and where near the cuisines that he can search the A district specifically are in, and there is what project or the like the gymnasium of periphery.Characteristics: the problem that is run in helping life informations such as people have solved clothing, food, live, row, amusement, commerce to search to solve to live
Life circle second-level directory comprises: webpage, picture, video, and characteristics: the user will find the interesting resource in internet, as: the website, picture, video waits store and manage, can interaction be shared with other users simultaneously.
Need to prove: webpage is the user's interest collections of web pages here, and the website on as the user is Sina, Baidu, Sohu, and then the user can add these three webpages, the convenient unlatching.
The interested picture that picture loads for the user, the user can stay picture after loading, and user name of uploading and time, is convenient to share.
The interested video that video loads for the user, like excellent cruel a certain fragment, a certain fragment of potato or the like, all videos all can be shown simultaneously.
Described life is used second-level directory and is comprised: amusement, recreation, instrument, and characteristics: the user shares interpolation with application resource, can one-stop solution user's request like: the little application of life every aspect; Like Online Music; At train time, game on line, online weather; The online film in online radio station waits life with sharing.
The amusement class comprises: music box, the strange high definition or the like of planting
Game class comprises: the bird of indignation, kitchen kitchen knife or the like
Wherein, all have three grades of catalogues under the described second-level directory, described three grades of catalogues are concrete program, and specifically: point is opened the music box, the music of placing in the meeting playing back music box.
In a word; Like that cluck life search relies on leading domestic vertical search technology; For the life information user provides domestic and rents a house, the professional search service of air ticket, hotel, travel in holiday and train ticket, and utilize technological means such as advanced data mining and intelligent recommendation, through real-time integration, identification, processing mass data; For the user provide up-to-date, the most accurately, valuable life data, thereby help the user relatively to select to be fit to the life information of oneself efficiently.
Simultaneously after " liking cluck life search "; Love cluck life search data open platform is introduced the data resource of " life retail shop " " life is shared " " life is used " again; And in system, incorporated LBS (location based services) function, and be intended to satisfy user's life requirement anywhere or anytime, the data solution of hommization more is provided for the user; No matter when and where you are in; Clothing, food, live, the row various information all is all in your having in one's pocket, " like cluck search " is with the life information search, issues second-hand classified information; And cuisines, shopping, amusement and recreation, beauty, body-building, the Internet resources collection content such as application of sharing, live is main, and provides the local life information of the handy service for the people and favor information to search for and sharing platform.Excavate the huge vertical information of quantity through open internet platform, for the user a brand-new simple and reliable information acquiring way is provided then.The combination of the two; With leading a kind of brand-new search custom; The user no longer need login any special website, also need not experience the screening of navigation layer by layer, the information that only needs its shop of thinking of input maybe will search; Like cluck search the relevant informations such as place, business hours and even pre-capita consumption that mobile search will be told your shop, all are exactly so simple.
A kind of real-time searching method, device and the system of embodiment of the present invention have following beneficial technical effects:
1. on search strategy, adopted the optimization routing algorithm based on product classification of original creation, the interest point data of an acquisition system appointment.The benefit of the maximum of this algorithm; Be the path that need not to travel through a large amount of valueless data, through contrasting the classification associated path tree that we preset, the path with the targeted website is classified into effective traversal number automatically; Significantly reduce grabbing of junk data and climbed, greatly improved the speed of data acquisition.
2. adopt the automatic structural data of original creation to extract intelligent template (DocView) technology at pretreatment stage,, extract the structural model storehouse, generate automatically and extract template through sample analysis to mass data.When data were pretreated, the similarity in calculating and structural model storehouse was judged the structure to data.And can adjust template automatically to the webpage that changed according to historical data.
3. quasi real time data are climbed the technology of getting, and through the distributed reptile technology, accomplish the online emerging data of more real-time collection.
Combine accompanying drawing that embodiments of the invention are described above; But the present invention is not limited to above-mentioned embodiment, and above-mentioned embodiment only is schematically, rather than restrictive; Those of ordinary skill in the art is under enlightenment of the present invention; Not breaking away under the scope situation that aim of the present invention and claim protect, also can make a lot of forms, these all belong within the protection of the present invention.

Claims (10)

1. a method of searching in real time is characterized in that, comprising:
S1. the interest point data of initialization system appointment;
S2. grasp associated data from the targeted website to system according to described interest point data;
S3. travel through described targeted website according to preset data collection cycle;
S4. judge whether described targeted website has the targeted website of renewal, and described renewal comprises: emerging webpage, the webpage that changed; If not, return step S2, if get into step S5;
S5. grasp associated data to the described system on the targeted website of described renewal and upgrade, realize that synchronous acquisition and classification show search information.
2. the method for real-time search according to claim 1 is characterized in that, described step S1 also comprises the following steps: between step S2
S11. through sample analysis, extract the structural model storehouse, generate automatically and extract template mass data;
S12. the described interest point data of pre-service calculates and the similarity in described structural model storehouse, judges the structure of associated data.
3. the method for real-time search according to claim 1 is characterized in that, described classification shows that search information comprises: life Taobao, life classification, life merchant street, life circle and life are used,
Described life Taobao second-level directory down comprises: house property information, service for life, friend-making are marriage-seeking, vehicle is bought and sold service, pet/pet supplies, flea market, job hunting resume, recruitment information, business service;
Described life classification second-level directory comprises house property information, flea market, vehicle dealing and service, ticketing service reward voucher, the educational training of different cities;
The second-level directory in described life merchant street comprises: the cuisines of different cities, shopping, beauty, leisure, hotel, body-building, tourism;
Described life circle second-level directory comprises: webpage, picture, video;
Described life is used second-level directory and is comprised: amusement, recreation, instrument;
Wherein, all have three grades of catalogues under the described second-level directory, described three grades of catalogues are concrete program.
4. the method for real-time search according to claim 2 is characterized in that, described extraction template comprises: banner, type of webpage, content type, title, keyword, summary, text, peer link.
5. a real-time searched devices is used to realize the described method of claim 1, it is characterized in that, comprising:
Interest point data setup unit: the interest point data of initialization system appointment;
Associated data placement unit: be used for grasping associated data from the targeted website to system according to described interest point data;
Traversal unit, targeted website: link to each other with described associated data placement unit, be used for traveling through described targeted website according to preset data collection cycle;
Judging unit is upgraded in the targeted website: link to each other with traversal unit, described targeted website, be used to judge whether described targeted website has the targeted website of renewal, and described renewal comprises: emerging webpage, the webpage that changed;
Placement unit is upgraded in the targeted website: upgrade judging unit with described targeted website and link to each other, be used to grasp associated data to the described system on the targeted website of described renewal and upgrade, realize synchronous acquisition classification demonstration search information.
6. real-time searched devices according to claim 5 is characterized in that, also comprises extracting template generation unit and associated data structures identifying unit,
Described extraction template generation unit links to each other with described interest point data setup unit, is used for extracting the structural model storehouse through the sample analysis to mass data, generates automatically and extracts template;
The associated data structures identifying unit links to each other with described extraction template generation unit and associated data placement unit, is used for the described interest point data of pre-service, calculates and the similarity in described structural model storehouse, judges the structure of associated data.
7. real-time system of search; Comprise: the searcher of search website, the controller that is used to control described searcher that links to each other with described searcher, the raw data base that links to each other with described controller; The index that links to each other with described raw data base; The index data base that links to each other with described index, with the searcher that described index data base links to each other, described searcher links to each other with man-machine interaction unit; It is characterized in that, comprise the real-time searched devices described in the claim 5 in the described searcher.
8. the system of real-time search according to claim 7; It is characterized in that; Also comprise user behavior data storehouse and log analyzer; Described user behavior data storehouse links to each other with described man-machine interaction unit, and described log analyzer and described raw data base and index data base and user behavior log database link to each other respectively, are used for the uncertain user inquiring of search content.
9. the system of real-time search according to claim 7 is characterized in that, described man-machine interaction unit comprises keypad/display/touch-screen.
10. the system of real-time search according to claim 7; It is characterized in that; Described real-time search system comprises at least one index server; Described index server comprises at least one burst server, and described index server is realized the search of associated data through described burst server.
CN2012100068607A 2012-01-11 2012-01-11 Real-time search method, device and system Pending CN102446225A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2012100068607A CN102446225A (en) 2012-01-11 2012-01-11 Real-time search method, device and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2012100068607A CN102446225A (en) 2012-01-11 2012-01-11 Real-time search method, device and system

Publications (1)

Publication Number Publication Date
CN102446225A true CN102446225A (en) 2012-05-09

Family

ID=46008721

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2012100068607A Pending CN102446225A (en) 2012-01-11 2012-01-11 Real-time search method, device and system

Country Status (1)

Country Link
CN (1) CN102446225A (en)

Cited By (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102799668A (en) * 2012-07-12 2012-11-28 杜继俊 Recruitment position information processing method and system
CN103092999A (en) * 2013-02-22 2013-05-08 人民搜索网络股份公司 Webpage crawling cycle adjusting method and device
CN103475678A (en) * 2012-06-06 2013-12-25 百度在线网络技术(北京)有限公司 Method and equipment used for providing application data update between distributed equipment
CN103714116A (en) * 2013-10-31 2014-04-09 北京奇虎科技有限公司 Webpage information extracting method and webpage information extracting equipment
CN103902533A (en) * 2012-12-24 2014-07-02 腾讯科技(深圳)有限公司 Fast search method and device
CN104422443A (en) * 2013-09-09 2015-03-18 阿尔派株式会社 Navigation device and information providing method
CN104881501A (en) * 2015-06-19 2015-09-02 四川大学 Automatic Internet information obtaining and pushing method
CN105045684A (en) * 2015-07-16 2015-11-11 北京京东尚科信息技术有限公司 Method and device for switching and controlling indexes
CN105589949A (en) * 2015-12-18 2016-05-18 晶赞广告(上海)有限公司 Distributed type crawler framework capable of customizing responsibility chains and post-processing modules
CN105787074A (en) * 2016-03-01 2016-07-20 深圳市百米生活股份有限公司 Big data system based on combination of offline LBS trajectories and online browsing behaviors of users
TWI570579B (en) * 2015-07-23 2017-02-11 葆光資訊有限公司 An information retrieving method utilizing webpage visual features and webpage language features and a system using thereof
CN106933822A (en) * 2015-12-29 2017-07-07 腾讯科技(深圳)有限公司 A kind of content recommendation method and device
CN106933962A (en) * 2017-02-06 2017-07-07 涂正富 A kind of film micro area network insertion and vertical search precise positioning obtain mesh calibration method
CN107133779A (en) * 2017-05-02 2017-09-05 山东浪潮通软信息科技有限公司 A kind of active method, system and the browser plug-in for collecting resume of multi-domain communication
CN107301253A (en) * 2017-08-23 2017-10-27 杭州安恒信息技术有限公司 A kind of method and device for improving multi-site search key accuracy
CN107679908A (en) * 2017-09-28 2018-02-09 平安科技(深圳)有限公司 Sales force's topic nonproductive poll method, electronic installation and storage medium
WO2018049908A1 (en) * 2016-09-19 2018-03-22 北京京东尚科信息技术有限公司 Web page generation method and device
CN108256067A (en) * 2018-01-16 2018-07-06 平安好房(上海)电子商务有限公司 Calculate method, apparatus, equipment and the storage medium of source of houses similarity
CN108280013A (en) * 2018-02-05 2018-07-13 中国银行股份有限公司 A kind of methods of exhibiting and device of the environmental resource monitoring page
CN108549693A (en) * 2018-04-13 2018-09-18 上海宝尊电子商务有限公司 CMS page generation methods based on crawler technology
US10146588B2 (en) 2014-01-14 2018-12-04 Tencent Technology (Shenzhen) Company Limited Method and apparatus for processing computational task having multiple subflows
CN110083754A (en) * 2019-04-23 2019-08-02 重庆紫光华山智安科技有限公司 The self-adapting data abstracting method of structure change webpage
CN111310069A (en) * 2018-12-11 2020-06-19 阿里巴巴集团控股有限公司 Evaluation method and device for timeliness search
CN115329179A (en) * 2022-10-14 2022-11-11 卡奥斯工业智能研究院(青岛)有限公司 Data acquisition resource amount control method, device, equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100332583A1 (en) * 1999-07-21 2010-12-30 Andrew Szabo Database access system
CN102073726A (en) * 2011-01-11 2011-05-25 百度在线网络技术(北京)有限公司 Search engine system and structured data import method for search engine system
CN102184253A (en) * 2011-05-30 2011-09-14 北京搜狗科技发展有限公司 Method and system used for pushing grabbed and updated messages of network resource

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100332583A1 (en) * 1999-07-21 2010-12-30 Andrew Szabo Database access system
CN102073726A (en) * 2011-01-11 2011-05-25 百度在线网络技术(北京)有限公司 Search engine system and structured data import method for search engine system
CN102184253A (en) * 2011-05-30 2011-09-14 北京搜狗科技发展有限公司 Method and system used for pushing grabbed and updated messages of network resource

Cited By (35)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103475678B (en) * 2012-06-06 2018-03-06 百度在线网络技术(北京)有限公司 One kind is used to provide application data more new method and apparatus between distributed apparatus
CN103475678A (en) * 2012-06-06 2013-12-25 百度在线网络技术(北京)有限公司 Method and equipment used for providing application data update between distributed equipment
CN102799668A (en) * 2012-07-12 2012-11-28 杜继俊 Recruitment position information processing method and system
CN103902533B (en) * 2012-12-24 2018-07-06 腾讯科技(深圳)有限公司 It is a kind of to search for through method and apparatus
CN103902533A (en) * 2012-12-24 2014-07-02 腾讯科技(深圳)有限公司 Fast search method and device
WO2014101689A1 (en) * 2012-12-24 2014-07-03 Tencent Technology (Shenzhen) Company Limited A direct search method and apparatus
CN103092999B (en) * 2013-02-22 2016-06-29 人民搜索网络股份公司 A kind of webpage capture period modulation method and apparatus
CN103092999A (en) * 2013-02-22 2013-05-08 人民搜索网络股份公司 Webpage crawling cycle adjusting method and device
CN104422443A (en) * 2013-09-09 2015-03-18 阿尔派株式会社 Navigation device and information providing method
CN103714116A (en) * 2013-10-31 2014-04-09 北京奇虎科技有限公司 Webpage information extracting method and webpage information extracting equipment
US10146588B2 (en) 2014-01-14 2018-12-04 Tencent Technology (Shenzhen) Company Limited Method and apparatus for processing computational task having multiple subflows
CN104881501A (en) * 2015-06-19 2015-09-02 四川大学 Automatic Internet information obtaining and pushing method
CN105045684A (en) * 2015-07-16 2015-11-11 北京京东尚科信息技术有限公司 Method and device for switching and controlling indexes
CN105045684B (en) * 2015-07-16 2018-06-15 北京京东尚科信息技术有限公司 Index switching and the method and device of index control
TWI570579B (en) * 2015-07-23 2017-02-11 葆光資訊有限公司 An information retrieving method utilizing webpage visual features and webpage language features and a system using thereof
CN105589949A (en) * 2015-12-18 2016-05-18 晶赞广告(上海)有限公司 Distributed type crawler framework capable of customizing responsibility chains and post-processing modules
CN105589949B (en) * 2015-12-18 2020-05-29 晶赞广告(上海)有限公司 Distributed crawler method for customizing responsibility chain and post-processing module
CN106933822A (en) * 2015-12-29 2017-07-07 腾讯科技(深圳)有限公司 A kind of content recommendation method and device
CN105787074A (en) * 2016-03-01 2016-07-20 深圳市百米生活股份有限公司 Big data system based on combination of offline LBS trajectories and online browsing behaviors of users
WO2018049908A1 (en) * 2016-09-19 2018-03-22 北京京东尚科信息技术有限公司 Web page generation method and device
CN106933962A (en) * 2017-02-06 2017-07-07 涂正富 A kind of film micro area network insertion and vertical search precise positioning obtain mesh calibration method
CN107133779A (en) * 2017-05-02 2017-09-05 山东浪潮通软信息科技有限公司 A kind of active method, system and the browser plug-in for collecting resume of multi-domain communication
CN107301253A (en) * 2017-08-23 2017-10-27 杭州安恒信息技术有限公司 A kind of method and device for improving multi-site search key accuracy
WO2019061996A1 (en) * 2017-09-28 2019-04-04 平安科技(深圳)有限公司 Salesperson conversation-topic assisted query method, electronic device, and storage medium
CN107679908A (en) * 2017-09-28 2018-02-09 平安科技(深圳)有限公司 Sales force's topic nonproductive poll method, electronic installation and storage medium
CN107679908B (en) * 2017-09-28 2021-04-09 平安科技(深圳)有限公司 Salesperson topic auxiliary query method, electronic device and storage medium
CN108256067A (en) * 2018-01-16 2018-07-06 平安好房(上海)电子商务有限公司 Calculate method, apparatus, equipment and the storage medium of source of houses similarity
CN108280013A (en) * 2018-02-05 2018-07-13 中国银行股份有限公司 A kind of methods of exhibiting and device of the environmental resource monitoring page
CN108280013B (en) * 2018-02-05 2021-07-23 中国银行股份有限公司 Method and device for displaying environmental resource monitoring page
CN108549693A (en) * 2018-04-13 2018-09-18 上海宝尊电子商务有限公司 CMS page generation methods based on crawler technology
CN111310069A (en) * 2018-12-11 2020-06-19 阿里巴巴集团控股有限公司 Evaluation method and device for timeliness search
CN111310069B (en) * 2018-12-11 2023-09-26 阿里巴巴集团控股有限公司 Evaluation method and device for timeliness search
CN110083754A (en) * 2019-04-23 2019-08-02 重庆紫光华山智安科技有限公司 The self-adapting data abstracting method of structure change webpage
CN115329179A (en) * 2022-10-14 2022-11-11 卡奥斯工业智能研究院(青岛)有限公司 Data acquisition resource amount control method, device, equipment and storage medium
CN115329179B (en) * 2022-10-14 2023-04-28 卡奥斯工业智能研究院(青岛)有限公司 Data acquisition resource amount control method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN102446225A (en) Real-time search method, device and system
US8032510B2 (en) Social aspects of content aggregation, syndication, sharing, and updating
Chianese et al. An associative engines based approach supporting collaborative analytics in the internet of cultural things
Phaneendra et al. Big Data-solutions for RDBMS problems-A survey
CN103914536B (en) A kind of point of interest for electronic map recommends method and system
CN107451861B (en) Method for identifying user internet access characteristics under big data
CN105210061B (en) Tagged search result maintenance
Rogers Mapping and the politics of web space
US8682881B1 (en) System and method for extracting structured data from classified websites
CN108073710B (en) Github open source code library recommendation system based on dynamic network graph mining
CN106126646B (en) Establish the method and device of the inverted index of Internet of Things smart machine
EP2159716A1 (en) System and method for interfacing a web browser widget with social indexing
CN107784059A (en) For searching for and selecting the method and system and machine-readable medium of image
CN110110221A (en) Government data intelligent recommendation method and system
CN105447186A (en) Big data platform based user behavior analysis system
CN102880624A (en) Website navigation tool system
CN107463591A (en) The method and system with the image dynamic order of content matching is treated in response to search inquiry
CN102402539A (en) Design technology for object-level personalized vertical search engine
CN103995905A (en) Electronic commerce content multi-dimensional classification, navigation and skipping method
CN106033428B (en) The selection method of uniform resource locator and the selection device of uniform resource locator
CN107491465A (en) For searching for the method and apparatus and data handling system of content
Dias et al. Automating the extraction of static content and dynamic behaviour from e-commerce websites
CN102622402B (en) Server, method and system for providing information search service by using sheaf of pages
CN103365868A (en) Data processing method and data processing system
CN107766398A (en) For the method, apparatus and data handling system for image is matched with content item

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
EE01 Entry into force of recordation of patent licensing contract

Assignee: Shenzhen city foreign style Technology Co. Ltd.

Assignor: Shenzhen Aigu Technology Co.,Ltd.

Contract record no.: 2012440020127

Denomination of invention: Real-time search method, device and system

License type: Exclusive License

Open date: 20120509

Record date: 20120528

C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20120509