CN101178736A - Web page collecting method and web page collecting server - Google Patents
Web page collecting method and web page collecting server Download PDFInfo
- Publication number
- CN101178736A CN101178736A CNA2007101985301A CN200710198530A CN101178736A CN 101178736 A CN101178736 A CN 101178736A CN A2007101985301 A CNA2007101985301 A CN A2007101985301A CN 200710198530 A CN200710198530 A CN 200710198530A CN 101178736 A CN101178736 A CN 101178736A
- Authority
- CN
- China
- Prior art keywords
- webpage
- buffer area
- web page
- extracting
- grasp
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Abstract
The invention discloses a web page snatching method and web page snatching server. The method comprises: A. the method receives web page request; B. the method estimates whether the requested web page is snatched, executes step C if yes, otherwise, snatches the web page and ends the flow; C. the invention estimates whether the snatching time of the requested web page is bigger than the presetting time threshold value and executes step D if yes,. Otherwise, does not snatch the web page and end the flow; D. the invention searches whether the web page is updated and snatches the web page if yes, otherwise does not snatch the web page. The server comprises: a web page request receiving module, an estimation module, a searching module and a snatching module. The invention can lighten the burden for the web page snatching server, reduces the occupation to the network band width material and elevates the efficiency of the web page snatching.
Description
Technical field
The present invention relates to technical field of information processing, the webpage grasping means and the webpage that relate in particular in a kind of wireless search Web page transition system grasp server.
Background technology
Along with development of internet technology, wireless interconnected network technology is also developing by leaps and bounds, people can get in touch with other people whenever and wherever possible by mobile communication terminal (for example mobile phone, wireless Palm Pilots etc.), simultaneously along with the reduction of post and telecommunication tariff and the popularization of 3G technology, wireless Internet will have great development, and change our life style.
Having maximum resources on the internet at present is webpage, but these webpages are the HTML (Hypertext Markup Language) (HTML that aim at personal computer (PC) design, HyperText Markup Language) form, because the restriction of mobile communication terminal screen size, processing power and the network bandwidth, these webpages can't directly be browsed on mobile communication terminal, at this situation, designed a kind of WAP Markup Language (WML at present, Wireless Markup Language) SGML of form is used to write the webpage that can show on mobile communication terminal.
Also there is the demand of search information in the user of wireless Internet, therefore, need provide a search engine that is similar on the PC to help user search information, because html web page quantity is far longer than WML webpage quantity at present, the major part as a result of user search is to be present in the html web page, therefore occurred a kind of wireless search Web page transition system at present, can automatically html web page have been converted to the WML webpage, directly on mobile communication terminal, browsed for wireless interconnected network users.
The wireless search Web page transition system comprises that webpage grasps server, change server and storage server.Its basic process is that webpage grasps the request that server obtains mobile communication terminal user earlier, isolate original html web page address, to grasp this html web page automatically afterwards, giving change server resolves, convert the WML webpage to, and with described WML web storage in storage server, search for mobile communication terminal visit.
Grasp server for webpage and how to grasp html web page, existing technical scheme is as follows:
Utilize STL (STL, Standard Template Library) the Map data structure in is as buffer memory, this buffer memory is used to store the URL object, the key word of a URL object is the message digest algorithm (MD5 of webpage URL, Messsage-Digest algorithm 5) value, value is the extracting time of webpage.Simultaneously, the unified time threshold of setting the extracting time interval of webpage for example is set to 10 minutes usually.
Mobile communication terminal searches corresponding webpage by wireless search engine, after the user clicks Search Results, mobile communication terminal sends to the wireless search Web page transition system with the web-page requests of correspondence, after the wireless search Web page transition system is received web-page requests, isolate the URL address of institute's requested webpage, and calculate the MD5 value of this URL address, with this MD5 value is key word, current time is value, in the buffer memory of webpage extracting server, search, if there is the URL object of same keyword, then inquire about the extracting time of this URL object, and compare with the current time, if both differences are then rewritten this URL object in the buffer memory more than or equal to the time threshold of described setting, the value that is about to this URL object is updated to the current time, and grasp the webpage of this URL object again, and convert the WML webpage to by change server and deposit storage server in; If both differences are less than the time threshold of described setting, then expression need not to grasp again this webpage, webpage grasps server can directly abandon described web-page requests, is returned the WML webpage of this URL object correspondence of present storage to the mobile communication terminal of initiating request by described storage server.
There is following shortcoming in above-mentioned prior art:
Prior art is only set a webpage to all types of webpages and is grasped the interlude threshold value, more new situation that can not the dissimilar webpages of flexible adaptation, suppose that if time threshold is set be 10 minutes, upgrade frequent webpage for some so, for example forum, the comment and so on webpage, 10 minutes the extracting time interval is long; Otherwise for the low-down type of webpage of those renewal frequencies, for example news web page just can not upgrade after the issue probably, but system can't adapt to this situation at present, still will remove to grasp again one time webpage every 10 minutes.After the extracting of webpage surpassed the time threshold of setting at interval, promptly this webpage was crossed after date from buffer memory, can not represent this web page contents to upgrade, and need to grasp again, yet in fact, the update cycle of most of webpage was all long on the internet.
Therefore, the wireless search Web page transition system of prior art can't adapt to long this situation of page refreshment cycle, cause and repeated to grasp the webpage that does not much carry out content update, increase the weight of webpage and grasped load of server, taken too much network bandwidth resources, and the efficient of extracting webpage is lower.
Summary of the invention
In view of this, technical matters to be solved by this invention is to provide a kind of webpage grasping means, grasps load of server to alleviate webpage, and minimizing takies network bandwidth resources, improves the efficient that webpage grasps.
Another technical matters to be solved by this invention is to provide a kind of webpage to grasp server, to alleviate the burden of self system, reduces the taking of network bandwidth resources, and improves the efficient that webpage grasps.
In order to realize the foregoing invention purpose, main technical schemes of the present invention is:
A kind of webpage grasping means comprises:
A, reception web-page requests;
B, judge whether institute's requested webpage grasped, if, execution in step C then; Otherwise, grasp this webpage, process ends;
C, judge institute's requested webpage extracting at interval whether greater than default time threshold, if, execution in step D then; Otherwise, do not grasp this webpage, process ends;
Whether D, the described webpage of inquiry have renewal, if renewal is arranged, then grasp this webpage; Otherwise, do not grasp this webpage.
Preferably, this method sets in advance buffer area and this buffer area time corresponding threshold value;
And, when grasping webpage for the first time, depositing buffer area at object of this Web page create, this object comprises the sign and the request time of this webpage, and further upgrades original time with the current time in subsequent step D;
In step B, whether in buffer memory, exist according to the banner of being asked and to judge whether this webpage grasped; Among the step C, described extracting is spaced apart the difference of included time of web object described in current time and the buffer area, and described time threshold is this buffer area time corresponding threshold value.
Preferably, the difference that grasps frequency according to webpage is provided with the buffer area of different stage, the extracting interlude threshold value that buffer area wherein not at the same level is corresponding different; And in buffer area not at the same level, move according to the extracting frequency of webpage object with webpage.
Preferably, the buffer area correspondence of described each grade is provided with one and grasps the frequency level value, and the extracting number of times further is set in the object of described each webpage, and the initial value of this extracting number of times is 0;
In step D, further comprise: if webpage has renewal, then the extracting number of times in this web object is added 1, if webpage does not upgrade, then the extracting number of times in this web object subtracts 1; And the extracting frequency level value of buffer area under the extracting number of times of this web object and this web object relatively, if grasp number of times greater than described extracting frequency level value, then this web object is moved to the shorter upper level buffer area of time threshold, if grasp number of times less than described extracting frequency level value, then this web object moved to the longer next stage buffer area of time threshold.
Preferably, whether the described webpage of the described inquiry of step D has renewal to be specially: whether the described webpage of return code inquiry judging according to HTML (Hypertext Markup Language) has renewal.
Preferably, described webpage is the HTML (Hypertext Markup Language) webpage.
A kind of webpage grasps server, comprising:
The web-page requests receiver module is used to receive web-page requests;
Judge module is used to judge whether institute's requested webpage grasped and grasped at interval, is not grasping out-of-dately, triggers and grasps module, triggers enquiry module when grasping at interval greater than default time interval;
Enquiry module, whether have renewal, having when renewal to trigger the extracting module if being used to inquire about described webpage;
Grasp module, be used to grasp webpage.
Preferably, further comprise buffer area, be used to store the object that grasps webpage, and this buffer area has the time corresponding threshold value; Described judge module judges according to the web object in the described buffer area whether webpage grasped and grasped at interval, and the described time threshold that is used for comparison is this buffer area time corresponding threshold value.
Preferably, described buffer area has two-stage at least, and every grade of corresponding different webpage of buffer area grasps frequency and grasps the interlude threshold value;
And described webpage grasps server and further comprises the object migration module, is used for moving at buffer area not at the same level according to the extracting frequency of the webpage object with webpage.
Preferably, described webpage is the HTML (Hypertext Markup Language) webpage.
Owing to the invention enables webpage to grasp server does not need to grasp again the user in the certain hour threshold value requested webpage, but directly return the event memory of storage server, and after the extracting of webpage is at interval greater than default time threshold, further judge whether webpage has renewal, if renewal is arranged then grasp webpage, otherwise do not grasp webpage.Therefore can avoid repeating to grasp the webpage that does not much carry out content update, alleviate webpage and grasp load of server, minimizing takies network bandwidth resources, improves the efficient that grasps webpage.
In addition, the present invention also further utilizes the mechanism of hierarchical cache to improve webpage and grasps efficient, the buffer area of different stage is set according to the difference of webpage extracting frequency, correspond respectively to the webpage of different update frequency, and in buffer area not at the same level, move according to the extracting frequency of webpage object with webpage.Make the renewal frequency of this URL object level off to real web page contents renewal frequency, improve the accuracy of buffer area.
Description of drawings
Fig. 1 is the process flow diagram of a kind of embodiment of webpage grasping means of the present invention;
Fig. 2 is that webpage of the present invention grasps a kind of structure of server and the synoptic diagram that concerns with the external world.
Embodiment
Below by specific embodiments and the drawings the present invention is described in further details.
Webpage grasping means of the present invention is applicable to that the webpage in the wireless search Web page transition system grasps server, this webpage grasps a kind of caching mechanism of server by utilizing and guarantee not repeat to grasp same html web page in the certain hour scope, simultaneously, after the preset time threshold value arrives, detect this html web page content according to the HTTP header information and whether more newly arrived and judge whether that needs grasp html web page again.When needs grasp html web page, this webpage extracting server grasps this html web page from the server at this html web page place, and the html web page that grasps sent to change server in the wireless search Web page transition system, convert the WML webpage to by change server, and be deposited in the storage server in the wireless search Web page transition system, obtain for the mobile communication terminal user visit.
Fig. 1 is the process flow diagram of a kind of embodiment of webpage grasping means of the present invention.Among this embodiment, when initial, grasp three buffer areas of initialization in the server at webpage, be used to store the URL object, the corresponding html web page of a described URL object, this URL object is a key word with the MD5 value of webpage URL address, and comprises the request time of html web page and the actual extracting number of times update of html web page, and update is a shaping numerical value.Each buffer area is inner realizes that data structure is the Map of STL.Described three buffer areas grasp according to webpage and are divided into three ranks at interval, the webpage that three buffer area correspondences are set grasps the interlude threshold value, for example first buffer area is set to 5 minutes in the present embodiment, second buffer area is set to 10 minutes, the 3rd buffer area is set to 20 minutes, simultaneously each buffer area also is provided with the updateLevel value of a correspondence respectively, represent the extracting frequency of the corresponding html web page of URL object in this rank buffer area, grade is frequently spent in the renewal that also is equivalent to html web page.Size, time threshold parameter and the updateLevel value of each buffer area are stored in the configuration file, webpage grasps server and read this configuration file when starting, and can grasp the server admin thread by webpage simultaneously and dynamically upgrade in webpage extracting server operational process.
Referring to Fig. 1, in this embodiment, webpage grasps server and specifically carries out following steps:
The source of described web-page requests is a mobile communication terminal, mobile communication terminal searches corresponding webpage by wireless search engine, after the user clicks Search Results, mobile communication terminal sends to the wireless search Web page transition system with the web-page requests of correspondence, the webpage of wireless search Web page transition system grasps server and storage server can receive this web-page requests, webpage grasps the follow-up grasping manipulation that server can carry out according to this web-page requests, and described storage server can be according to the corresponding WML webpage of web-page requests inquiry.
When webpage extracting server is asked some URL for the first time, if successfully grasp webpage, the return state of this webpage place server can be 200, and content is a web data, the time that has this webpage of attribute-bit of a Last-Modified to be modified at last on the website simultaneously, form is similar:
Last-Modified:Wed,17?Oct?2007?12:45:30?GMT。
If step 104 finds corresponding URL object in certain buffer area, the value of then taking out this URL object deducts time value in this URL object with the current time value, if difference in this buffer area time corresponding threshold value, then execution in step 105; Otherwise execution in step 106.
Described step 106 judges whether the html web page content has reality detailed process more to comprise:
Step 61, webpage extracting server serve as the zero-time (If-Modified-Since) whether inquiry is upgraded with the request time of URL object in the described buffer area, send the request whether the inquiry html web page upgrades to html web page place server, comprising the URL address in the described web-page requests; Certain described zero-time also can be the time that the webpage that comprises in the last return state 200 is modified at last.For example:
If-Modified-Since:Wed,17?Oct?2007?12:45:30?GMT。
Whether the html web page of step 62, this described URL of place server lookup address, URL address correspondence has renewal, and carries the result who whether upgrades in HTTP 304 return codes that return to webpage extracting server;
Whether step 63, webpage extracting server have actual renewal according to the described html web page content of HTTP 304 return code inquiry judging, if 304 return codes are empty, then the corresponding html web page of expression was not modified, otherwise represented to be modified.
Then, whether the present invention determines some webpages to stay in working as the prime buffer area according to certain cache policy or adjusts its buffer memory rank.Step 107 specific as follows is to step 109.
The request time that step 108, elder generation upgrade URL object described in the described buffer area is the current time, and the update value in this URL object is added 1, needs to start network simultaneously and connects the actual content that grasps this webpage again.
In certain grade of buffer area, for example this is in the buffer area of the second level, during the updateLevel value of the update value of certain URL object>this grade buffer area, it is moved on in the upper level buffer area of time threshold shorter (promptly upgrading more frequent), for example herein for moving to first order buffer area, after moving, the update value zero clearing of this URL object.But the situation that belongs to first order buffer area for described URL object is not then done mobile the processing.
In certain grade of buffer area, for example still be in the buffer area of the second level herein, this grade buffer area updateLevel value of the update value of certain URL object<negative the time, it is moved on in the next stage buffer area of time threshold longer (promptly upgrading more not frequent), for example herein for moving in the third level buffer area, after moving, the update value zero clearing of this URL object.But the situation that belongs to the afterbody buffer area for described URL object is not done mobile the processing.
By step 109, can utilize the update value that the URL object is dynamically moved in buffer area not at the same level, make the renewal frequency of this URL object level off to real web page contents renewal frequency, improve the accuracy of buffer area.
Fig. 2 is that webpage of the present invention grasps a kind of structure of server and the synoptic diagram that concerns with the external world.Referring to Fig. 2, described webpage grasps server 200 and comprises:
Web-page requests receiver module 201 is used to receive web-page requests.
Judge module 202, be used to judge whether institute's requested webpage grasped and grasped at interval, if do not grasp then trigger and grasp module 204, if grasped, and trigger enquiry module 203 when grasping at interval greater than default time interval, if grasped, and grasp and to be less than or equal to the default time interval at interval and then to ignore this web-page requests.
Enquiry module 203, whether have renewal, there being when renewal to trigger extracting module 204, abandon this web-page requests when not upgrading if being used to inquire about described webpage.Described concrete querying method is referring to step 106, and promptly the request time with URL object in the described buffer area is an initial value, utilizes 304 return codes in the http protocol to judge whether the html web page content of described URL has actual renewal.
Grasp module 204, be used for grasping the html web page of this URL correspondence, the html web page that grabs is issued change server 300 from the server 500 at described html web page place.Afterwards, change server 300 is converted to the WML webpage with html web page, and the WML webpage deposited in the storage server 400 for the mobile communication terminal user visit obtains the WML webpage.
Described webpage grasps in the server and is provided with buffer area 205, is used to store the object that grasps webpage, and the URL object is a key word with the MD5 value of webpage URL address, and comprises the request time of html web page and the actual extracting number of times update of html web page.And the extracting interlude threshold value of buffer area correspondence is set, described judge module 202 is the URL object that whether has this MD5 sign indicating number correspondence in the described buffer area of keyword query according to the MD5 sign indicating number of the URL address of institute's requested webpage, if exist then the corresponding crawled mistake of html web page of explanation, otherwise do not have crawledly, need to grasp this html web page.And, judge module 202 compares with this buffer area time corresponding threshold value as grasping at interval with the time value that the current time value deducts in this URL object, if grasp at interval greater than this time threshold then need trigger enquiry module 203, grasp module 204 and carry out webpage and grasp otherwise directly trigger.
As described in above-mentioned method, described buffer area can have two-stage at least, for example is three grades of buffer areas among the figure, and every grade of corresponding different webpage of buffer area grasps frequency and grasps the interlude threshold value; And described webpage grasps server and further comprises object migration module 206, be used for moving at buffer area not at the same level according to the extracting frequency of webpage object with webpage, concrete migration pattern referring to above-mentioned steps 107 to step 109.
The above; only for the preferable embodiment of the present invention, but protection scope of the present invention is not limited thereto, and anyly is familiar with the people of this technology in the disclosed technical scope of the present invention; the variation that can expect easily or replacement all should be encompassed within protection scope of the present invention.
Claims (10)
1. a webpage grasping means is characterized in that, comprising:
A, reception web-page requests;
B, judge whether institute's requested webpage grasped, if, execution in step C then; Otherwise, grasp this webpage, process ends;
C, judge institute's requested webpage extracting at interval whether greater than default time threshold, if, execution in step D then; Otherwise, do not grasp this webpage, process ends;
Whether D, the described webpage of inquiry have renewal, if renewal is arranged, then grasp this webpage; Otherwise, do not grasp this webpage.
2. webpage grasping means according to claim 1 is characterized in that, this method sets in advance buffer area and this buffer area time corresponding threshold value;
And, when grasping webpage for the first time, depositing buffer area at object of this Web page create, this object comprises the sign and the request time of this webpage, and further upgrades original time with the current time in subsequent step D;
In step B, whether in buffer memory, exist according to the banner of being asked and to judge whether this webpage grasped; Among the step C, described extracting is spaced apart the difference of included time of web object described in current time and the buffer area, and described time threshold is this buffer area time corresponding threshold value.
3. webpage grasping means according to claim 2 is characterized in that, the difference that grasps frequency according to webpage is provided with the buffer area of different stage, the extracting interlude threshold value that buffer area wherein not at the same level is corresponding different; And in buffer area not at the same level, move according to the extracting frequency of webpage object with webpage.
4. webpage grasping means according to claim 3 is characterized in that,
The buffer area correspondence of described each grade is provided with one and grasps the frequency level value, and the extracting number of times further is set in the object of described each webpage, and the initial value of this extracting number of times is 0;
In step D, further comprise: if webpage has renewal, then the extracting number of times in this web object is added 1, if webpage does not upgrade, then the extracting number of times in this web object subtracts 1; And the extracting frequency level value of buffer area under the extracting number of times of this web object and this web object relatively, if grasp number of times greater than described extracting frequency level value, then this web object is moved to the shorter upper level buffer area of time threshold, if grasp number of times less than described extracting frequency level value, then this web object moved to the longer next stage buffer area of time threshold.
5. webpage grasping means according to claim 1 is characterized in that, whether the described webpage of the described inquiry of step D has to upgrade is specially: whether the described webpage of return code inquiry judging according to HTML (Hypertext Markup Language) has renewal.
6. according to each described webpage grasping means of claim 1 to 5, it is characterized in that described webpage is the HTML (Hypertext Markup Language) webpage.
7. a webpage grasps server, it is characterized in that, comprising:
The web-page requests receiver module is used to receive web-page requests;
Judge module is used to judge whether institute's requested webpage grasped and grasped at interval, is not grasping out-of-dately, triggers and grasps module, triggers enquiry module when grasping at interval greater than default time interval;
Enquiry module, whether have renewal, having when renewal to trigger the extracting module if being used to inquire about described webpage;
Grasp module, be used to grasp webpage.
8. webpage according to claim 7 grasps server, it is characterized in that, further comprises buffer area, be used to store the object that grasps webpage, and this buffer area has the time corresponding threshold value; Described judge module judges according to the web object in the described buffer area whether webpage grasped and grasped at interval, and the described time threshold that is used for comparison is this buffer area time corresponding threshold value.
9. webpage according to claim 8 grasps server, it is characterized in that described buffer area has two-stage at least, and every grade of corresponding different webpage of buffer area grasps frequency and grasps the interlude threshold value;
And described webpage grasps server and further comprises the object migration module, is used for moving at buffer area not at the same level according to the extracting frequency of the webpage object with webpage.
10. grasp server according to each described webpage of claim 7 to 9, it is characterized in that described webpage is the HTML (Hypertext Markup Language) webpage.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNB2007101985301A CN100501746C (en) | 2007-12-11 | 2007-12-11 | Web page collecting method and web page collecting server |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNB2007101985301A CN100501746C (en) | 2007-12-11 | 2007-12-11 | Web page collecting method and web page collecting server |
Publications (2)
Publication Number | Publication Date |
---|---|
CN101178736A true CN101178736A (en) | 2008-05-14 |
CN100501746C CN100501746C (en) | 2009-06-17 |
Family
ID=39404989
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CNB2007101985301A Active CN100501746C (en) | 2007-12-11 | 2007-12-11 | Web page collecting method and web page collecting server |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN100501746C (en) |
Cited By (39)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101303700B (en) * | 2008-06-13 | 2010-04-21 | 成都市华为赛门铁克科技有限公司 | Method and system for collecting web page |
CN101826074A (en) * | 2009-03-04 | 2010-09-08 | 上海众恒信息产业股份有限公司 | Data exchange method for isolated system and data exchange device |
CN101902438A (en) * | 2009-05-25 | 2010-12-01 | 北京启明星辰信息技术股份有限公司 | Method and device for automatically identifying web crawlers |
CN101917479A (en) * | 2010-08-20 | 2010-12-15 | 北京新岸线网络技术有限公司 | Method and device for improving grouped data service in mobile network |
WO2010149024A1 (en) * | 2009-06-23 | 2010-12-29 | 北京搜狗科技发展有限公司 | Update notification method and browser |
CN101984634A (en) * | 2010-11-22 | 2011-03-09 | 北京酷我科技有限公司 | Server-side automatic steering method and system adapting to resource synchronous mechanism |
CN101986659A (en) * | 2010-10-27 | 2011-03-16 | 青岛普加智能信息有限公司 | Real-time data transmission method and system |
CN101459571B (en) * | 2008-12-16 | 2011-04-06 | 北京大学 | Method, system and apparatus for website mirroring |
CN102184253A (en) * | 2011-05-30 | 2011-09-14 | 北京搜狗科技发展有限公司 | Method and system used for pushing grabbed and updated messages of network resource |
CN102196506A (en) * | 2010-03-15 | 2011-09-21 | 华为技术有限公司 | Network resource access control method, system and device |
CN102253941A (en) * | 2010-05-21 | 2011-11-23 | 卓望数码技术(深圳)有限公司 | Cache updating method and cache updating device |
CN102347930A (en) * | 2010-07-26 | 2012-02-08 | 中国电信股份有限公司 | Method and system for obtaining webpage content |
CN102364461A (en) * | 2011-06-30 | 2012-02-29 | 广州市动景计算机科技有限公司 | Page content data acquisition method and server |
CN102594787A (en) * | 2011-01-14 | 2012-07-18 | 腾讯科技(深圳)有限公司 | Data grab method, system and routing server |
CN102609416A (en) * | 2011-01-21 | 2012-07-25 | 富泰华工业(深圳)有限公司 | Webpage information storage control and method |
CN102609481A (en) * | 2012-01-20 | 2012-07-25 | 苏州简拔林网络科技有限公司 | Method for updating and gathering comment information in real time |
CN102831252A (en) * | 2012-09-21 | 2012-12-19 | 北京奇虎科技有限公司 | Method and device for updating index database and search method and system |
CN102915363A (en) * | 2012-10-18 | 2013-02-06 | 北京奇虎科技有限公司 | Website storing method and system |
CN102129441B (en) * | 2010-01-14 | 2013-02-27 | 深圳市深信服电子科技有限公司 | Web page information identifying and processing method and device |
CN102982161A (en) * | 2012-12-05 | 2013-03-20 | 北京奇虎科技有限公司 | Method and device for acquiring webpage information |
CN103020313A (en) * | 2013-01-08 | 2013-04-03 | 北京航空航天大学 | Capturing method based on detection of webpage refreshing period |
CN103218452A (en) * | 2013-04-27 | 2013-07-24 | 人民搜索网络股份公司 | Method and device for recognizing valid interlinkage in Hub webpage |
WO2013135003A1 (en) * | 2012-03-15 | 2013-09-19 | 中兴通讯股份有限公司 | Embedded network proxy system, terminal device and proxy method |
CN103399933A (en) * | 2013-08-08 | 2013-11-20 | 人民搜索网络股份公司 | Method and system for grabbing webpage contents of network print media |
CN103905441A (en) * | 2014-03-28 | 2014-07-02 | 广州华多网络科技有限公司 | Data acquisition method and device |
CN104252530A (en) * | 2014-09-10 | 2014-12-31 | 北京京东尚科信息技术有限公司 | Single-computer crawler grabbing method and system |
CN104462493A (en) * | 2014-12-18 | 2015-03-25 | 北京奇虎科技有限公司 | Method and device for grabbing question and answer webpages |
CN104462492A (en) * | 2014-12-18 | 2015-03-25 | 北京奇虎科技有限公司 | Method and device for grabbing question and answer webpages |
CN104967698A (en) * | 2015-02-13 | 2015-10-07 | 腾讯科技(深圳)有限公司 | Network data crawling method and apparatus |
CN106055638A (en) * | 2016-05-30 | 2016-10-26 | 国家基础地理信息中心 | Network geographic information updating method and network geographic information updating system |
CN102609416B (en) * | 2011-01-21 | 2016-12-14 | 富泰华工业(深圳)有限公司 | Webpage information storage control and method |
CN106371830A (en) * | 2016-08-25 | 2017-02-01 | 北京量科邦信息技术有限公司 | Interactive method for realizing close and back control of native APP and WEB pages |
CN106547773A (en) * | 2015-09-21 | 2017-03-29 | 北京国双科技有限公司 | The method and device of adjustment event opening speed |
CN106557484A (en) * | 2015-09-25 | 2017-04-05 | 北京国双科技有限公司 | The update method and device of webpage thermodynamic Background |
CN106897127A (en) * | 2015-12-21 | 2017-06-27 | 北京奇虎科技有限公司 | A kind of method and server for picture capture treatment |
CN106897126A (en) * | 2015-12-21 | 2017-06-27 | 北京奇虎科技有限公司 | A kind of picture grasping means and server |
CN107102997A (en) * | 2016-02-22 | 2017-08-29 | 北京国双科技有限公司 | data crawling method and device |
CN108600342A (en) * | 2018-03-30 | 2018-09-28 | 连尚(新昌)网络科技有限公司 | A kind of message display method, equipment and storage medium |
CN110020065A (en) * | 2017-07-19 | 2019-07-16 | 阿里巴巴集团控股有限公司 | A kind of website identification method and device |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102118361B (en) * | 2009-12-31 | 2014-07-23 | 北京金山软件有限公司 | Method and device for controlling data transmission based on network protocol |
-
2007
- 2007-12-11 CN CNB2007101985301A patent/CN100501746C/en active Active
Cited By (55)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101303700B (en) * | 2008-06-13 | 2010-04-21 | 成都市华为赛门铁克科技有限公司 | Method and system for collecting web page |
CN101459571B (en) * | 2008-12-16 | 2011-04-06 | 北京大学 | Method, system and apparatus for website mirroring |
CN101826074A (en) * | 2009-03-04 | 2010-09-08 | 上海众恒信息产业股份有限公司 | Data exchange method for isolated system and data exchange device |
CN101902438A (en) * | 2009-05-25 | 2010-12-01 | 北京启明星辰信息技术股份有限公司 | Method and device for automatically identifying web crawlers |
CN101902438B (en) * | 2009-05-25 | 2013-05-15 | 北京启明星辰信息技术股份有限公司 | Method and device for automatically identifying web crawlers |
WO2010149024A1 (en) * | 2009-06-23 | 2010-12-29 | 北京搜狗科技发展有限公司 | Update notification method and browser |
CN102129441B (en) * | 2010-01-14 | 2013-02-27 | 深圳市深信服电子科技有限公司 | Web page information identifying and processing method and device |
CN102196506A (en) * | 2010-03-15 | 2011-09-21 | 华为技术有限公司 | Network resource access control method, system and device |
CN102196506B (en) * | 2010-03-15 | 2013-12-04 | 华为技术有限公司 | Network resource access control method, system and device |
CN102253941A (en) * | 2010-05-21 | 2011-11-23 | 卓望数码技术(深圳)有限公司 | Cache updating method and cache updating device |
CN102347930B (en) * | 2010-07-26 | 2015-09-09 | 中国电信股份有限公司 | Web page contents acquisition methods and system |
CN102347930A (en) * | 2010-07-26 | 2012-02-08 | 中国电信股份有限公司 | Method and system for obtaining webpage content |
CN101917479A (en) * | 2010-08-20 | 2010-12-15 | 北京新岸线网络技术有限公司 | Method and device for improving grouped data service in mobile network |
CN101986659A (en) * | 2010-10-27 | 2011-03-16 | 青岛普加智能信息有限公司 | Real-time data transmission method and system |
CN101984634B (en) * | 2010-11-22 | 2013-06-26 | 北京酷我科技有限公司 | Server-side automatic steering method and system adapting to resource synchronous mechanism |
CN101984634A (en) * | 2010-11-22 | 2011-03-09 | 北京酷我科技有限公司 | Server-side automatic steering method and system adapting to resource synchronous mechanism |
CN102594787A (en) * | 2011-01-14 | 2012-07-18 | 腾讯科技(深圳)有限公司 | Data grab method, system and routing server |
CN102594787B (en) * | 2011-01-14 | 2016-01-20 | 腾讯科技(深圳)有限公司 | Data grab method, system and routing server |
CN102609416A (en) * | 2011-01-21 | 2012-07-25 | 富泰华工业(深圳)有限公司 | Webpage information storage control and method |
CN102609416B (en) * | 2011-01-21 | 2016-12-14 | 富泰华工业(深圳)有限公司 | Webpage information storage control and method |
CN102184253A (en) * | 2011-05-30 | 2011-09-14 | 北京搜狗科技发展有限公司 | Method and system used for pushing grabbed and updated messages of network resource |
CN102364461A (en) * | 2011-06-30 | 2012-02-29 | 广州市动景计算机科技有限公司 | Page content data acquisition method and server |
CN106599239A (en) * | 2011-06-30 | 2017-04-26 | 广州市动景计算机科技有限公司 | Webpage content data acquisition method and server |
CN102609481A (en) * | 2012-01-20 | 2012-07-25 | 苏州简拔林网络科技有限公司 | Method for updating and gathering comment information in real time |
WO2013135003A1 (en) * | 2012-03-15 | 2013-09-19 | 中兴通讯股份有限公司 | Embedded network proxy system, terminal device and proxy method |
CN102831252B (en) * | 2012-09-21 | 2015-11-25 | 北京奇虎科技有限公司 | A kind of method for upgrading index data base and device, searching method and system |
CN102831252A (en) * | 2012-09-21 | 2012-12-19 | 北京奇虎科技有限公司 | Method and device for updating index database and search method and system |
CN102915363A (en) * | 2012-10-18 | 2013-02-06 | 北京奇虎科技有限公司 | Website storing method and system |
CN102982161A (en) * | 2012-12-05 | 2013-03-20 | 北京奇虎科技有限公司 | Method and device for acquiring webpage information |
CN103020313A (en) * | 2013-01-08 | 2013-04-03 | 北京航空航天大学 | Capturing method based on detection of webpage refreshing period |
CN103218452A (en) * | 2013-04-27 | 2013-07-24 | 人民搜索网络股份公司 | Method and device for recognizing valid interlinkage in Hub webpage |
CN103218452B (en) * | 2013-04-27 | 2016-08-10 | 人民搜索网络股份公司 | A kind of method and apparatus identifying effectively link in Hub page |
CN103399933A (en) * | 2013-08-08 | 2013-11-20 | 人民搜索网络股份公司 | Method and system for grabbing webpage contents of network print media |
CN103399933B (en) * | 2013-08-08 | 2017-01-18 | 人民搜索网络股份公司 | Method and system for grabbing webpage contents of network print media |
CN103905441B (en) * | 2014-03-28 | 2017-08-25 | 广州华多网络科技有限公司 | Data capture method and device |
CN103905441A (en) * | 2014-03-28 | 2014-07-02 | 广州华多网络科技有限公司 | Data acquisition method and device |
CN104252530B (en) * | 2014-09-10 | 2017-09-15 | 北京京东尚科信息技术有限公司 | A kind of unit crawler capturing method and system |
CN104252530A (en) * | 2014-09-10 | 2014-12-31 | 北京京东尚科信息技术有限公司 | Single-computer crawler grabbing method and system |
CN104462492B (en) * | 2014-12-18 | 2018-01-16 | 北京奇虎科技有限公司 | The method and apparatus for capturing question and answer class webpage |
CN104462493A (en) * | 2014-12-18 | 2015-03-25 | 北京奇虎科技有限公司 | Method and device for grabbing question and answer webpages |
CN104462493B (en) * | 2014-12-18 | 2018-08-03 | 北京奇虎科技有限公司 | The method and apparatus for capturing question and answer class webpage |
CN104462492A (en) * | 2014-12-18 | 2015-03-25 | 北京奇虎科技有限公司 | Method and device for grabbing question and answer webpages |
CN104967698B (en) * | 2015-02-13 | 2018-11-23 | 腾讯科技(深圳)有限公司 | A kind of method and apparatus crawling network data |
CN104967698A (en) * | 2015-02-13 | 2015-10-07 | 腾讯科技(深圳)有限公司 | Network data crawling method and apparatus |
CN106547773A (en) * | 2015-09-21 | 2017-03-29 | 北京国双科技有限公司 | The method and device of adjustment event opening speed |
CN106557484A (en) * | 2015-09-25 | 2017-04-05 | 北京国双科技有限公司 | The update method and device of webpage thermodynamic Background |
CN106897127A (en) * | 2015-12-21 | 2017-06-27 | 北京奇虎科技有限公司 | A kind of method and server for picture capture treatment |
CN106897126A (en) * | 2015-12-21 | 2017-06-27 | 北京奇虎科技有限公司 | A kind of picture grasping means and server |
CN107102997A (en) * | 2016-02-22 | 2017-08-29 | 北京国双科技有限公司 | data crawling method and device |
CN106055638A (en) * | 2016-05-30 | 2016-10-26 | 国家基础地理信息中心 | Network geographic information updating method and network geographic information updating system |
CN106371830A (en) * | 2016-08-25 | 2017-02-01 | 北京量科邦信息技术有限公司 | Interactive method for realizing close and back control of native APP and WEB pages |
CN110020065A (en) * | 2017-07-19 | 2019-07-16 | 阿里巴巴集团控股有限公司 | A kind of website identification method and device |
CN110020065B (en) * | 2017-07-19 | 2023-04-25 | 阿里巴巴集团控股有限公司 | Website identification method and device |
CN108600342A (en) * | 2018-03-30 | 2018-09-28 | 连尚(新昌)网络科技有限公司 | A kind of message display method, equipment and storage medium |
CN108600342B (en) * | 2018-03-30 | 2020-01-10 | 连尚(新昌)网络科技有限公司 | Message display method, device and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN100501746C (en) | 2009-06-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN100501746C (en) | Web page collecting method and web page collecting server | |
US6954754B2 (en) | Apparatus and methods for managing caches on a mobile device | |
CN100464308C (en) | Method and system for updating user vocabulary synchronouslly | |
CN101334792B (en) | Personalized service recommendation system and method | |
CN104182408B (en) | A kind of webpage offline access method and device | |
CN110519401A (en) | Improve method, apparatus, equipment and the storage medium of network Access Success Rate | |
CN102164186B (en) | Method and system for realizing cloud search service | |
US10489476B2 (en) | Methods and devices for preloading webpages | |
CN105095226A (en) | Method and apparatus for loading webpage resource | |
CN101702173A (en) | Method and device for increasing access speed of mobile portal dynamic page | |
JP2006511134A (en) | Method for automatically replicating data objects between a mobile device and a server | |
CN102480397A (en) | Method and equipment for accessing internet pages | |
CN104298790A (en) | Browser accelerating method and browser device with accelerator | |
EP1512264B1 (en) | Communication system, mobile device and method for storing pages on a mobile device | |
CN102819554A (en) | Favorite data processing method and device and server | |
CN103701929A (en) | Method and device for realizing business data caching | |
CN105468707A (en) | Cache-based data processing method and device | |
CN102591887B (en) | Network data pre-head method and system | |
CN103916474A (en) | Method, device and system for determining caching time | |
CN103473326A (en) | Method and device providing searching advices | |
CN103617278A (en) | Control method and device for address bar searching | |
CN100489861C (en) | Data searching method, system and device | |
CN102129437A (en) | Domain name matching method and browser | |
CN101299854B (en) | Mobile terminal and data maintenance method thereof | |
CN104348628A (en) | Method and device for obtaining local Root authority |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |