Summary of the invention
The objective of the invention is to address the above problem, provide a kind of in the service of the network that independently releases news the method for fox message content, finished quasi real time content monitoring and the demand known of situation carried out in the website with a large amount of real-time update contents, solved the unfavorable problem of the supervision that causes because of shortage of manpower, perhaps because of the user who uses strong keyword filtration to cause release news limited too much, the problem that descends of user experience.
Another object of the present invention is to provide a kind of in the service of the network that independently releases news the device of fox message content, finished quasi real time content monitoring and the demand known of situation carried out in the website with a large amount of real-time update contents, solved the unfavorable problem of the supervision that causes because of shortage of manpower, perhaps because of the user who uses strong keyword filtration to cause release news limited too much, the problem that descends of user experience.According to the emphasis difference of paying close attention to content, the present invention also can be used for full station content is set the autoscan of search strategy and retrieves and find the concern content of appointment, and reports the concern personnel.
Technical scheme of the present invention is: the present invention disclosed a kind of in the service of the network that independently releases news the method for fox message content, comprising:
(1) to the stand full scanning of creeping automatically of scope of all webpages of appointed website or newly-increased webpage;
(2) check web page contents according to the search strategy of setting, obtain content/page address that the setting that meets this search strategy requires;
(3) will meet content/page address that the setting of this search strategy requires reports in predefined selectable mode.
Above-mentioned in the network service that independently releases news the method for fox message content, wherein, in step (2), this search strategy comprise to the inspection of carrying out key word and expression formula coupling based on the text webpage, to the matching check found based on the particular file format sign, to the image graphic file carry out the image recognition of particular type inspection, human language is carried out any one combination in the matching check of identification automatically of meaning of one's words machine.
Above-mentioned in the network service that independently releases news the method for fox message content, wherein, this predefined selectable mode comprises that Email reports, phone reports, mobile phone reports, the IMU number reports in the step (3).
Above-mentioned in the network service that independently releases news the method for fox message content, wherein, step (1) and (2) further comprise:
(a) whether meet the controlled condition of all period interval or special time according to the time response condition judgment that limits, if eligible then enter step (b);
(b) dynamically the page URL formation that acceptance scans is prepared in filling;
(c) judge the URL object in this page URL formation, whether can scan in addition,, otherwise directly change step (3) over to if the URL object that can scan is arranged then extract one of them URL object;
(d) connect and the essential information of the pagefile availability by a connection judgment URL object, if but essential information meet the testing conditions of setting then continue next step, otherwise with this URL object tag for wrong and charge to error queue;
(e) carry out web page contents entities loading and preliminary state inspection, continue the analyzing web page content if check result meets, otherwise with this URL object tag for wrong and charge to error queue;
(f) content substance of analyzing web page, carry out that URL check to analyze and according to the inspection of the search strategy of setting, wherein URL checks that analytic process comprises available URL is packed into URL formation to be analyzed, and the URL that is not inconsistent the inspection condition does and abandons mark and charge to error queue;
(g) be the inspection of only carrying out search strategy when not comprising the binary type file of URL information at checked URL pagefile, comprise that content substance is carried out the matching ratio of search strategy is right, when find triggering the problem of this search strategy, URL is made marks and charge to URL formation to be reported to the police, simultaneously current page URL is done watermark, supervision time mark and status indication.
The present invention has also disclosed the device of fox message content in a kind of network service that independently releases news, and comprising:
The full station scan module of creeping automatically is to the stand full scanning of creeping automatically of scope of all webpages of appointed website or newly-increased webpage;
Search strategy is checked module, checks web page contents according to the search strategy of setting, and obtains to meet content/page address that this search strategy is set requirement;
Can select reporting module, will meet content/page address that this search strategy triggers and report in predefined selectable mode.
The device of fox message content in the above-mentioned network that independently the releases news service, wherein, this search strategy check the search strategy of setting in module comprise to the inspection of carrying out key word and expression formula coupling based on the text webpage, to the matching check found based on the particular file format sign, to the image graphic file carry out the image recognition of particular type inspection, human language is carried out any one combination in the matching check of identification automatically of meaning of one's words machine.
The device of fox message content in the above-mentioned network that independently the releases news service, wherein, this can select reporting module to comprise that Email reports unit, phone to report unit, mobile phone to report unit, IMU number to report the unit.
The present invention contrasts prior art following beneficial effect: the present invention is by the scanning of creeping automatically of scope that all webpages of appointed website or newly-increased webpage are stood entirely, check web page contents according to the search strategy of setting, obtain the content/page address of the requirement that meets the search strategy setting, and these content/page addresses report in predefined selectable mode.The contrast prior art the invention solves the unfavorable problem of the supervision that causes because of shortage of manpower, perhaps because of the user who uses the strong keyword filtration of rigidity to cause release news limited too much, the problem that descends of user experience.
Embodiment
The invention will be further described below in conjunction with drawings and Examples.
Fig. 1 shows the system topology that web site contents inspection of the present invention is reported to the police.See also Fig. 1, check alarm center 10, have following setting for the website.At first allow the user to create the series of monitoring scan task, the corresponding website of each task or a start page; The second, the security strategy (for example key word and expression way, feature or particular functionality matching check etc.) of each task is set during user's creation task; The 3rd, be set up the website that needs to check and receive the inspection of background scanning property program loop, the background scheduler that is positioned at system starts the node scan server of extensively distribution to finish these tasks, when discovery strategy is excited by content, start corresponding warning reminding equipment, send the notification to designated user equipment and address; The 4th, corresponding apparatus type and address when security strategy is excited (phone and number, MSN and account, Email and mail box address etc.) will obtain notice.
Fig. 2 shows the flow process of user profile of the present invention and task setting.See also Fig. 2, system design corresponding menu come managing user information, task, web site scan daily record, graphical statistics etc., for the background scanning system provides scanning monitoring objective and security strategy, the respective number of information terminals such as corresponding Email, mobile phone, IMU etc. is set for the warning reminding system simultaneously, and these information terminals that set in advance will be received the prompting warning when triggering warning.
Fig. 3 shows the flow process of the preferred embodiment of the method for fox message content in the network service that independently releases news of the present invention.Seeing also Fig. 3, is the detailed description to each step in the method below.
Step S30: to the stand full scanning of creeping automatically of scope of all webpages of appointed website or newly-increased webpage.
All pages under the website or the newly-increased page are carried out the autoscan of full standing posture.All web page addresses in the website of designated inspection are by being present in the link network address analysis that comprises in each page acquisition that adds up.The scanning of creeping automatically in the step can realize finding new page or leaf, dodges old page or leaf, dodges endless loop, dodge multiple scanning in a short time etc., because these steps can realize that prior art is not done at this and given unnecessary details by multiple algorithm.
Step S32: check web page contents according to the search strategy of setting, obtain content/page address that the setting that meets search strategy requires.
Search strategy means the retrieval and the discovery of the named policer that the concern content is carried out, and wherein a kind of form for example is the security strategy described in the present embodiment.
Search strategy (or security strategy) include but not limited to the inspection of carrying out key word and expression formula coupling based on the text webpage, to the matching check found based on the particular file format sign, to the image graphic file carry out the image recognition of particular type inspection, human language is carried out any one combination in the matching check of identification automatically of meaning of one's words machine.
Step S34: the content/page address that will meet search strategy setting requirement reports in predefined selectable mode.
The mode that reports is set by the user in advance, for example can be arranged to that Email reports, phone reports, mobile phone reports, the IMU number reports etc.
Fig. 4 shows the further refinement of step S30, S32 and S34.Seeing also Fig. 4, is the detailed description to each step below.
Step S400: whether the judgement time periodicity condition meets, if meet then enter step S401, otherwise flow process finishes.
After the user is provided with the monitoring scan task, whether scanning system can meet the controlled condition of time cycle interval or special time according to the time response condition judgment that limits, this comprises the time interval frequency of scanning, the scope of scanning and the mode of scanning etc., avoids repeating in the short time occupied bandwidth resource and monitored server system resource etc.
Step S401: the formation of dynamically loading examine page URL.
Dynamically filling prepares to accept the page URL formation of scanning, along with the download and the analysis of each page, constantly increases the number of objects of URL to be scanned.
Step S402: judge the URL object that whether can scan in addition in the scan queue,, otherwise change step S417 over to if having then enter step S403.
Step S403: from the URL object that can scan, extract one of them URL object to begin to enter download scanning subsequent step afterwards.
Step S404: connect and connect (HEADER) and judge essential informations such as URL pagefile availability (comprise content that success or wrong code, entity information are possible etc.) by head.If coincidence detection is submitted to then entered step S405, otherwise change step S409 over to and return step S402.
Step S405: carry out the web page contents entities loading.
Step S406: carry out preliminary state inspection, if met page entity state to be analyzed would enter step S407 and S410 continuing the analyzing web page content, otherwise change step S409 over to and return step S402.
Step S407: carry out comprising in the page or leaf network address analysis.
Step S408: all URL in the page or leaf are checked the condition (available URL comprises the URL that did not analyze, the URL that increases newly and meets URL of type to be analyzed etc. in the formation wherein to be scanned) that enters formation to be scanned to judge whether to meet one by one, and URL done according to check result one by one abandon or load.If meet the condition that enters formation to be scanned then enter step S413, otherwise change step S412 over to.
Step S409: do and abandon mark, these data can be quoted by step S412, dodge invalid page or leaf in step S407 circular test thereafter.
When checked pagefile is the binary type file that do not comprise URL information for example when image, picture, audio file etc., skips steps S407~409.
Step S410: content analysis in the page or leaf.
Step S411: judge whether to find to report to the police, remind key word according to the security strategy of setting.If then enter step S414, otherwise change step S415 over to.
Comprise and carry out content key word, expression formula inspection, or carry out essentiality content inspection (as special pattern image recognition, language intelligence machine recognition etc.) by other power function, module.
Step S412: data referencing changes step S407 then over to.
Step S413: load and be array to be scanned, enter step S401 then.
Step S414: filling warning array is also done respective markers, enters step S415 and S416 then.
Step S415:URL does watermark and sequential record, changes step S412 then over to.
Step S416: data referencing changes step S417 then over to.
Step S417: drive alarm module to finish warning, flow process finishes.
The startup of whole procedure and work are to be finished based on scheduling property management such as time, task, target control and corresponding task are started by the sequential scheduling program outside this program of being based upon.
Fig. 5 shows the further refinement of step S34 among Fig. 3 embodiment, is the detailed description of each step below.
Step S500: quote and treat alert data and warning object data.
Step S501: judging whether to report to the police with phone language is complementary, if coupling then enter step S502.
Step S502: driving arrangement sends reports to the police, and enters step S503 then.
Step S503: detect success status,, otherwise return step S502 if success then flow process finish.
Step S504: judge whether to be complementary with SMS alarm, if the coupling would enter step S505.
Step S505: drive SMS platform and send warning.
Step S506: detect success status,, otherwise return step S505 if success then flow process finish.
Step S507: judging whether to report to the police with Email is complementary, if coupling then enter step S508.
Step S508: mail server sends reports to the police.
Step S509: detect success status,, otherwise return step S508 if success then flow process finish.
Step S510: judging whether to report to the police with IMU is complementary, if coupling then enter step S511.
Step S511: the IMU platform sends alarm.
Step S512: detect success status,, otherwise return step S511 if success then flow process finish.
Fig. 6 show of the present invention in the service of the network that independently releases news the device of fox message content.See also 6, device comprises that creep automatically scan module 60, search strategy of full station check module 62 and can select reporting module 64.Whole device is to be made of the server of difference in functionality and gateway device, The software multithreading, modularization, distributed architecture programming, integrated database technology, speech synthesis technique, telecommunication technique and load-balancing technique etc.
The stand full scanning of creeping automatically of scope of all webpages of 60 pairs of appointed website of scan module or newly-increased webpage of creeping automatically of full station.Search strategy inspection module 62 is carried out the robotization inspection according to the search strategy of setting to web page contents, obtains to meet content/page address that this search strategy requires.Search strategy check the search strategy of setting in the module 62 comprise to the inspection of carrying out key word and expression formula coupling based on the text webpage, to the matching check found based on the particular file format sign, to the image graphic file carry out the image recognition of particular type inspection, human speech is carried out any one combination in the matching check of identification automatically of meaning of one's words machine.Can select reporting module 64 that these content/page addresses that search strategy triggered are reported in predefined selectable mode.Can select reporting module 64 can comprise that Email reports unit, phone to report unit, mobile phone to report unit and IMU number to report the unit.
The foregoing description provides to those of ordinary skills and realizes or use of the present invention; those of ordinary skills can be under the situation that does not break away from invention thought of the present invention; the foregoing description is made various modifications or variation; thereby protection scope of the present invention do not limit by the foregoing description, and should be the maximum magnitude that meets the inventive features that claims mention.