CN103870546B - The analysis method of on-line off-line environment page contrast and equipment after transcoding - Google Patents

The analysis method of on-line off-line environment page contrast and equipment after transcoding Download PDF

Info

Publication number
CN103870546B
CN103870546B CN201410066929.4A CN201410066929A CN103870546B CN 103870546 B CN103870546 B CN 103870546B CN 201410066929 A CN201410066929 A CN 201410066929A CN 103870546 B CN103870546 B CN 103870546B
Authority
CN
China
Prior art keywords
html page
online
offline
access
dom tree
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410066929.4A
Other languages
Chinese (zh)
Other versions
CN103870546A (en
Inventor
王峰
邹静
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201410066929.4A priority Critical patent/CN103870546B/en
Publication of CN103870546A publication Critical patent/CN103870546A/en
Application granted granted Critical
Publication of CN103870546B publication Critical patent/CN103870546B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/957Browsing optimisation, e.g. caching or content distillation
    • G06F16/9577Optimising the visualization of content, e.g. distillation of HTML documents

Abstract

Analysis method and equipment that after providing a kind of transcoding, the on-line off-line environment page contrasts.Methods described includes:Obtain and storage configuration file;Configuration file according to storage obtains the html page that online and offline access;Obtain the difference between the html page of online and offline access by contrasting the Dom tree of the html page that online and offline access;Given a mark according to the similarity between the html page that the difference between the html page that the online and offline obtaining access accesses to online and offline;Bleachinged and dyeing according to the change between the html page that online and offline are accessed by the difference between the html page that the online and offline obtaining access;Represent the html page and the marking result of the access of the online and offline after bleachinging and dyeing.

Description

The analysis method of on-line off-line environment page contrast and equipment after transcoding
Technical field
The present invention relates to after transcoding page effect comparison, more particularly, be related to on-line off-line environment after a kind of transcoding The analysis method of page contrast and equipment.
Background technology
Webpage in the past mainly for fixed terminals such as computers, with intelligent terminal's (such as smart mobile phone)Etc. can also Browse the appearance of the mobile terminal of various webpages, due in the size of the display screen of these mobile terminals or mobile terminal be The restriction of system, not all mobile terminal can show the webpage effect of various webpages it is therefore desirable to make well Carry out transcoding with the former webpage that transcoder is directed to Computer Design to script so as to adapt to the display screen of various mobile terminals.
Because the webpage of the Internet is in different poses and with different expressions, page type and form are too numerous to enumerate, to the transcoding page in prior art Quality test be with manually with the naked eye being tested, manually two webpages being compared one by one to find difference.Due to Artificial subjective factor, is also easy to lead to the page category tested incomplete, so that test result referential is had a greatly reduced quality.
Content of the invention
Therefore, the invention provides a kind of analysis method for on-line off-line environment page contrast after transcoding, described side Method includes:Obtain and storage configuration file;Configuration file according to storage obtains the html page that online and offline access;Pass through Contrast online and offline access the html page Dom tree come to obtain online and offline access the html page between difference; According between the html page that the difference between the html page that the online and offline obtaining access accesses to online and offline Similarity given a mark;According to the difference between the html page that the online and offline obtaining access, online and offline are visited Change between the html page asked is bleachinged and dyeing;Represent the html page and the marking knot of the access of the online and offline after bleachinging and dyeing Really.
Preferably, the html page obtaining online and offline access may include:On configuration file splicing line according to storage With the URL accessing under line;The URL that online and offline according to splicing access obtains the online and offline after execution javascript The html page accessing.
Preferably, kernel program can be browsed to execute URL by using simulation, thus generating execution
The html page after javascript.
Preferably, the difference obtaining between the html page that online and offline access may include:Pretreatment online and offline The html page accessing;Contrast the Dom tree of the html page that pretreated online and offline access.
Preferably, the Dom tree of the html page that the pretreated online and offline of contrast access may include:Obtain line on and The Dom tree of the html page accessing under line;Traversal Dom tree obtains the content of each label;It is respectively directed to online and offline to access The Dom tree of the html page in obtain each label substance be compared;Comparative result is shown with Hash array form.
The invention provides a kind of analytical equipment for environment page contrast on-line off-line after transcoding, described analytical equipment May include:Configuration file acquiring unit, obtains and storage configuration file;Html page acquiring unit, according to the configuration literary composition of storage Part obtains the html page that online and offline access;Html page difference computing unit, by contrasting what online and offline accessed The Dom tree of the html page come to obtain online and offline access the html page between difference;Marking unit, according to the line obtaining The similarity that above difference and the html page accessing under line between is come between the html page that online and offline are accessed is carried out Marking;Bleaching and dyeing unit, according to the difference between the html page that the online and offline obtaining access, online and offline are accessed The html page between change bleachinged and dyeing;Represent unit, represent the online and offline after bleachinging and dyeing access the html page with And marking result.
Preferably, html page acquiring unit may include:URL splices subelement, according to the configuration file splicing line of storage The URL accessing above and under line;Html page obtains subelement, and the URL that the online and offline according to splicing access obtains execution The html page that online and offline after javascript access.
Preferably, html page obtains subelement and can browse kernel program to execute URL by using simulation, thus generating The html page after execution javascript.
Preferably, html page difference computing unit may include:Html page pretreatment unit, pretreatment online and offline The html page accessing;Dom tree comparison unit, the Dom tree of the html page that the pretreated online and offline of contrast access.
Preferably, Dom tree comparison unit can travel through Dom by obtaining the Dom tree of the html page that online and offline access Tree obtains the content of each label, is respectively directed to each label obtain in the Dom tree of the html page of online and offline access Content is compared, and shows comparative result with Hash array form, to contrast what pretreated online and offline accessed The Dom tree of the html page.
The other aspect of the present invention and/or advantage will partly be illustrated in following description, some passes through to retouch State and will be apparent from, or can learn through the enforcement of the present invention.
Brief description
By being described to embodiments of the invention below in conjunction with the accompanying drawings, the above and other purpose of the present invention will become Obtain clearer, wherein:
Fig. 1 is to illustrate the analysis method for environment page contrast on-line off-line after transcoding according to embodiments of the present invention Flow chart;
Fig. 2 is to illustrate that the configuration file according to storage according to embodiments of the present invention obtains the html that online and offline access The detailed process of the page;
Fig. 3 is to illustrate according to embodiments of the present invention to obtain the detailed of the difference between the html page that online and offline access Thin process;
Fig. 4 is the schematic diagram illustrating Dom tree;
Fig. 5 be illustrate to travel through Dom tree after the schematic diagram of text, branch, img, link array that obtains;
Fig. 6 is the schematic diagram illustrating Dom tree under Dom tree and line on reference line;
Fig. 7 be illustrate to contrast after the data array that obtains(Branch、Image、Link)Schematic diagram;
Fig. 8 be illustrate to contrast after the data array that obtains(Text)Schematic diagram;
Fig. 9 is the hash array after illustrating the contrast of Branch, Text, Image, Link array;
Figure 10 is the schematic diagram being shown for the marking of each node;
Figure 11 is the schematic diagram of the html page after illustrating to bleaching and dyeing different content;
Figure 12 is the schematic diagram illustrating to be represented all results;
Figure 13 is the knot of the analytical equipment for environment page contrast on-line off-line after transcoding illustrating the embodiment of the present invention The block diagram of structure.
Specific embodiment
Fig. 1 is to illustrate the analysis method for environment page contrast on-line off-line after transcoding according to embodiments of the present invention Flow chart.
As shown in figure 1, in S101, obtaining and storage configuration file.
Concretely, configuration file may include for the page marking weights configuration file, page type configuration file, operation Required other configurations file(Such as on-line off-line machine name, build a station framework or cloud of structure system is selected to read framework or simple Transcoding framework), send the associated profile such as addresses of items of mail.
These configuration files can be stored (for example, placing) in catalogue (for example, a conf catalogue)In, therefore needing When can obtain configuration file by reading this catalogue.
For example, due to various parameters in the uniform resource locator (URL) that fixation, transcoding do not service of the test machine under line Do not fix in testing, therefore on-line off-line machine configuration file is extracted and is placed in predetermined directories.When needed, The parameter of needs can be added in this predetermined directories, such as the nocache of not middle caching, checks page module wdebug parameter etc. Deng.
Subsequently, in S102, the configuration file according to storage obtains the html page that online and offline access.
As shown in Fig. 2 S102 may include sub- S1021 and S1022.
In S1021, the configuration file according to storage splices the URL that online and offline access.
For example, splicing line can be carried out by reading on-line off-line machine URL associated profile from the configuration file of storage The URL accessing above and under line.Splicing is exactly that two character strings are connected together one new character string of composition.
For example, URL=" test environment IP "+"/"+" parameter 1 "+" & "+" parameter 2 "+" & "+" ... "+" &src="+" PCURL " is exactly spliced URL.
In S1022, the URL that the online and offline according to splicing access obtains the online and offline after execution javascript The html page accessing.
Concretely, browse kernel program to execute URL by using simulation in an embodiment of the present invention, thus generating The html page after execution javascript.
Now turn to Fig. 1, in S103, obtained on line by contrasting the Dom tree of the html page that online and offline access Difference and the html page accessing under line between.
As shown in figure 3, S103 may include sub- S1031 and S1032.
The html page accessing in S1031, pretreatment online and offline.
Described pretreatment can be executed by removing the useless character in the html page, for example, remove in text and had time White symbol and ESC;Filter out all control characters;Process timg, remove host, ip, sec, di;Filter out small text.
In S1032, contrast the Dom tree of the html page that pretreated online and offline access.
Concretely, first, obtain the Dom tree of the html page that online and offline access.Fig. 4 is to illustrate showing of Dom tree It is intended to.
Subsequently, traversal Dom tree obtains the content of each label.These labels for example include:Text label, title label, Style label, div tag, img label, a label.
For example, text array can be obtained from text label, comprise the n*3 dimension group of text, node quantity, node information, The text size of this node;Obtain title from title label;Obtain css information from style label, comprise the corresponding key of css With value information;Obtain the n*3 dimension group whether branch folds from div tag, the quantity of folding, corresponding to folding point Href value, the content of text of folding;Obtain the n*3 dimension group of img, the quantity of img, the length of the src of img, img from img label Spend for -1;Obtain img and link array from a label, link is n*3 dimension, the quantity of link, the corresponding src of link, link's Length is -1.Fig. 5 be illustrate to travel through Dom tree after the schematic diagram of text, branch, img, link array that obtains.
Again, each label substance being respectively directed to obtain in the Dom tree of the html page of online and offline access is carried out Relatively.For example, respectively on-line off-line branch array, text array, img array, link array are contrasted.Fig. 6 is The schematic diagram of Dom tree under Dom tree and line on reference line is shown.
For example, contrast branch array, img array, obtain data array after link array(Array of data)And data_ Diff array(Data difference).Data array comprise the node quantity of branch or img or link on line, under line branch or Than under line on the node quantity of img or link, the quantity of on-line off-line same node, the quantity of on-line off-line different node, line Than the quantity reducing node under line on the quantity of increase node, line.Fig. 7 be illustrate to contrast after the data array that obtains(Branch、 Image、Link)Schematic diagram.Additionally, Data_diff array comprise data array, on line branch or img or link knot Point information n*3 dimension group, the node information n*3 dimension group of branch or img or link under line, identical node information array, Than the node information array reducing under line on the node information array that increases than under line on different node information array, line, line.
Additionally, for example, contrast text array obtains data array(Array of data)With data_diff array.Here, data Array comprises the size (size) of text on line, the size of text, the size of same text, the size of different text, line under line On than the text size increasing under line, reduce than under line on line text size, on line text coverage rate(On line, text covers Lid rate=(same_size+chang_size)On/total_size_ line).In data_diff array and branch array Data_diff structure of arrays is the same.Fig. 8 be illustrate to contrast after the data array that obtains(Text)Schematic diagram.
Finally, comparative result is shown with Hash array form, i.e. show that with Hash array form comparing online and offline visits The result of each label in the Dom tree of the html page asked.Fig. 9 is to illustrate that Branch, Text, Image, Link array contrasts Hash array afterwards.
Now turn to Fig. 1, in S104, according to the difference between the html page that the online and offline obtaining access come to line Above the similarity and the html page accessing under line between is given a mark.
Concretely, by traveling through each node, carry out beating according to other conditions such as Hash result, priority, number of levels Point, and the fraction of each node is collected to calculate gross score.Figure 10 is the schematic diagram being shown for the marking of each node.
Below by an example, scoring process to be described.
For branch, in img, link array, the marking rule of node is as follows:
$score_branch=($same_score+$change_score)*100/$total_score
$change_score=1-$change[$i]/$max_offset*0.5
$same_score:The quantity of same node
$change[$i]:The quantity of i-th node skew
On $ max_offset=line, different node quantity deducts the absolute value of different node quantity under line
$total_score:Node quantity
$score_img:The score of image
$score_link:The score of link
In text array, the marking rule of node is as follows:
Calculate each corresponding weight method of text node:
$weigth_text_node[$i]=$length_text_node[$i]*$total_text_num/$total_ text_size
$total_text_size:All node text size sums
$total_text_num:Node sum
$weigth_text_node[$i]:The corresponding weight of i-th node text
$length_text_node[$i]:The length of i-th node
Calculate the fraction of same node:
$same_text_score:The corresponding weight sum of same node
Calculate the fraction of different node:
$change_text_score[$i]=(1*$weigth_text_node[$i]*$off_text_set[$i])
$change_text_score[$i]:The corresponding fraction of i-th node, the weight of current node is multiplied by current node Side-play amount
$change_text_score=$change_text_score[0]+$change_text_score[1]+……… +$change_text_score[n]
$change_text_score:All node fraction sums
$off_text_set[$i]=1-$offset_text_change[$i]/$max_text_offset*0.5
$off_text_set[$i]:The algorithm of the side-play amount of i-th node
$offset_text_change[$i]:The side-play amount of i-th node
$max_text_offset:For maximum value in on-line off-line amount of text
$weigth_text_node[$i]:The corresponding weights of i-th node, have calculated above
The computational methods of other fractions($other_text_score):Under the node add array and line that increase on traversal line The node sub array reducing, sees that this two parts can match, the match is successful, and $ other_text_score+=node corresponds to The length of this node under the length/line of this node on weight * line
The computational methods of total score:Fraction+others the fraction of fraction+change (change) node of same (identical) node
$total_text_score=($same_text_score+$chang_text_score+$other_text_ score)*100/$total_text_num
$total_text_num:The quantity of text node.
The computational methods of one page total score are as follows:
The fraction that the fraction of Branch node is multiplied by weights+Text node is multiplied by the fraction of weights+image node and is multiplied by power The fraction of value+link node is multiplied by weights
$html_score=($score_branch*$weigth_branch+$score_img*$weigth_img+$ score_link*$weigth_link+$total_text_score*$weigth_text)/($weigth_branch+$ weigth_img+$weigth_link+$weigth_text)
$ weigth_branch, $ weigth_img, $ weigth_link, the value of $ weigth_text are derived from configuration file In page weight table, the example that illustrate page weight table is presented herein below:
As page weight table above, the type of first row representation page, the such as type of acquiescence, homepage, url page etc.; Secondary series represents that branch, the 3rd row represent that link, the 4th row represent image, and the 5th row represent text.Type according to the page Difference, corresponding weights are different.In PAGE_TYPE_TEXT text type, the proportion that TEXT accounts for is up to 60, because It is the content mainly comparing text label in the page in the case that text type is for webpage.
In S105, according to the difference between the html page that the online and offline obtaining access, online and offline are accessed The html page between change bleachinged and dyeing.
Can be according to the data data being previously obtained, data_diff array color by between the html page of online and offline Difference bleaching and dyeing in order to observe.
Concretely, the html page on the line that traversal obtains, on line in the html page, finds the knot in change array Node in point and add array, after finding, for example, shows this section of content with yellow, and this section of content that yellow is shown is write Enter the new html page, finally give the new html page, i.e. the html page bleachinging and dyeing out by chang and add node.
Traversal obtains the html page under line, descends online in the html page, finds the node in change array and add array In node, after finding, such as with pink this section of content of display, and will pink display the write of this section of content new The html page, finally gives the new html page, i.e. the html page bleachinging and dyeing out by chang and add node.
Figure 11 is the schematic diagram of the html page after illustrating to bleaching and dyeing different content.
It should be understood that the order of above-mentioned S104 and S105 execution is not limited to foregoing description that is to say, that S104 and S105 can Execute simultaneously or first carry out S105, subsequently execute S104.
Finally, the html page and the marking result of the access of the online and offline after bleachinging and dyeing in S106, are represented.For example, may be used The same page is represented and is checked in order to user.Certainly, also dependent on the mail ground of the user in configuration file Location will be bleachinged and dyeing result and will be sent to user, and user can be by clicking on the chained address in mail it can be seen that representing result.
Figure 12 is the schematic diagram illustrating to be represented all results.
As shown in figure 12, as indicated by the left side, by the branch of all pages, text, link, image fraction shows Out, and represent on-line off-line html link and provide;
As indicated by the upper right corner, after clicking on leftmost click button, the information of current page, current page on line The original url in face, page type, branch similarity score, text similarity score, link similarity score, image are similar Degree fraction, overall similarity fraction can be revealed.
The html page of machine in middle indicatrix.
The html page of machine under the indicatrix of the right.
Figure 13 is the knot of the analytical equipment for environment page contrast on-line off-line after transcoding illustrating the embodiment of the present invention The block diagram of structure.
As shown in figure 13, described analytical equipment includes:Configuration file acquiring unit 100, html page acquiring unit 200, Html page difference computing unit 300, marking unit 400, bleaching and dyeing unit 500 and represent unit 600.
Above-mentioned each module will be described in detail below.
Configuration file acquiring unit 100 is used for obtaining and storage configuration file.
Concretely, configuration file may include for the page marking weights configuration file, page type configuration file, operation Required other configurations file(Such as on-line off-line machine name, build a station framework or cloud of structure system is selected to read framework or simple Transcoding framework), send the associated profile such as addresses of items of mail.
These configuration files can be stored (for example, placing) in catalogue (for example, a conf catalogue)In, therefore needing When can obtain configuration file by reading this catalogue.
The configuration file that html page acquiring unit 200 is used for according to storage obtains html page that online and offline access Face.
As shown in figure 13, html page acquiring unit 200 may include URL splicing subelement 2001 and html page obtains son Unit 2002.
URL splices subelement 2001 and splices, according to the configuration file of storage, the URL that online and offline access.
Html page obtains subelement 2002 and obtains execution javascript according to the URL that the online and offline of splicing access The html page that online and offline afterwards access.Concretely, html page obtains subelement in an embodiment of the present invention 2002 browse kernel program to execute URL by using simulation, thus generating the html page after execution javascript.
Html page difference computing unit 300 is obtained by contrasting the Dom tree of the html page that online and offline access Difference between the html page that online and offline access.
As shown in figure 13, html page difference computing unit 300 may include html page pretreatment unit 3001 and Dom tree Comparison unit 3002.
The html page that html page pretreatment unit 3001 pretreatment online and offline access.
Dom tree comparison unit 3002 contrasts the Dom tree of the html page that pretreated online and offline access.
Concretely, Dom tree comparison unit 3002 can be contrasted pretreated online and offline and visit by following process The Dom tree of the html page asked:Obtain the Dom tree of the html page that online and offline access;Traversal Dom tree obtains each label Content;Each label substance being respectively directed to obtain in the Dom tree of the html page of online and offline access is compared;With Hash array form shows comparing result, i.e. is shown with Hash array form and compares the html page that online and offline access The result of each label in Dom tree.
Marking unit 400 is according to the difference between the html page of the online and offline access obtaining come to online and offline Similarity between the html page accessing is given a mark.
Bleaching and dyeing difference between the html page that unit 500 accesses according to the online and offline obtaining come by online and offline Change between the html page accessing is bleachinged and dyeing.
Represent the html page and the marking that unit 600 represents the access of the online and offline after bleachinging and dyeing in the same page.
It should be understood that for environment page contrast on-line off-line after transcoding analytical equipment be able to carry out above-mentioned combination Fig. 1- The analysis method of on-line off-line environment page contrast after transcoding described by 12, in order to simple and clear, here no longer carries out detailed weight Multiple description.
By according to embodiments of the present invention for transcoding after the analysis method of on-line off-line environment page contrast and equipment, Can greatly reduce the burden of manual testing, and can more precisely and objectively the similarity of the transcoding page be carried out right Than;When having New function or redaction to issue, can automatically test whether functional on line to be impacted, to subtract as far as possible The number of times of few rollback.
Analysis method for environment page contrast on-line off-line after transcoding according to embodiments of the present invention as above Computer program can be written as, and can be in the general digital meter executing described program by using computer readable recording medium storing program for performing It is implemented in calculation machine.
Although be particularly shown and described the present invention, those skilled in the art with reference to embodiments of the invention It should be understood that in the case of without departing from the spirit and scope of the present invention being defined by the claims, can be so that form be carried out to it With the various changes in details.

Claims (10)

1. a kind of analysis method for environment page contrast on-line off-line after transcoding, methods described includes:
Obtain and storage configuration file;
Configuration file according to storage obtains the html page that online and offline access;
Obtain online and offline by the content contrasting each label of the Dom tree of the html page that online and offline access to visit Difference between the html page asked, wherein, is shown with Hash array form and compares the html page that online and offline access The result of each label in Dom tree;
Html page online and offline being accessed according to the difference between the html page that the online and offline obtaining access Between similarity given a mark;
Html page online and offline being accessed according to the difference between the html page that the online and offline obtaining access Between change bleachinged and dyeing;
Represent the html page and the marking result of the access of the online and offline after bleachinging and dyeing,
Wherein, the difference between the html page according to the online and offline access obtaining is come the html that online and offline are accessed Similarity between the page carries out marking and includes:
By traveling through each node of Dom tree, according to Hash result, priority, number of levels, each node is given a mark, and to each The fraction of node is collected to calculate gross score.
2. analysis method as claimed in claim 1, wherein, the html page obtaining online and offline access includes:
Configuration file according to storage splices the URL that online and offline access;
The URL that online and offline according to splicing access obtains html page that the online and offline after executing javascript access Face.
3. analysis method as claimed in claim 2, wherein, browses kernel program to execute URL by using simulation, thus raw Become the html page after execution javascript.
4. analysis method as claimed in claim 1, wherein, obtains the difference bag between the html page that online and offline access Include:
The html page that pretreatment online and offline access;
Contrast the Dom tree of the html page that pretreated online and offline access.
5. analysis method as claimed in claim 4, wherein, the html page of contrast pretreated online and offline access Dom tree includes:
Obtain the Dom tree of the html page that online and offline access;
Traversal Dom tree obtains the content of each label;
Each label substance being respectively directed to obtain in the Dom tree of the html page of online and offline access is compared;
Comparative result is shown with Hash array form.
6. a kind of analytical equipment for environment page contrast on-line off-line after transcoding, described analytical equipment includes:
Configuration file acquiring unit, obtains and storage configuration file;
Html page acquiring unit, the configuration file according to storage obtains the html page that online and offline access;
Html page difference computing unit, each label of the Dom tree of the html page being accessed by contrast online and offline Content, to obtain the difference between the html page that online and offline access, wherein, is shown on alternative line with Hash array form Result with each label in the Dom tree of the html page accessing under line;
Online and offline are accessed by marking unit according to the difference between the html page that the online and offline obtaining access Similarity between the html page is given a mark;
Bleaching and dyeing unit, according to the difference between the html page that the online and offline obtaining access, online and offline are accessed Change between the html page is bleachinged and dyeing;
Represent unit, represent the html page and the marking result of the access of the online and offline after bleachinging and dyeing,
Wherein, marking unit passes through to travel through each node of Dom tree, according to Hash result, priority, number of levels, each node is entered Row marking, and the fraction of each node is collected to calculate gross score.
7. analytical equipment as claimed in claim 6, wherein, html page acquiring unit 200 includes:
URL splices subelement, and the configuration file according to storage splices the URL that online and offline access;
Html page obtains subelement, and the URL that the online and offline according to splicing access obtains the line after execution javascript The html page accessing above and under line.
8. analytical equipment as claimed in claim 7, wherein, html page obtains subelement and browses kernel journey by using simulation Sequence is executing URL, thus generating the html page after execution javascript.
9. analytical equipment as claimed in claim 7, wherein, html page difference computing unit includes:
Html page pretreatment unit, the html page that pretreatment online and offline access;
Dom tree comparison unit, the Dom tree of the html page that the pretreated online and offline of contrast access.
10. analytical equipment as claimed in claim 9, wherein, Dom tree comparison unit is passed through to obtain what online and offline accessed The Dom tree of the html page, traversal Dom tree obtains the content of each label, is respectively directed to the html page of online and offline access Dom tree in obtain each label substance be compared, and with Hash array form show comparative result, to contrast pretreatment The Dom tree of the html page that online and offline afterwards access.
CN201410066929.4A 2014-02-26 2014-02-26 The analysis method of on-line off-line environment page contrast and equipment after transcoding Active CN103870546B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410066929.4A CN103870546B (en) 2014-02-26 2014-02-26 The analysis method of on-line off-line environment page contrast and equipment after transcoding

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410066929.4A CN103870546B (en) 2014-02-26 2014-02-26 The analysis method of on-line off-line environment page contrast and equipment after transcoding

Publications (2)

Publication Number Publication Date
CN103870546A CN103870546A (en) 2014-06-18
CN103870546B true CN103870546B (en) 2017-03-01

Family

ID=50909076

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410066929.4A Active CN103870546B (en) 2014-02-26 2014-02-26 The analysis method of on-line off-line environment page contrast and equipment after transcoding

Country Status (1)

Country Link
CN (1) CN103870546B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106371988A (en) * 2016-08-22 2017-02-01 浪潮(北京)电子信息产业有限公司 Automatic interface test method and device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5761683A (en) * 1996-02-13 1998-06-02 Microtouch Systems, Inc. Techniques for changing the behavior of a link in a hypertext document
CN102760139A (en) * 2011-04-29 2012-10-31 国际商业机器公司 Webpage processing method and webpage processing system
CN103218358A (en) * 2012-01-18 2013-07-24 百度在线网络技术(北京)有限公司 Diff scoring method and system
CN103226475A (en) * 2013-05-16 2013-07-31 百度在线网络技术(北京)有限公司 Method and device for realizing control replacement during transcoding
CN103365967A (en) * 2013-06-21 2013-10-23 百度在线网络技术(北京)有限公司 Automatic difference detection method and device based on crawler
CN103455547A (en) * 2013-07-05 2013-12-18 百度在线网络技术(北京)有限公司 Method and device for webpage load

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7966564B2 (en) * 2008-05-08 2011-06-21 Adchemy, Inc. Web page server process using visitor context and page features to select optimized web pages for display

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5761683A (en) * 1996-02-13 1998-06-02 Microtouch Systems, Inc. Techniques for changing the behavior of a link in a hypertext document
CN102760139A (en) * 2011-04-29 2012-10-31 国际商业机器公司 Webpage processing method and webpage processing system
CN103218358A (en) * 2012-01-18 2013-07-24 百度在线网络技术(北京)有限公司 Diff scoring method and system
CN103226475A (en) * 2013-05-16 2013-07-31 百度在线网络技术(北京)有限公司 Method and device for realizing control replacement during transcoding
CN103365967A (en) * 2013-06-21 2013-10-23 百度在线网络技术(北京)有限公司 Automatic difference detection method and device based on crawler
CN103455547A (en) * 2013-07-05 2013-12-18 百度在线网络技术(北京)有限公司 Method and device for webpage load

Also Published As

Publication number Publication date
CN103870546A (en) 2014-06-18

Similar Documents

Publication Publication Date Title
US11294968B2 (en) Combining website characteristics in an automatically generated website
CN104156307B (en) A kind of browser compatibility detection method and system
CN104391786B (en) Webpage automatization test system and its method
CN107229633A (en) Static page generation method, Web access method and device
EP2987088A2 (en) Client side page processing
CN105512285B (en) Adaptive network reptile method based on machine learning
WO2014127535A1 (en) Systems and methods for automated content generation
US20140289612A1 (en) Merging web page style addresses
EP2399200A1 (en) Method and system of processing cookies across domains
CN109033282B (en) Webpage text extraction method and device based on extraction template
US20220114269A1 (en) Page processing method, electronic apparatus and non-transitory computer-readable storage medium
CN106951495A (en) Method and apparatus for information to be presented
CN102760150A (en) Webpage extraction method based on attribute reproduction and labeled path
CN106899549A (en) A kind of network security detection method and device
CN106886530A (en) A kind of dynamic data distinguishes editing and updating method and system
CN107133165A (en) Browser compatibility detection method and device
CN108334484A (en) The method and apparatus of data inputting
CN107015903A (en) A kind of generation method, device and the electronic equipment of interface detection program
CN107590288A (en) Method and apparatus for extracting webpage picture and text block
CN107229653B (en) Pseudo static webpage generation method and device
CN108388796B (en) Dynamic domain name verification method, system, computer device and storage medium
CN106970962A (en) A kind of method and apparatus for obtaining search engine search results
CN109558123A (en) The method of webpage conversion electrons book, electronic equipment, storage medium
CN103870546B (en) The analysis method of on-line off-line environment page contrast and equipment after transcoding
CN108255891A (en) A kind of method and device for differentiating type of webpage

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant