TITLE OF THE INVENTION:
System and Method for Evaluating and Optimizing Web Site Attributes
Cross-reference to Related Patent Applications
This application claims priority to co-pending Italian Patent Application No. UD99A00213 entitled "PROCESSO VI CONTROLLO E GESTIONE VI SITO INFORMATICO" filed on December 9, 1999 in the name of SYA srl, the entirety of which is herein incorporated by reference.
Field of the Invention
The present invention is directed generally to document processing, and more particularly to evaluating and optimizing the attributes of web site documents.
Background of the Invention
Since its introduction, the Internet has increased in popularity in terms of both the number of users accessing the Internet, as well as with the number of entities who launch and maintain web sites thereon. Web sites, as used herein, generally refers to one or more interrelated web pages. Although the computer languages for generating these web sites are largely standardized (i.e. HTML, XML, JAVA), the methods for generating them are not. As a result,
attributes such as ease-of-use, appearance and navigability may vary greatly from web site to web site.
Many operators of commercial web sites generate revenue or receive revenue which in some way corresponds to the number of visitors who access their web sites. For example, advertisers may pay a web site operator to display banner ads to visitors. The price to be paid by the advertisers may be determined based on the average number of visitors to the site per day. In another example, the operator of a web site may offer to sell any number of products or services through the site. One or more web pages are presented which include pictures and or textual descriptions of the products and/or services offered. For sales to be accomplished, visitors to the web site will generally need to view these pages before purchasing the product or service. This, in turn, requires that the visitors stay at the web site long enough to view the product/service pages and make the decision to purchase the offered products and services.
Web sites which demonstrate poor ease-of-use, appearance and navigability generally do not receive as many visitors as web sites which are considered user-friendly. Such poorly designed sites also retain fewer visitors for a shorter period of time. As a result, web site operators who maintain poorly designed web sites may not be able to effectively compete on the Internet due to the lower revenue generation which results from lower visitation.
Several attempts have been made to evaluate particular features of a web site. For example, hypertext markup language (HTML) validation programs have been produced. The purpose of these programs is to ensure that the commands found in the source document of each web page conform to a specific version of HTML. In another example, link validating program have been developed which confirm that hypertext links which may placed on a web site to direct a user to a web page that is external to the current web site are valid. However, these
programs do not contemplate reviewing multiple attributes of web sites. These programs further do not contemplate rating multiple attributes of a web site. Also, such programs fail to present suggested corrections for web design errors. Thus, it would be advantageous to develop a method and system for evaluating and optimizing web site attributes so as to promote increased visitation and user retention, thereby increasing revenues and or popularity of a web site.
Summary of the Invention
The present application is directed to particular features of a method and system for evaluating and optimizing web site attributes. In particular, the present invention includes a first method for evaluating a web site. A server or a software program of the present invention receives a source document corresponding to a web site from a user. The source document may have a plurality of tags. The system compares at least one of the plurality of tags to a usability rule and generates a usability rating based on the comparison. In a second embodiment of the present invention, a server or program of the present invention receives an identification of a source document corresponding to a web site and a user request to compare the source document to a usability rule. In response, the source document is compared to the usability rules and the user receives a usability rating of the web site in response to the request. In a third embodiment of the present invention a method and accompanying apparatus for for increasing a usability of a web site begins when a user identifies a web site having a plurality of attributes and sends a request for a comparison of the web site to a plurality of usability rules.
A server or computer then presents an identification of a problem with at least one of the attributes in response to the request.
Brief Description of the Drawings
Further aspects of the present invention will be more readily appreciated upon review of the detailed description of the preferred embodiments included below when taken in conjunction with the accompanying drawings, of which:
FIG. 1 is a block diagram illustrating an exemplary computer network in accordance with an embodiment of the invention;
FIG. 2 is a block diagram of exemplary components of a server for use with the system of FIG. 1;
FIGS. 3 and 4 are a flowchart of an exemplary comparison process in accordance with an embodiment of the present invention;
FIGS. 5 and 6 are exemplary screen displays depicting selectable usability rules and variable weights according to one embodiment of the present invention;
FIG. 7 is an exemplary screen display of a menu for initiating a web site attribute evaluation according to one embodiment of the present invention; FIG. 8 is an exemplary screen display depicting an entry field for designating a source document to review according to one embodiment of the present invention;
FIG. 9 is an exemplary screen display of a source document to be evaluated according to one embodiment of the present invention;
FIG. 10 is an exemplary screen display showing the result of a web site evaluation according to one embodiment of the present invention;
FIG. 11 is an exemplary screen display showing an explanation of an identified problem with a web site according to one embodiment of the present invention;
FIG. 12 is an exemplary screen display showing a description and suggested correction of an identified problem with a web site according to one embodiment of the present invention; and
FIG. 13 is an exemplary usability rating generated according to one embodiment of the present invention.
Detailed Description of the Invention
Referring now to FIGS. 1-13, wherein similar components of the present invention are referenced in like manner, preferred embodiments of a method and system for evaluating and optimizing web site attributes are disclosed.
Turning now to FIG. 1 , there is depicted an exemplary computer network 1 through which a plurality of remote devices 2, 4, 6 may communicate with a server 16 via network connection 10 in any known manner. Although computer network 1 is preferably an Internet- based network such as the World Wide Web, it may be any one or more of a local area network (LAN), a wide-area network (WAN), an intranet environment, an extranet environment, a wireless network or any other type of computer network, such as those enabled over public switched telephone networks.
Remote terminals 2, 4, 6 may each be any type of computing device, such as a
personal computer, a workstation, a network terminal, a hand-held remote access device or any other device that can communicate with the server 16 over the network connection 10. Users may interact with the server 16 in order to submit source document information and the like and receive a usability rating for a web site corresponding to the source document. Server 16, in turn, may include any number of computer servers operative to evaluate source document information in accordance with the present invention. Specific functions and operations of serverlό are discussed further below.
Turning now to FIG. 2, displayed therein are exemplary components of a computing device, such as server 16. It should be understood that any of customer terminals 2, A and 6 may share similar configurations. However, for sake of brevity, the discussion immediately below will refer to the server 16 only.
The primary component of the server 16 is a processor 20, which may be any commonly available microprocessor, such as the PENTIUM III manufactured by INTEL CORP. The processor 20 may be operatively connected to further exemplary components, such as RAM/ROM 26, a clock 28, input/output devices 30, and a memory 22 which, in turn, stores one or more computer programs 24.
The processor 20 operates in conjunction with random access memory and read-only memory in a manner well known in the art. The random-access memory (RAM) portion of RAM/ROM 26 may be a suitable number of Single In-line Memory Module (SIMM) chips having a storage capacity (typically measured in kilobytes or megabytes) sufficient to store and transfer, inter alia, processing instructions utilized by the processor 20 which may be received from the application programs 24. The read-only memory (ROM) portion of RAM/ROM 26 may be any permanent non-rewritable memory medium capable of storing and transferring, inter
alia, processing instructions performed by the processor 20 during a start-up routine of the server 12.
The clock 28 may be an on-board component of the processor 20 which dictates a clock speed (typically measured in MHz) at which the processor 20 performs and synchronizes, inter alia, communication between the internal components of the server 12.
The input/output device(s) 30 may be one or more commonly known devices used for receiving operator inputs, network data, and the like and transmitting outputs resulting therefrom. Accordingly, exemplary input devices may include a keyboard, a mouse, a voice recognition unit and the like for receiving operator inputs. Output devices may include any commonly known devices used to present data to an operator of the server 12 or to transmit data over the computer network 1 to a remote user or customer. Accordingly, suitable output devices may include a display, a printer and a voice synthesizer connected to a speaker.
Other input/output devices 30 may include a telephonic or network connection device, such as a telephone modem, a cable modem, a T-l connection, a digital subscriber line or a network card, for communicating data to and from other computer devices over the computer network 1 , such as remote terminal 2. In an embodiment involving a network server, it is preferred that the communications devices used as input/output devices 30 have capacity to handle high bandwidth traffic in order to accommodate communications with a large number of users. The memory 22 may be an internal or external large capacity device for storing computer processing instructions, computer-readable data, and the like. The storage capacity of the memory 22 is typically measured in megabytes or gigabytes. Accordingly, the memory 22 may be one or more of the following: a floppy disk in conjunction with a floppy disk drive, a hard
disk drive, a CD-ROM disk and reader/writer, a DVD disk and reader/writer, a ZIP disk and a ZIP drive of the type manufactured by IOMEGA CORP., and/or any other computer readable medium that may be encoded with processing instructions in a read-only or read-write format. Further functions of and available devices for memory 22 will be apparent. The memory 22 preferably stores, inter alia, a plurality of programs 24 which may be any one or more of an operating system such as WINDOWS 2000 by MICROSOFT CORP, and one or more application programs, such as a web hosting program, which may be necessary to implement the embodiments of the present invention. The programs 24 preferably include evaluation software for evaluating and optimizing web site attributes in accordance with the present invention.
Turning now to FIGS. 3 and 4, with continuing references to FIGS. 5-13, an exemplary process 31 for evaluating and optimizing web site attributes is presented. The process 31 begins when a user identifies a web site or presents one or more source documents to be evaluated (step 32). This may be accomplished by initiating evaluation software which implements the process 31. The evaluation software may present a screen display 114 in which a dialog box 112 is presented for entering an identification of the web site or files containing source document data to be evaluated, as depicted in FIG. 8. The source document 118 may be written in any suitable format or language, such as HTML, XML and CSS. The corresponding web site may be an e- commerce web site, or any other type of web site which has user-accessible content presented in one or more inter-related source documents 118.
In an online embodiment of the present invention, a user 2 may transmit a URL or other identifier to a remote server 16 which, in turn, downloads source data from a second server (not shown) that hosts the web site corresponding to the URL. Alternatively, the user 2 may transmit
source document data 118 (as shown in screen display 116 of FIG. 9) corresponding to the web site directly to the server 16 from one or more computer-readable files stored in a memory of the user terminal 2.
In an offline embodiment, a user 2 may have the evaluation software installed on the server hosting the web site (not shown), in which case network communication over network connection 10 is not necessary. Also, the user 2 may install the evaluation software on a standalone computer and load the source document data from a computer-readable medium.
Next the user transmits a request to compare the source documents corresponding to the web site to multiple and various usability rules stored by the program (step 34). A screen display 100 of exemplary rules are provided in FIGS. 5 and 6. The rules may be presented according to a number of headings 102, such as page rules, graphics rules, navigation rules and structure rules. Other headings or fewer headings 102 may be provided. The screen display may also present individual rule titles 104 which generally describe the rule to be applied. Each rule may further have a weight 106 associated therewith. The weight 106 is used to generate a usability rating as described further below. The weights may be adjusted by selecting the weight 106 and inputting a new number variable as displayed according to checkbox 106 of FIG. 6. Preferably, weights are to be applied on a scale of one to five. However, other weight systems may be used. A number of checkboxes 105 may be presented by which a user may designate whether particular rules or groups of rules are to be applied. It is further contemplated that a user 2 may define and incorporate their own rules into the evaluation software.
In a preferred embodiment, the rules to be employed as well as the statistical weights to be applied thereto are based on user surveys in which participants are asked to rate various existing web sites according to ease-of-use, navigability and appearance. An analysis of the
questionnaires may provide a guideline to identifying and quantifying those attributes which negatively impact a user's perception of the web site. Other criteria for determining problems, statistical weights, and the like may be used.
Returning to process 31, in response to the user's request, the processing instructions of the evaluation software next direct the processor 20 of the server 16 to parse each received source document 118 to generate a corresponding tag list, which includes the document tags stored in each source document 118 (step 36). The tags may be written in a format that corresponds to standard web document languages such as HTML or XML. As used herein, the parsing of a source document may involve transforming a stream of computer-readable data (e.g., representing HTML code) into a useable data structure which is then stored in an accessible memory, such as memory 32 or the RAM component of RAM/ROM 26. Such a data structure preferably represents the content of the page and is optimized for access speed. The evaluation program may then query the data structure to determine what web site attributes are present in the source document 118. Next, the evaluation software generates a model of the entire web site based on the source documents received (step 38). According to one embodiment of the present invention, once the tag list for each page has been assembled, the evaluation software constructs a model of the web site which includes, for example, the interconnections among the various pages of the web site. The site model designates which page corresponds to the home page of the web site and further identifies which is the current web page chosen for the evaluation. In one embodiment, the site model is a graphic depiction representing all possible connections between any two pairs of pages in the web site. The connections are preferably determined from HTML links presented in the source documents as well as anchors, included images, frames, automatic refresh pages,
background images, image map links, and the like included in the web site. The site model particularly helps determine problems with respect to site rules and the like as described further below.
The evaluation software may then direct the processor 20 to compare the attributes from the tag list corresponding to a source document 118 to the stored usability rules (step 40). Each usability rule is preferably a set of processing instructions that are applied to each page systematically and which may be applied in any order. Such processing instructions direct the processor 20 to collect needed information from the tag list corresponding to the source document 118, and from the site model. There may be three types of rules: (1) those applied to individual pages and analyzing features of those particular pages (referred to herein as single page rules); (2) those that compare a page with a model page or another page of the web site (referred to herein as pairwise rules); and (3) rules which check features that do not belong to specific web pages, but rather, are properties of the entire site (referred to herein as site rules). The pairwise rules are generally used to check the consistency between a page chosen as a model and each other page that should conform to the former one. Checking is performed by extracting several features of the model page and verifying that the remaining pages have them as well.
Next, the processing instructions may direct the processor 20 to compare tag list data to a specific standard and if the standard is not met, a usability problem is generated and identified to the user (step 42). According to the degree of conformance of such a data with respect to the rule and the weight identified with the rule, the problem is assigned to a severity level. In accordance with this step, a screen display 120 reporting detected web site attribute problems is depicted in FIG. 10. According to one embodiment of the present invention, the evaluation software may present a user 2 with a list of pages 124 from the identified web site 122. A first column 126
presents the number of detected problems associated with each page and the value 128 representing the average number of problems per page. A second column 130 may present a severity rating associated with each page which may be determined based upon the number of problems detected and the weights of the detected problems. The severity may also be represented in a graphical format. The second column may also include the average severity of the web site 132, which may in turn, be used to generate a usability rating for the web site, in any useful manner. A third column 134 may further be presented which lists the number of problems fixed and the number of problems originally present.
Next, the user 2 may indicate whether problems identified by the evaluation software are to be corrected (step 44). If so, the process 31 continues to step 46 below. Otherwise the process continues step 54, also described below. In order to fix identified problems, the user 2 may select a page listed in column 124 of screen display 120. The evaluation software may then present a dialog box, such as dialog box 136 of FIG. 11, which lists the individual problems associated with the selected page. The dialog box 136 may further contain a window 138 for presenting a description 142, an explanation 139 and one or more suggested corrections 144 for the one or more listed problems.
The user 2 may then decide if one or more of the suggested corrections should be implemented (step 46). If the user 2 decides to use the suggested corrections, the user 2 may activate a fix problem command 145, after which the evaluation software inserts the suggested corrections into the applicable source document 118 (step 50). If, however, the user 2 wishes to correct the attributes manually, the user 2 may instead select the edit page command 146 and insert desired replacement code in the source document 118 (step 48). Steps 44-50 may be repeated iteratively, as needed, for each problem identified with the subject web site.
Next, the user 2 may decide if the web site is to be re-evaluated (step 52). if so, the process 31 returns to step 40 above. Otherwise, the process 31 continues to step 54 wherein the evaluation software generates a usability rating of the web site. The usability rating is determined in any useful manner by considering the number of problems associated with the web site, the weights associated with those problems, and a percentage of the web site for which there are no demonstrable problems. Other factors may be used to generate the usability rating. The usability rating may take the form of a percentage, but may alternately be described in terms of a number scale, a letter grade or any other useful indicium of web site ease-of-use, navigability, appearance, and/or user-friendliness. In one embodiment, the usability rating may be generated by the number of pages having problems with a high weight or severity in comparison to the number of pages of the web site. In another embodiment, usability ratings may be assigned in the following manner. A rating of 100% would be assigned to a web site which does not demonstrate catastrophic problems (i.e. those problems having a highest weight), does not demonstrate major problems (those having an intermediate weight) and has less than a predetermined number of minor problems per page(those with the lowest weights). A web site would receive a usability rating of 80% when there are no catastrophic problems, less than a predetermined number (e.g. 3) of major problems per page, and less than a predetermined number (e.g. 10) of minor problems per page. A usability rating of 60% would be applied a web site which contains no catastrophic problems, less than a predetermined number (e.g. 10) of major problems per page, and less than a predetermined number (e.g. 10) of minor problems per page. A 40% rating would be applied to web sites which do not demonstrate catastrophic problems, but demonstrate higher than predetermined numbers
of major and minor problems. A 0% rating would be applied to a web site demonstrating catastrophic problems. Other rating systems may be used.
Once generated, the user 2 may decide whether to place the rating on the web site (step 56). This may be accomplished by inserting appropriate web page formatting commands into a source document of the web site. An example of a displayed usability rating 160 is presented in FIG. 13. If the user 2 decides to insert the rating, the process 31 continues to step 58 where the appropriate instructions are inserted into the source document 118, after which the process 31 ends. If, on the other hand, the user 2 decides not to insert the rating, the process 31 ends without performing step 58 The following are rule titles corresponding to rules which may be employed by a method or apparatus of the present invention. Additional rules may be used and fewer rules may be employed. Preferred rules, however, include: (1) Body Color Portability, (2) Portable Font Color, (3) Known Body Color, (4) Known Font Color, (5) No Blinking, (6) Headings Exist, (7) Body Consistency, (8) Body Color Consistency, (9) Consistent Link Colors, (10) Noframes Exists, (11) Noframes With Content, ( 12) Noframes Links, (13) IFRAME Exists, ( 14) Frame Borders, (15) Frame Margin, (16) Frame Targetting, (17) Noresize Frame, (18) Frames Are Bad, (19) Page Size Small, (20) IMG with ALT, (21) Img Alt Short, (22) ALT With Content, (23) IMG With Size, (24) Image Stretching, (25) Forced Download Images, (26) Background Consistency, (27) Description Analysis, (28) Keywords Analysis, (29) Title Analysis, (30) Explicit Mail To, (31 ) VLINK_differs_from_LINK, (32) Non-Standard LINK, (33) Empty Anchor Links, (34) Consistent Navigationals, (35) Reachable Home Page, (36) Logical Path Rule, (37) Isolated Pages, (38) Self Referential Page, (39) Broken Links, (40) Click Counting,
(41) Http Equiv Refresh Is Bad, (42) Http Equiv Refresh Forward, (43) Http Equiv Refresh Backward, (44) Invisible Image Maps, (45) Image Map Links, (46) Show Links, (47) Show Paths, (48) Show Site Navigationals, (49) Max Click Number, (50) List Of Images, (51) Recycle Graphics, (52) Email Consistency, (53) Pre-Filtered Page, (54) Show Parser Errors, (55) Marquee Is Bad, (56) Spacer Is Bad. Each of attribute problem, a description of the problem, and suggested fixes for the problem are presented in detail below.
The rule relating to Body Color Portability checks that colors specified in a BODY element of the source document for foreground or background are recognizable by various versions of standard web browsers. Such colors may be those belonging to a predefined list of color names or those specified as RGB values belonging to a specific subset. Using non-standard colors may lead to different renderings between different browsers on displays not supporting 24-bit colors (i.e. millions of colors), thus reducing the graphical quality of the site. To correct a detected problem with this attribute, the evaluation software first extracts the colors specified in the BODY HTML elements of the source document 118 and compares them with the predefined list of acceptable colors or to standard RGB formats. The software then signals a problem for each page that uses colors not satisfying these constraints. Possible fixes for this problem include presenting an interactive tool to choose an acceptable color to substitute for the original, problematic color listed.
The rule relating to Portable Font Color checks that colors specified in <FONT COLOR=...> tags of the source document 1 18 are portable across standard web browsers.
Portable colors are those belonging to a predefined list of color names or those specified as RGB values belonging to a standard format. Using non-standard colors may lead to differing presentations of a web site among different types of web browsers, especially those which do not
support 24-bit colors. This, in turn, may reduce the graphical quality of the web site. To identify corrections to this problem, the rule extracts from the source document those colors specified as attributes of FONT elements, generally having the form <FONT COLOR=...>. Next, the evaluation software compares these colors with a predefined list of standard colors and/or the standard RGB formats. The software signals a problem for each page that contains FONT elements with colors not satisfying those constraints. The software may then suggest a fix for this problem by presenting an interactive tool to choose an acceptable color to substitute for the original color.
The rule relating to known body color checks that colors specified in the BODY element of a source document are specified either as RGB strings or as symbolic names for VGA standard colors only. The list of VGA colors is constituted by 16 color names that all browsers understand. Using symbolic names for colors, except for recognized VGA colors, may lead to different renderings between different browsers, and some may not even be recognized, thus reducing the graphical quality of the site. In order to correct this problem, the evaluation software extracts the colors specified as attributes of the BODY HTML element in the page and compares them with the predefined list and the VGA standard. The software then identifies a problem for each page that uses colors not satisfying these constraints. The software may then suggests a fix for the problem by presenting an interactive tool to choose an acceptable to substitute for the original color. The rule relating to Known Font Color checks that colors specified in a <FONT
COLOR=...> tag are specified either as RGB strings or symbolic names for VGA colors only. As stated previously, the list of VGA colors is constituted by 16 color names that all browsers understand. Using symbolic names for colors, except for the VGA ones, may lead to different
renderings between different browsers. Some names may not even be recognized, thus reducing the graphical quality of the site. In order to fix this problem, the evaluation software extracts from the source document 118 those colors specified as attributes of FONT elements in the page which have the form <FONT COLOR=...>. Next, the software compares these colors with the predefined list and the RGB format. The software then identifies a problem for each page that contains FONT elements with colors not satisfying those constraints. The software may suggest a fix to the problem by presenting an interactive tool from which the user may choose an acceptable color to substitute for the original one.
The rule relating to No Blinking checks that the page does not contain the BLINK element of an HTML document. The BLINK element, though being part of the HTML 4.0 standard, is considered an undesirable element because surveys have indicated that blinking animated features on a computer screen have a strong negative impact on a user's peripheral visual perception. In other words, a user's attention is constantly distracted away from the interesting parts of the page. It has been further shown that distracted users will move quickly to other sites, thus decreasing visitor retention times.
In order to address this, the evaluation software extracts from the page all the BLINK elements and generates a problem for all of them. In addition, the software suggests to fix the problem by substituting the BLINK element with other means to emphasize text, including, for example, inserting a new color, increasing font size, using boldface (e.g. by employing a <STRONG> tag), or by using italics (e.g. by employing an <EM> tag).
The rule relating to Headings Exist checks that if the page contains a reasonable amount of text (pages consisting mainly of graphics may be skipped), then the text should be split and heading elements should be inserted. Headings <H1>, <H2>, <H3> are used in html pages to
structure text into blocks. The advantages of using headings are: (1) the text may be easier to scan and read (2) the text may be more effectively rendered by speaking machines, and (3) the heading text may used when a search engine indexes the page.
In order to fix problems of this nature, the evaluation software first counts the words that constitute the content of a page. If the resulting number is above a predefined threshold, then it extracts all the heading elements. If none are found then a problem is generated. The software then suggests to fix the error by splitting the page content into sections and using heading elements to specify sections titles.
The rule relating to Body Consistency is a pairwise rule that checks that the subject page contains a particular body element if such element is found in the model page. If this is not true, then the two pages have a different structure that breaks consistency. Changing structure between pages may confuse readers to the point that they may not know if they are still visiting the same site, or if they got the page they requested. In order to address this problem, the evaluation software extracts BODY elements from the two pages and compares results. If the compared elements are different then a problem is generated. The software then suggests that the problem be fixed by making the subject web page use exactly the same structure as that used by the model page.
The rule relating to Body Color Consistency is a pairwise rule that checks that a model page, such as a home page of the web site, and the current page specify the same color for the following components: background, foreground text, visited links, unvisited links and active links. Changing colors between pages may confuse readers to the point that they may not know if they are still visiting the same site, if they got the page they wanted, or if the meaning of colored objects has changed. In order to address this problem, the evaluation software extracts
from the model and the current page the color specified for the considered components. If they differ (or only one of the two pages specify it) then a problem is generated. The evaluation software then suggests that the user replace, add or remove the color used in the current page, accordingly. This may be accomplished through a suitable dialog box that presents the information extracted from the model and offers the means to copy the colors to the current page. The rule relating to Consistent Link Colors is a pairwise rule that checks if individual instances of links have colors that are inconsistent between the model and the current pages. Inconsistent use of link colors is a source of confusion for users because they may not easily realize that words printed in a new color link to the same type as those that they have previously seen. In order to address this problem, the rule extracts FONT elements from both pages and the colors specified in those FONT elements. If the two pages differ for the same link type, then the evaluation software signals a problem. The problem may be fixed through a dialog box that shows which elements in the model page use which color and that highlights those elements in the current page using a color not appearing in the model. The rule relating to No Frames Exists checks that if the page contains a FRAMESET element (that is, an element specifying that the page should have a frame-based structure), then the page should also contain a NOFRAMES element. This is because a frameset without <NOFRAMES> ... < NOFRAMES> will not be displayed by certain browsers which can not render frames. Such browsers include early versions of well-known browsers as well as certain modern specialized browsers, like speaking machines, browsers for PDAs, and text-only browsers. Thus, by failing to specify a NOFRAMES element, many users will not be able to access such a page and might decide to give up exploring the site.
In order to address this problem, the evaluation software determines whether the source e document 118 includes a FRAMSET and, if so, whether a NOFRAMES element exists. If it does not exist, then a problem is generated. The software then suggests to fix the problem by adding a NOFRAMES element. A suggestion presented for inserting such a NOFRAMES element may be similar to the following:
Consider adding the following HTML code after </FRAMESET>: <NOFRAMES>
<BODY bgcolor=RGBCOLOR text=RGBCOLOR background=IMAGE>.... </BODY>
< NOFRAMES>
The rule relating to No Frames With Content checks that if the page contains a NOFRAMES element, then it should contain meaningful text that is to be substituted for the frame, rather than sentences like 'you need a frame enabled browser to view this page'. The NOFRAMES element is intended to be used to provide users of browsers that are not frame- enabled the means to access the site, although with a slightly degraded graphic and interaction quality. A NOFRAMES element containing text stating that the page cannot be rendered by the browser being used does not help the user to access the page content that is framed.
In order to address this problem, the evaluation software collects all the text included in the NOFRAMES element, extracts the words that appear more frequently and removes from the resulting list of words other words that are not significant (like 'browser', 'frame', 'download',
etc.). If less than a predefined proportion (e.g. 50%) of the words remain, then a problem is generated since too few significant words have been found. Next, the rule extracts the most frequent words contained in the pages that constitute the FRAMESET components of the current page and shows them to the user. The user can then read those words and perhaps use them to fill the content of the NOFRAMES element.
The rule entitled No Frames Links checks that if a page that contains a FRAMESET, and further contains a NOFRAMES element, then the NOFRAMES element should include many of the links contained in the framed pages. This is because the NOFRAMES element is intended to provide users of browsers that are not frame-enabled the means to access the site anyway, although perhaps with a slightly degraded graphic and interaction quality. A NOFRAMES element that does not contain links, will not permit navigation from the page, thus unintentionally preventing a user from exiting the page without deactivating their browser.
In order to address this problem, the evaluation software extracts two lists of links: those connecting the NOFRAMES element with other pages and those connecting the framed pages with other pages. It then compares the lists and if less than, for example, 70% of the links found in framed pages occur in the NOFRAMES element, then a problem is generated. Since it is necessary to determine where in the web page to place the missing links, the user must generally provide a manual fix to this problem.
The rule entitled IFRAME Exists checks if an IFRAME element is present in the page. Though included in the HTML 4.0 standard, the IFRAME element (referring to "inline frame") is not supported by NETSCAPE NAVIGATOR versions 2, 3, or 4.1. Nor is it supported by INTERNET EXPLORER verson 2. Thus, users who surf the net with such browsers may not be able to correctly see the page.
In order to correct this problem, the evaluation software extracts any IFRAME elements from the page. If some are found, then a problem is generated. Since there is no easy way to simulate the effect of IFRAME using other standard HTML elements, the software suggests that the user fix the problem by either reorganizing the page without using IFRAME or providing an alternative route for users not using INTERNET EXPLORER 3+.
The rule entitled Frame Borders checks if the page specifies frame borders in a way that is incompatible between the two major browsers in the market: NETSCAPE NAVIGATOR and INTERNET EXPLORER. This is because these browsers use different HTML coding for determining whether frame borders should be drawn, and the width of the frame. A FRAMESET or FRAME element that uses only one type of coding will be properly viewed only on one type of browser. Thus, users surfing with other than that browser will not receive a properly rendered page. For FRAMESET, NETSCAPE NAVIGATOR 3+ and INTERNET EXPLORER 4+ use the format: FRAMEBORDER=no|yes
INTERNET EXPLORER 3+ uses the format: FRAMEBORDER=0| 1
FRAMESPACING=number
For FRAME, NETSCAPE NAVIGATOR 3+ and INTERNET EXPLORER 4+ use the format:
FRAMEBORDER=no|yes INTERNET EXPLORER 3+ uses the format: FRAMEBORDER=0|1
In order to correct this problem, the evaluation software checks what combination of HTML code is used to specify frame borders. In particular it checks if apparently redundant attributes are used that can be understood by both major browsers. The code that guarantees that both browser will work is displayed below:
Within FRAMESET or FRAME:
FRAMEBORDER=no co-occurs with FRAMEBORDER=0 FRAMEBORDER=yes co-occurs with FRAMEBORDER=l
Within FRAMESET the following attributes and values should be used:
FRAMEBORDER=no and FRAMEBORDER=0 and BORDER=0 and
FRAMESPACING=0
The evaluation software may offer a dialog box showing what kind of attributes are used in the page and highlighting what other attributes should be added to guarantee full compatibility with current major browsers. The dialog box also provides the user with the ability to automatically add the missing attributes.
The rule entitled Frame Margin determines whether the page specifies frame margins in a way that is compatible with major browsers in the market, for example, MARGINHEIGHT and
MARGINWIDTH can be used to specify the size of top/bottom and left/right margins (e.g., the distance between text and border of window). Unfortunately, certain browsers do not allow a margin=0. If any of the two attributes are set to 0, then a software error occurs with these browsers In order to address this problem, the evaluation software extracts from the page the attributes concerned with frame margins. It checks if MARGINWIDTH and/or MARGINHEIGHT are used with a value equal to 0. The software then shows the FRAME elements with such margins and suggests that the user remove the margin specification and instead use an appropriate background color for frames which will, in turn, camouflage the border when shown by non-supported browsers.
The rule entitled Frame Targeting concerns a problem where pages linked from within a frame may be displayed in different windows. The TARGET tag in HTML is one attribute controlling such an effect. A common design mistake is to forget to specify the target for pages that should not be framed. A user following such links will be trapped inside a frame until a new browser window is explicitly opened. Another mistake is to use TARGET=_blank, which will tell the browser to open a new window each time the user clicks on the link, quickly clogging the user's desktop.
In order to address these problems, the evaluation software collects all the links leaving the page, if the page is rendered within a frameset. It then removes from such a list those links pointing towards other framed pages belonging to the same frameset. It further removes those links that are not important. (Important links are those pointing to pages internally and to pages external to the site that are often used within the site. In practice, important links are determined by scanning all the pages of the site, collecting the frequencies with which their destination
URLs are used, and selecting those most frequently used. Links connecting two frames are generally not considered.) Next, the rule extracts the TARGET attribute for such links. If the
TARGET is '_self or is missing, then a problem is generated (because the link destination will be rendered by browsers within the current frame). If it is '_blank' then another kind of problem is generated. That is, a new browser window will be opened for rendering the link destination. In order to fix this problem, the evaluation software opens a dialog box which shows the list of links having wrong targets suggesting a correct target.
The rule entitled Noresize Frame confirms that the FRAME elements of the subject source document 118 do not contain the attribute NORESIZE. This is because users may want to move frames borders to better render web site content within their browser. Often this is needed because the configuration of their browser (e.g., font size, image downloading, etc.) may lead to an inconvenient layout of the frame. By specifying NORESIZE, the width or height of a frame is frozen, thus reducing user flexibility.
In order to address this problem, the evaluation software collects the FRAME elements contained in the page and checks if the NORESIZE attribute is set. If so, it identifies the problem to the user. A dialog box is then opened which shows the non-resizable frames and offers to automatically remove that attribute.
The rule entitled Frames Are Bad determines whether frames are present in a page and if so simply gives a warning message to the user. Frames suffer from a number of usability drawbacks, such as: (1) framed pages cannot be bookmarked ; (2) users often get confused as to which frame is active; (3) frames cannot be rendered by specialised browsers, like text-based or speaking machines; and (4) framed page are often not indexed by search engines, as they have no recognizable content.
The rule entitled Page Size Small determines whether the page can be quickly downloaded by visitors. Many visitors use slow internet connections and will be inconvenienced if the download time for a page is too long.
In order to address this problem, the evaluation software determines the size of all the files involved in the page (e.g., background and other images, framed pages, etc.), computes a sum and then computes the estimated download time for website users that may use different modem speeds (e.g. 14.4, 28.8, 33.6, 56.0 kbps). Special attention is paid if the page being analyzed is a home page of the web site, since this will be the first page a visitor is presented with. If a download time for the page exceeds a predetermined or user-selected threshhold, the evaluation software recommends a reduction in the size of the page. The software may further highlight the size of particular page components, such as images, in order to assist the user in limiting the page size.
The rule entitled IMG With ALT determines whether images embedded in a page contain an alternative textual description. Images are not always downloaded by users and rendered by browsers. Image downloading may be disabled in many web browsers in order to increase an access to a web site. Thus, providing an alternative textual description of the image with the ALT attribute helps these users to determine what kind of image they missed and might be interested in. Furthermore, ALT descriptions are usually downloaded and rendered immediately by browsers, whereas the image itself might require several seconds to download and render. Finally, non graphically capable browsers (speaking machines or text-based browsers), are still able to provide some information to users.
In order to address this problem, the evaluation software extracts all the IMG elements from a page and then checks if they specify an ALT attribute. If not, a problem is generated and the software provides a recommendation that an ALT string with suitable text be inserted. The rule entitled IMG Alt Short determines whether images embedded in page have an alternative textual description that is not too long. Long strings will typically be truncated by browsers or won't be rendered at all. Therefore, long ALT strings contribute negatively to the download time of a web page. The absolute limit of an ALT string is 1024 characters, but it is reasonable for a web designer to adopt a much smaller limit.
In order to address the problem, the evaluation software presents a dialog in which the ALT string of each collected image is shown, together with the image itself, and the user can edit or reword the string so that it becomes shorter. Then the new string can be saved attached to its image in the current page or in all the occurrences of that image in other pages of the site.
The rule entitled ALT With Content determines whether the alternative string associated with an image does not consist of words that simply suggest the user to use a graphic-enabled browser. The ALT attribute should be used to convey some information about the content of the associated graphics. In this manner, non-graphical browsers (e.g., text-based, speaking machines, browsers with image download disabled and browsers for blind people) can still provide some meaningful information about the graphic image that can not be presented. Furthermore, search engines can better index the page. In order to address this problem, these strings are extracted from the page and words belonging to a specific subset of generic terms (e.g., 'browser', 'graphic', 'enable') are removed. The number of remaining words are compared to the original size of the string, and if the ratio is below a predefined threshold, then a problem is signaled. The software then presents a dialog
box in which the ALT string of each collected image is shown, together with the image itself. The user can then edit or re-word the text so that it becomes shorter. Then the new string can be saved attached to its image in the current page, as well as in all the occurrences of that image in other pages of the sites. The rule entitled IMG With Size determines whether images that are embedded in the current page declare their sizes in HTML. It is important to specify the height and width of an image in order to speed-up the rendering process of the browser. In fact, if such information is missing, the browser needs to download the entire image before drawing it. This, in turn, wastes time by increasing the time a user needs to download the page. In order to address this problem, the evaluation software collects all the embedded images (e.g., IMG elements) and confirms whether they each specify height and width attributes. If at least one of those elements do not specify the height/width attributes, the software identifies the element as an attribute problem. The software then suggests fixing this problem by showing the user a dialog box in which each image is shown with its physical size, if it can be obtained from the file containing the image itself. (This is currently possible for GIF and JPEG files). The user can then input new values for the image geometry or adopt the ones that the software proposes.
The rule entitled Image Stretching checks that if an embedded image is represented as a GIF file, and the IMG element specifies a height/width attribute of the image, then the value of these attributes should be equal to the physical size of the image that is extracted from the file. If the sizes are not equal, then the image is stretched for presentation on the web site. Stretching images, particularly those stored in GIF format, generally does not work and leads to pictures that are badly rendered.
In order to address this problem, the evaluation software collects the geometry specified in the HTML tag corresponding to the image and the geometry extracted from the file containing the image. If they differ, then a problem is generated. The same process is repeated for all the images embedded in the current page. The software then suggests that this problem be fixed by displaying a dialog box in which each image is shown along with its actual. The user can fix the images as the software proposes, or instead may abort the fixing process.
The rule entitled Forced Download Images determines whether the page contains images embedded in links that need to be downloaded in order to follow the link, even if the user disabled automatic downloading of images. Images whose downloading cannot be disabled by end users are those within <A> ... </A>. Enabling or disabling downloading of images can make the difference between a usable site and a site that cannot be experienced without very long waiting times. Embedding <IMG...> elements within the label of an anchor <A href=...> labelled with <IMG src=...> </A> forces the downloading of the image even when the user doesn't want to download it. Furthermore, embedding images in anchors is acceptable only if you can safely assume that downloading them will not require long waiting times for the user. In order to address this problem, the evaluation software collects the IMG elements from the page that are contained within <A> elements. A problem with these elements is identified to the user who may manually fix the link data.
The rule entitled Background Consistency is a pairwise rule that determines whether a model page and the current page use the same background image. Changing backgrounds between pages may confuse visitors, who may think that they have inadvertently left the site.
In order to address this problem, the evaluation software first determines whether the two pages specify the same background image. If one does and the other does not, then a problem is
identified. A problem is also identified if they both specify different background images, which may be determined from file names and sizes associated with the background. The software may then present a dialog box that presents the background images that it has found, and gives the user an option to change the background image of the current page. The rule entitled Description Analysis determines whether the current page contains a description that is suitable to be shown by search engines when they present he page in their hit list. In order to be properly handled by a search engine, the description string should not exceed
1024 characters and should not contain HTML tags.
For this attribute, the evaluation software determines whether the page contains a <META name=description content- '..." > element. If it does not, then a problem is identified and the software suggest that a meta tag be formed and placed in the source document.
Similar to the description analysis, a rule entitled Keywords Analysis detemines whether the current page contains a set of keywords that an be used by search engines when they index the page. The list of keywords should be shorter than 1024 characters and should not contain HTML tags. The keyword list is important for search engines when this page is indexed because it affects whether the page is retrieved and where the page is positioned in a ranked list of search results computed by the search engines.
In order to address this, the evaluation software determines whether the page contains a
<META name=keywords content- '..." > element. If not, a problem is identified. The software then suggests that the user provides a list of keywords. Additionally, the software may generate a suitable list by automatically extracting and ranking text from the page.
The rule entitled Title Analysis confirms that the page has a title and that it is not longer than the maximum length allowed for a title. The page title is used in bookmark lists and by
search engines to display results lists. Titles that are too long are truncated. Thus, it is important that titles are brief, informative and contextual, so that the page can be bookmarked and clearly identified.
In order to address this, the evaluation software extracts the TITLE element, if any. If no such attribute is found, then a problem is generated. If it is found, then its content is analyzed to determine if the title length is longer than 64 characters. If so, a second problem is identified.
The software then presents a dialog box through which a user may define a new title, or change the content of the current title. The user inputs the desired string and an appropriate TITLE element is inserted in the page. The rule entitled Explicit Mail To relates to MAILTO: links which enable a user to send an email message by selecting a link. Accordingly, the link should contain an actual email address.
In order to address this problem, the evaluation software extracts all the labels of all the links having MAILTO: as protocol. It then checks if the email address specified in the URL of the link is present in the text of the <A> element containing it. If not, a problem is identified. To fix this kind of problems the user can open a dialog box in which the email address and the label of each MAILTO: link may be entered. By pressing a button the label can be changed to the address and then saved.
The rule entitled VLINK_Differs_From_LINK confirms that a color used for visited links is different than the one used for non-visited links. Many users rely on the fact that links that have been already followed change their color. In this way they can tell if part of a site has already been visited.
In order to address this problem, the evaluation software reads the values of the attributes LINK and VLINK from the BODY element of the page, and compares them. If they are the same color (even if their name is different, like for 'BLACK' and '#000000', which are two ways to specify the same color) a problem is generated. To fix this problem a dialog can be opened which presents the two attributes and their values, as well as a color list from which the user can choose the desired colors. The conventional colors are highlighted, so that the user can select also those that are typical for websites.
The rule entitled Non-Standard LINK determines whether the page uses a conventional color for displaying visited and non visited links, respectively blue and purple. Many users rely on the fact that followed links are purple and new links are blue.
In order to address this problem, the evaluation software extracts the BODY element from the page and confirms whether it contains the attributes LINK and VLINK. If it does and the two colors are different than the conventional ones, then a problem is identified. A dialog box may then be opened which enables the user to set the default colors for these attributes. The rule entitled Empty Anchor Links is used to determine whether the page contains link specifications that do not have a clickable label. These links can occur where a web designer misplaces a tag for the A element. These <A href=.„> links cannot be followed by users because they have no label. They are generally harmless, but they add up to the page size, thus increasing page download time. In order to address this problem, the evaluation software extracts from the page all the A elements and checks what they have as labels. Those having an image or those having a nonempty string are skipped. The remaining elements are then identified as problematic. A dialog
can be used to fix this problem. The dialog box displays the list of links. The user can then type the label for each of those links and save them within the source document 118.
The rule entitled Consistent Navigationals is a pairwise rule which determines whether the navigationals that are used in the model page are also used in the current page. Navigationals are defined as those links pointing to pages within the site and pages external to the site that are often used. In practice, important links are determined by scanning all the pages of the site, collecting the frequencies with which their destination URLs are used, and selecting those most frequently used. Links connecting two frames are not considered. When links that are often used in the site are missing from a page, it may mean that an inconsistent navigational structure is present in the page. If this is the case, then users will have a hard time exploring the site because relevant paths are not available and paths that users have already seen in previous pages no longer exist.
In order to address this problem, the evaluation software examines all the links leaving the model page and those leaving the current page. It then considers only those links that are navigationals. If the remaining sets differ substantially, a problem is identified. To fix the problem the user needs to decide which of the navigational links that are missing in the current page have to be added.
The rule entitled Reachable Home Page determines whether the page contains a link to the site home page. If it does not, it may pose a major navigational obstacle for users who, for example, bookmark and visit an internal web page, but wish to visit the home page for the site therefrom.
In order to address this problem, the evaluation software collects from the site model all the pages that are connected to the current one. If the home page does not appear among such a
list, then a problem is identified. To fix this kind of problem, the software prompts the user to determine where and how to include a link to the home page. The rule provides a template for a textual link to the home page, but the actual fixing itself has to be performed by the user because of the determinations that have to be made. The rule entitled Logical Path Rule determines which is the shortest path from the home page to the current page that a visitor can follow by clicking on links. The software then checks whether from the current page there a link to each of the intermediate pages in such a path. Users may forget the route they followed to reach this page. Providing navigational aids in each page to help users to reach intermediate and initial pages of a site makes it much more navigable and usable. In addition, such navigational aids are clues for users to better understand the structure of a site. Relying only on the "BACK" button of the browser to support effective navigation of the site is not enough since some users may be visiting the page from a search engine or a bookmark, and the "BACK" button would not work.
In order to address this problem, the evaluation software scans all the links present in the page (e.g., those specified as textual ~ A ~ or embedded in images ~ AREA --) and checks whether they are included in the shortest path connecting the home page and the current page. If some are missing then a problem is identified. The problem can then be manually fixed by selecting the navigation structure that the software presents to the user and including it in the page. In this case, automatic fixing of the source document is not possible since adding links may require substantial reorganization of the page.
The rule entitled Isolated Pages is used to determine whether the current page can be reached from the home page. If isolated pages cannot be reached by users that start from the home page, then such users will not be able to view the page, even if it contains important
material. The only way they would be able to view this page is via a search engine or through some bookmark. Isolated pages may occur due to spelling errors in links of other pages.
In order to address this problem, the evaluation software searches the site model for paths connecting the home page to the current one. If none is found (within preset time limits, which may be selected by users) then the page is declared to be unreachable and a problem is identified. To fix this problem the user needs to reorganize part of the site, which has to be done manually.
The rule entitled Self Referential Page is used to determine whether the page contains links that point to itself. Such links are generally useless.
In order to address this problem, the evaluation software collects all the links in the page that point to the very same page. It does not consider A elements whose destinations are named locations within the page (i.e. elements having this form <A HREF=thispage# namedlocation> ... </A> are skipped). The software then recommends that these links be deleted.
The rule entitled Broken Links is used to determine whether links contained in the page point to resources that are valid. The presence of invalid links on the page may confuse and irritate users. Such links may be the result of a typing error in the listed URL.
In order to address this problem, the evaluation software collects all the links pointing to resources external to the page (including URLs to other pages and images embedded in the page). If the links point to resources that are locally stored, then these resources are checked. If they are not accessible to the software, then they are assumed to be inaccessible, and a problem is identified. If links point to resources that are external to the site (i.e. they reside on a web server that is different from the one serving the current page), then these links are probed. If this fails because of network problems then nothing is done. If probing instead fails because the resources cannot be accessed, then a further problem is identified. Fixing this problem requires
checking and modifying the URL of the broken links and possibly removing the link from the page if the resource does not exist anymore. A dialog box can be presented so that a user may perform both these activities.
The rule entitled Click Counting is used to determine whether the number of clicks that a user needs to perform to reach the current page starting from the home page is greater than a given threshold (which may be predefined by the user). An inordinate number of clicks may deter users from reaching and reading the page. Furthermore, it may confuse novice users because they would not be able to find what they need, leading them to conclude that the site does not contain such information. In order to address this problem, the evaluation software searches the path in the site model and counts the number of necessary clicks. If it exceed the threshold a problem is identified. To fix such a problem, site reorganization by the user is required.
The rule entitled Http Equiv Refresh Is Bad is used to determine whether the page contains HTML code that has the effect of automatically loading another page after some time has elapsed from the time when the current page was first selected by the browser. Such automatic loading of a new page has a number of drawbacks: (1) it may disorient users while they are reading the content of this page; (2) it may not allow users to examine and follow links present in this page; and (3) it reduces the effectiveness of the browser's BACK button (which is the second most used features by web users, after link following). In order to address this problem, the evaluation software determines whether the page has an element similar to the following one:
<META HTTP-EQUIV="refresh" CONTENT="5; url=new-page.html:">
If so and if the time specified in it is below a threshold (defined in the preferences), a problem is identified. Fixing this problem may require a reorganization of the page. However, the software may offer a dialog box that shows which is the destination page, and offers a button that can be used to automatically remove the META element. The rule entitled Http Equiv Refresh Forward is used to determine whether the page contains HTML code that has the effect of automatically loading another page. If so, then the page should also contain an explicit link to the same destination page. If the link does not exist, users wishing to access the page are forced to wait for the redirection to occur.
In order to address this problem, the evaluation software checks if page has an element similar to the following one:
<META HTTP-EQUIV="refresh" CONTENT="5; url=new-page.html:"> If so and if the page does not contain another link to 'new-page.html' then a problem is identified. Fixing this problem may require adding a new link to the destination page and in general this cannot be done automatically. The rule however offers a dialog showing the relevant information and enables the user to automatically remove the META element.
The rule entitled Http Equiv Refresh Backward determines whether the current page is the destination of some automatic redirection taking place in some other page. If so, then the current page should contain the link pointing back to such a page. This is because users with a slow connection will not be given the time to read that page and perhaps follow its links. The new page (i.e. the current one) will be automatically downloaded instead, and no links to the previous page are provided to the user.
In order to address this problem, the software first determines whether the current page can be reached by another page through a link specified as:
<META HTTP-EQUIV="refresh" CONTENT="5; url=current_page.html:"> If so and if the current page does not contain another link to that page then a problem is identified. Fixing this problem may require the addition a new link to the destination page and in general this cannot be done automatically. The rule however offers a dialog showing the relevant information and enabling the user to automatically remove the META element from the other page.
The rule entitled Unvisible Image Maps is used to determine whether in the HTML code there are structures that are useless because no user would ever see them displayed by any sort of browser. Sometimes web designers use cut-and-paste methods to create new pages from old ones. It sometimes happens that the new page contains some 'left over' elements, like image maps (i.e. sets of links) without having an associated image. Such MAP elements do not appear in any image and are useless. Unused HTML code contributes to the page size and therefore its download time.
In order to address this problem, the evaluation software is programmed to scan the page and searches for MAP elements that are not associated to any images in the page. If such elements are found, a problem is identified to the user 2. The software or the user 2 may fix the problem by deleting the unneeded element.
The rule entitled Imagemap Links is used to determine whether links present in image maps are duplicated as textual links. Links specified by the image map should also appear as normal textual links, for three reasons: (1) many users browse the web with "image loading" disabled; such users would not be able to follow your links; (2) many browsers are not capable of rendering all the graphics (e.g. speaking machines, microbrowsers on hand-held devices) and
therefore those links would not be accessible; and (3) imagemap links are not followed by all search engines, and therefore pages they point to would not be indexed.
In order to address this problem, the evaluation software is programmed to scan the page and extract all the visible AREA elements and their link lists. It then checks that these links appear also within A elements elsewhere in the page. It collects those that do not appear elsewhere and identifies them as a problem. Fixing this problem requires the addition of one or more A elements with these, links. Since user input is involved in the correction, the software presents all the relevant information to the user who then may modify the page using other tools. The rule entitled Show Links is a rule that displays some information about the page to the user. It does not identify usability problems but instead presents some statistics about the page. In particular it shows which are the links (of any sort) leaving the page. It displays their label (if textual) and their destination URL, and highlights local links.
The rule entitled Show Paths is a rule that displays some information about the page to the user. It does not identify usability problems, but instead presents some statistics about the page. In particular it shows a number (by default 50) of the shortest paths connecting the home page to the current page.
The rule entitled Show Site Navigationals is a site rule that displays some information about the site to the user. It does not identify usability problems but instead provides some statistics about the site. In particular it shows the list of local links that are important navigationals for the site. Links connecting two frames are not considered.
The rule entitled Max Clicks Number is a site rule that checks the average number of clicks on links that a user has to perform in order to reach each page. If the average number is above a predefined threshold (set in the preferences), then a problem is signaled.
If the site contains many pages that, in order to be reached from the home page, require too many links to be followed, this may deter users from reaching and reading those pages. Furthermore, it may confuse novice users because they would not be able to find what they need, leading them to conclude that the site does not contain such information. In order to address these problems, the evaluation software computes the length of the shortest path from the home to every other site page. It then computes the average. It finally compares such a value with a predefined threshold.
Such a problem is not automatically fixable by the software, as it requires major reorganization of the site. However, the software may highlight those pages that are farthest from the home page.
The rule entitled List Of Images is a site rule that displays a list of the images used by pages of the site. It displays each image's URL and the pages that use that image. No usability problem is associated with this rule. However, it is helpful to use to track images used on the web site. The rule entitled Recycle Graphics is a site rule which confirms that image files on the site can be cached by a browser. Generally, different pages pointing to the same image should refer to the same file. Images that are included in web pages, before being rendered, are cached by the user browser in order to avoid downloading them when the same image has to be displayed the next time. However, if the image is referred to by different names (i.e. different URLs), each reference is cached separately and hence has to be downloaded separately, thus increasing the download time the user has to wait.
In order to address this problem, the evaluation software generates a table which maps the name and size of every image file on a site to a list of URLs referring to those files. The
table is then used to check that image files are reused instead of copied. The software displays the images that are stored in duplicative files, including their size and the pages that use them. A dialog box can then be presented so that the user may replace or edit the URLs so that all requisite pages point to the same image file. The rule entitled Email Consistency is a site rule that checks whether each e-mail address included in a link element has the same label throughout the site. Inconsistent labels for e-mail addresses may confuse users of the site, requiring them to spend time and effort to investigate which of the apparently different links should be followed.
In order to address this problem, the evaluation software collects all the labels associated to <A HREF=MAILTO: ...> links in all the pages of the site and then compares their e-mail addresses. If two links refer to the same address but have different labels the problem is identified to the user 2. The software then presents all the addresses that have more than one label and a dialog box can be opened so that the user may choose a single label to be associated to each e-mail address, and further to replace previously used labels in some or all of the pages. The rule entitled PreFilteredPage is an HTML validation rule. It checks whether the values associated to elements attribute is correctly written. In particular it checks if the value requires double quotes. The rule also automatically fixes this problem. Certain browsers may fail to process such pages because their HTML does not satisfy the standard (HTML 3.2 or HTML 4.0). The software of the present invention scans the page and locates attribute values that are violating the above-mentioned requirement. It then may automatically add double quotes to prevent browser from accessing such commands and re-saves the source document.
The rule entitle Show Parser Errors is used to report the HTML errors that were found when parsing the page and those elements within the source document 118 that depend on the
use of non-standard HTML. If the HTML contained in the page is not conformed to a current standard then some browsers may render the page in an unpredictable way.
In order to address this problem, the evaluation software compares each HTML structure used in the page to its standard definitions which is mentioned in the DOCTYPE specification in the page. If it is missing then in the preferences the user can select what standard to use. Each error is described in a concise way, as well as its location in the HTML code. To fix these kind of problems extensive re-coding may be required. The software, therefore, enables the user to open an HTML editor positioned in the error location.
The rule entitled Marquee Is Bad confirms that the current page does not contain the MARQUEE element, which creates a scrolling text area. <MARQUEE> .. </MARQUEE> can currently only be used with INTERNET EXPLORER. It is not standard HTML 4.0 nor can it be understood by other browsers. Animated features within a page such as a marquee can also negatively affect the way in which a user looks and reads the page content. Users will be continuously distracted away from the most important page content. In addition, users browsing this page with a browser that cannot understand the MARQUEE element will be presented with a page that does not work as intended.
In order to address this problem, the program scans the page searching for marquee elements. If any are found, then a problem is identified. The software then shows in detail the MARQUEE elements that it has found and recommends that they be disabled or deleted. The rule entitled SpacerlsBad is used to determine whether the current page contains a
SPACER element, which creates a blank space. Currently, <SPACER> .. </spacer> can only be used with NETSCAPE NAVIGATOR. It is not a standard HTML 4.0 command. Thus, a page incorporating this command may not be properly displayed by some browsers.
In order to address this problem, the software scans the page searching for SPACER elements. If any are found, then a problem is identified. The software shows in detail the SPACER elements that it has found and proposes their replacements with alternative (but more standard) ways to achieve the same effect (eg. using ' ' or transparent gifs). A dialog box can be presented to a user so that they may replace the SPACER elements with selected alternatives.
The process and apparatus of the invention are further contemplated to include the following features. In a fine-tuning procedure contemplated for the present invention, a user may modify both the number of usability rules to be employed as well as the weight or severity of each rule after an initial analysis, according to process 31, is performed. The user may adjust both the number of rules used and the weight or severity of any number of the rules, and then reanalyze the web site, all in an iterative manner. This may be performed so as to tailor the analysis to web sites with particularized functionality.
The method and apparatus of the present invention is further contemplated to include a recorder feature and a downloading feature. The recorder feature can be used to receive dynamic web pages (e.g. those created by a web server on the fly) that are generated in response to user inputs. The recorder may copy the generated document and store it as a static file for analysis. The download feature can be used to download one or more source documents based on a submitted URL. The download feature may be customized to include user-definable features such as how links in the source documents are to be followed.
An authoring plug-in feature is also contemplated to be included in the present invention. This plug-in may be used to interface with web authoring software applications to receive source
document information. This feature then enables a user to fix and modify proprietary web page templates used by such wed authoring software.
A server log analysis feature is further contemplated to be used with the present invention. The server log analysis plug-in may use server logs to decipher user activity on a web site and identify problematic attributes therefrom. The server log plug-in may be adapted to accommodate any format used by a web server to monitor such user activity. The server plug-in may contain a separate set of rules from those described previously which employ server log information to identify attribute problems.
Finally, a customized rules plug-in feature allows a user to generate and use customized rules which the user may develop to identify problems with web site attributes. The method and apparatus of the present invention may be adapted to present the user with a specialized user interface for entering new rule criteria. The computer language of the new rules does not have to match the language in which the predefined rules are stored. This feature may further allow a user to remotely store new usability rules and to instruct a server performing an analysis where to locate the stored rules on a network or the Internet.
Although the invention has been described in detail in the foregoing embodiments, it is to be understood that the descriptions have been provided for purposes of illustration only and that other variations both in form and detail can be made thereupon by those skilled in the art without departing from the spirit and scope of the invention, which is defined solely by the appended claims.