US20120198342A1 - Automatic generation of task scripts from web browsing interaction history - Google Patents

Automatic generation of task scripts from web browsing interaction history Download PDF

Info

Publication number
US20120198342A1
US20120198342A1 US13/016,182 US201113016182A US2012198342A1 US 20120198342 A1 US20120198342 A1 US 20120198342A1 US 201113016182 A US201113016182 A US 201113016182A US 2012198342 A1 US2012198342 A1 US 2012198342A1
Authority
US
United States
Prior art keywords
action
web browsing
subsequences
task
web
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US13/016,182
Inventor
Jalal U. Mahmud
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Priority to US13/016,182 priority Critical patent/US20120198342A1/en
Assigned to INTERNATIONAL BUSINESS MACHNES CORPORATION reassignment INTERNATIONAL BUSINESS MACHNES CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: MAHMUD, JALAL U.
Priority to US13/561,378 priority patent/US9626159B2/en
Publication of US20120198342A1 publication Critical patent/US20120198342A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/30Creation or generation of source code
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3409Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
    • G06F11/3414Workload generation, e.g. scripts, playback
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/954Navigation, e.g. using categorised browsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3438Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment monitoring of user actions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]

Definitions

  • the present invention generally relates to web automaton systems, and more particularly relates to generating automated web browsing scripts.
  • the World Wide Web has become an integral part of our daily activities. People browse the Web for information (e.g., reading newspapers), to conduct transactions (e.g., buying a book), and so on. Most often the goal of web browsing is to accomplish a task, such as buying a book, checking flight status, or paying a bill. Each such task is a sequence of web actions, such as visiting a website, clicking a link to select a category (e.g., the fiction category on a book seller website), clicking a link to select an item (e.g., a particular book), and clicking a button to add that item to a shopping cart. The execution of the actions on the website accomplishes a goal (e.g., buying the book). Some of these tasks are performed repeatedly by users.
  • Web automaton systems allow users to record scripts while conducting such tasks.
  • the recorded scripts can be saved in a repository and reused at later times. For example, a user can create a script for “buying a book” that can be repeatedly executed later. Such a script is known as a “task-script” because the script accomplishes a task.
  • One embodiment of the present invention provides a method.
  • a web browsing interaction history of a user is analyzed to identify web browsing actions associated with web sites.
  • Abstracted action sequences for the web browsing actions that are identified are generated, and action subsequences for the abstracted action sequences are generated.
  • a similarity between each of the action subsequences is determined, and similar action subsequences are designated as a web browsing task.
  • Another embodiment of the present invention provides a computer program product comprising a computer readable storage medium having computer readable program code embodied therewith.
  • the computer readable program code comprises computer readable program code configured to analyze a web browsing interaction history of a user to identify web browsing actions associated with web sites, generate abstracted action sequences for the web browsing actions that are identified, generate action subsequences for the abstracted action sequences, determine a similarity between each of the action subsequences, and designate similar action subsequences as a web browsing task.
  • a further embodiment of the present invention provides a system that includes a task manager.
  • the task manager analyzes a web browsing interaction history of a user to identify web browsing actions associated with web sites, generates abstracted action sequences for the web browsing actions that are identified, generates action subsequences for the abstracted action sequences, determines a similarity between each of the action subsequences, and designates similar action subsequences as a web browsing task.
  • FIG. 1 is a block diagram illustrating an operating environment according to one embodiment of the present invention
  • FIG. 2 is a block diagram illustrating a more detailed view of the task identifier of FIG. 1 ;
  • FIG. 4 illustrates the segmenting of web browsing actions based on tasks according to one embodiment of the present invention
  • FIG. 5 is an operational flow diagram for identifying web browsing tasks based on repeated web browsing actions according to one embodiment of the present invention
  • Embodiments of the present invention automatically identify personalized tasks from a user's web browsing interaction history. Repeated sequences of similar actions on a single website are identified from the user's web browsing interaction history, and these sequences are labeled as a task. The identification of such tasks assists the creation of task-scripts by a web automaton system, and thus makes task-script generation easier for the user.
  • FIG. 1 illustrates an operating environment according to one embodiment of the present invention.
  • one or more user systems 102 are communicatively coupled to one or more networks 104 .
  • N web servers 106 and 108 are communicatively coupled to the network(s) 104 .
  • the network(s) 104 in this embodiment, is a wide area network, local area network, wired network, wireless network, and/or the like.
  • Each web server 106 and 108 comprises web content 110 and 112 such as websites and their web pages that are accessible by a user of the user system 102 via an application such as a web browser 114 .
  • the user system 102 comprises the web browser 114 and a task management system (task manager) 115 .
  • the task manager 115 includes a browsing monitor 116 , a task identifier 118 , a task model generator 120 , and a script generator 122 .
  • the user system 102 also comprises browsing history information 124 , web pages 126 (and their document object models (DOMs)), and task models 128 . In further embodiments, one or more of these components resides outside of the user system 102 .
  • the browsing history monitor 116 monitors the user's browsing history including various actions taken by the user with respect to the web content using the web browser 114 .
  • the browsing monitor 116 continually records web browsing history at the level of interactions, such as entering a value into a form field, turning on a checkbox, or clicking a button. This goes beyond a conventional web history interface to give the user a more complete picture of the actions performed on every web page that is visited, as compared to just recording page titles and URLs.
  • the information recorded by the browsing monitor 116 is stored as the browsing history information 124 .
  • the illustrated embodiment uses the method for recording such a browsing history that is described in Ian Liet al.
  • the task identifier 118 comprises an action analyzer 202 , a web page extractor 204 , a DOM analyzer 206 , a feature extractor 208 , a feature vector constructor 210 , a feature vector merger 212 , a label generator 214 , a subsequence generator 216 , and a subsequence comparator 218 , as shown in FIG. 2 .
  • the task identifier 118 uses these components to automatically identify personalized tasks for the user from the web browsing interaction history 124 . More specifically, the task identifier 118 identifies repeated sequences of similar actions on a single website from the user's web browsing interaction history 124 and web pages (and DOMs) 126 associated with each action. These sequences are labeled as a task (i.e., web browsing task). For example, the task identifier 118 identifies the following sequence of web actions as a task.
  • Task models 128 are created for each task.
  • the task models 128 identify other instances of the task from web interactions on the same website or other websites.
  • the script generator 122 uses these identified tasks to generate task-scripts that can be automatically performed at the website(s). More specifically, after the task is identified, the script generator 122 uses the identified task to generate a script for the actions.
  • the script is a sequence of instructions, with each instruction corresponding to an action. For example, the following script is generated for the exemplary sequence of web actions listed.
  • the illustrated embodiment uses the script generator that is described in Gilly Leshed et al. “CoScripter: automating & sharing how-to knowledge in the enterprise”. Other script generators are used in further embodiments of the present invention.
  • Identifying personalized tasks from a user's web browsing interaction history enables automatic creation of task specific scripts for later execution by a web automaton tool. Such scripts can later be reused by the same user or by other users. Also, a user can easily bootstrap their personalized task-script repository to have the full benefit of existing web automaton systems. Even further, task inference from a user's web browsing interaction history 124 can be used in creating a user's personal profile. For example, keywords identified from the personalized task scripts can be added to a user's interest profile. For example, if the keywords “book” and “buy” are identified from a user's task script, then those can be added to user's interest profile. This can also be used to categorize the user as a frequent book buyer. Thus, task inference can assist the building of a task-based profile for the user, which can be used by adaptive and context-aware systems, social networking applications, and mobile applications.
  • FIG. 3 shows an exemplary web browsing interaction history that is used in one embodiment of the present invention.
  • a user interface 302 displays the user's web browsing interaction history.
  • web browsing session information 304 is displayed in a first area 306 of the user interface 302 .
  • Actions 308 performed at a web page of a website, time information 310 , web page title information 312 , and web page URL information 314 are displayed in a second area 316 of the interface 302 .
  • Other types of information can also be maintained and displayed in the web browsing interaction history 124 .
  • the task identifier 118 segments the interaction history 124 by websites and sessions.
  • FIG. 4 illustrates the segmentation of web browsing actions based on tasks in this embodiment.
  • the exemplary segmented interaction history 124 includes the following sequence of actions: clicking on a “IBMemail” link 402 , entering a user name 404 , entering a password 406 , and clicking a sign in button 408 . After such segmentation, there are one or more sequences of actions for each website.
  • the task identifier 118 via the subsequence comparator 218 , computes a similarity characteristic of the subsequences and identifies repeating subsequences based on this similarity. Each repeating subsequence that is identified is an “identified task”.
  • the similarity computation of subsequences considers two subsequences as similar if: (1) they are identical (i.e., they are the same sequences of feature-vector class labels), or (2) one of the subsequences is a generalization of the other. In this embodiment, the following heuristic is used to determine generalization, with p being the prefix, m being the middle part, and s being the suffix.
  • the third actions of both subsequences are put into the same cluster because they share similar words in the action node as well as in their context.
  • the fourth actions are also clustered together because of contextual similarity.
  • These two abstracted sequences are considered to be equal because the first is a generalization of the second. As a result, these are identified as sequences corresponding to a task.
  • the script generator 122 can generate executable scripts from the action sequences corresponding to the tasks.
  • These action sequences that are identified as a task can also be displayed to the user via another portion 420 of the user interface 302 , as shown in FIG. 4 .
  • This “task view” shows to the user a given task 422 (such as “login”) and the associated activities 402 , 404 , 406 , and 408 that have been identified as an instance of this task 422 .
  • the task model generator 120 constructs a task model 128 for each of the identified tasks.
  • the model 128 comprises all of the identified action sequences that are an instance of this task and is used to identify future sequences of actions as an instance of this task.
  • the model 128 also comprises feature classifiers that classify the features extracted from the user's actions into an abstracted action, and the generalization heuristics are used to compute the similarity of abstracted action sequences.
  • An exemplary task model 128 for the two sequences above is the following two action sequences (listed with their scripts).
  • the feature classifier classifies the actions for this task model into a set of clusters that each correspond to an action-class label.
  • the feature classifier places an action (i.e., interaction on a web page element) into a cluster that classifies the action as an abstracted-action with a corresponding action-class label.
  • the feature classifier places the actions into six clusters. After classification, the following two abstracted sequences are obtained from the two action sequences.
  • a subsequence pm + s is a generalization of the subsequence pmms
  • a subsequence pm(ym) + s is a generalization of the subsequence pmyms
  • a subsequence p(my) + ms is generalization of the subsequence pmyms.
  • the tester After a tester has task models for each task (across multiple websites in the similar domain and across multiple users), and a user's browsing history for the website for which test cases are to be developed, the tester can use task models to identify task-specific actions from the browsing history and generate scripts from them automatically. Thus, the burden of manual test script generation is avoided.
  • the following is performed for each task model to construct the script for a website.
  • the task management system 115 loads the home page of the website. Then for each actionable element of the current web-page, the task management system 115 obtains the features and classifies the features with the feature labels for that task model.
  • the actionable elements that are classified to some action-class for that task model form candidate sets of relevant actionable elements on the web-page. An abstracted subsequence of actions is constructed for each of these candidate sets of relevant actionable elements.
  • the action class for the first two links is l2 and the last link is l3, and the abstracted action-class label l1 corresponds to the action of visiting the website of “xyz.com”. So the abstracted subsequence constructed is l1.l2 for the first link; l1.l2 for the second link; and l1.l3 for the third link.
  • the current abstracted subsequences are compared to the abstracted sequences stored in the task model to determine if the current abstracted subsequences are a prefix of any of the abstracted subsequences in the task model.
  • the prefix comparison also uses generalization heuristics.
  • the first two abstracted subsequences are prefixes of the following abstracted sequence: l1 l2 l3 l4 l5 l5 l6. Therefore, both links are selected.
  • the task management system 115 then follows the first link, loads the corresponding page, and uses the algorithm discussed above to synthesize scripts.
  • the task management system 115 follows the second link, loads the next page and uses the above algorithm to synthesize scripts.
  • personalized tasks are automatically identified from a user's web browsing interaction history. Repeated sequences of similar actions on a single website are identified from the user's web browsing interaction history. These sequences are labeled as a task, which assists in the creation of task-scripts by web automaton systems and lowers the barrier to task-script generation.
  • FIG. 5 is an operational flow diagram illustrating a process for identifying tasks based on repeated action sequences according to one embodiment of the present invention.
  • the operational flow diagram of FIG. 5 begins at step 500 and flows directly to step 502 .
  • the task management system 115 receives a user's web browsing interaction history 124 , at step 502 .
  • the task management system 115 extracts web pages 126 associated with actions identified in the interaction history 124 , at step 504 .
  • aspects of the present invention may be embodied as a system, method or computer program product. Accordingly, aspects of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “circuit,” “module” or “system.” Furthermore, aspects of the present invention may take the form of a computer program product embodied in one or more computer readable medium(s) having computer readable program code embodied thereon.
  • the computer readable medium may be a computer readable signal medium or a computer readable storage medium.
  • a computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing.
  • a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
  • a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof.
  • a computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
  • Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
  • These computer program instructions may also be stored in a computer readable medium that can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer readable medium produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
  • the computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
  • the information processing system 700 includes a computer 702 .
  • the computer 702 has a processor(s) 704 that is connected to a main memory 706 , mass storage interface 708 , and network adapter hardware 710 .
  • a system bus 712 interconnects these system components.
  • Only one CPU 704 is illustrated for computer 702 , computer systems with multiple CPUs can be used equally effectively.
  • the main memory 706 in this embodiment, comprises the task manager 115 and its components, the browsing interaction history 124 , the web pages 126 , and the task models 128 .
  • the mass storage interface 708 is used to connect mass storage devices, such as mass storage device 714 , to the information processing system 700 .
  • mass storage devices such as mass storage device 714
  • One specific type of data storage device is an optical drive such as a CD/DVD drive, which can be used to store data to and read data from a computer readable medium or storage product such as (but not limited to) a CD/DVD 716 .
  • Another type of data storage device is a data storage device configured to support, for example, NTFS type file system operations.
  • An operating system included in the main memory is a suitable multitasking operating system such as any of the Linux, UNIX, Windows, and Windows Server based operating systems. Embodiments of the present invention are also able to use any other suitable operating system. Some embodiments of the present invention utilize architectures, such as an object oriented framework mechanism, that allows instructions of the components of operating system to be executed on any processor located within the information processing system 700 .
  • the network adapter hardware 710 is used to provide an interface to a network 104 . Embodiments of the present invention are able to be adapted to work with any data communications connections including present day analog and/or digital techniques or via a future networking mechanism.
  • each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s).
  • the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.

Abstract

Embodiments of the invention relate to automatically identifying web browsing tasks based on a web browsing interaction history. According to one embodiment of the invention, a web browsing interaction history of a user is analyzed to identify web browsing actions associated with web sites. Abstracted action sequences for the web browsing actions that are identified are generated, and action subsequences for the abstracted action sequences are generated. A similarity between each of the action subsequences is determined, and similar action subsequences are designated as a web browsing task.

Description

    BACKGROUND
  • The present invention generally relates to web automaton systems, and more particularly relates to generating automated web browsing scripts.
  • The World Wide Web has become an integral part of our daily activities. People browse the Web for information (e.g., reading newspapers), to conduct transactions (e.g., buying a book), and so on. Most often the goal of web browsing is to accomplish a task, such as buying a book, checking flight status, or paying a bill. Each such task is a sequence of web actions, such as visiting a website, clicking a link to select a category (e.g., the fiction category on a book seller website), clicking a link to select an item (e.g., a particular book), and clicking a button to add that item to a shopping cart. The execution of the actions on the website accomplishes a goal (e.g., buying the book). Some of these tasks are performed repeatedly by users. Web automaton systems allow users to record scripts while conducting such tasks. The recorded scripts can be saved in a repository and reused at later times. For example, a user can create a script for “buying a book” that can be repeatedly executed later. Such a script is known as a “task-script” because the script accomplishes a task.
  • BRIEF SUMMARY
  • One embodiment of the present invention provides a method. According to the method, a web browsing interaction history of a user is analyzed to identify web browsing actions associated with web sites. Abstracted action sequences for the web browsing actions that are identified are generated, and action subsequences for the abstracted action sequences are generated. A similarity between each of the action subsequences is determined, and similar action subsequences are designated as a web browsing task.
  • Another embodiment of the present invention provides a computer program product comprising a computer readable storage medium having computer readable program code embodied therewith. The computer readable program code comprises computer readable program code configured to analyze a web browsing interaction history of a user to identify web browsing actions associated with web sites, generate abstracted action sequences for the web browsing actions that are identified, generate action subsequences for the abstracted action sequences, determine a similarity between each of the action subsequences, and designate similar action subsequences as a web browsing task.
  • A further embodiment of the present invention provides a system that includes a task manager. The task manager analyzes a web browsing interaction history of a user to identify web browsing actions associated with web sites, generates abstracted action sequences for the web browsing actions that are identified, generates action subsequences for the abstracted action sequences, determines a similarity between each of the action subsequences, and designates similar action subsequences as a web browsing task.
  • Other objects, features, and advantages of the present invention will become apparent from the following detailed description. It should be understood, however, that the detailed description and specific examples, while indicating various embodiments of the present invention, are given by way of illustration only and various modifications may naturally be performed without deviating from the present invention.
  • BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS
  • FIG. 1 is a block diagram illustrating an operating environment according to one embodiment of the present invention;
  • FIG. 2 is a block diagram illustrating a more detailed view of the task identifier of FIG. 1;
  • FIG. 3 shows an exemplary web browsing interaction history that is used in one embodiment of the present invention;
  • FIG. 4 illustrates the segmenting of web browsing actions based on tasks according to one embodiment of the present invention;
  • FIG. 5 is an operational flow diagram for identifying web browsing tasks based on repeated web browsing actions according to one embodiment of the present invention;
  • FIG. 6 is an operational flow diagram for identifying new action sequences as an instance of an existing task based on a task model according to one embodiment of the present invention; and
  • FIG. 7 is a block diagram illustrating an information processing system applicable to embodiments of the present invention.
  • DETAILED DESCRIPTION
  • Various embodiments of the present invention will be discussed in detail herein below with reference to the attached drawings.
  • Web automaton systems allow users to record scripts while conducting a task. One of the key benefits of such systems is that a user can reuse a script recorded by another user. However, manually creating and sharing scripts has limitations. Most often a user has personalized task needs for which no scripts have been created by other users. For example, a user may regularly visit a travel website to check airline ticket prices, and another user may not have created and shared a script for accomplishing this task. In this situation, the user has to manually create the script. Similarly, if the user frequently checks airline ticket prices on different websites and another user has not created a script for those websites, the user has to create a script for each of the websites in order to reuse them later. Although some conventional web automaton systems facilitate the recording of scripts, this is a labor intensive process. As a result, many users do not record scripts and thus cannot take advantage of conventional web automaton systems.
  • In some conventional systems, a user can manually select actions from an interaction history in order to create a script. Thus, with such systems, a user does not need to repeat what they have already done. In particular, if the user visited a particular website and performed a task, then the user can later manually inspect their web browsing interaction history, select the actions for which they want to create a script, and create the script. However, a great deal of manual effort is required to find the correct actions in the web browsing interaction history for creating the scripts. And more manual effort is required when the total number of interactions in the browsing history is large. Additionally, such a manual approach is not scalable across websites. The user has to manually create a task script for each of the websites, even though the scripts perform a similar task on the different websites.
  • Embodiments of the present invention automatically identify personalized tasks from a user's web browsing interaction history. Repeated sequences of similar actions on a single website are identified from the user's web browsing interaction history, and these sequences are labeled as a task. The identification of such tasks assists the creation of task-scripts by a web automaton system, and thus makes task-script generation easier for the user.
  • FIG. 1 illustrates an operating environment according to one embodiment of the present invention. As shown, one or more user systems 102 are communicatively coupled to one or more networks 104. Additionally, N web servers 106 and 108 are communicatively coupled to the network(s) 104. The network(s) 104, in this embodiment, is a wide area network, local area network, wired network, wireless network, and/or the like. Each web server 106 and 108 comprises web content 110 and 112 such as websites and their web pages that are accessible by a user of the user system 102 via an application such as a web browser 114.
  • The user system 102 comprises the web browser 114 and a task management system (task manager) 115. The task manager 115 includes a browsing monitor 116, a task identifier 118, a task model generator 120, and a script generator 122. The user system 102 also comprises browsing history information 124, web pages 126 (and their document object models (DOMs)), and task models 128. In further embodiments, one or more of these components resides outside of the user system 102.
  • The browsing history monitor 116 monitors the user's browsing history including various actions taken by the user with respect to the web content using the web browser 114. The browsing monitor 116 continually records web browsing history at the level of interactions, such as entering a value into a form field, turning on a checkbox, or clicking a button. This goes beyond a conventional web history interface to give the user a more complete picture of the actions performed on every web page that is visited, as compared to just recording page titles and URLs. The information recorded by the browsing monitor 116 is stored as the browsing history information 124. The illustrated embodiment uses the method for recording such a browsing history that is described in Ian Liet al. “Here's what I did: sharing and reusing web activity with ActionShot” (CHI 2010: Proceedings of the 28th international conference on Human factors in computing systems, 2010, pp. 723-732), which is herein incorporated by reference in its entirety. Other methods for recording a user's web browsing history are used in further embodiments of the present invention.
  • The task identifier 118 comprises an action analyzer 202, a web page extractor 204, a DOM analyzer 206, a feature extractor 208, a feature vector constructor 210, a feature vector merger 212, a label generator 214, a subsequence generator 216, and a subsequence comparator 218, as shown in FIG. 2. The task identifier 118 uses these components to automatically identify personalized tasks for the user from the web browsing interaction history 124. More specifically, the task identifier 118 identifies repeated sequences of similar actions on a single website from the user's web browsing interaction history 124 and web pages (and DOMs) 126 associated with each action. These sequences are labeled as a task (i.e., web browsing task). For example, the task identifier 118 identifies the following sequence of web actions as a task.
  • visiting the website “www.abc.com”
  • clicking the link “tv”
  • clicking the link “lcd tv”
  • clicking the link “brand1 lcd”
  • clicking the button “add to shopping cart”
  • clicking the “check out” button
  • Task models 128 are created for each task. The task models 128 identify other instances of the task from web interactions on the same website or other websites. The script generator 122 uses these identified tasks to generate task-scripts that can be automatically performed at the website(s). More specifically, after the task is identified, the script generator 122 uses the identified task to generate a script for the actions. The script is a sequence of instructions, with each instruction corresponding to an action. For example, the following script is generated for the exemplary sequence of web actions listed.
  • go to “www.abc.com”
  • click the “tv” link
  • click the “lcd tv” link
  • click the “brand1 lcd” link
  • click the “add to shopping cart” button
  • click the “check out” button
  • The illustrated embodiment uses the script generator that is described in Gilly Leshed et al. “CoScripter: automating & sharing how-to knowledge in the enterprise”. Other script generators are used in further embodiments of the present invention.
  • Identifying personalized tasks from a user's web browsing interaction history enables automatic creation of task specific scripts for later execution by a web automaton tool. Such scripts can later be reused by the same user or by other users. Also, a user can easily bootstrap their personalized task-script repository to have the full benefit of existing web automaton systems. Even further, task inference from a user's web browsing interaction history 124 can be used in creating a user's personal profile. For example, keywords identified from the personalized task scripts can be added to a user's interest profile. For example, if the keywords “book” and “buy” are identified from a user's task script, then those can be added to user's interest profile. This can also be used to categorize the user as a frequent book buyer. Thus, task inference can assist the building of a task-based profile for the user, which can be used by adaptive and context-aware systems, social networking applications, and mobile applications.
  • The following is a more detailed discussion on generating scripts from tasks that are identified from repeated action sequences. To identify tasks from a user's web interaction history 124, the action analyzer 202 analyzes the interaction history 124. FIG. 3 shows an exemplary web browsing interaction history that is used in one embodiment of the present invention. A user interface 302 displays the user's web browsing interaction history. In this embodiment, web browsing session information 304 is displayed in a first area 306 of the user interface 302. Actions 308 performed at a web page of a website, time information 310, web page title information 312, and web page URL information 314 are displayed in a second area 316 of the interface 302. Other types of information can also be maintained and displayed in the web browsing interaction history 124.
  • In this embodiment, the task identifier 118 segments the interaction history 124 by websites and sessions. FIG. 4 illustrates the segmentation of web browsing actions based on tasks in this embodiment. The exemplary segmented interaction history 124 includes the following sequence of actions: clicking on a “IBMemail” link 402, entering a user name 404, entering a password 406, and clicking a sign in button 408. After such segmentation, there are one or more sequences of actions for each website.
  • For each action in each sequence of actions Sj for each website Wi identified in the interaction history 124, the following is performed by the task identifier 118.
      • (1) The task identifier 118, via the web page extractor 204, extracts an associated web page 126 for each action. These web pages 126 are extracted from the user's web browsing history 124. For example, for the actions 402, 404, 406, and 408 shown in FIG. 4, the corresponding web pages 412, 414, 416, and 418 are respectively extracted. Each such web page has a DOM that is used for further analysis.
      • (2) For each action, the task identifier 118, via the DOM analyzer 206, also identifies the node for the web object (e.g., the “sign in” link) that was accessed during the web action from the DOM of the web page 126. In this embodiment, an x-path expression for each web object is present in the web browsing history 124. This makes the retrieval of the node corresponding to the web object straightforward. The identified node is referred to as an “action node”. In an alternative embodiment, the DOM analyzer 206 searches the DOM to find the node that has a matching object type (e.g., link) and a matching object label (e.g., “lord of the rings”).
      • (3) After the action node for each web object is identified from the extracted web pages 126, the task identifier 118, via the feature extractor 208, extracts features from the action node and surrounding nodes that share similar textual context (i.e., context nodes). In this embodiment, the extracted features for each action node are words, phrases (bi-grams and tri-grams), and object type (e.g., button), and the extracted features for context nodes are words and phrases (bi-grams and tri-grams). The illustrated embodiment performs contextual analysis for web browsing according to the method in Jalal Mahmud et al. “Csurf: a context-driven non-visual web-browser” (WWW '07: Proceedings of the 16th international conference on World Wide Web, 2007, pp. 31-40), which is herein incorporated by reference in its entirety
      • (4) After features are extracted, the task identifier 118, via the feature vector constructor 210, constructs a feature vector for each action. For example, if the action is “click the ‘check out’ button”, and from the context of the ‘ok’ button, the following text is extracted {shipping, delivery}. Then, the feature vector is <click, button, ok, shipping, delivery>.
      • (5) After the feature vector is constructed for each action in each sequence for a website, the task identifier 118 performs a clustering process that merges similar feature vectors into a single cluster. For example, the two feature vectors <click, button, sign, in > and <click, button, sign, in, now> are merged into a single cluster. Similarly, the two feature vectors <click, radiobutton, check, out, now> and <turn, on, radiobutton, check, out> are merged into the same cluster. For the similarity computation required by the clustering process, cosine similarity of vectors is used in this embodiment. Each cluster serves as a feature vector class which is used as a classifier in the categorizing of the feature vectors. The task identifier 118, via the label generator 214, generates and assigns an action-class label (i.e., “feature-vector class label”) to each of the clusters.
  • After the clustering process has completed, the task identifier 118 replaces each action (such as action 402) with its associated feature-vector class label and generates a sequence of feature-vector class labels for each action sequence. Therefore, for each sequence Sj for each website Wi, an abstracted sequence F(Sj) is generated with each action ak in the sequence being replaced with F(ak), where F(ak)=lk is the label of the feature-vector class of action ak. If the length of the abstracted sequence F(Sj) is m, then the task identifier 118, via the subsequence generator 216, generates the following m subsequences.
  • I1
    I1 I2
    I1 I2 I3
    . . .
    I1 I2 I3 . . . Im
  • After the subsequences have been generated for each abstracted sequence for the website, the task identifier 118, via the subsequence comparator 218, computes a similarity characteristic of the subsequences and identifies repeating subsequences based on this similarity. Each repeating subsequence that is identified is an “identified task”. The similarity computation of subsequences considers two subsequences as similar if: (1) they are identical (i.e., they are the same sequences of feature-vector class labels), or (2) one of the subsequences is a generalization of the other. In this embodiment, the following heuristic is used to determine generalization, with p being the prefix, m being the middle part, and s being the suffix.
      • A subsequence pm+s is a generalization of the subsequence pmms.
  • These generalization heuristics are based on the observation that if a sequence of actions completes a task, then adding subsequences of repeated actions also completes the task. For example, consider the following two subsequences of actions for the website “abc.com”.
  • Subsequence 1:
      • visiting the website www.abc.com
      • clicking the link “tv”
      • clicking the link “lcd tv”
      • clicking the link “brand1 lcd tv”
      • clicking the button “add to shopping cart”
      • clicking the link “view shopping cart”
      • clicking the “check out” button
  • Subsequence 2:
  • visiting the website www.abc.com
  • clicking the link “tv”
  • clicking the link “plasma tv”
  • clicking the link “brand2 plasma tv”
  • clicking the button “add to cart”
  • clicking the “check out” button
  • After clustering of the feature vectors for the actions in these subsequences, the following abstracted sequences are obtained.
  • Abstracted Sequence 1: l1 l2 l3 l4 l5 l5 l6
  • Abstracted Sequence 2: l1 l2 l3 l4 l5 l6
  • In this example, the third actions of both subsequences are put into the same cluster because they share similar words in the action node as well as in their context. The fourth actions are also clustered together because of contextual similarity. These two abstracted sequences are considered to be equal because the first is a generalization of the second. As a result, these are identified as sequences corresponding to a task. Once a sequence of actions is identified as an instance of a task, the script generator 122 can generate executable scripts from the action sequences corresponding to the tasks. These action sequences that are identified as a task can also be displayed to the user via another portion 420 of the user interface 302, as shown in FIG. 4. This “task view” shows to the user a given task 422 (such as “login”) and the associated activities 402, 404, 406, and 408 that have been identified as an instance of this task 422.
  • After the task sequences are identified, the task model generator 120 constructs a task model 128 for each of the identified tasks. The model 128 comprises all of the identified action sequences that are an instance of this task and is used to identify future sequences of actions as an instance of this task. The model 128 also comprises feature classifiers that classify the features extracted from the user's actions into an abstracted action, and the generalization heuristics are used to compute the similarity of abstracted action sequences. An exemplary task model 128 for the two sequences above is the following two action sequences (listed with their scripts).
  • Action Sequence 1:
  • Script
      • visiting the website www.abc.com go to “www.abc.com”
      • clicking the link “tv” click the “tv” link
      • clicking the link “lcd tv” click the “lcd tv” link
      • clicking the link “brand1 lcd tv” click the “brand1 lcd tv” link
      • clicking the button “add to shopping cart” click the “add to shopping cart” button
      • clicking the link “view shopping cart” click the “view shopping cart” link
      • clicking the “check out” button click the “check out” button
  • Action Sequence 2:
  • Script
      • visiting the website www.abc.com go to “www.abc.com”
      • clicking the link “tv” click the “tv” link
      • clicking the link “plasma tv” click the “plasma tv” link
      • clicking the link “brand2 plasma tv” click the “brand2 plasma tv” link
      • clicking the button “add to cart” click the “add to cart” button
      • clicking the “check out” button click the “check out” button
  • The feature classifier classifies the actions for this task model into a set of clusters that each correspond to an action-class label. In particular, the feature classifier places an action (i.e., interaction on a web page element) into a cluster that classifies the action as an abstracted-action with a corresponding action-class label. For the above example, the feature classifier places the actions into six clusters. After classification, the following two abstracted sequences are obtained from the two action sequences.
  • Abstracted Sequence 1: l1 l2 l3 l4 l5 l5 l6
  • Abstracted Sequence 2: l1 l2 l3 l4 l5 l6
  • The generalization heuristics for this task model are:
  • a) a subsequence pm+s is a generalization of the subsequence pmms;
  • b) a subsequence pm(ym)+s is a generalization of the subsequence pmyms; and
  • c) a subsequence p(my)+ms is generalization of the subsequence pmyms.
  • This process for identifying action sequences associated with a task is used to determine whether or not new action sequences are instances of a task based on the task models 128. A website specific task model constructed for a website helps identify new action sequences as instances of that task. This facilitates generating scripts for new action sequences. In this way, a user can bootstrap their personalized script repository.
  • For example, to identify whether a sequence of actions that is not currently identified as an instance of a given task is an instance of a given task, the following is preformed. First, the task identifier 118 applies the feature extraction technique described above to the identified sequence of actions to generate a feature vector for each action in the sequence. The task model generator 120 then classifies the feature vectors using the feature vector class labels that were constructed as a result of the clustering performed to construct task models, as described above. If a feature vector is not mapped/classified to any of the clusters, then the task identifier 118 creates a new cluster for it. After this is done, the task identifier 118 generates an abstracted sequence from the action sequence by replacing each action with its feature vector cluster label. Next, the task identifier 118 computes its similarity with the sequences stored in the task models 128. To determine whether two sequences are similar, the generalization heuristics described above are used. A sequence is identified as an instance of a task if the sequence is found to be similar to any of the sequences stored in the associated task model 128. In some instances, a sequence may be identified as an instance of multiple tasks. After a sequence of actions is identified as an instance of a task, the script generator 122 can generate executable scripts from the action sequences corresponding to the task.
  • In addition, the task models can be built across websites and across multiple users so that test cases can be automatically generated for testing web sites. A detailed discussion on testing websites is given in Jeffrey Bigham et al. “TrailBlazer: Enabling Blind Users to Blaze Trails Through the Web” (IUI '09: Proceedings of the 13th international conference on Intelligent user interfaces, 2009, pp. 177-186), which is herein incorporated by reference in its entirety.
  • After a tester has task models for each task (across multiple websites in the similar domain and across multiple users), and a user's browsing history for the website for which test cases are to be developed, the tester can use task models to identify task-specific actions from the browsing history and generate scripts from them automatically. Thus, the burden of manual test script generation is avoided.
  • The task models 128 also help a website tester that uses the task models constructed from multiple websites. If the tester does not have the browsing history of a user for the website for which they want to automatically generate test cases, the tester can still use task models to automatically construct scripts from the website.
  • In one embodiment, the following is performed for each task model to construct the script for a website. The task management system 115 loads the home page of the website. Then for each actionable element of the current web-page, the task management system 115 obtains the features and classifies the features with the feature labels for that task model. The actionable elements that are classified to some action-class for that task model form candidate sets of relevant actionable elements on the web-page. An abstracted subsequence of actions is constructed for each of these candidate sets of relevant actionable elements. For example, assume that the website for which a tester wants to construct a task model is “www.xyz.com” and the following links are classified as an instance of action-class: “lcd tv” link, “flat screen tv” link, and “plasma tv and others” link.
  • The action class for the first two links is l2 and the last link is l3, and the abstracted action-class label l1 corresponds to the action of visiting the website of “xyz.com”. So the abstracted subsequence constructed is l1.l2 for the first link; l1.l2 for the second link; and l1.l3 for the third link. The current abstracted subsequences are compared to the abstracted sequences stored in the task model to determine if the current abstracted subsequences are a prefix of any of the abstracted subsequences in the task model. The prefix comparison also uses generalization heuristics.
  • For illustration, the first two abstracted subsequences are prefixes of the following abstracted sequence: l1 l2 l3 l4 l5 l5 l6. Therefore, both links are selected. The task management system 115 then follows the first link, loads the corresponding page, and uses the algorithm discussed above to synthesize scripts. Next, the task management system 115 follows the second link, loads the next page and uses the above algorithm to synthesize scripts. In this example, once the first link is selected, the task management system 115 can generate the scripts that correspond to the following abstracted sequences: l1 l2 l2 l3 l4 l5 l6, and l1 l2 l3 l3 l4 l4 l3 l4 l5 l6. Once the second link is selected, the task management system 115 can generate the scripts that correspond to the following abstracted sequence l1 l2 l3 l4 l3 l4 l5 l6. An example of script fragments generated for the abstracted sequences of l1 l2 l2 l3 l4 l5 l6, and l1 l2 l3 l3 l4 l4 l3 l4 l5 l is shown in TABLE 1 below.
  • TABLE 1
    For Abstracted Sequence 1 For Abstracted Sequence 2
    go to “xyz.com” go to “xyz.com”
    click the “lcd tv link” click the “flat screen tv link”
    . . . . . .
    . . . . . .
  • Thus, personalized tasks are automatically identified from a user's web browsing interaction history. Repeated sequences of similar actions on a single website are identified from the user's web browsing interaction history. These sequences are labeled as a task, which assists in the creation of task-scripts by web automaton systems and lowers the barrier to task-script generation.
  • FIG. 5 is an operational flow diagram illustrating a process for identifying tasks based on repeated action sequences according to one embodiment of the present invention. The operational flow diagram of FIG. 5 begins at step 500 and flows directly to step 502. The task management system 115 receives a user's web browsing interaction history 124, at step 502. The task management system 115 extracts web pages 126 associated with actions identified in the interaction history 124, at step 504.
  • The task management system 115 identifies, for each identified action, a DOM node for each web object accessed during the action, at step 506. The task management system 115 extracts features associated with each node and constructs features vectors for the actions, at step 508. The task management system 115 clusters the feature vectors to map each action to an action-class in order to obtain an abstracted sequence of actions, at step 510. The task management system 115 computes a set of action subsequences for each abstracted sequence of actions and determines the similarity of the action subsequences, at step 512. The task management system 115 then identifies repeating action sequences and designates these repeating sequences as a task, at step 514. The control flow then exits at step 516.
  • FIG. 6 is an operational flow diagram illustrating a process for determining if new action sequences are an instance of an existing task based on a task model according to one embodiment of the present invention. The operational flow diagram of FIG. 6 begins at step 602 and flows directly to step 604. The task management system 115 identifies a new set of actions based on a user's web browser interaction history 124, at step 604. The task management system 115 retrieves one or more task models 128, at step 606. The task management system 115 generates feature vectors for each action in the set of actions based on web pages 126 associated with the action, at step 608.
  • The task management system 115 classifies the feature vectors using feature vector cluster labels associated with the task model 128, at step 610. The task management system 115 generates an abstracted action sequence based on the set of actions by replacing each action with its feature vector cluster label, at step 612. The task management system 115 computes a similarity between each abstracted action and each action in the task model 128, at step 614. The task management system 115 identifies an action corresponding to an abstracted action that is similar to an action in the task model 128 as an instance of the task associated with the task model 128. The control flow then exits at step 618.
  • As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or computer program product. Accordingly, aspects of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “circuit,” “module” or “system.” Furthermore, aspects of the present invention may take the form of a computer program product embodied in one or more computer readable medium(s) having computer readable program code embodied thereon.
  • Any combination of one or more computer readable medium(s) may be utilized. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
  • A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
  • Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
  • Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C++ or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
  • Aspects of the present invention are described below with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
  • These computer program instructions may also be stored in a computer readable medium that can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer readable medium produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
  • The computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
  • Referring now to FIG. 7, this figure is a block diagram illustrating an information processing system that can be utilized in embodiments of the present invention. The information processing system 700 is based upon a suitably configured processing system adapted to implement one or more embodiments of the present invention (e.g., the user system 102 of FIG. 1). Any suitably configured processing system can be used as the information processing system 700 in embodiments of the present invention.
  • The information processing system 700 includes a computer 702. The computer 702 has a processor(s) 704 that is connected to a main memory 706, mass storage interface 708, and network adapter hardware 710. A system bus 712 interconnects these system components. Although only one CPU 704 is illustrated for computer 702, computer systems with multiple CPUs can be used equally effectively. The main memory 706, in this embodiment, comprises the task manager 115 and its components, the browsing interaction history 124, the web pages 126, and the task models 128.
  • The mass storage interface 708 is used to connect mass storage devices, such as mass storage device 714, to the information processing system 700. One specific type of data storage device is an optical drive such as a CD/DVD drive, which can be used to store data to and read data from a computer readable medium or storage product such as (but not limited to) a CD/DVD 716. Another type of data storage device is a data storage device configured to support, for example, NTFS type file system operations.
  • An operating system included in the main memory is a suitable multitasking operating system such as any of the Linux, UNIX, Windows, and Windows Server based operating systems. Embodiments of the present invention are also able to use any other suitable operating system. Some embodiments of the present invention utilize architectures, such as an object oriented framework mechanism, that allows instructions of the components of operating system to be executed on any processor located within the information processing system 700. The network adapter hardware 710 is used to provide an interface to a network 104. Embodiments of the present invention are able to be adapted to work with any data communications connections including present day analog and/or digital techniques or via a future networking mechanism.
  • The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Claims (20)

1. A method comprising:
analyzing a web browsing interaction history of a user to identify a plurality of web browsing actions associated with a plurality of web sites;
generating a plurality of abstracted action sequences for the web browsing actions that are identified;
generating a plurality of action subsequences for the abstracted action sequences;
determining a similarity between each of the action subsequences in the plurality of action subsequences; and
designating similar action subsequences as a web browsing task.
2. The method of claim 1, wherein generating the plurality of abstracted action sequences comprises, for each of the web browsing actions that is identified:
extracting a web page on which the web browsing action was performed;
analyzing a document object model associated with the web page that is extracted; and
identifying a node in the document object model for a web object accessed by the web browsing action.
3. The method of claim 2, wherein generating the plurality of abstracted action sequences further comprises, for each node that is identified:
extracting a first set of features from the node; and
extracting a second set of features from contextual nodes that have substantially similar textual content as the node.
4. The method of claim 3, wherein generating the plurality of abstracted action sequences further comprises:
constructing a plurality of feature vectors for each of the web browsing actions that is identified based on the first set of features and the second set of features that are extracted;
creating a plurality of clusters, each of the clusters comprising web browsing actions with similar feature vectors;
generating an action-class label for each of the clusters; and
replacing each of the web browsing action in each of the clusters with the action-class label for that cluster.
5. The method of claim 4, wherein determining the similarity between each of the action subsequences comprises:
determining if one of the action subsequences comprises substantially identical sequences of action-class labels as one or more other of the action subsequences; and
determining if the one action subsequence is a generalization of the one or more other action subsequences.
6. The method of claim 1, further comprising:
automatically generating a set of executable scripts based on the similar action subsequences designated as the web browsing task.
7. The method of claim 1, further comprising:
generating a web browsing task model based on the similar action subsequences designated as the web browsing task, the web browsing task model comprising the similar action subsequences, a plurality of feature classifiers that classify features associated with each of the web browsing actions in the similar action subsequences, and a plurality of generalization heuristics that compute similarity of abstracted action sequences.
8. The method of claim 7, further comprising:
identifying a set of web browsing actions from the web browsing interaction history;
comparing the set of web browsing actions to the similar action subsequences in the web browsing task model; and
designating the set of web browsing actions as an instance of the web browsing task if the set of web browsing actions is substantially similar to the similar action subsequences in the web browsing task model.
9. A computer program product comprising a computer readable storage medium having computer readable program code embodied therewith, the computer readable program code comprising:
computer readable program code configured to:
analyze a web browsing interaction history of a user to identify a plurality of web browsing actions associated with a plurality of web sites;
generate a plurality of abstracted action sequences for the web browsing actions that are identified;
generate a plurality of action subsequences for the abstracted action sequences;
determine a similarity between each of the action subsequences in the plurality of action subsequences; and
designate similar action subsequences as a web browsing task.
10. The non-transitory computer readable medium of claim 9, wherein generating the plurality of abstracted action sequences comprises, for each of the web browsing actions that is identified:
extracting a web page on which the web browsing action was performed;
analyzing a document object model associated with the web page that is extracted; and
identifying a node in the document object model for a web object accessed by the web browsing action.
11. The non-transitory computer readable medium of claim 10, wherein generating the plurality of abstracted action sequences further comprises, for each node that is identified:
extracting a first set of features from the node; and
extracting a second set of features from contextual nodes that have substantially similar textual content as the node.
12. The non-transitory computer readable medium of claim 11, wherein generating the plurality of abstracted action sequences further comprises:
constructing a plurality of feature vectors for each of the web browsing actions that is identified based on the first set of features and the second set of features that are extracted;
creating a plurality of clusters, each of the clusters comprising web browsing actions with similar feature vectors;
generating an action-class label for each of the clusters; and
replacing each of the web browsing action in each of the clusters with the action-class label for that cluster.
13. The non-transitory computer readable medium of claim 9, wherein the computer readable program code is further configured to:
automatically generate a set of executable scripts based on the similar action subsequences designated as the web browsing task.
14. The non-transitory computer readable medium of claim 9, wherein the computer readable program code is further configured to:
generate a web browsing task model based on the similar action subsequences designated as the web browsing task, the web browsing task model comprising the similar action subsequences, a plurality of feature classifiers that classify features associated with each of the web browsing actions in the similar action subsequences, and a plurality of generalization heuristics that compute similarity of abstracted action sequences.
15. A system comprising:
a task manager for:
analyzing a web browsing interaction history of a user to identify a plurality of web browsing actions associated with a plurality of web sites;
generating a plurality of abstracted action sequences for the web browsing actions that are identified;
generating a plurality of action subsequences for the abstracted action sequences;
determining a similarity between each of the action subsequences in the plurality of action subsequences; and
designating similar action subsequences as a web browsing task.
16. The system of claim 15, wherein in generating the plurality of abstracted action sequences, the task manager, for each of the web browsing actions that is identified:
extracts a web page on which the web browsing action was performed;
analyzes a document object model associated with the web page that is extracted; and
identifies a node in the document object model for a web object accessed by the web browsing action.
17. The system of claim 16, wherein in generating the plurality of abstracted action sequences, the task manager further, for each node that is identified:
extracts a first set of features from the node; and
extracts a second set of features from contextual nodes that have substantially similar textual content as the node.
18. The system of claim 17, wherein in generating the plurality of abstracted action sequences, the task manager further:
constructs a plurality of feature vectors for each of the web browsing actions that is identified based on the first set of features and the second set of features that are extracted;
creates a plurality of clusters, each of the clusters comprising web browsing actions with similar feature vectors;
generates an action-class label for each of the clusters; and
replaces each of the web browsing action in each of the clusters with the action-class label for that cluster.
19. The system of claim 15, wherein the task manager further:
automatically generates a set of executable scripts based on the similar action subsequences designated as the web browsing task.
20. The system of claim 15, wherein the task manager further:
generates a web browsing task model based on the similar action subsequences designated as the web browsing task, the web browsing task model comprising the similar action subsequences, a plurality of feature classifiers that classify features associated with each of the web browsing actions in the similar action subsequences, and a plurality of generalization heuristics that compute similarity of abstracted action sequences.
US13/016,182 2011-01-28 2011-01-28 Automatic generation of task scripts from web browsing interaction history Abandoned US20120198342A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US13/016,182 US20120198342A1 (en) 2011-01-28 2011-01-28 Automatic generation of task scripts from web browsing interaction history
US13/561,378 US9626159B2 (en) 2011-01-28 2012-07-30 Automatic generation of task scripts from web browsing interaction history

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US13/016,182 US20120198342A1 (en) 2011-01-28 2011-01-28 Automatic generation of task scripts from web browsing interaction history

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US13/561,378 Continuation US9626159B2 (en) 2011-01-28 2012-07-30 Automatic generation of task scripts from web browsing interaction history

Publications (1)

Publication Number Publication Date
US20120198342A1 true US20120198342A1 (en) 2012-08-02

Family

ID=46578443

Family Applications (2)

Application Number Title Priority Date Filing Date
US13/016,182 Abandoned US20120198342A1 (en) 2011-01-28 2011-01-28 Automatic generation of task scripts from web browsing interaction history
US13/561,378 Expired - Fee Related US9626159B2 (en) 2011-01-28 2012-07-30 Automatic generation of task scripts from web browsing interaction history

Family Applications After (1)

Application Number Title Priority Date Filing Date
US13/561,378 Expired - Fee Related US9626159B2 (en) 2011-01-28 2012-07-30 Automatic generation of task scripts from web browsing interaction history

Country Status (1)

Country Link
US (2) US20120198342A1 (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130055068A1 (en) * 2011-08-26 2013-02-28 International Business Machines Corporation Automatic detection of item lists within a web page
US20130055268A1 (en) * 2011-08-31 2013-02-28 International Business Machines Corporation Automated web task procedures based on an analysis of actions in web browsing history logs
US20130254642A1 (en) * 2012-03-20 2013-09-26 Samsung Electronics Co., Ltd. System and method for managing browsing histories of web browser
US8583648B1 (en) * 2011-09-30 2013-11-12 Google Inc. Merging semantically similar clusters based on cluster labels
US8881303B2 (en) * 2011-07-28 2014-11-04 Xerox Corporation System and method for executing web services
US9335987B2 (en) * 2013-12-09 2016-05-10 International Business Machines Corporation Data object with common statement series
US20160253295A1 (en) * 2013-10-11 2016-09-01 Zte Corporation Method, device, terminal and computer storage medium for realizing intelligent reading of a browser
US20160378736A1 (en) * 2015-06-24 2016-12-29 International Business Machines Corporation Managing a domain specific ontology collection
US20170104841A1 (en) * 2015-10-07 2017-04-13 Impossible Ventures, LLC Automated sequential site navigation
US20200183550A1 (en) * 2018-12-06 2020-06-11 Canon Kabushiki Kaisha Information processing apparatus, control method, and storage medium
US11068921B1 (en) 2014-11-06 2021-07-20 Capital One Services, Llc Automated testing of multiple on-line coupons
US11120461B1 (en) 2014-11-06 2021-09-14 Capital One Services, Llc Passive user-generated coupon submission
US11205188B1 (en) 2017-06-07 2021-12-21 Capital One Services, Llc Automatically presenting e-commerce offers based on browse history
US20220308904A1 (en) * 2018-10-15 2022-09-29 AIble Inc. Interface for generating models with customizable interface configurations
US11892932B2 (en) 2018-10-15 2024-02-06 AIble Inc. Interface for visualizing and improving model performance

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9515879B2 (en) 2014-01-09 2016-12-06 Lenovo Enterprise Solutions (Singapore) Pte. Ltd. Establishing an action list for reconfiguration of a remote hardware system
US10740831B1 (en) * 2015-02-09 2020-08-11 Amazon Technologies, Inc. Method, medium, and system for context based configuration of content
US10250705B2 (en) * 2015-08-26 2019-04-02 International Business Machines Corporation Interaction trajectory retrieval
US10579400B2 (en) 2016-11-11 2020-03-03 International Business Machines Corporation Path-sensitive contextual help system
US9805306B1 (en) * 2016-11-23 2017-10-31 Accenture Global Solutions Limited Cognitive robotics analyzer
US11368752B2 (en) 2017-01-03 2022-06-21 Bliss Point Media, Inc. Optimization of broadcast event effectiveness
US10884769B2 (en) * 2018-02-17 2021-01-05 Adobe Inc. Photo-editing application recommendations
US11036811B2 (en) 2018-03-16 2021-06-15 Adobe Inc. Categorical data transformation and clustering for machine learning using data repository systems

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6957390B2 (en) * 2000-11-30 2005-10-18 Mediacom.Net, Llc Method and apparatus for providing dynamic information to a user via a visual display

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001014952A2 (en) * 1999-08-26 2001-03-01 Memetrics Inc. On-line experimentation
US7295995B1 (en) * 2001-10-30 2007-11-13 A9.Com, Inc. Computer processes and systems for adaptively controlling the display of items
US7081093B2 (en) 2003-12-05 2006-07-25 Vermon Array transducer for 3D tilting probes
US8452720B2 (en) 2009-01-30 2013-05-28 Ntt Docomo, Inc. System and methods for optimizing user interaction in web-related activites

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6957390B2 (en) * 2000-11-30 2005-10-18 Mediacom.Net, Llc Method and apparatus for providing dynamic information to a user via a visual display

Cited By (35)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8881303B2 (en) * 2011-07-28 2014-11-04 Xerox Corporation System and method for executing web services
US8806330B2 (en) * 2011-08-26 2014-08-12 International Business Machines Corporation Automatic detection of item lists within a web page
US9251287B2 (en) 2011-08-26 2016-02-02 International Business Machines Corporation Automatic detection of item lists within a web page
US20130055068A1 (en) * 2011-08-26 2013-02-28 International Business Machines Corporation Automatic detection of item lists within a web page
US9766922B2 (en) * 2011-08-31 2017-09-19 International Business Machines Corporation Automated web task procedures based on an analysis of actions in web browsing history logs
US20130055268A1 (en) * 2011-08-31 2013-02-28 International Business Machines Corporation Automated web task procedures based on an analysis of actions in web browsing history logs
US20140019979A1 (en) * 2011-08-31 2014-01-16 International Business Machines Corporation Automated web task procedures based on an analysis of actions in web browsing history logs
US8583648B1 (en) * 2011-09-30 2013-11-12 Google Inc. Merging semantically similar clusters based on cluster labels
US9336301B2 (en) 2011-09-30 2016-05-10 Google Inc. Merging semantically similar clusters based on cluster labels
US20130254642A1 (en) * 2012-03-20 2013-09-26 Samsung Electronics Co., Ltd. System and method for managing browsing histories of web browser
US20160253295A1 (en) * 2013-10-11 2016-09-01 Zte Corporation Method, device, terminal and computer storage medium for realizing intelligent reading of a browser
US9892099B2 (en) * 2013-10-11 2018-02-13 Zte Corporation Intelligent reading for accessing multi-page data from a web browser
US9335987B2 (en) * 2013-12-09 2016-05-10 International Business Machines Corporation Data object with common statement series
US11748775B2 (en) 2014-11-06 2023-09-05 Capital One Services, Llc Passive user-generated coupon submission
US11068921B1 (en) 2014-11-06 2021-07-20 Capital One Services, Llc Automated testing of multiple on-line coupons
US11727428B2 (en) 2014-11-06 2023-08-15 Capital One Services, Llc Automated testing of multiple on-line coupons
US11507969B2 (en) 2014-11-06 2022-11-22 Capital One Services, Llc Passive user-generated coupon submission
US11120461B1 (en) 2014-11-06 2021-09-14 Capital One Services, Llc Passive user-generated coupon submission
US20160378736A1 (en) * 2015-06-24 2016-12-29 International Business Machines Corporation Managing a domain specific ontology collection
US10552008B2 (en) * 2015-06-24 2020-02-04 International Business Machines Corporation Managing a domain specific ontology collection
US10482083B2 (en) * 2015-10-07 2019-11-19 Capital One Services, Llc Automated sequential site navigation
US11681699B2 (en) 2015-10-07 2023-06-20 Capital One Services, Llc Automated extraction of data from web pages
US11016967B2 (en) * 2015-10-07 2021-05-25 Capital One Services, Llc Automated sequential site navigation
US11860866B2 (en) * 2015-10-07 2024-01-02 Capital One Services, Llc Automated sequential site navigation
US20170104841A1 (en) * 2015-10-07 2017-04-13 Impossible Ventures, LLC Automated sequential site navigation
US10452653B2 (en) 2015-10-07 2019-10-22 Capital One Services, Llc Automated extraction of data from web pages
US11055281B2 (en) 2015-10-07 2021-07-06 Capital One Services, Llc Automated extraction of data from web pages
US11537607B2 (en) * 2015-10-07 2022-12-27 Capital One Services, Llc Automated sequential site navigation
US20230085209A1 (en) * 2015-10-07 2023-03-16 Capital One Services, Llc Automated sequential site navigation
US11651387B2 (en) 2017-06-07 2023-05-16 Capital One Services, Llc Automatically presenting e-commerce offers based on browse history
US11205188B1 (en) 2017-06-07 2021-12-21 Capital One Services, Llc Automatically presenting e-commerce offers based on browse history
US20220308904A1 (en) * 2018-10-15 2022-09-29 AIble Inc. Interface for generating models with customizable interface configurations
US11892932B2 (en) 2018-10-15 2024-02-06 AIble Inc. Interface for visualizing and improving model performance
US20200183550A1 (en) * 2018-12-06 2020-06-11 Canon Kabushiki Kaisha Information processing apparatus, control method, and storage medium
US11269496B2 (en) * 2018-12-06 2022-03-08 Canon Kabushiki Kaisha Information processing apparatus, control method, and storage medium

Also Published As

Publication number Publication date
US20120290936A1 (en) 2012-11-15
US9626159B2 (en) 2017-04-18

Similar Documents

Publication Publication Date Title
US9626159B2 (en) Automatic generation of task scripts from web browsing interaction history
US11244011B2 (en) Ingestion planning for complex tables
US20170091838A1 (en) Product recommendation using sentiment and semantic analysis
US20200401621A1 (en) Cognitive video and audio search aggregation
US8806330B2 (en) Automatic detection of item lists within a web page
US20110302103A1 (en) Popularity prediction of user-generated content
US20180089197A1 (en) Internet search result intention
US10169005B2 (en) Consolidating and reusing portal information
US20200265074A1 (en) Searching multilingual documents based on document structure extraction
US10755332B2 (en) Multi-perceptual similarity detection and resolution
US20180217986A1 (en) Automated extraction tools and their use in social content tagging systems
US8639560B2 (en) Brand analysis using interactions with search result items
US20210073255A1 (en) Analyzing the tone of textual data
US11775759B2 (en) Systems and methods for training and evaluating machine learning models using generalized vocabulary tokens for document processing
CN110825941A (en) Content management system identification method, device and storage medium
US20200159819A1 (en) Electronic document processing system
CN111104599A (en) Method and apparatus for outputting information
US20230161948A1 (en) Iteratively updating a document structure to resolve disconnected text in element blocks
CN113743973A (en) Method and device for analyzing market hotspot trend
US11762928B2 (en) Feature recommendation based on user-generated content
US20230129473A1 (en) Efficiently manage and share resources during e-learning
US20230093468A1 (en) Cognitive image searching based on personalized image components of a composite image
CN111191152A (en) News data display method and device
KR20230072600A (en) Automatic information alarm mothod, device and system for enterprise customers based on artificial intelligence
CN112948028A (en) Method and device for detecting page display information

Legal Events

Date Code Title Description
AS Assignment

Owner name: INTERNATIONAL BUSINESS MACHNES CORPORATION, NEW YO

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MAHMUD, JALAL U.;REEL/FRAME:025713/0279

Effective date: 20110124

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION