WO2012134889A2 - Markov modeling of service usage patterns - Google Patents

Markov modeling of service usage patterns Download PDF

Info

Publication number
WO2012134889A2
WO2012134889A2 PCT/US2012/029820 US2012029820W WO2012134889A2 WO 2012134889 A2 WO2012134889 A2 WO 2012134889A2 US 2012029820 W US2012029820 W US 2012029820W WO 2012134889 A2 WO2012134889 A2 WO 2012134889A2
Authority
WO
WIPO (PCT)
Prior art keywords
action
time
pattern
computer
precedent
Prior art date
Application number
PCT/US2012/029820
Other languages
French (fr)
Other versions
WO2012134889A3 (en
Inventor
Karan BHATNAGAR
Rohan KHOT
Original Assignee
Google Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Google Inc. filed Critical Google Inc.
Publication of WO2012134889A2 publication Critical patent/WO2012134889A2/en
Publication of WO2012134889A3 publication Critical patent/WO2012134889A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks

Definitions

  • Log analysis can generate information about how a system is used.
  • a log can include one or more computer-generated records of events that occur in a system, such as page visits, Remote Procedure Calls (RPCs) and downloads.
  • RPCs Remote Procedure Calls
  • the results of log analysis can help to improve compliance with security policies, perform audits of system usage, aid in system troubleshooting and assist in responding to security incidents.
  • Logs are emitted by network devices, operating systems, applications and different kinds of intelligent or programmable devices.
  • a log can include a stream of messages ordered by the time at which events occur or are recorded.
  • Logs may be directed to files, stored on disk, or directed as a network stream to a log collector.
  • the usage of a service usage can be measured by the number of occurrences of given individual events, such as page visits, Remote Procedure Calls, etc. Such individual events can be stored in a log and can be analyzed in view of the source of a request or call, its frequency, the times of day the event occurred and so on. The results of the analysis can help spot usage trends, such as the popularity of a given web page, the amount of time spent by a user on a page and the time of day when an RPC receives the most usage.
  • System usage can be analyzed and presented as a Markov model. Records of client requests to the service can be extracted from at least one log. The records can be grouped by client and sorted by timestamp. A pattern of requests that form an action can be detected using one or more pattern matching systems. Each action has a time. A probability of a transition from a precedent action to a subsequent action can be calculated, where the precedent action has a time prior to the subsequent action. A delay time can also be calculated between a precedent action and a subsequent action. A probability can be calculated for a delay time, such as the likelihood that a delay from a precedent action to a subsequent action will fall within a given time interval. The results can be presented as a Markov model with nodes representing actions and each edge representing a transition from the actions that it connects. Each edge can be labeled with the probability of the transition. A probability distribution of the delay for that transition can also be shown with the edge.
  • Figure 1 shows a Markov model of usage according to the presently described subject matter.
  • Figure 2 shows components for generating Markov usage models according to an embodiment of the described subject matter.
  • the action opening a user interface can be the result of a set of individual requests that operate together to cause the action to occur.
  • One action can be followed by one or more subsequent actions.
  • subsequent actions can include opening an e-mail message, composing and sending reply message, forwarding the message, etc. It would be useful to understand sequences of actions to better configure and optimize a service.
  • an entry in a log may be a,b,b,a,a,b,b,b,a,b,b, which shows a repeat pattern "a,b,b", indicating that these three requests may signify a higher level action, such as opening a user interface.
  • actions for e-mail include open email UI, compose email, send email, save draft, view inbox, etc.
  • examples of actions include add contact, delete contact, add contact to group, etc.
  • An airline flight booking system can have actions such as search for options, show detailed view of option, switch to previous/next date, go-back, book ticket, etc.
  • Examples of online shopping actions include search item, modify criteria, next-page, go back to previous page, buy item, add to wishlist, etc. Groups of actions (such as next-page + modify-price range) can be clustered to form higher level actions.
  • the system can generate a Markov model, which can be a state transition diagram.
  • the nodes of the diagram are actions, i.e., states, such as "create profile”, “read profile”, “delete profile”, etc.
  • the edges between the nodes can show the probabilities of transitions between the actions. For example, an edge from action A to action B shows the probability that action A will transition to action B.
  • the edges can also show a probability distribution of the time between action A and action B when a transition occurs.
  • a log file can contain numerous logs.
  • the client or user identifier and atomic request can be extracted from the logs.
  • client can refer to any entity that is a source of requests or can be associate with a source of requests, such as a client, a user, a process, etc.
  • An atomic request can be a low level request that a service can receive.
  • the atomic request is the RPC payload.
  • the atomic request can be the HTTP request parameters.
  • the atomic request from a log entry can be extracted using logic in a configuration file that specifies the fields of logs that should be considered, or a piece of code that takes a log entry as an input and returns the atomic request data. Chains of atomic requests can be sorted by timestamp for a given user. Pattern extraction algorithms can then be used to extract frequently occurring atomic requests into higher level actions.
  • the logs in a log file can be analyzed to extract client identifiers, atomic requests and timestamps.
  • These sets of information can be grouped according to clientlD. Each subset grouped according to clientlD can be sorted by timestamp.
  • Each sorted, grouped subset can be analyzed to predict one or more clusters of one or more high level actions. This can be done by using N-gram models, FP-growth models, or any suitable pattern detection technique. This yields chains of ⁇ action, timestamp> pairs for each client. For actions made of numerous log entries, the timestamp for the action can be the timestamp for first log entry, for the last log entry, for an average time of all of the timestamps of the entries, or any other suitable time. For example, consider a set of (time,log) entries: (123, a), (124, b), (125, a), (126, c), (127, b).
  • 'aba ' is a higher level action A
  • 'cb' is higher level action B.
  • the timestamps for higher level actions can be (123, A), (126, B).
  • Each per-client chain can be analyzed and multiple key value pairs of the form ⁇ action A, ⁇ action B, delaytime» are produced across all or some of the clients. Each such key value pair indicates that action B (succeeding action) follows action A (preceding action) with a delay equal to delaytime after action A.
  • the set of key value pairs for an action can then be analyzed to generate probabilities for state transitions. For example, consider the following set of ⁇ action, delay> key value pairs for preceding action, Action A:
  • the probability of transitions from A to each state B, C and D can be calculated as follows: Number of Transitions to State X
  • the probabilities of delay can be calculated for each transition. For example, the probability of a delay Y for a transition to a given State can be calculated by:
  • Y is a delay value or range of delay values. For example, if the set of delays for transition from Action A to Action B is ⁇ 0.5ms, 0.4ms, 0.5ms ⁇ ; to Action C is ⁇ 0.25ms, 1.5ms, 1.5ms ⁇ ; and to Action D ⁇ 0.5ms ⁇ , then:
  • Delay times can be expressed as a range, such as occurring in the intervals [0.1ms, 0.3ms); [0.3, 0.5 ms); etc., powers of 2 from an initial value to a final value, e.g., in intervals such as [10ms, 16ms); [16ms, 32ms); [32ms, 64ms) and [64, 100ms].
  • the initial value is 10ms and the final value is 100ms.
  • a Markov model with nodes representing actions (states) and edges labeled with transition and delay probabilities can be produced based upon the foregoing calculations for a given system. Such a model for the above example is shown in Figure 1.
  • Markov models representing sets of client can also be generated.
  • Sets of Actions grouped by clients can be segregated to enable different types of analysis.
  • the ⁇ Action A, ⁇ Action B, delaytime» data can be grouped by various attributes, such as the geographical location of clients (all clients in New York State); by date; by a time of day or a range of time of day; by network or subnetwork; by source address; by protocol or protocol version; etc.
  • Various embodiments may include or be embodied in the form of computer-implemented processes and apparatuses for practicing those processes.
  • Embodiments also may be embodied in the form of a computer program product having computer program code containing instructions embodied in tangible media, such as floppy diskettes, CD-ROMs, hard drives, USB (universal serial bus) drives, or any other machine readable storage medium, wherein, when the computer program code is loaded into and executed by a computer, the computer becomes an apparatus for practicing the described subject matter.
  • Embodiments also may be embodied in the form of computer program code, for example, whether stored in a storage medium, loaded into and/or executed by a computer, or transmitted over some transmission medium, such as over electrical wiring or cabling, through fiber optics, or via electromagnetic radiation, wherein when the computer program code is loaded into and executed by a computer, the computer becomes an apparatus for practicing the described subject matter.
  • the computer program code segments configure the microprocessor to create specific logic circuits.
  • a set of computer-readable instructions stored on a computer-readable storage medium may be implemented by a general-purpose processor, which may transform the general-purpose processor or a device containing the general-purpose processor into a special-purpose device configured to implement or carry out the instructions.
  • Embodiments may be implemented using hardware that may include a processor, such as a general purpose microprocessor and/or an Application Specific Integrated Circuit (ASIC) that embodies all or part of the technique in accordance with the described subject matter in hardware and/or firmware.
  • the processor may be coupled to memory, such as RAM, ROM, flash memory, a hard disk or any other device capable of storing electronic information.
  • the memory may store instructions adapted to be executed by the processor to perform the technique in accordance with an embodiment of the described subject matter.
  • modules can be software, hardware and/or a combination thereof.
  • a module can perform a single function, multiple functions or a function may be partially performed by each of a number of modules.
  • signature and domain validation may be performed by two dedicated modules or by a single module capable of performing both functions.
  • the domain validator and the signature validator module may run on different machines that may be controlled by different parties.
  • the system for storing customer domains and generated tickets may be implemented as Software as a Service (SaaS) in the cloud.
  • SaaS Software as a Service
  • the data may be stored in a single database, in a single table, in multiple tables, in multiple databases or in one or more distributed databases.
  • the functionality thereof may be implemented using virtualized machines across multiple computers and data centers in multiple locations.
  • the data stores and databases may be monolithic or distributed across numerous machines and locations.
  • FIG. 2 shows a Markov modeling system 201.
  • Log interface 202 extracts a plurality of records from logs 203, 204 and 205. Each record can have at least one request and a timestamp.
  • the log interface 202 can be in communication with a client grouping module 206 that groups the extracted records by client.
  • the grouped records are sorted by sorter 207, which sorts group of client records by timestamp.
  • a patter detector 208 detects a set of requests in at least one record in the sorted group that form a pattern that constitutes a higher-level action.
  • For action a series of HTTP requests can together open a user interface in a SaaS application. The action (“Open UI") can thus be assigned to that set of requests.
  • Each action has a time.
  • a transition probability module 209 calculates the probability of a transition from a precedent action to a subsequent action.
  • a precedent action is an action that has a time prior to the time of a subsequent action.
  • GetltemList returns List ⁇ Item>
  • Updateltem(ltem item, integer quantity) returns void
  • Removeltem(Item) returns void
  • the GetltemList, Updateltem, Removeltem procedure calls made by a remote client over the network can be the low level requests.
  • the series of these low level requests when grouped by client-id, sorted by timestamp and analyzed for patterns can be processed into Markov models of high level client-access scenarios, in much the same way as described above.
  • the Markov models can be used to help optimize the server, e.g., by suggesting more efficient ways of prefetching, caching or buffering writes to a disk.

Abstract

A system for analyzing service usage utilizing Markov models. Records of client requests to the service are extracted from at least one log. The records are grouped by client and sorted by timestamp. A pattern of requests that form an action is detected. Each action has a time. A probability is calculated of a transition from a precedent action to a subsequent action, where the precedent action has a time prior to the subsequent action. A delay time is also calculated between a precedent action and a subsequent action. A probability is calculated for a delay time, such as the likelihood that a delay from a precedent action to a subsequent action will fall within a given time interval.

Description

MARKOV MODELING OF SERVICE USAGE PATTERNS
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application claims priority to Indian Patent Application No. 953/CHE/2011, filed March 28, 2011, and U.S. Nonprovisional Application the disclosure of which is incorporated by reference in its entirety.
BACKGROUND OF THE INVENTION
[0002] Log analysis (or system and network log analysis) can generate information about how a system is used. A log can include one or more computer-generated records of events that occur in a system, such as page visits, Remote Procedure Calls (RPCs) and downloads. The results of log analysis can help to improve compliance with security policies, perform audits of system usage, aid in system troubleshooting and assist in responding to security incidents.
[0003] Logs are emitted by network devices, operating systems, applications and different kinds of intelligent or programmable devices. A log can include a stream of messages ordered by the time at which events occur or are recorded. Logs may be directed to files, stored on disk, or directed as a network stream to a log collector.
[0004] The usage of a service usage can be measured by the number of occurrences of given individual events, such as page visits, Remote Procedure Calls, etc. Such individual events can be stored in a log and can be analyzed in view of the source of a request or call, its frequency, the times of day the event occurred and so on. The results of the analysis can help spot usage trends, such as the popularity of a given web page, the amount of time spent by a user on a page and the time of day when an RPC receives the most usage.
SUMMARY OF THE INVENTION
[0005] System usage can be analyzed and presented as a Markov model. Records of client requests to the service can be extracted from at least one log. The records can be grouped by client and sorted by timestamp. A pattern of requests that form an action can be detected using one or more pattern matching systems. Each action has a time. A probability of a transition from a precedent action to a subsequent action can be calculated, where the precedent action has a time prior to the subsequent action. A delay time can also be calculated between a precedent action and a subsequent action. A probability can be calculated for a delay time, such as the likelihood that a delay from a precedent action to a subsequent action will fall within a given time interval. The results can be presented as a Markov model with nodes representing actions and each edge representing a transition from the actions that it connects. Each edge can be labeled with the probability of the transition. A probability distribution of the delay for that transition can also be shown with the edge.
BRIEF DESCRIPTION OF THE DRAWINGS
[0006] Figure 1 shows a Markov model of usage according to the presently described subject matter.
[0007] Figure 2 shows components for generating Markov usage models according to an embodiment of the described subject matter.
DETAILED DESCRIPTION
[0008] Although analyzing individual events (e.g., RPC calls) in logs can help to identify certain usage rates and trends, it can be useful to understand the higher-level actions that are made of collections of individual events. For example, the action opening a user interface can be the result of a set of individual requests that operate together to cause the action to occur. One action can be followed by one or more subsequent actions. For example, after a user interface is open, subsequent actions can include opening an e-mail message, composing and sending reply message, forwarding the message, etc. It would be useful to understand sequences of actions to better configure and optimize a service.
[0009] Sets of related actions can be better understood in terms of the probability that a given action ("Action A") will be followed by another action ("Action B" or "Action C"). For example, after an e-mail message is opened, it would be informative to know that there is, say, a 0.66 probability that the e-mail message will be closed, a .24 probability that a reply-to message will be opened and a 0.2 probability that a forward-to message will be opened.
[0010] It would also be informative to understand the time delay between actions. For example, it would be informative to know that, say, there is a .8 probability that there is a delay of less than 0.9ms between the time an e-mail message is sent and the time an autoreply message is received, and a 0.2 probability that the delay is greater than 0.9 ms.
[0011] Techniques of the described subject matter can extract higher-level actions from chains of requests from usage logs. For example, an entry in a log may be a,b,b,a,a,b,b,b,a,b,b, which shows a repeat pattern "a,b,b", indicating that these three requests may signify a higher level action, such as opening a user interface. Examples of actions for e-mail include open email UI, compose email, send email, save draft, view inbox, etc. For contact management, examples of actions include add contact, delete contact, add contact to group, etc. An airline flight booking system can have actions such as search for options, show detailed view of option, switch to previous/next date, go-back, book ticket, etc. Examples of online shopping actions include search item, modify criteria, next-page, go back to previous page, buy item, add to wishlist, etc. Groups of actions (such as next-page + modify-price range) can be clustered to form higher level actions.
[0012] The system can generate a Markov model, which can be a state transition diagram. The nodes of the diagram are actions, i.e., states, such as "create profile", "read profile", "delete profile", etc. The edges between the nodes can show the probabilities of transitions between the actions. For example, an edge from action A to action B shows the probability that action A will transition to action B. The edges can also show a probability distribution of the time between action A and action B when a transition occurs.
[0013] A log file can contain numerous logs. The client or user identifier and atomic request can be extracted from the logs. As used herein, the term "client" can refer to any entity that is a source of requests or can be associate with a source of requests, such as a client, a user, a process, etc. An atomic request can be a low level request that a service can receive. For example, for an RPC call, the atomic request is the RPC payload. For a frontend service, the atomic request can be the HTTP request parameters. The atomic request from a log entry can be extracted using logic in a configuration file that specifies the fields of logs that should be considered, or a piece of code that takes a log entry as an input and returns the atomic request data. Chains of atomic requests can be sorted by timestamp for a given user. Pattern extraction algorithms can then be used to extract frequently occurring atomic requests into higher level actions.
[0014] The logs in a log file can be analyzed to extract client identifiers, atomic requests and timestamps. Thus,
Each Log Entry → < clientl D , AtomicRe quest, timestamp >
[0015] These sets of information can be grouped according to clientlD. Each subset grouped according to clientlD can be sorted by timestamp.
[0016] Each sorted, grouped subset can be analyzed to predict one or more clusters of one or more high level actions. This can be done by using N-gram models, FP-growth models, or any suitable pattern detection technique. This yields chains of <action, timestamp> pairs for each client. For actions made of numerous log entries, the timestamp for the action can be the timestamp for first log entry, for the last log entry, for an average time of all of the timestamps of the entries, or any other suitable time. For example, consider a set of (time,log) entries: (123, a), (124, b), (125, a), (126, c), (127, b). Suppose that 'aba ' is a higher level action A, 'cb' is higher level action B. After pattern extraction, the timestamps for higher level actions can be (123, A), (126, B). Each per-client chain can be analyzed and multiple key value pairs of the form <action A, <action B, delaytime» are produced across all or some of the clients. Each such key value pair indicates that action B (succeeding action) follows action A (preceding action) with a delay equal to delaytime after action A.
[0017] The set of key value pairs for an action can then be analyzed to generate probabilities for state transitions. For example, consider the following set of <action, delay> key value pairs for preceding action, Action A:
{<B, 0.5>, <C, 0.25>, <B, 0.4>, <C, 1.5>, <D, 0.2>, <D, 0.5>, <B, 0.5>, <C, 1.5>}, where the delay is in milliseconds.
The probability of transitions from A to each state B, C and D can be calculated as follows: Number of Transitions to State X
Probability =
Total Number of Transitions to All States
Thus, for example,
Probability to B = ^ = 0.375 to C = - = 0.375
Probability to D = - = 0.250
[0018] The probabilities of delay can be calculated for each transition. For example, the probability of a delay Y for a transition to a given State can be calculated by:
Number of Delays at Y
Probability of Delay Y =
Total Number of Transitions to State where Y is a delay value or range of delay values. For example, if the set of delays for transition from Action A to Action B is {0.5ms, 0.4ms, 0.5ms}; to Action C is {0.25ms, 1.5ms, 1.5ms}; and to Action D {0.5ms}, then:
Probabilities of Delays to Action B: 0.5ms = 0.667; 0.4ms = 0.333.
Probabilities of Delays to Action C: 0.25ms = 0.333; 1.5ms = 0.667
Probability of Delay to Action D: 0.5 ms = 1.0
Delay times can be expressed as a range, such as occurring in the intervals [0.1ms, 0.3ms); [0.3, 0.5 ms); etc., powers of 2 from an initial value to a final value, e.g., in intervals such as [10ms, 16ms); [16ms, 32ms); [32ms, 64ms) and [64, 100ms]. Here, the initial value is 10ms and the final value is 100ms. [0019] A Markov model with nodes representing actions (states) and edges labeled with transition and delay probabilities can be produced based upon the foregoing calculations for a given system. Such a model for the above example is shown in Figure 1.
[0020] Markov models representing sets of client can also be generated. Sets of Actions grouped by clients can be segregated to enable different types of analysis. For example, the <Action A, <Action B, delaytime» data can be grouped by various attributes, such as the geographical location of clients (all clients in New York State); by date; by a time of day or a range of time of day; by network or subnetwork; by source address; by protocol or protocol version; etc.
[0021] Various embodiments may include or be embodied in the form of computer-implemented processes and apparatuses for practicing those processes. Embodiments also may be embodied in the form of a computer program product having computer program code containing instructions embodied in tangible media, such as floppy diskettes, CD-ROMs, hard drives, USB (universal serial bus) drives, or any other machine readable storage medium, wherein, when the computer program code is loaded into and executed by a computer, the computer becomes an apparatus for practicing the described subject matter. Embodiments also may be embodied in the form of computer program code, for example, whether stored in a storage medium, loaded into and/or executed by a computer, or transmitted over some transmission medium, such as over electrical wiring or cabling, through fiber optics, or via electromagnetic radiation, wherein when the computer program code is loaded into and executed by a computer, the computer becomes an apparatus for practicing the described subject matter. When implemented on a general-purpose microprocessor, the computer program code segments configure the microprocessor to create specific logic circuits. In some configurations, a set of computer-readable instructions stored on a computer-readable storage medium may be implemented by a general-purpose processor, which may transform the general-purpose processor or a device containing the general-purpose processor into a special-purpose device configured to implement or carry out the instructions. Embodiments may be implemented using hardware that may include a processor, such as a general purpose microprocessor and/or an Application Specific Integrated Circuit (ASIC) that embodies all or part of the technique in accordance with the described subject matter in hardware and/or firmware. The processor may be coupled to memory, such as RAM, ROM, flash memory, a hard disk or any other device capable of storing electronic information. The memory may store instructions adapted to be executed by the processor to perform the technique in accordance with an embodiment of the described subject matter.
[0022] Any of the functionality described herein may be implemented by modules, which can be software, hardware and/or a combination thereof. A module can perform a single function, multiple functions or a function may be partially performed by each of a number of modules. For example, signature and domain validation may be performed by two dedicated modules or by a single module capable of performing both functions. Further, the domain validator and the signature validator module may run on different machines that may be controlled by different parties. Likewise, the system for storing customer domains and generated tickets may be implemented as Software as a Service (SaaS) in the cloud. The data may be stored in a single database, in a single table, in multiple tables, in multiple databases or in one or more distributed databases. The functionality thereof may be implemented using virtualized machines across multiple computers and data centers in multiple locations. The data stores and databases may be monolithic or distributed across numerous machines and locations.
[0023] Figure 2 shows a Markov modeling system 201. Log interface 202 extracts a plurality of records from logs 203, 204 and 205. Each record can have at least one request and a timestamp. The log interface 202 can be in communication with a client grouping module 206 that groups the extracted records by client. The grouped records are sorted by sorter 207, which sorts group of client records by timestamp. A patter detector 208 detects a set of requests in at least one record in the sorted group that form a pattern that constitutes a higher-level action. For action, a series of HTTP requests can together open a user interface in a SaaS application. The action ("Open UI") can thus be assigned to that set of requests. Each action has a time. A transition probability module 209 calculates the probability of a transition from a precedent action to a subsequent action. A precedent action is an action that has a time prior to the time of a subsequent action.
[0024] Examples provided herein are merely illustrative and are not meant to be an exhaustive list of all possible embodiments, applications, or modifications of the described subject matter. Thus, various modifications and variations of the described techniques and systems of the described subject matter will be apparent to those skilled in the art without departing from the scope and spirit of the described subject matter. [0025] For example, the disclosed subject matter is not restricted to the analysis of HTTP logs of a UI server. It can also be used, for example, for backend systems, such as those designed to serve RPC requests. Similarly to the examples cited above, low level action corresponding to a RPC log can be obtained based on the RPC request-name and payload information for backend systems. For example, consider an RPC server which serves following requests:
GetltemList : returns List<Item>
Updateltem(ltem item, integer quantity) : returns void
Removeltem(Item) : returns void
In this case, the GetltemList, Updateltem, Removeltem procedure calls made by a remote client over the network can be the low level requests. The series of these low level requests when grouped by client-id, sorted by timestamp and analyzed for patterns can be processed into Markov models of high level client-access scenarios, in much the same way as described above. The Markov models can be used to help optimize the server, e.g., by suggesting more efficient ways of prefetching, caching or buffering writes to a disk.
[0026] Although the subject matter herein has been described in connection with specific embodiments, it should be understood that the described subject matter as claimed should not be unduly limited to such specific embodiments. Indeed, various modifications of the described modes for carrying out the described subject matter which are obvious to those skilled in the relevant arts or fields are intended to be within the scope of the appended claims.

Claims

What is claimed is:
1. A computer-implemented method for modeling service usage, comprising:
extracting a plurality of records from at least one log, each record having a timestamp and at least one request;
grouping the extracted records by client;
sorting a group of client records by timestamp;
detecting an action comprising a pattern of requests, where the action has a time;
calculating, by a processor, the probability of a transition from a precedent action to a subsequent action, where the precedent action has a time prior to the time of the subsequent action;
calculating, a delay time for a precedent action and a subsequent action, where the delay time is the difference between the precedent action time and the subsequent action time; and calculating the probability of a delay time for a transition from the precedent action to the subsequent action.
2. The computer-implemented method of claim 1, wherein the pattern comprises a plurality of requests from one record.
3. The computer-implemented method of claim 1, wherein the pattern comprises a plurality of requests from one record.
4. The computer-implemented method of claim 1, wherein a client is at least one from the group of a computer, a user, an application, a process and a peer.
5. A computer-implemented method for modeling service usage, comprising:
extracting a plurality of records from at least one log, each record having a timestamp and at least one request;
grouping the extracted records by client;
sorting a group of client records by timestamp;
detecting an action comprising a pattern of requests, where the action has a time; calculating, by a processor, the probability of a transition from a precedent action to a subsequent action, where the precedent action has a time prior to the time of the subsequent action.
6. The computer-implemented method of claim 5, wherein the pattern comprises a plurality of requests from one record.
7. The computer-implemented method of claim 5, wherein the pattern comprises a plurality of requests from one record.
8. The computer-implemented method of claim 5, wherein a client is at least one from the group of a computer, a user, an application, a process and a peer.
9. The computer-implemented method of claim 5, wherein the detecting a set of requests that form a pattern in the sorted group comprises using an N-gram model to detect the pattern.
10. The computer-implemented method of claim 5, wherein the detecting a set of requests that form a pattern in the sorted group comprises using a FP growth model to detect the pattern.
11. The computer-implemented method of claim 5, wherein an action comprises opening a user interface.
12. The computer-implemented method of claim 5, further comprising:
calculating a delay time for a precedent action and a subsequent action, where the delay time is the difference between the precedent action time and the subsequent action time; and calculating the probability of a delay time for a transition from the precedent action to the subsequent action.
13. The computer-implemented method of claim 12, wherein the calculating the probability of a delay time comprises calculating the likelihood that the delay time falls within a given time interval.
14. An apparatus for determining system usage, comprising:
a log interface that extracts a plurality of records from at least one log, each record having at least one request and a timestamp;
a client grouping module that groups the extracted records by client;
a sorter that sorts group of client records by timestamp;
a pattern detector that detects a set of requests in at least one record in the sorted group that form a pattern comprising an action, where an action has a time;
a transition probability module that calculates the probability of a transition from a precedent action to a subsequent action, where the precedent action has a time prior to the time of the subsequent action.
15. The system of claim 14, wherein a client is at least one from the group of a computer, a user, an application, a process and a peer.
16. The system of claim 14, wherein the pattern detector detects a pattern of requests in the sorted group using an N-gram model to detect the pattern.
17. The system of claim 14, wherein the pattern detector detects a pattern of requests in the sorted group using a FP growth model to detect the pattern.
18. The system of claim 14, further comprising a delay time probability module that calculates a delay time between the time of a precedent action and the time of a subsequent action, where the delay time is the difference between the precedent action time and the subsequent action time and further calculates the probability of a delay time for a transition from the precedent action to the subsequent action.
19. The system of claim 18, wherein the probability of a delay time comprises the likelihood that the delay time falls within a given time interval.
20. A non-transitory computer readable medium storing a plurality of instructions that cause a computer to perform a method comprising:
extracting a plurality of records from at least one log, each record having a timestamp and at least one request;
grouping the extracted records by client;
sorting a group of client records by timestamp;
detecting a set of requests in at least one record in the sorted group that form a pattern comprising an action, wherein the action has a time;
calculating the probability of a transition from a precedent action to a subsequent action, where the precedent action has a time prior to the time of the subsequent action.
21. The non-transitory computer readable medium of claim 20 storing a plurality of instructions that cause a computer to further perform a method comprising detecting a set of requests that form a pattern in the sorted group by using an N-gram model to detect the pattern.
22. The non-transitory computer readable medium of claim 20 storing a plurality of instructions that cause a computer to further perform a method comprising detecting a set of requests that form a pattern in the sorted group by using a FP growth model to detect the pattern.
23. The non-transitory computer readable medium of claim 20 storing a plurality of instructions that cause a computer to further perform a method comprising
calculating a delay time for a precedent action and a subsequent action, where the delay time is the difference between the precedent action time and the subsequent action time; and calculating the probability of a delay time for a transition from the precedent action to the subsequent action.
24. The non-transitory computer readable medium of claim 23 storing a plurality of instructions that cause a computer to further perform a method wherein the calculating the probability of a delay time comprises calculating the likelihood that the delay time falls within a given time interval.
PCT/US2012/029820 2011-03-28 2012-03-20 Markov modeling of service usage patterns WO2012134889A2 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
IN953CH2011 2011-03-28
IN953/CHE/2011 2011-03-28
US13/157,006 2011-06-09
US13/157,006 US8909562B2 (en) 2011-03-28 2011-06-09 Markov modeling of service usage patterns

Publications (2)

Publication Number Publication Date
WO2012134889A2 true WO2012134889A2 (en) 2012-10-04
WO2012134889A3 WO2012134889A3 (en) 2012-12-27

Family

ID=46928581

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2012/029820 WO2012134889A2 (en) 2011-03-28 2012-03-20 Markov modeling of service usage patterns

Country Status (2)

Country Link
US (2) US8909562B2 (en)
WO (1) WO2012134889A2 (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10255124B1 (en) * 2013-06-21 2019-04-09 Amazon Technologies, Inc. Determining abnormal conditions of host state from log files through Markov modeling
US10324779B1 (en) 2013-06-21 2019-06-18 Amazon Technologies, Inc. Using unsupervised learning to monitor changes in fleet behavior
US9819623B2 (en) * 2013-07-24 2017-11-14 Oracle International Corporation Probabilistic routing of messages in a network
US9372898B2 (en) 2014-07-17 2016-06-21 Google Inc. Enabling event prediction as an on-device service for mobile interaction
US9710755B2 (en) * 2014-09-26 2017-07-18 Wal-Mart Stores, Inc. System and method for calculating search term probability
US20160092519A1 (en) 2014-09-26 2016-03-31 Wal-Mart Stores, Inc. System and method for capturing seasonality and newness in database searches
US9965788B2 (en) 2014-09-26 2018-05-08 Wal-Mart Stores, Inc. System and method for prioritized product index searching
US10255358B2 (en) * 2014-12-30 2019-04-09 Facebook, Inc. Systems and methods for clustering items associated with interactions
US10713140B2 (en) 2015-06-10 2020-07-14 Fair Isaac Corporation Identifying latent states of machines based on machine logs
US10360093B2 (en) * 2015-11-18 2019-07-23 Fair Isaac Corporation Detecting anomalous states of machines
CN108446281B (en) * 2017-02-13 2021-03-12 北京嘀嘀无限科技发展有限公司 Method, device and storage medium for determining user intimacy
CN107729210B (en) * 2017-09-29 2020-09-25 百度在线网络技术(北京)有限公司 Distributed service cluster abnormity diagnosis method and device
US10852908B2 (en) * 2017-10-27 2020-12-01 Oracle International Corporation Method and system for controlling a display screen based upon a prediction of compliance of a service request with a service level agreement (SLA)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6012052A (en) * 1998-01-15 2000-01-04 Microsoft Corporation Methods and apparatus for building resource transition probability models for use in pre-fetching resources, editing resource link topology, building resource link topology templates, and collaborative filtering
US6549896B1 (en) * 2000-04-07 2003-04-15 Nec Usa, Inc. System and method employing random walks for mining web page associations and usage to optimize user-oriented web page refresh and pre-fetch scheduling
US20070005646A1 (en) * 2005-06-30 2007-01-04 Microsoft Corporation Analysis of topic dynamics of web search
US20070198321A1 (en) * 2006-02-21 2007-08-23 Lakshminarayan Choudur K Website analysis combining quantitative and qualitative data

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6411998B1 (en) * 1997-09-08 2002-06-25 International Business Machines Corporation World wide web internet delay monitor
JP4664231B2 (en) * 2006-05-12 2011-04-06 富士通セミコンダクター株式会社 Timing analysis method and timing analysis apparatus
GB0816556D0 (en) 2008-09-10 2008-10-15 Univ Napier Improvements in or relating to digital forensics
US8103599B2 (en) * 2008-09-25 2012-01-24 Microsoft Corporation Calculating web page importance based on web behavior model
CN103605452B (en) 2008-11-11 2018-04-17 索尼电脑娱乐公司 Image processing apparatus and image processing method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6012052A (en) * 1998-01-15 2000-01-04 Microsoft Corporation Methods and apparatus for building resource transition probability models for use in pre-fetching resources, editing resource link topology, building resource link topology templates, and collaborative filtering
US6549896B1 (en) * 2000-04-07 2003-04-15 Nec Usa, Inc. System and method employing random walks for mining web page associations and usage to optimize user-oriented web page refresh and pre-fetch scheduling
US20070005646A1 (en) * 2005-06-30 2007-01-04 Microsoft Corporation Analysis of topic dynamics of web search
US20070198321A1 (en) * 2006-02-21 2007-08-23 Lakshminarayan Choudur K Website analysis combining quantitative and qualitative data

Also Published As

Publication number Publication date
WO2012134889A3 (en) 2012-12-27
US20120254080A1 (en) 2012-10-04
US8620839B2 (en) 2013-12-31
US8909562B2 (en) 2014-12-09
US20120254078A1 (en) 2012-10-04

Similar Documents

Publication Publication Date Title
US8909562B2 (en) Markov modeling of service usage patterns
US11775548B1 (en) Selection of representative data subsets from groups of events
Vlăduţu et al. Internet traffic classification based on flows' statistical properties with machine learning
US10679135B2 (en) Periodicity analysis on heterogeneous logs
US20170279840A1 (en) Automated event id field analysis on heterogeneous logs
CN103150374A (en) Method and system for identifying abnormal microblog users
CN106815125A (en) A kind of log audit method and platform
KR20070011432A (en) Processing data in a computerised system
US20120078903A1 (en) Identifying correlated operation management events
WO2021041901A1 (en) Context informed abnormal endpoint behavior detection
Tavares et al. Overlapping analytic stages in online process mining
CN108304432A (en) Information push processing method, information push processing unit and storage medium
Aghaei et al. Ensemble classifier for misuse detection using N-gram feature vectors through operating system call traces
US11568344B2 (en) Systems and methods for automated pattern detection in service tickets
Baruah et al. A batch arrival single server queue with server providing general service in two fluctuating modes and reneging during vacation and breakdowns
Dentamaro et al. Ensemble Consensus: An Unsupervised Algorithm for Anomaly Detection in Network Security data.
CN110442439A (en) Task process processing method, device and computer equipment
CN115941441A (en) System link automation monitoring operation and maintenance method, system, equipment and medium
KR101656011B1 (en) System and method for fault monitoring based on big-data
US9910874B1 (en) Scalable alerter for security information and event management
CN105260467A (en) Short message classification method and apparatus
CN113032774A (en) Training method, device and equipment of anomaly detection model and computer storage medium
EP3077925B1 (en) Enhanced service environments with user-specific working sets
Kechadi et al. Behavioural Proximity Discovery: an adaptive approach for root cause analysis
US11841949B2 (en) System and method for antimalware application adversarial machine learning

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12762922

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12762922

Country of ref document: EP

Kind code of ref document: A2