WO2003090081A1 - A hierarchical system for analysing data streams - Google Patents

A hierarchical system for analysing data streams Download PDF

Info

Publication number
WO2003090081A1
WO2003090081A1 PCT/AU2003/000460 AU0300460W WO03090081A1 WO 2003090081 A1 WO2003090081 A1 WO 2003090081A1 AU 0300460 W AU0300460 W AU 0300460W WO 03090081 A1 WO03090081 A1 WO 03090081A1
Authority
WO
WIPO (PCT)
Prior art keywords
analysis
target activity
sub
alert
data
Prior art date
Application number
PCT/AU2003/000460
Other languages
French (fr)
Inventor
George Bolt
John Manslow
Original Assignee
Neural Technologies Ltd
Toms, Alvin, David
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Neural Technologies Ltd, Toms, Alvin, David filed Critical Neural Technologies Ltd
Priority to EP03714539A priority Critical patent/EP1499969A1/en
Priority to AU2003218899A priority patent/AU2003218899A1/en
Publication of WO2003090081A1 publication Critical patent/WO2003090081A1/en
Priority to US10/965,703 priority patent/US20050190905A1/en
Priority to US12/340,504 priority patent/US20090164761A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/22Arrangements for supervision, monitoring or testing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M2203/00Aspects of automatic or semi-automatic exchanges
    • H04M2203/60Aspects of automatic or semi-automatic exchanges related to security aspects in telephonic communication systems
    • H04M2203/6027Fraud preventions

Definitions

  • the present invention relates to a hierarchical system for analysing data streams.
  • the present invention relates to analysing data streams to identify target events.
  • a target event may be an instance of fraud on a telephone system, however the present invention has applications in other high data volume environments to identify other target events/activities.
  • Fraud is a serious problem in modern telecommunication systems, and can result in revenue loss by the telecommunications service provider, reduced operational efficiency, and an increased risk of subscribers moving to other providers that are perceived to offer better security.
  • any provider that can reduce revenue loss resulting from fraud - either by its prevention or early detection - has a significant advantage over its competitors.
  • Telecommunications networks support many hundreds or thousands of transactions per second, and one of the challenges in developing effective fraud detection systems is to achieve the high throughput necessary to analyse all network traffic in detail and in real time.
  • fraud detection systems frequently ignore services that are considered to be low risk (e.g. low cost calls), or limit the sophistication of the fraud detection algorithms in order to achieve the required throughput.
  • the present invention provides a system of hierarchical data analysis that seeks to provide high throughput and sensitivity with less false positive alerts of possible target activity.
  • a method for analysing data streams comprising at least the steps of: receiving a data stream; conducting a first analysis of the data stream for a possible target activity, and if a possible target activity is indicated generating a first alert; if the first alert is generated, conducting a second analysis for the possible target activity to determine whether the target activity is indicated in the data stream with a high degree of certainty, if a possible target activity is indicated by the second analysis, generating a second alert; and providing the second alert to an external system for action.
  • the first analysis step comprises at least: conducing a first sub-analysis of the data stream for the possible target activity to determine whether the target activity is indicated in the data stream, if the possible target activity is indicated by the first sub-analysis then a first sub-alert is generated; and conducting a second sub-analysis of the data stream for the possible target activity to determine whether the target activity is indicated in the data stream with a higher degree of certainty than in the first sub-analysis, if the possible target activity is indicated by the second sub-analysis then the first alert is generated.
  • the second sub-analysis provides an indication of the target activity with a higher degree of certainty than in the first sub-analysis.
  • the second analysis provides an indication of the target activity with a higher degree of certainty than in the second sub-analysis.
  • the method further comprises propagating data from the data stream relevant to the second sub-analysis for conducting the second sub-analysis.
  • the method further comprises the step of propagating data from the data stream relevant to the second analysis for conducting the second analysis.
  • the second sub-analysis is conducted on additional data to the propagated data.
  • the second analysis is conducted using additional data to the data propagated for the second analysis.
  • one or more additional levels of sub-analysis are conducted between the first sub-analysis and the second sub-analysis wherein an alert is generated by one of the additional levels and passed to a next of the additional levels.
  • a subsequent analysis is conducted while determining whether the target activity is indicated to a higher degree of certainty than the previous level.
  • the first sub-alert triggers the first of one or more additional levels of sub-analysis and the alert generated by the final level of additional sub-analysis triggers the second sub-analysis.
  • data is propagated from one additional level of sub-analysis to the next and includes data necessary in the subsequent levels of additional sub-analysis.
  • each additional level of sub-analysis is conducted on additional data specific to the type of analysis conducted in addition to the propagated data.
  • each level of the sub-analysis creates a third alert if a fraudulent activity is indicated with a relatively high degree of certainty, any one of the second alerts and third alerts triggering an action in the external system.
  • the first analysis may conduct one or more types of analysis in parallel.
  • one or more of the additional levels of sub-analysis may conduct one • or more types of analysis in parallel.
  • the target activity is fraudulent activity.
  • a system for analysing data streams comprising at least: a first analyser arranged to analyse a data stream for possible target activity and if a possible target activity is indicated to generate a first alert; a second analyser arranged to conduct an analysis for possible target activity if the first alert is generated, and if a possible target activity is indicated with a relatively high probability by the second analysis to generate a second alert for an external system to act on.
  • a system for analysing datastreams comprising at least: one or more sequential analysers are arranged to conduct an analysis for possible target activity, a first analyser of the sequence of analysers analysing a data stream, each subsequent analyser of the sequence of analysers only conducting its analysis if the previous analyser indicates a possible target activity, and if a possible target activity is indicated by each analysis generating a subsequent alert for the next analyser; and a final analyser arranged to conduct an analysis for possible target activity if the last analyser of the sequence of analysers generates an alert, and if a possible target activity is indicated with a relatively high probability by the analysis of the final analyser, the final analyser generates an alert for an external system to act on.
  • a method of analysing data streams comprising at least: conducing one or more sequential analyses of a data stream for possible target activity, the first of the analyses being conducted directly on the data stream, subsequent analyses after the first, only being conducted if the previous analysis indicated a possible target activity; conducting a final analysis for possible target activity if the last of the sequential analyses indicated a possible target activity; and if the final analysis indicates a possible target activity with a relatively high degree of certainty generating an alert to an external system for action.
  • Figure 1 is a schematic representation of a preferred embodiment of a system for analysing data streams in accordance with the present invention.
  • Figure 2 is a schematic representation exemplifying data analysis using the system of Figure 1.
  • FIG. 1 there is shown a system 10 that receives a data stream 12 (that may include one or more sub-streams) and outputs a data stream of alerts 34 for use by an external system.
  • the system 10 includes a plurality of data analysis modules, in this case three are shown 14, 16 and 18.
  • Each of the analysis modules 14, 16 and 18 receives respective additional data 20, 22 and 24 used in the analysis of the data stream 12 provided to the first data module 14.
  • Each data module 14, 16 and 18 propagates data to the next data module indicated by propagated data 26 and 30.
  • Each data module provides internal alerts 28 and 32 to the subsequent data module.
  • the system 10 is configured to identify suspicious telephone activity that may indicate fraud. Due to the high volume of telephone call data required to be processed, each data analysis module can provide a different analysis technique to progressively increase the certainty that the data indicated the presence of fraudulent telephone activity.
  • the system 10 may be implemented in the form of a computer or a network of computers programmed to perform the analysis of each of the modules.
  • a single computer can be programmed to run the system or a dedicated computer may be programmed to conduct each of the analysis of each of the modules with communication being provided between each of the computers of the whole system 10.
  • Each of the data analysis modules 14, 16 and 18 cascade data initially provided by data stream 12 to the subsequent module.
  • the data stream 12 could, for example, include call data records (CDRs, which contain details of the calls made on a telecommunication network). For example, a portion of a CDR produced from a real call is given in Table 1. The fields contained in the CDR are (from top to bottom) A-number (the number of the phone from which the call was made), B-number (the number to which the call was made), B-number type (whether it was local, national, international etc encoded as a number), the call's cost, its duration and the date and time at which it started.
  • CDRs call data records
  • Table 1 The fields contained in the CDR are (from top to bottom) A-number (the number of the phone from which the call was made), B-number (the number to which the call was made), B-number type (whether it was local, national, international etc encoded as a number), the call's cost, its duration and
  • the data stream 12 can also include several substreams from different sources.
  • one substream could be a CDR stream, while another could provide customer information such as postcodes and payment histories.
  • Each of the data analysis modules 14, 16 and 18 contains one or more fraud detection engines that analyse their input data for signs of fraudulent activity, in response to which they generate alerts. Each fraud detection engine can process different subsets of the modules' input data.
  • Each data analysis module after the first, receives propagated data that is passed from the analysis module immediately receiving it in the hierarchy. The additional data available to each data analysis module may be specific to the type of analysis conducted by that particular data analysis module. The propagated data may contain low level data from the original data stream 12 or additional data used by data analysis modules lower in the hierarchy, depending on the configuration of the system 10.
  • Propagated data is important for the efficiency of the system because the analyses performed within particular analysis modules may require particular access to potentially large quantities of data that are not required elsewhere within the system. Propagating data that is not required in other analysis modules is a waste of resources and is likely to reduce the rate at which the system can process incoming data. Propagated data consists of information that is used in more that one data analysis module. For example, the A-number field is used to identify the calling party, is provided within the CDR stream that usually forms part of the systems input 12, and is usually required throughout the system, and hence usually propagated through the system rather than forming part of the additional data inputs.
  • Each of the data analysis modules 14, 16 and 18 can generate internal and external alerts.
  • External alerts 34 are combined from all of the modules 14, 16 and 18 to form the output 34 of the system. Combining the outputs may be the equivalent of providing a logical OR to each of the alerts, so that if any of the modules generates an external alert, the system as a whole generates the alert.
  • External alerts are only produced by the modules when the calculated probability of a target activity (fraud) is sufficiently high to reasonably conclude that fraud has occurred. What is considered a high probability depends on the particular application, its expected throughput, and the desired degree of certainty. When individual calls are analysed for fraud within telecommunication networks, a probability as large as 0.99995 to 0.99999 may be required to keep the number of alerts to a manageable level (since large networks can experience as many as 100 million calls per day).
  • Each of the data analysis modules 14, 16 and 18 can generate internal alerts if its analysis reveals something unusual, but does not provide sufficiently high probability that target activity is indicated to warrant an external alert.
  • Internal alerts are important for regulating the activity of subsequent data analysis modules within the hierarchy of the system, because subsequent data analysis modules may only be activated if an internal alert is received, indicating that further analysis of the data is required to obtain the sufficient degree of certainty to generate an external alert.
  • Subsequent data analysis modules 16 and 18 may only be activated if they receive an internal alert 28 or 32 from a proceeding analysis module or if any of its input data is updated.
  • the additional data is only provided in response to a request made by a lower module and the input additional data is not configured to activate an analysis module.
  • an analysis module 14, 16 or 18 may identify a short term increase in the total cost of calls made by a particular subscriber, which may not be severe enough to conclude that fraud has occurred and hence to generate an external alert.
  • a subsystem may therefore generate an internal alert that causes the next module in the system to perform its analysis.
  • This cascaded activation of analysis modules within the system means that lower level subsystems are activated most frequently and that the throughput of the system can be maximised by designing the lower level subsystems to require a minimum amount of processing.
  • Higher level analysis, which is activated less frequently can thus use more expensive processes (such as nonlinear or iterative functions) and can perform expensive operations (such as database reads and writes) or make use of human intervention, with minimal effect on the throughput of the entire system.
  • a neural network could be trained to estimate the probability that a particular telephone call was fraudulent based on its characteristics (cost, duration, etc.) or Fourier analysis could be used to see if a short term fluctuation in the calling activity was part of a cycle of a subscriber's normal behaviour in an analysis module that becomes active only once a lower level system has generated an alert.
  • the lower level subsystems may need some level of parallelism in order to achieve the required throughput and thus can be distributed across several computers. Later stages may require so little resources that several can be run simultaneously on a single computer while others may require user interaction or database access, placing specific requirements on their geographic location.
  • By building a fraud detection system from a hierarchy of subsystems of increasing sophistication it is possible to produce a superior trade off between fraud detection accuracy and throughput.
  • Each of the data analysis modules should be designed to generate many more internal false positives (that is, internal alerts for events that are not actually fraudulent) than internal false negatives (where an internal alert was not generated when fraud did in fact occur). This is because the higher level subsystems that are activated by the internal alerts may be able to provide a higher degree of certainty to confirm or refute the internal alert based on different analysis techniques and/or the inclusion of additional data in the analysis to clarify whether, with the required of certainty, the data indicates that a fraud is actually present. If the system is not designed in this way, then when false negatives occur the higher level subsystems are never activated and thus are not able to correct an error made by the lower level subsystem.
  • the analysis modules 14, 16 and 18 are designed to generate a small number of external false positives (external alerts generated for events that are not actually fraudulent) and a large number of external false negatives (resulting in no external alert being generated when in fact a fraud did occur). This is because provided that an internal alert was generated, the external false negative can be corrected by higher level analysis modules generating their own external alerts. In a situation where a false positive external alert is generated the system as a whole will generate an alert that can't be prevented by analysis conducted at a subsequent level modules even if subsequent modules were activated.
  • FIG. 2 shows an example of a real telecommunications fraud detection system based on the system 10.
  • the input data stream 12 includes a CDR stream that provides details of each call made on the telecommunications network shortly after the call is terminated.
  • the CDR stream is passed to the lowest level data analysis module 14 which is configured as a candidate fraud detector (CFD).
  • the CFD contains two separate fraud detection algorithms, based on a set of rules 36 that search directly for common fraud indicators (such as more than 8 hours of calls to the Caribbean in any 24 hour period), and change detection algorithm 38 that searches for unusual changes in the pattern of behaviour associated with individual subscribers (which can indicate that a line has been taken over by fraudsters). These two components 36 and 38 of the lowest level data analysis module 14 operate independently.
  • An internal alert 28 is generated when either of its components 36 and 38 indicates that a particular telephone call is a fraud candidate.
  • the rules 36 and change detector 38 are designed to be fast and simple because the CDR stream 12 can present the data analysis module with as many as 100 million CDRs per day.
  • the internal alerts 28 are passed to the next level data analysis module which operates as an intelligent alarm analyser (IAA) which is only activated when an internal alert is generated by the CFD.
  • IAA intelligent alarm analyser
  • the ratio of the number of CDRs to the number of internal alerts 28 is about 1000:1 meaning that statistically the IAA is activated only once for every 1000 times the CFD is activated.
  • the IAA is a rule based system that removes some of the false alerts generated by the CFD by performing complex analysis on the distributions of the alerts themselves. These complex analyses are possible due to the low level of activity demanded of the IAA compared to the CFD. The analyses also require time information (real world, date and time) which is provided to the IAA as additional data 22.
  • time information real world, date and time
  • the third level data analysis module operates as a case manager.
  • the case manager may be a team committed by the telecommunications operator employed for the purpose of investigating the events that caused internal alerts to be generated by the IAA. Because the case manager is a higher level subsystem it is activated only once every 500,000 or so CDRs and hence can use much slower and more expensive processing methods such as manual investigations of potential frauds than either the CFD or IAA without being overwhelmed.
  • the case manager uses customer information (names, addresses, payment histories, etc.) as further additional data 24 and frequently a wide variety of additional data sources (six month history of calls made by a particular customer) to investigate internal alerts 32 generated by the IAA to determine whether they are likely to be cases of actual fraud. If it is determined that they are, the case manager subsystem generates an external alert 34 which is passed out of the system.
  • the alert could be used for a variety of purposes, such as to inform billing services within the network operator to remove fraudulent calls from a customer's bill, or to inform law enforcement agencies.
  • null additional data 20 is provided to the CFD.
  • no data is propagated from the CFD to the IAA or from the IAA to the case manager.
  • additional data may be provided to the CFD or data may be propagated from the CFD to the IAA and possibly then from the IAA to the case manager.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Signal Processing (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

A method for analysing data streams comprises receiving a data stream (12), conducting a first analysis of the data stream (14) for a possible target activity, and if a possible target activity is indicated generating a first alert (28). If the first alert (28) is generated, a second analysis (16) for the possible target activity is conducted to determine whether the target activity is indicated in the data stream with a high degree of certainty. If a possible target activity is indicated by the second analysis, a second alert (34) is generated and provided to an external system for action.

Description

A HIERARCHICAL SYSTEM FOR ANALYSING DATA STREAMS
FIELD OF THE INVENTION
[0001] The present invention relates to a hierarchical system for analysing data streams. In particular, the present invention relates to analysing data streams to identify target events. A target event may be an instance of fraud on a telephone system, however the present invention has applications in other high data volume environments to identify other target events/activities.
BACKGROUND OF THE INVENTION
[0002] Fraud is a serious problem in modern telecommunication systems, and can result in revenue loss by the telecommunications service provider, reduced operational efficiency, and an increased risk of subscribers moving to other providers that are perceived to offer better security. In the highly competitive telecommunications sector, any provider that can reduce revenue loss resulting from fraud - either by its prevention or early detection - has a significant advantage over its competitors.
[0003] Telecommunications networks support many hundreds or thousands of transactions per second, and one of the challenges in developing effective fraud detection systems is to achieve the high throughput necessary to analyse all network traffic in detail and in real time. In practice, fraud detection systems frequently ignore services that are considered to be low risk (e.g. low cost calls), or limit the sophistication of the fraud detection algorithms in order to achieve the required throughput.
[0004] Each of these has critical disadvantages - ignoring services automatically precludes the detection of fraud on those services - which is particularly hazardous because fraudsters actively search for unprotected services. Similarly, the use of fast but inaccurate algorithms increases the range of frauds that cannot be detected without increasing the number of false alerts. Telecommunications service providers are therefore often forced to accept higher false alert rates in order to maintain sensitivity at high throughput, and hence incur additional costs resulting from an enlarged fraud investigation team that is required to process the extra alerts.
SUMMARY OF THE PRESENT INVENTION
[0005] The present invention provides a system of hierarchical data analysis that seeks to provide high throughput and sensitivity with less false positive alerts of possible target activity.
[0006] According to a first aspect of the present invention there is provided a method for analysing data streams comprising at least the steps of: receiving a data stream; conducting a first analysis of the data stream for a possible target activity, and if a possible target activity is indicated generating a first alert; if the first alert is generated, conducting a second analysis for the possible target activity to determine whether the target activity is indicated in the data stream with a high degree of certainty, if a possible target activity is indicated by the second analysis, generating a second alert; and providing the second alert to an external system for action.
[0007] Preferably the first analysis step comprises at least: conducing a first sub-analysis of the data stream for the possible target activity to determine whether the target activity is indicated in the data stream, if the possible target activity is indicated by the first sub-analysis then a first sub-alert is generated; and conducting a second sub-analysis of the data stream for the possible target activity to determine whether the target activity is indicated in the data stream with a higher degree of certainty than in the first sub-analysis, if the possible target activity is indicated by the second sub-analysis then the first alert is generated.
[0008] Preferably the second sub-analysis provides an indication of the target activity with a higher degree of certainty than in the first sub-analysis. Preferably the second analysis provides an indication of the target activity with a higher degree of certainty than in the second sub-analysis. [0009] Preferably the method further comprises propagating data from the data stream relevant to the second sub-analysis for conducting the second sub-analysis.
[0010] Preferably the method further comprises the step of propagating data from the data stream relevant to the second analysis for conducting the second analysis.
[0011] Preferably the second sub-analysis is conducted on additional data to the propagated data. Preferably the second analysis is conducted using additional data to the data propagated for the second analysis.
[0012] Preferably one or more additional levels of sub-analysis are conducted between the first sub-analysis and the second sub-analysis wherein an alert is generated by one of the additional levels and passed to a next of the additional levels. Preferably a subsequent analysis is conducted while determining whether the target activity is indicated to a higher degree of certainty than the previous level. Preferably the first sub-alert triggers the first of one or more additional levels of sub-analysis and the alert generated by the final level of additional sub-analysis triggers the second sub-analysis.
[0013] Preferably data is propagated from one additional level of sub-analysis to the next and includes data necessary in the subsequent levels of additional sub-analysis.
[0014] Preferably each additional level of sub-analysis is conducted on additional data specific to the type of analysis conducted in addition to the propagated data.
[0015] Preferably each level of the sub-analysis creates a third alert if a fraudulent activity is indicated with a relatively high degree of certainty, any one of the second alerts and third alerts triggering an action in the external system.
[0016] Preferably the first analysis may conduct one or more types of analysis in parallel.
[0017] Preferably one or more of the additional levels of sub-analysis may conduct one or more types of analysis in parallel.
[0018] Preferably the target activity is fraudulent activity.
[0019] According to a second aspect of the present invention there is provided a system for analysing data streams comprising at least: a first analyser arranged to analyse a data stream for possible target activity and if a possible target activity is indicated to generate a first alert; a second analyser arranged to conduct an analysis for possible target activity if the first alert is generated, and if a possible target activity is indicated with a relatively high probability by the second analysis to generate a second alert for an external system to act on.
[0020] According to a third aspect of the present invention there is provided a system for analysing datastreams comprising at least: one or more sequential analysers are arranged to conduct an analysis for possible target activity, a first analyser of the sequence of analysers analysing a data stream, each subsequent analyser of the sequence of analysers only conducting its analysis if the previous analyser indicates a possible target activity, and if a possible target activity is indicated by each analysis generating a subsequent alert for the next analyser; and a final analyser arranged to conduct an analysis for possible target activity if the last analyser of the sequence of analysers generates an alert, and if a possible target activity is indicated with a relatively high probability by the analysis of the final analyser, the final analyser generates an alert for an external system to act on.
[0021] According to another aspect of the present invention there is provided a method of analysing data streams comprising at least: conducing one or more sequential analyses of a data stream for possible target activity, the first of the analyses being conducted directly on the data stream, subsequent analyses after the first, only being conducted if the previous analysis indicated a possible target activity; conducting a final analysis for possible target activity if the last of the sequential analyses indicated a possible target activity; and if the final analysis indicates a possible target activity with a relatively high degree of certainty generating an alert to an external system for action.
BRIEF DESCRIPTION OF THE DRAWINGS
[0022] In order to facilitate a better understanding of the nature of the invention, preferred embodiments will now be described in greater detail, by way of example only, with reference to the accompanying drawings in which:
Figure 1 is a schematic representation of a preferred embodiment of a system for analysing data streams in accordance with the present invention; and
Figure 2 is a schematic representation exemplifying data analysis using the system of Figure 1.
DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS [0023] Referring to Figure 1 there is shown a system 10 that receives a data stream 12 (that may include one or more sub-streams) and outputs a data stream of alerts 34 for use by an external system. The system 10 includes a plurality of data analysis modules, in this case three are shown 14, 16 and 18. Each of the analysis modules 14, 16 and 18 receives respective additional data 20, 22 and 24 used in the analysis of the data stream 12 provided to the first data module 14. Each data module 14, 16 and 18 propagates data to the next data module indicated by propagated data 26 and 30. Each data module provides internal alerts 28 and 32 to the subsequent data module.
[0024] In the present example the system 10 is configured to identify suspicious telephone activity that may indicate fraud. Due to the high volume of telephone call data required to be processed, each data analysis module can provide a different analysis technique to progressively increase the certainty that the data indicated the presence of fraudulent telephone activity.
[0025] The system 10 may be implemented in the form of a computer or a network of computers programmed to perform the analysis of each of the modules. For example, a single computer can be programmed to run the system or a dedicated computer may be programmed to conduct each of the analysis of each of the modules with communication being provided between each of the computers of the whole system 10.
[0026] Each of the data analysis modules 14, 16 and 18 cascade data initially provided by data stream 12 to the subsequent module. The data stream 12 could, for example, include call data records (CDRs, which contain details of the calls made on a telecommunication network). For example, a portion of a CDR produced from a real call is given in Table 1. The fields contained in the CDR are (from top to bottom) A-number (the number of the phone from which the call was made), B-number (the number to which the call was made), B-number type (whether it was local, national, international etc encoded as a number), the call's cost, its duration and the date and time at which it started. Note that the four rightmost digits of the A- and B-numbers have been masked to conceal the identities of call to call parties. The data stream 12 can also include several substreams from different sources. For example, one substream could be a CDR stream, while another could provide customer information such as postcodes and payment histories.
TABLE 1
Figure imgf000007_0001
[0027] Each of the data analysis modules 14, 16 and 18 contains one or more fraud detection engines that analyse their input data for signs of fraudulent activity, in response to which they generate alerts. Each fraud detection engine can process different subsets of the modules' input data. Each data analysis module after the first, receives propagated data that is passed from the analysis module immediately receiving it in the hierarchy. The additional data available to each data analysis module may be specific to the type of analysis conducted by that particular data analysis module. The propagated data may contain low level data from the original data stream 12 or additional data used by data analysis modules lower in the hierarchy, depending on the configuration of the system 10.
[0028] The distinction between propagated data and additional data is important for the efficiency of the system because the analyses performed within particular analysis modules may require particular access to potentially large quantities of data that are not required elsewhere within the system. Propagating data that is not required in other analysis modules is a waste of resources and is likely to reduce the rate at which the system can process incoming data. Propagated data consists of information that is used in more that one data analysis module. For example, the A-number field is used to identify the calling party, is provided within the CDR stream that usually forms part of the systems input 12, and is usually required throughout the system, and hence usually propagated through the system rather than forming part of the additional data inputs.
[0029] Each of the data analysis modules 14, 16 and 18 can generate internal and external alerts. External alerts 34 are combined from all of the modules 14, 16 and 18 to form the output 34 of the system. Combining the outputs may be the equivalent of providing a logical OR to each of the alerts, so that if any of the modules generates an external alert, the system as a whole generates the alert. External alerts are only produced by the modules when the calculated probability of a target activity (fraud) is sufficiently high to reasonably conclude that fraud has occurred. What is considered a high probability depends on the particular application, its expected throughput, and the desired degree of certainty. When individual calls are analysed for fraud within telecommunication networks, a probability as large as 0.99995 to 0.99999 may be required to keep the number of alerts to a manageable level (since large networks can experience as many as 100 million calls per day).
[0030] Each of the data analysis modules 14, 16 and 18 can generate internal alerts if its analysis reveals something unusual, but does not provide sufficiently high probability that target activity is indicated to warrant an external alert. Internal alerts are important for regulating the activity of subsequent data analysis modules within the hierarchy of the system, because subsequent data analysis modules may only be activated if an internal alert is received, indicating that further analysis of the data is required to obtain the sufficient degree of certainty to generate an external alert. Subsequent data analysis modules 16 and 18 may only be activated if they receive an internal alert 28 or 32 from a proceeding analysis module or if any of its input data is updated. Preferably the additional data is only provided in response to a request made by a lower module and the input additional data is not configured to activate an analysis module.
[0031] For example, an analysis module 14, 16 or 18 may identify a short term increase in the total cost of calls made by a particular subscriber, which may not be severe enough to conclude that fraud has occurred and hence to generate an external alert. A subsystem may therefore generate an internal alert that causes the next module in the system to perform its analysis. This cascaded activation of analysis modules within the system means that lower level subsystems are activated most frequently and that the throughput of the system can be maximised by designing the lower level subsystems to require a minimum amount of processing. Higher level analysis, which is activated less frequently can thus use more expensive processes (such as nonlinear or iterative functions) and can perform expensive operations (such as database reads and writes) or make use of human intervention, with minimal effect on the throughput of the entire system. For example, a neural network could be trained to estimate the probability that a particular telephone call was fraudulent based on its characteristics (cost, duration, etc.) or Fourier analysis could be used to see if a short term fluctuation in the calling activity was part of a cycle of a subscriber's normal behaviour in an analysis module that becomes active only once a lower level system has generated an alert.
[0032] Dividing the system into a series of stages of increasing complexity of different (and in particular, increasing) complexity, also simplifies the problem of targeting different resources at different subsystems. For example, the lower level subsystems may need some level of parallelism in order to achieve the required throughput and thus can be distributed across several computers. Later stages may require so little resources that several can be run simultaneously on a single computer while others may require user interaction or database access, placing specific requirements on their geographic location. By building a fraud detection system from a hierarchy of subsystems of increasing sophistication it is possible to produce a superior trade off between fraud detection accuracy and throughput.
[0033] Each of the data analysis modules should be designed to generate many more internal false positives (that is, internal alerts for events that are not actually fraudulent) than internal false negatives (where an internal alert was not generated when fraud did in fact occur). This is because the higher level subsystems that are activated by the internal alerts may be able to provide a higher degree of certainty to confirm or refute the internal alert based on different analysis techniques and/or the inclusion of additional data in the analysis to clarify whether, with the required of certainty, the data indicates that a fraud is actually present. If the system is not designed in this way, then when false negatives occur the higher level subsystems are never activated and thus are not able to correct an error made by the lower level subsystem.
[0034] Conversely, the analysis modules 14, 16 and 18 are designed to generate a small number of external false positives (external alerts generated for events that are not actually fraudulent) and a large number of external false negatives (resulting in no external alert being generated when in fact a fraud did occur). This is because provided that an internal alert was generated, the external false negative can be corrected by higher level analysis modules generating their own external alerts. In a situation where a false positive external alert is generated the system as a whole will generate an alert that can't be prevented by analysis conducted at a subsequent level modules even if subsequent modules were activated.
[0035] Figure 2 shows an example of a real telecommunications fraud detection system based on the system 10. The input data stream 12 includes a CDR stream that provides details of each call made on the telecommunications network shortly after the call is terminated. The CDR stream is passed to the lowest level data analysis module 14 which is configured as a candidate fraud detector (CFD). The CFD contains two separate fraud detection algorithms, based on a set of rules 36 that search directly for common fraud indicators (such as more than 8 hours of calls to the Caribbean in any 24 hour period), and change detection algorithm 38 that searches for unusual changes in the pattern of behaviour associated with individual subscribers (which can indicate that a line has been taken over by fraudsters). These two components 36 and 38 of the lowest level data analysis module 14 operate independently. An internal alert 28 is generated when either of its components 36 and 38 indicates that a particular telephone call is a fraud candidate. The rules 36 and change detector 38 are designed to be fast and simple because the CDR stream 12 can present the data analysis module with as many as 100 million CDRs per day. The internal alerts 28 are passed to the next level data analysis module which operates as an intelligent alarm analyser (IAA) which is only activated when an internal alert is generated by the CFD.
[0036] With a typical fraud detection configuration, the ratio of the number of CDRs to the number of internal alerts 28 is about 1000:1 meaning that statistically the IAA is activated only once for every 1000 times the CFD is activated. The IAA is a rule based system that removes some of the false alerts generated by the CFD by performing complex analysis on the distributions of the alerts themselves. These complex analyses are possible due to the low level of activity demanded of the IAA compared to the CFD. The analyses also require time information (real world, date and time) which is provided to the IAA as additional data 22. When the IAA considers the distribution of alerts to be sufficiently suspicious, it generates an internal alert 32 which is passed to the next level data analysis module 18. The ratio of the numbers of alerts generated by the CFD compared to those generated by the IAA is usually around 500:1, meaning that statistically the third level of data analysis is activated once every 500 times the IAA is activated.
[0037] The third level data analysis module operates as a case manager. The case manager may be a team committed by the telecommunications operator employed for the purpose of investigating the events that caused internal alerts to be generated by the IAA. Because the case manager is a higher level subsystem it is activated only once every 500,000 or so CDRs and hence can use much slower and more expensive processing methods such as manual investigations of potential frauds than either the CFD or IAA without being overwhelmed.
[0038] The case manager uses customer information (names, addresses, payment histories, etc.) as further additional data 24 and frequently a wide variety of additional data sources (six month history of calls made by a particular customer) to investigate internal alerts 32 generated by the IAA to determine whether they are likely to be cases of actual fraud. If it is determined that they are, the case manager subsystem generates an external alert 34 which is passed out of the system. The alert could be used for a variety of purposes, such as to inform billing services within the network operator to remove fraudulent calls from a customer's bill, or to inform law enforcement agencies.
[0039] In this example, neither the CFD nor the IAA generate external alerts because of the technical difficulties in guaranteeing extremely low false alert rates that are required for the purposes for which the external alerts are intended. However it will be appreciated that in other configurations, these modules may be suited to generating external alerts. It is also noted that in this example, null additional data 20 is provided to the CFD. Furthermore, it is also noted that no data is propagated from the CFD to the IAA or from the IAA to the case manager. It is further noted that in an alternative configuration, additional data may be provided to the CFD or data may be propagated from the CFD to the IAA and possibly then from the IAA to the case manager.
[0040] It will be appreciated by the person skilled in the art that the hierarchical system and method of the present invention may be applied to data streams that originate from a variety of sources to identify target events. The above example of fraud detection on a telecommunications network is not intended to be limiting.
[0041] It will be appreciated that modifications may be made to the preferred forms of the present invention without departing from the basic inventive concept. Such modifications are intended to fall within the scope of the present invention, the nature of which is to be determined from the foregoing description and appended claims.

Claims

THE CLAIMS DEFINING THE INVENTION ARE AS FOLLOWS:
1. A method for analysing data streams comprising at least the steps of: receiving a data stream; conducting a first analysis of the data stream for a possible target activity, and if a possible target activity is indicated generating a first alert; if the first alert is generated, conducting a second analysis for the possible target activity to determine whether the target activity is indicated in the data stream with a high degree of certainty, if a possible target activity is indicated by the second analysis, generating a second alert; and providing the second alert to an external system for action.
2. A method according to claim 1, wherein the first analysis step comprises at least: conducing a first sub-analysis of the data stream for the possible target activity to determine whether the target activity is indicated in the data stream, if the possible target activity is indicated by the first sub-analysis then a first sub-alert is generated; and conducting a second sub-analysis of the data stream for the possible target activity to determine whether the target activity is indicated in the data stream with a higher degree of certainty than in the first sub-analysis, if the possible target activity is indicated by the second sub-analysis then the first alert is generated.
3. A method according to claim 2, wherein the second sub-analysis provides an indication of the target activity with a higher degree of certainty than in the first sub- analysis.
4. A method according to claim 3, wherein the second analysis provides an indication of the target activity with a higher degree of certainty than in the second sub-analysis.
5. A method according to claim 2, wherein the method further comprises propagating data from the data stream relevant to the second sub-analysis for conducting the second sub-analysis.
6. A method according to claim 2, wherein the method further comprises the step of propagating data from the data stream relevant to the second analysis for conducting the second analysis.
7. A method according to claim 2, wherein the second sub-analysis is conducted on additional data to the propagated data.
8. A method according to claim 7, wherein the second analysis is conducted using additional data to the data propagated for the second analysis.
9. A method according to claim 2, wherein one or more additional levels of sub- analysis are conducted between the first sub-analysis and the second sub-analysis wherein an alert is generated by one of the additional levels and passed to a next of the additional levels.
10. A method according to claim 9, wherein a subsequent analysis is conducted while determining whether the target activity is indicated to a higher degree of certainty than the previous level.
11. A method according to claim 10, wherein the first sub-alert triggers the first of one or more additional levels of sub-analysis and the alert generated by the final level of additional sub-analysis triggers the second sub-analysis.
12. A method according to claim 11, wherein data is propagated from one additional level of sub-analysis to the next and includes data necessary in the subsequent levels of additional sub-analysis.
13. A method according to claim 12, wherein each additional level of sub-analysis is conducted on additional data specific to the type of analysis conducted in addition to the propagated data.
14. A method according to claim 13, wherein each level of the sub-analysis creates a third alert if a fraudulent activity is indicated with a relatively high degree of certainty, any one of the second alerts and third alerts triggering an action in the external system.
15. A method according to claim 1, wherein the first analysis may conduct one or more types of analysis in parallel.
16. A method according to claim 2, wherein one or more of the additional levels of sub-analysis may conduct one or more types of analysis in parallel.
17. A system for analysing data streams comprising at least: a first analyser arranged to analyse a data stream for possible target activity and if a possible target activity is indicated to generate a first alert; a second analyser arranged to conduct an analysis for possible target activity if the first alert is generated, and if a possible target activity is indicated with a relatively high probability by the second analysis to generate a second alert for an external system to act on.
18. A system for analysing data streams comprising at least: one or more sequential analysers arranged to conduct an analysis for possible target activity, a first analyser of the sequence of analysers analysing a data stream, each subsequent analyser of the sequence of analysers only conducting its analysis if the previous analyser indicates a possible target activity, and if a possible target activity is indicated by each analysis generating a subsequent alert for the next analyser; and a final analyser arranged to conduct an analysis for possible target activity if the last analyser of the sequence of analysers generates an alert, and if a possible target activity is indicated with a relatively high probability by the analysis of the final analyser, the final analyser generates an alert for an external system to act on.
19. A system for analysing data streams comprising at least: conducing one or more sequential analyses of a data stream for possible target activity, the first of the analyses being conducted directly on the data stream, subsequent analyses after the first, only being conducted if the previous analysis indicated a possible target activity; conducting a final analysis for possible target activity if the last of the sequential analyses indicated a possible target activity; and if the final analysis indicates a possible target activity with a relatively high degree of certainty generating an alert to an external system for action.
PCT/AU2003/000460 2002-04-16 2003-04-16 A hierarchical system for analysing data streams WO2003090081A1 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
EP03714539A EP1499969A1 (en) 2002-04-16 2003-04-16 A hierarchical system for analysing data streams
AU2003218899A AU2003218899A1 (en) 2002-04-16 2003-04-16 A hierarchical system for analysing data streams
US10/965,703 US20050190905A1 (en) 2002-04-16 2004-10-14 Hierarchical system and method for analyzing data streams
US12/340,504 US20090164761A1 (en) 2002-04-16 2008-12-19 Hierarchical system and method for analyzing data streams

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB0208711.2 2002-04-16
GB0208711A GB0208711D0 (en) 2002-04-16 2002-04-16 A hierarchical system for analysing data streams

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US10/965,703 Continuation US20050190905A1 (en) 2002-04-16 2004-10-14 Hierarchical system and method for analyzing data streams

Publications (1)

Publication Number Publication Date
WO2003090081A1 true WO2003090081A1 (en) 2003-10-30

Family

ID=9934941

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/AU2003/000460 WO2003090081A1 (en) 2002-04-16 2003-04-16 A hierarchical system for analysing data streams

Country Status (4)

Country Link
EP (1) EP1499969A1 (en)
AU (1) AU2003218899A1 (en)
GB (1) GB0208711D0 (en)
WO (1) WO2003090081A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5680542A (en) * 1995-06-07 1997-10-21 Motorola, Inc. Method and apparatus for synchronizing data in a host memory with data in target MCU memory
GB2328043A (en) * 1997-07-26 1999-02-10 Ibm Managing a distributed data processing system
EP0985995A1 (en) * 1998-09-09 2000-03-15 International Business Machines Corporation Method and apparatus for intrusion detection in computers and computer networks
EP0833489B1 (en) * 1996-09-26 2002-05-15 Eyretel Limited Signal monitoring apparatus

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5680542A (en) * 1995-06-07 1997-10-21 Motorola, Inc. Method and apparatus for synchronizing data in a host memory with data in target MCU memory
EP0833489B1 (en) * 1996-09-26 2002-05-15 Eyretel Limited Signal monitoring apparatus
GB2328043A (en) * 1997-07-26 1999-02-10 Ibm Managing a distributed data processing system
EP0985995A1 (en) * 1998-09-09 2000-03-15 International Business Machines Corporation Method and apparatus for intrusion detection in computers and computer networks

Also Published As

Publication number Publication date
GB0208711D0 (en) 2002-05-29
EP1499969A1 (en) 2005-01-26
AU2003218899A1 (en) 2003-11-03

Similar Documents

Publication Publication Date Title
EP3324607B1 (en) Fraud detection on a communication network
US20230136732A1 (en) Systems and methods for phone number fraud prediction
US11240372B2 (en) System architecture for fraud detection
US10165128B2 (en) Toll-tree numbers metadata tagging, analysis and reporting
EP1889461B1 (en) Network assurance analytic system
US6732082B1 (en) System, method and computer program product for processing event records
US6587552B1 (en) Fraud library
US20090164761A1 (en) Hierarchical system and method for analyzing data streams
US20230344932A1 (en) Systems and methods for use in detecting anomalous call behavior
US6188753B1 (en) Method and apparatus for detection and prevention of calling card fraud
EP1499969A1 (en) A hierarchical system for analysing data streams
US6373935B1 (en) Workstation for calling card fraud analysis
EP1396141A1 (en) Variable length called number screening
Rosas et al. Telecommunications fraud: problem analysis-an agent-based KDD perspective
Kang et al. Toll Fraud Detection of Voip Services via an Ensemble of Novelty Detection Algorithms.
CN116915904A (en) Call service detection method, device and storage medium
Moreau et al. of Deliverable Definition of Fraud Detection Concepts

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NI NO NZ OM PH PL PT RO RU SC SD SE SG SK SL TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
WWE Wipo information: entry into national phase

Ref document number: 10965703

Country of ref document: US

WWE Wipo information: entry into national phase

Ref document number: 2003218899

Country of ref document: AU

WWE Wipo information: entry into national phase

Ref document number: 2003714539

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 3582/DELNP/2004

Country of ref document: IN

WWE Wipo information: entry into national phase

Ref document number: 3696/DELNP/2004

Country of ref document: IN

WWP Wipo information: published in national office

Ref document number: 2003714539

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: JP

WWW Wipo information: withdrawn in national office

Ref document number: JP