CN104715027A - Distributed data transaction judging and positioning method and system - Google Patents

Distributed data transaction judging and positioning method and system Download PDF

Info

Publication number
CN104715027A
CN104715027A CN201510096586.0A CN201510096586A CN104715027A CN 104715027 A CN104715027 A CN 104715027A CN 201510096586 A CN201510096586 A CN 201510096586A CN 104715027 A CN104715027 A CN 104715027A
Authority
CN
China
Prior art keywords
dimension
distributed data
unusual fluctuation
level
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510096586.0A
Other languages
Chinese (zh)
Other versions
CN104715027B (en
Inventor
李亮
刘朋飞
牟川
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201510096586.0A priority Critical patent/CN104715027B/en
Publication of CN104715027A publication Critical patent/CN104715027A/en
Priority to HK15109484.7A priority patent/HK1208927A1/en
Priority to PCT/CN2016/072348 priority patent/WO2016138805A1/en
Application granted granted Critical
Publication of CN104715027B publication Critical patent/CN104715027B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2264Multidimensional index structures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Complex Calculations (AREA)
  • Position Fixing By Use Of Radio Waves (AREA)

Abstract

The invention discloses a distributed data transaction judging and positioning method and system. The method comprises the steps that a plurality of dimensions are crossed and combined to obtain a plurality of dimension combinations, a plurality of current one-level dimension distributed data related to a one-level dimension and a plurality of current dimension combined distributed data related to the dimension combinations are generated, and a plurality of historical one-level dimension benchmark value distributed data related to the one-level dimension and a plurality of historical dimension combined benchmark value distributed data related to the dimension combinations are generated; the structure transaction of each current one-level dimension distributed datum and the structure transaction of each current dimension combined distributed datum are obtained, and if transaction one-level distribution distributed data or transaction dimension combined distributed data with the structure transaction exceeding the transaction threshold value exist, an alarm is given. The multi-dimension distributed data are checked on the one-level dimension and the dimension combinations, various defects of an existing transaction judging and transaction positioning method are overcome, and the transaction judgment is more rapid and accurate.

Description

A kind of distributed data unusual fluctuation judges localization method and system
Technical field
The present invention relates to distributed data unusual fluctuation correlative technology field, particularly a kind of distributed data unusual fluctuation judges positioning system.
Background technology
At internet industry, especially in e-commerce website business, all the time all in the data producing magnanimity, usually comprise various index in these data, and each index there is different dimension visual angles.Index is as order volume, the order amount of money etc., and dimension is as province, Order Type, the modes of payments etc.When fluctuation occurs an index, each dimension corresponding data also can fluctuate thereupon.For example, when on-line payment system breaks down, the indexs such as order volume, the order amount of money can be affected generally, correspondingly, the order volume that the various modes of payments is corresponding and the order amount of money have fluctuation, and these other dimensions external are as influenced too in the data on province, Order Type.Now, how does is to find out from data that payment system goes wrong the exception caused?
The factor superposition such as to rise one after another of the optimization and upgrading of, business changeable in market environment, sales promotion, also can cause these data of flowing rhythm.When fluctuation occurs data, unusual fluctuation (unusual fluctuations) can be judged to be; Under different condition, how locating accurately and rapidly in numerous data, namely screen out unusual fluctuation index and mainly come from which dimension, is the key problem that data unusual fluctuation is excavated.
For transaction data location, existing technology roughly adopts the method comparing fluctuating range based on threshold value.Specifically, the method is done weighted average calculation go out historical baseline values to recent (as nearest one week, recently January) data (corresponding data in concrete dimension), relatively latest data and these two groups of data of historical baseline values, investigate each data fluctuations amplitude, if when fluctuating range exceeds certain threshold value (threshold value is generally artificially set by rule of thumb), then decision data there occurs unusual fluctuation, and select wherein fluctuating range maximum as the main cause causing data unusual fluctuation.
The major defect of available data unusual fluctuation location technology scheme: on the whole, existing artificial unusual fluctuation monitoring is strong with location subjectivity, from suspecting to successively decomposing, unusual fluctuation navigates to that link involved by concrete detailed unusual fluctuation dimension is many, the numerous and diverse poor efficiency of long flow path, process.Specifically, be first the artificial subjectivity setting of threshold value, inadequate science is objective; Next is (inertia of data drops as festivals or holidays) under some scene, and the method compared based on threshold value easily causes erroneous judgement; Be finally when multi-group data exceeds respective threshold simultaneously, be usually difficult to the main cause of locator data unusual fluctuation.
Summary of the invention
Based on this, be necessary the technical matters for prior art, data unusual fluctuation being difficult to accurately judgement, provide a kind of distributed data unusual fluctuation to judge localization method and system.
A kind of distributed data unusual fluctuation judges localization method, comprising:
Distributed data preparation process, comprise: obtain various dimensions distributed data, and various dimensions reference value distributed data, described various dimensions reference value distributed data is historical baseline values corresponding to each data of described various dimensions distributed data, obtain multiple dimension by multiple dimension combined crosswise to combine, according to described various dimensions distributed data generate respectively multiple about one-level dimension when previous stage dimension distributed data, and multiple current dimension combination distributed data about dimension combination, multiple history one-level dimension reference value distributed data about one-level dimension is generated respectively according to described various dimensions reference value distributed data, and multiple history dimension combination reference value distributed data about dimension combination,
Unusual fluctuation determination step, comprise: obtain eachly working as the structure unusual fluctuation of previous stage dimension distributed data relative to corresponding history one-level dimension reference value distributed data when previous stage dimension distributed data and corresponding history one-level dimension reference value distributed data compare by described, structure unusual fluctuation exceed unusual fluctuation threshold value when previous stage dimension distributed data be unusual fluctuation one-level dimension distributed data, described current dimension combination distributed data and history dimension are combined reference value distributed data to compare and obtain each current dimension and combine distributed data combines reference value distributed data structure unusual fluctuation relative to corresponding history dimension, the current dimension combination distributed data that structure unusual fluctuation exceedes unusual fluctuation threshold value is unusual fluctuation dimension combination distributed data, if have unusual fluctuation one-level dimension distributed data or unusual fluctuation dimension combination distributed data, carry out alarm.
A kind of distributed data unusual fluctuation judges positioning system, comprising:
Distributed data preparation module, for: obtain various dimensions distributed data, and various dimensions reference value distributed data, described various dimensions reference value distributed data is historical baseline values corresponding to each data of described various dimensions distributed data, obtain multiple dimension by multiple dimension combined crosswise to combine, according to described various dimensions distributed data generate respectively multiple about one-level dimension when previous stage dimension distributed data, and multiple current dimension combination distributed data about dimension combination, multiple history one-level dimension reference value distributed data about one-level dimension is generated respectively according to described various dimensions reference value distributed data, and multiple history dimension combination reference value distributed data about dimension combination,
Unusual fluctuation determination module, for: obtain eachly working as the structure unusual fluctuation of previous stage dimension distributed data relative to corresponding history one-level dimension reference value distributed data when previous stage dimension distributed data and corresponding history one-level dimension reference value distributed data compare by described, structure unusual fluctuation exceed unusual fluctuation threshold value when previous stage dimension distributed data be unusual fluctuation one-level dimension distributed data, described current dimension combination distributed data and history dimension are combined reference value distributed data to compare and obtain each current dimension and combine distributed data combines reference value distributed data structure unusual fluctuation relative to corresponding history dimension, the current dimension combination distributed data that structure unusual fluctuation exceedes unusual fluctuation threshold value is unusual fluctuation dimension combination distributed data, if have unusual fluctuation one-level dimension distributed data or unusual fluctuation dimension combination distributed data, carry out alarm.
The present invention tests respectively to various dimensions distributed data in one-level dimension and dimension combination, overcomes the various deficiencies of existing unusual fluctuation judgement and unusual fluctuation localization method, unusual fluctuation is judged more rapid accurate.
Accompanying drawing explanation
Fig. 1 is the workflow diagram that a kind of distributed data unusual fluctuation of the present invention judges localization method;
Fig. 2 is the construction module figure that a kind of distributed data unusual fluctuation of the present invention judges positioning system;
Fig. 3 is the module diagram of preferred embodiment.
Embodiment
Below in conjunction with the drawings and specific embodiments, the present invention will be further described in detail.
Be illustrated in figure 1 the workflow diagram that a kind of distributed data unusual fluctuation of the present invention judges localization method, comprise:
Step S101, comprise: obtain various dimensions distributed data, and various dimensions reference value distributed data, described various dimensions reference value distributed data is historical baseline values corresponding to each data of described various dimensions distributed data, obtain multiple dimension by multiple dimension combined crosswise to combine, according to described various dimensions distributed data generate respectively multiple about one-level dimension when previous stage dimension distributed data, and multiple current dimension combination distributed data about dimension combination, multiple history one-level dimension reference value distributed data about one-level dimension is generated respectively according to described various dimensions reference value distributed data, and multiple history dimension combination reference value distributed data about dimension combination,
Step S102, comprise: obtain eachly working as the structure unusual fluctuation of previous stage dimension distributed data relative to corresponding history one-level dimension reference value distributed data when previous stage dimension distributed data and corresponding history one-level dimension reference value distributed data compare by described, structure unusual fluctuation exceed unusual fluctuation threshold value when previous stage dimension distributed data be unusual fluctuation one-level dimension distributed data, described current dimension combination distributed data and history dimension are combined reference value distributed data to compare and obtain each current dimension and combine distributed data combines reference value distributed data structure unusual fluctuation relative to corresponding history dimension, the current dimension combination distributed data that structure unusual fluctuation exceedes unusual fluctuation threshold value is unusual fluctuation dimension combination distributed data, if have unusual fluctuation one-level dimension distributed data or unusual fluctuation dimension combination distributed data, carry out alarm.
In step S101, various dimensions distributed data is decomposed into about one-level dimension when previous stage dimension distributed data, and multiple current dimension combination distributed data about dimension combination.Multiple data are comprised under each dimension.For order volume index as the distributed data of the on-line payment system of dimension, the dimensions such as province, Order Type, the modes of payments are one-level dimension, and " province _ Order Type ", " province _ modes of payments ", " Order Type _ modes of payments ", " province _ Order Type _ modes of payments " are then dimension combination.Every one dimension comprises multiple data, such as, can comprise in province dimension: the data of the data of province A, the data of province B, province C, and can comprise in Order Type dimension: the data of the data of Order Type D, the data of Order Type E, Order Type F, modes of payments dimension can comprise: the data of the data of modes of payments G, the data of modes of payments H, modes of payments I.Then " province _ Order Type " comprising: province A and the data of Order Type D, province A and the data of Order Type E, province A and the data of Order Type F, province B and the data of Order Type D, province B and the data of Order Type E, province B and the data of Order Type F, province C and the data of Order Type D, province C and the data of Order Type E, province C and the data of Order Type F." province _ modes of payments ", " Order Type _ modes of payments ", " province _ Order Type _ modes of payments ", by that analogy.Similarly can obtain combining reference value distributed data by the history one-level dimension reference value distributed data of various dimensions reference value distributed data gained and history dimension.Wherein, various dimensions reference value distributed data is the reference value of each data of each dimension of corresponding various dimensions distributed data, such as, for the reference value of province A, the reference value etc. for province B.Processed by non-transaction data in early stage corresponding for each data in various dimensions distributed data, weighted mean generates historical baseline values and is stored as a multidimensional data table and then obtains various dimensions reference value distributed data.Granularity when various dimensions distributed data can adopt, day granularity, all granularities, the moon granularity, year granularity equal time granularity preserve, early stage, non-transaction data then referred to that the data acquisition same time granularity of preserving in various dimensions distributed data carries out the data without unusual fluctuation in the Primary Stage Data preserved.Such as the data of province A, if employing day granularity is preserved, be then the data without unusual fluctuation in the data of the province A of front N days, the historical baseline values on average then obtaining province A is weighted to it.
In step S102, to working as previous stage dimension distributed data, and current dimension combination distributed data calculates it combines reference value distributed data structure unusual fluctuation relative to history one-level dimension reference value distributed data and history dimension respectively.Based on test of hypothesis, structural diagnosis is carried out to above-mentioned two groups of data, find that whether the structure of two groups of data is consistent, inconsistent, think there is unusual fluctuation.Namely judge when whether previous stage dimension distributed data is consistent with the structure of history one-level dimension reference value distributed data by structure unusual fluctuation, judge that whether the structure that current dimension combination distributed data and history dimension combine reference value distributed data is consistent by structure unusual fluctuation.The thought of test of hypothesis is small probability reduction to absurdity thought, refers to that small probability event (as P<0.01 or P<0.05) substantially can not occur in single test during small probability thought.In step S102, utilize this thought, be exactly first suppose that two groups of data structures are consistent, then the possibility size supposing to set up is determined by the method for statistical test, as possibility is very little, then hypothesis is false, and illustrates that two groups of data structures change, thus draws there is unusual fluctuation in this dimension.
Technical solution of the present invention, based on the thinking of test of hypothesis, by the method for testing to the structural testing of achievement data in dimension or the data structure after dimension intersection, compared to the decision method comparing fluctuating range based on threshold value, unusual fluctuation can be judged more accurately, and unusual fluctuation location can be made fast.
Still take example before to illustrate, when on-line payment system breaks down, order volume, the order amount of money have fluctuation, and the data of modes of payments dimension have fluctuation certainly, and the data of same province dimension or Order Type dimension also have fluctuation.The existing method comparing fluctuating range based on threshold value, in general can find that the data in these three dimensions have unusual fluctuation, but to be difficult to orient be pay the unusual fluctuation that causes of link.But by the present invention is based on the method for test of hypothesis, respectively the data in the modes of payments, province, these three dimensions of Order Type are tested, be not difficult to find, province, Order Type data are compared with historical baseline values, possible numerical value all has decline, but is basically identical (as province dimension, the change of the data accounting in each province is little) in one-piece construction, by structural testing, exception would not be judged as.But from modes of payments dimension, when on-line payment is gone wrong, order volume or the order amount of money accounting of on-line payment must decline very serious, other modes of payments as cashed on delivery, the accounting of postal remittance etc. is then shifting significantly rises, its structure there occurs obvious exception, structural testing is carried out to data, just can capture this extremely, thus realize the unusual fluctuation location of data.So the present invention compensate for the deficiency of existing unusual fluctuation judgement and unusual fluctuation localization method.
Wherein in an embodiment, also comprise:
Unusual fluctuation positioning step, comprise: using the dimension corresponding to unusual fluctuation one-level dimension distributed data the highest for structure unusual fluctuation as crucial unusual fluctuation dimension, the dimension of described unusual fluctuation dimension combination corresponding to distributed data is combined as the combination of unusual fluctuation dimension, the dimension that described unusual fluctuation dimension combination comprises described crucial unusual fluctuation dimension is combined as the dimension affected by crucial unusual fluctuation dimension and combines, described other dimensions except crucial unusual fluctuation dimension included by dimension combination affected by crucial unusual fluctuation dimension are the dimension affected by crucial unusual fluctuation dimension, the dimension showing described crucial unusual fluctuation dimension and affect by crucial unusual fluctuation dimension.
In the present embodiment, by judging crucial unusual fluctuation dimension, thus find out the dimension affected by crucial unusual fluctuation dimension and combine, draw other dimensions affected by crucial unusual fluctuation dimension.
Wherein in an embodiment:
Described unusual fluctuation determination step, specifically comprise: calculate when previous stage dimension distributed data and the chi-square value of corresponding history one-level dimension reference value distributed data, described unusual fluctuation one-level dimension distributed data be corresponding chi-square value exceed unusual fluctuation threshold value when previous stage dimension distributed data, calculate the chi-square value that the combination of current dimension distributed data and corresponding history dimension combines reference value distributed data, it is that the current dimension that the chi-square value of correspondence exceedes unusual fluctuation threshold value combines distributed data that described unusual fluctuation dimension combines distributed data;
Described unusual fluctuation positioning step, specifically comprises: the highest unusual fluctuation one-level dimension distributed data of structure unusual fluctuation is the unusual fluctuation one-level dimension distributed data corresponding with minimum X2 value.
Chi-square Test: Chi-square Test is a kind of hypothesis testing method, departure degree between the actual observed value of statistical sample and theoretical implications value, departure degree between actual observed value and theoretical implications value just determines the size of chi-square value, and chi-square value is larger, does not more meet, deviation is less, chi-square value is less, is more tending towards meeting, if value complete equal time, chi-square value is just 0, shows that theoretical value meets completely.Can draw the probability that hypothesis is set up, i.e. level of significance or P value by chi-square value, P value is less, then suppose that the possibility set up is little, suppose more to be false.
The present embodiment adopts minimum X2 value to judge unusual fluctuation, and unusual fluctuation is judged, and location is more reliable.
Wherein in an embodiment:
Described unusual fluctuation determination step, specifically comprise: calculate when previous stage dimension distributed data and the chi-square value of corresponding history one-level dimension reference value distributed data, described unusual fluctuation one-level dimension distributed data be corresponding chi-square value exceed unusual fluctuation threshold value when previous stage dimension distributed data, calculate the chi-square value that the combination of current dimension distributed data and corresponding history dimension combines reference value distributed data, it is that the current dimension that the chi-square value of correspondence exceedes unusual fluctuation threshold value combines distributed data that described unusual fluctuation dimension combines distributed data;
Described unusual fluctuation positioning step, specifically comprise: select the unusual fluctuation one-level dimension distributed data corresponding with minimum X2 value to be minimum unusual fluctuation one-level dimension distributed data, from other unusual fluctuation one-level dimension distributed data, the difference of corresponding chi-square value and minimum X2 value is selected to be less than the unusual fluctuation one-level dimension distributed data of difference threshold, carry out the test of fitness of fot with corresponding history one-level dimension reference value distributed data and calculate the coefficient of determination, the highest unusual fluctuation one-level dimension distributed data of structure unusual fluctuation is the unusual fluctuation one-level dimension distributed data corresponding with the minimum coefficient of determination.
The goodness of fit (Goodness of Fit) refers to the fitting degree of regression straight line to observed reading.The statistic of the tolerance goodness of fit is the coefficient of determination (also known as determining coefficient) R^2.The span of R^2 is [0,1].The value of R^2, more close to 1, illustrates that regression straight line is better to the fitting degree of observed reading; Otherwise the value of R^2, more close to 0, illustrates that the fitting degree of regression straight line to observed reading is poorer.
Judge that the mode adopting chi-square value to be combined with the test of fitness of fot is carried out judgement to unusual fluctuation and located to unusual fluctuation in the present embodiment, unusual fluctuation is judged, and location is more accurate.
Wherein in an embodiment, described unusual fluctuation positioning step, also comprise: using the unusual fluctuation one-level dimension distributed data corresponding to crucial unusual fluctuation dimension as crucial unusual fluctuation one-level dimension distributed data, to each data item respectively calculated difference of crucial unusual fluctuation one-level dimension distributed data with corresponding history one-level dimension reference value distributed data, using the data item of the wherein maximum absolute value of difference as unusual fluctuation main cause, show described unusual fluctuation main cause.
The present embodiment can demonstrate unusual fluctuation main cause, and unusual fluctuation is judged, and location is more accurate.
Fig. 2 is the construction module figure that a kind of distributed data unusual fluctuation of the present invention judges positioning system, comprising:
Distributed data preparation module 201, for: obtain various dimensions distributed data, and various dimensions reference value distributed data, described various dimensions reference value distributed data is historical baseline values corresponding to each data of described various dimensions distributed data, obtain multiple dimension by multiple dimension combined crosswise to combine, according to described various dimensions distributed data generate respectively multiple about one-level dimension when previous stage dimension distributed data, and multiple current dimension combination distributed data about dimension combination, multiple history one-level dimension reference value distributed data about one-level dimension is generated respectively according to described various dimensions reference value distributed data, and multiple history dimension combination reference value distributed data about dimension combination,
Unusual fluctuation determination module 202, for: obtain eachly working as the structure unusual fluctuation of previous stage dimension distributed data relative to corresponding history one-level dimension reference value distributed data when previous stage dimension distributed data and corresponding history one-level dimension reference value distributed data compare by described, structure unusual fluctuation exceed unusual fluctuation threshold value when previous stage dimension distributed data be unusual fluctuation one-level dimension distributed data, described current dimension combination distributed data and history dimension are combined reference value distributed data to compare and obtain each current dimension and combine distributed data combines reference value distributed data structure unusual fluctuation relative to corresponding history dimension, the current dimension combination distributed data that structure unusual fluctuation exceedes unusual fluctuation threshold value is unusual fluctuation dimension combination distributed data, if have unusual fluctuation one-level dimension distributed data or unusual fluctuation dimension combination distributed data, carry out alarm.
Wherein in an embodiment, also comprise:
Unusual fluctuation locating module, for: using the dimension corresponding to unusual fluctuation one-level dimension distributed data the highest for structure unusual fluctuation as crucial unusual fluctuation dimension, the dimension of described unusual fluctuation dimension combination corresponding to distributed data is combined as the combination of unusual fluctuation dimension, the dimension that described unusual fluctuation dimension combination comprises described crucial unusual fluctuation dimension is combined as the dimension affected by crucial unusual fluctuation dimension and combines, described other dimensions except crucial unusual fluctuation dimension included by dimension combination affected by crucial unusual fluctuation dimension are the dimension affected by crucial unusual fluctuation dimension, the dimension showing described crucial unusual fluctuation dimension and affect by crucial unusual fluctuation dimension.
Wherein in an embodiment:
Described unusual fluctuation determination module, specifically for: calculate when previous stage dimension distributed data and the chi-square value of corresponding history one-level dimension reference value distributed data, described unusual fluctuation one-level dimension distributed data be corresponding chi-square value exceed unusual fluctuation threshold value when previous stage dimension distributed data, calculate the chi-square value that the combination of current dimension distributed data and corresponding history dimension combines reference value distributed data, it is that the current dimension that the chi-square value of correspondence exceedes unusual fluctuation threshold value combines distributed data that described unusual fluctuation dimension combines distributed data;
Described unusual fluctuation locating module, the unusual fluctuation one-level dimension distributed data the highest specifically for: structure unusual fluctuation is the unusual fluctuation one-level dimension distributed data corresponding with minimum X2 value.
Wherein in an embodiment:
Described unusual fluctuation determination module, specifically for: calculate when previous stage dimension distributed data and the chi-square value of corresponding history one-level dimension reference value distributed data, described unusual fluctuation one-level dimension distributed data be corresponding chi-square value exceed unusual fluctuation threshold value when previous stage dimension distributed data, calculate the chi-square value that the combination of current dimension distributed data and corresponding history dimension combines reference value distributed data, it is that the current dimension that the chi-square value of correspondence exceedes unusual fluctuation threshold value combines distributed data that described unusual fluctuation dimension combines distributed data;
Described unusual fluctuation locating module, specifically for: select the unusual fluctuation one-level dimension distributed data corresponding with minimum X2 value to be minimum unusual fluctuation one-level dimension distributed data, from other unusual fluctuation one-level dimension distributed data, the difference of corresponding chi-square value and minimum X2 value is selected to be less than the unusual fluctuation one-level dimension distributed data of difference threshold, carry out the test of fitness of fot with corresponding history one-level dimension reference value distributed data and calculate the coefficient of determination, the highest unusual fluctuation one-level dimension distributed data of structure unusual fluctuation is the unusual fluctuation one-level dimension distributed data corresponding with the minimum coefficient of determination.
Wherein in an embodiment, described unusual fluctuation locating module, also for: using the unusual fluctuation one-level dimension distributed data corresponding to crucial unusual fluctuation dimension as crucial unusual fluctuation one-level dimension distributed data, to each data item respectively calculated difference of crucial unusual fluctuation one-level dimension distributed data with corresponding history one-level dimension reference value distributed data, using the data item of the wherein maximum absolute value of difference as unusual fluctuation main cause, show described unusual fluctuation main cause.
Fig. 3 is the module diagram of preferred embodiment, comprising:
Data preparation module 310: the major function of data preparation module carries out pre-service to multi objective multi-dimensional data.Specifically comprise:
Data input submodule 311, for obtaining the latest data be stored in day granularity in multidimensional data table;
Data prediction submodule 312, exactly pre-service is carried out to latest data, to the data be stored in day granularity in multidimensional data table, multistage dimension after intersecting according to dimension or dimension respectively, carry out data aggregate, null value column processing, accounting small data column processing, thus generate the distributed data of the multistage dimension of index after one-level dimension or dimension are intersected.Specifically, for order volume index, respectively data prediction is carried out to dimensions (one-level dimension) such as province, Order Type, the modes of payments.
Dimension intersection submodule 313, does fully intermeshing combination to these dimensions and intersects, generate new multistage dimension to carry out corresponding data prediction, as " province _ Order Type ", " province _ modes of payments ", " Order Type _ modes of payments ", " province _ Order Type _ modes of payments ".Like this, we can not only remove the different condition investigating data from the visual angle of one-level dimension, whether can also refine to multistage dimension has unusual fluctuation to excavate local data.
Historical baseline values process submodule 314, processes the non-transaction data in early stage be stored in day granularity in multidimensional data table, and weighted mean generates historical baseline values and is stored as a multidimensional data table.This is comprised to the multidimensional data table of historical baseline values, similarly perform corresponding pretreatment process by data prediction submodule 312 and dimension intersection submodule 313, just can obtain the historical baseline values distributed data of the multistage dimension of index after one-level dimension or dimension are intersected.
Unusual fluctuation determination module 320: data, after the pre-service of data preparation module flow process, can export two groups of data, namely one-level dimension or dimension intersect after the distributed data on the same day and historical baseline values distributed data in multistage dimension.The major function of unusual fluctuation determination module carries out structural diagnosis based on test of hypothesis to these two groups of data, finds that whether the structure of two groups of data is consistent, inconsistent, think there is unusual fluctuation.During the thought of test of hypothesis, small probability reduction to absurdity thought, refers to during small probability thought that small probability event (as P<0.01 or P<0.05) substantially can not occur in single test.Unusual fluctuation determination module, utilize this thought, be exactly first suppose that two groups of data structures are consistent, then the possibility size supposing to set up is determined by the method for statistical test, as possibility is very little, then hypothesis is false, and illustrates that two groups of data structures change, thus draws there is unusual fluctuation in this dimension.This module comprises Chi-square statistic submodule 321 and goodness of fit submodule 322, adopt the method for Chi-square Test and the test of fitness of fot, under some scene, when overall data fluctuation is larger, the P value that in multiple dimension, Chi-square Test draws may all approximately equals, and the coefficient of determination R^2 that the now test of fitness of fot is calculated can be used for the size of structure change in these dimensions of aided verification.Alarm is carried out when there is unusual fluctuation.
Unusual fluctuation locating module 330: the major function of this module excavates crucial unusual fluctuation dimension from all structures of unusual fluctuation determination module acquisition change dimension, and by other dimensions at different levels that crucial unusual fluctuation dimension affects, comprise awl module 332 under dimension locator module 331 and cross-dimension, bore algorithm under corresponding dimension location algorithm and cross-dimension respectively.Dimension location algorithm, can look for crucial unusual fluctuation dimension in one-level dimension and secondary dimension, in dimension at the same level, namely preferentially compares the size of P value and assist the size comparing R^2 value, minimum is thought crucial unusual fluctuation dimension.Then calculate the distributed data on the same day and the every difference of historical baseline values distributed data in this crucial unusual fluctuation dimension and sort, data item maximum for wherein absolute difference is thought the main cause causing unusual fluctuation.Boring algorithm under cross-dimension is behind crucial unusual fluctuation dimension location, using comprise crucial unusual fluctuation dimension in the combination of those dimensions and the dimension self being judged as again unusual fluctuation as the dimension affected by crucial unusual fluctuation dimension.For example, " if the modes of payments " with other dimensions as " province ", " Order Type " compare the result of test of hypothesis, finally " modes of payments " is positioned as the words of crucial unusual fluctuation dimension, then more every in " modes of payments " dimension fluctuation situation, if wherein on-line payment data fluctuations is maximum, then the fluctuation of on-line payment data is thought the main cause of unusual fluctuation.Finally, inner in the cross-dimension (i.e. " province _ modes of payments ", " Order Type _ modes of payments " etc.) comprising this crucial unusual fluctuation dimension of the modes of payments exactly, find out the dimension affected by crucial unusual fluctuation dimension.The dimension finally export crucial unusual fluctuation dimension, affecting by crucial unusual fluctuation dimension and the main cause of unusual fluctuation.
The above embodiment only have expressed several embodiment of the present invention, and it describes comparatively concrete and detailed, but therefore can not be interpreted as the restriction to the scope of the claims of the present invention.It should be pointed out that for the person of ordinary skill of the art, without departing from the inventive concept of the premise, can also make some distortion and improvement, these all belong to protection scope of the present invention.Therefore, the protection domain of patent of the present invention should be as the criterion with claims.

Claims (10)

1. distributed data unusual fluctuation judges a localization method, it is characterized in that, comprising:
Distributed data preparation process, comprise: obtain various dimensions distributed data, and various dimensions reference value distributed data, described various dimensions reference value distributed data is historical baseline values corresponding to each data of described various dimensions distributed data, obtain multiple dimension by multiple dimension combined crosswise to combine, according to described various dimensions distributed data generate respectively multiple about one-level dimension when previous stage dimension distributed data, and multiple current dimension combination distributed data about dimension combination, multiple history one-level dimension reference value distributed data about one-level dimension is generated respectively according to described various dimensions reference value distributed data, and multiple history dimension combination reference value distributed data about dimension combination,
Unusual fluctuation determination step, comprise: obtain eachly working as the structure unusual fluctuation of previous stage dimension distributed data relative to corresponding history one-level dimension reference value distributed data when previous stage dimension distributed data and corresponding history one-level dimension reference value distributed data compare by described, structure unusual fluctuation exceed unusual fluctuation threshold value when previous stage dimension distributed data be unusual fluctuation one-level dimension distributed data, described current dimension combination distributed data and history dimension are combined reference value distributed data to compare and obtain each current dimension and combine distributed data combines reference value distributed data structure unusual fluctuation relative to corresponding history dimension, the current dimension combination distributed data that structure unusual fluctuation exceedes unusual fluctuation threshold value is unusual fluctuation dimension combination distributed data, if have unusual fluctuation one-level dimension distributed data or unusual fluctuation dimension combination distributed data, carry out alarm.
2. distributed data unusual fluctuation according to claim 1 judges localization method, it is characterized in that, also comprises:
Unusual fluctuation positioning step, comprise: using the dimension corresponding to unusual fluctuation one-level dimension distributed data the highest for structure unusual fluctuation as crucial unusual fluctuation dimension, the dimension of described unusual fluctuation dimension combination corresponding to distributed data is combined as the combination of unusual fluctuation dimension, the dimension that described unusual fluctuation dimension combination comprises described crucial unusual fluctuation dimension is combined as the dimension affected by crucial unusual fluctuation dimension and combines, described other dimensions except crucial unusual fluctuation dimension included by dimension combination affected by crucial unusual fluctuation dimension are the dimension affected by crucial unusual fluctuation dimension, the dimension showing described crucial unusual fluctuation dimension and affect by crucial unusual fluctuation dimension.
3. distributed data unusual fluctuation according to claim 2 judges localization method, it is characterized in that:
Described unusual fluctuation determination step, specifically comprise: calculate when previous stage dimension distributed data and the chi-square value of corresponding history one-level dimension reference value distributed data, described unusual fluctuation one-level dimension distributed data be corresponding chi-square value exceed unusual fluctuation threshold value when previous stage dimension distributed data, calculate the chi-square value that the combination of current dimension distributed data and corresponding history dimension combines reference value distributed data, it is that the current dimension that the chi-square value of correspondence exceedes unusual fluctuation threshold value combines distributed data that described unusual fluctuation dimension combines distributed data;
Described unusual fluctuation positioning step, specifically comprises: the highest unusual fluctuation one-level dimension distributed data of structure unusual fluctuation is the unusual fluctuation one-level dimension distributed data corresponding with minimum X2 value.
4. distributed data unusual fluctuation according to claim 2 judges localization method, it is characterized in that,
Described unusual fluctuation determination step, specifically comprise: calculate when previous stage dimension distributed data and the chi-square value of corresponding history one-level dimension reference value distributed data, described unusual fluctuation one-level dimension distributed data be corresponding chi-square value exceed unusual fluctuation threshold value when previous stage dimension distributed data, calculate the chi-square value that the combination of current dimension distributed data and corresponding history dimension combines reference value distributed data, it is that the current dimension that the chi-square value of correspondence exceedes unusual fluctuation threshold value combines distributed data that described unusual fluctuation dimension combines distributed data;
Described unusual fluctuation positioning step, specifically comprise: select the unusual fluctuation one-level dimension distributed data corresponding with minimum X2 value to be minimum unusual fluctuation one-level dimension distributed data, from other unusual fluctuation one-level dimension distributed data, the difference of corresponding chi-square value and minimum X2 value is selected to be less than the unusual fluctuation one-level dimension distributed data of difference threshold, carry out the test of fitness of fot with corresponding history one-level dimension reference value distributed data and calculate the coefficient of determination, the highest unusual fluctuation one-level dimension distributed data of structure unusual fluctuation is the unusual fluctuation one-level dimension distributed data corresponding with the minimum coefficient of determination.
5. distributed data unusual fluctuation according to claim 2 judges localization method, it is characterized in that, described unusual fluctuation positioning step, also comprise: using the unusual fluctuation one-level dimension distributed data corresponding to crucial unusual fluctuation dimension as crucial unusual fluctuation one-level dimension distributed data, to each data item respectively calculated difference of crucial unusual fluctuation one-level dimension distributed data with corresponding history one-level dimension reference value distributed data, using the data item of the wherein maximum absolute value of difference as unusual fluctuation main cause, show described unusual fluctuation main cause.
6. distributed data unusual fluctuation judges a positioning system, it is characterized in that, comprising:
Distributed data preparation module, for: obtain various dimensions distributed data, and various dimensions reference value distributed data, described various dimensions reference value distributed data is historical baseline values corresponding to each data of described various dimensions distributed data, obtain multiple dimension by multiple dimension combined crosswise to combine, according to described various dimensions distributed data generate respectively multiple about one-level dimension when previous stage dimension distributed data, and multiple current dimension combination distributed data about dimension combination, multiple history one-level dimension reference value distributed data about one-level dimension is generated respectively according to described various dimensions reference value distributed data, and multiple history dimension combination reference value distributed data about dimension combination,
Unusual fluctuation determination module, for: obtain eachly working as the structure unusual fluctuation of previous stage dimension distributed data relative to corresponding history one-level dimension reference value distributed data when previous stage dimension distributed data and corresponding history one-level dimension reference value distributed data compare by described, structure unusual fluctuation exceed unusual fluctuation threshold value when previous stage dimension distributed data be unusual fluctuation one-level dimension distributed data, described current dimension combination distributed data and history dimension are combined reference value distributed data to compare and obtain each current dimension and combine distributed data combines reference value distributed data structure unusual fluctuation relative to corresponding history dimension, the current dimension combination distributed data that structure unusual fluctuation exceedes unusual fluctuation threshold value is unusual fluctuation dimension combination distributed data, if have unusual fluctuation one-level dimension distributed data or unusual fluctuation dimension combination distributed data, carry out alarm.
7. distributed data unusual fluctuation according to claim 6 judges positioning system, it is characterized in that, also comprises:
Unusual fluctuation locating module, for: using the dimension corresponding to unusual fluctuation one-level dimension distributed data the highest for structure unusual fluctuation as crucial unusual fluctuation dimension, the dimension of described unusual fluctuation dimension combination corresponding to distributed data is combined as the combination of unusual fluctuation dimension, the dimension that described unusual fluctuation dimension combination comprises described crucial unusual fluctuation dimension is combined as the dimension affected by crucial unusual fluctuation dimension and combines, described other dimensions except crucial unusual fluctuation dimension included by dimension combination affected by crucial unusual fluctuation dimension are the dimension affected by crucial unusual fluctuation dimension, the dimension showing described crucial unusual fluctuation dimension and affect by crucial unusual fluctuation dimension.
8. distributed data unusual fluctuation according to claim 6 judges positioning system, it is characterized in that:
Described unusual fluctuation determination module, specifically for: calculate when previous stage dimension distributed data and the chi-square value of corresponding history one-level dimension reference value distributed data, described unusual fluctuation one-level dimension distributed data be corresponding chi-square value exceed unusual fluctuation threshold value when previous stage dimension distributed data, calculate the chi-square value that the combination of current dimension distributed data and corresponding history dimension combines reference value distributed data, it is that the current dimension that the chi-square value of correspondence exceedes unusual fluctuation threshold value combines distributed data that described unusual fluctuation dimension combines distributed data;
Described unusual fluctuation locating module, the unusual fluctuation one-level dimension distributed data the highest specifically for: structure unusual fluctuation is the unusual fluctuation one-level dimension distributed data corresponding with minimum X2 value.
9. distributed data unusual fluctuation according to claim 6 judges positioning system, it is characterized in that,
Described unusual fluctuation determination module, specifically for: calculate when previous stage dimension distributed data and the chi-square value of corresponding history one-level dimension reference value distributed data, described unusual fluctuation one-level dimension distributed data be corresponding chi-square value exceed unusual fluctuation threshold value when previous stage dimension distributed data, calculate the chi-square value that the combination of current dimension distributed data and corresponding history dimension combines reference value distributed data, it is that the current dimension that the chi-square value of correspondence exceedes unusual fluctuation threshold value combines distributed data that described unusual fluctuation dimension combines distributed data;
Described unusual fluctuation locating module, specifically for: select the unusual fluctuation one-level dimension distributed data corresponding with minimum X2 value to be minimum unusual fluctuation one-level dimension distributed data, from other unusual fluctuation one-level dimension distributed data, the difference of corresponding chi-square value and minimum X2 value is selected to be less than the unusual fluctuation one-level dimension distributed data of difference threshold, carry out the test of fitness of fot with corresponding history one-level dimension reference value distributed data and calculate the coefficient of determination, the highest unusual fluctuation one-level dimension distributed data of structure unusual fluctuation is the unusual fluctuation one-level dimension distributed data corresponding with the minimum coefficient of determination.
10. distributed data unusual fluctuation according to claim 6 judges positioning system, it is characterized in that, described unusual fluctuation locating module, also for: using the unusual fluctuation one-level dimension distributed data corresponding to crucial unusual fluctuation dimension as crucial unusual fluctuation one-level dimension distributed data, to each data item respectively calculated difference of crucial unusual fluctuation one-level dimension distributed data with corresponding history one-level dimension reference value distributed data, using the data item of the wherein maximum absolute value of difference as unusual fluctuation main cause, show described unusual fluctuation main cause.
CN201510096586.0A 2015-03-04 2015-03-04 A kind of distributed data unusual fluctuation judges localization method and system Active CN104715027B (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN201510096586.0A CN104715027B (en) 2015-03-04 2015-03-04 A kind of distributed data unusual fluctuation judges localization method and system
HK15109484.7A HK1208927A1 (en) 2015-03-04 2015-09-25 Method and system for judging and positioning unusual change in distributed data
PCT/CN2016/072348 WO2016138805A1 (en) 2015-03-04 2016-01-27 Method and system for determining and locating distributed data transaction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510096586.0A CN104715027B (en) 2015-03-04 2015-03-04 A kind of distributed data unusual fluctuation judges localization method and system

Publications (2)

Publication Number Publication Date
CN104715027A true CN104715027A (en) 2015-06-17
CN104715027B CN104715027B (en) 2018-03-30

Family

ID=53414354

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510096586.0A Active CN104715027B (en) 2015-03-04 2015-03-04 A kind of distributed data unusual fluctuation judges localization method and system

Country Status (3)

Country Link
CN (1) CN104715027B (en)
HK (1) HK1208927A1 (en)
WO (1) WO2016138805A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016138805A1 (en) * 2015-03-04 2016-09-09 北京京东尚科信息技术有限公司 Method and system for determining and locating distributed data transaction
CN107908533A (en) * 2017-06-15 2018-04-13 平安科技(深圳)有限公司 A kind of monitoring method, device, computer-readable recording medium and the equipment of database performance index
CN108880845A (en) * 2017-05-16 2018-11-23 腾讯科技(深圳)有限公司 A kind of method and relevant apparatus of information alert
CN109697203A (en) * 2017-10-23 2019-04-30 腾讯科技(深圳)有限公司 Index unusual fluctuation analysis method and equipment, computer storage medium, computer equipment
CN111090644A (en) * 2019-12-26 2020-05-01 成都康赛信息技术有限公司 Data consistency evaluation method based on data distribution fluctuation rate

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030105660A1 (en) * 2001-02-20 2003-06-05 Walsh Kenneth Peter Method of relating multiple independent databases
US20030200134A1 (en) * 2002-03-29 2003-10-23 Leonard Michael James System and method for large-scale automatic forecasting
US20070239753A1 (en) * 2006-04-06 2007-10-11 Leonard Michael J Systems And Methods For Mining Transactional And Time Series Data
CN102129525A (en) * 2011-03-24 2011-07-20 华北电力大学 Method for searching and analyzing abnormality of signals during vibration and process of steam turbine set
CN103793601A (en) * 2014-01-20 2014-05-14 广东电网公司电力科学研究院 Turbine set online fault early warning method based on abnormality searching and combination forecasting

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104715027B (en) * 2015-03-04 2018-03-30 北京京东尚科信息技术有限公司 A kind of distributed data unusual fluctuation judges localization method and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030105660A1 (en) * 2001-02-20 2003-06-05 Walsh Kenneth Peter Method of relating multiple independent databases
US20030200134A1 (en) * 2002-03-29 2003-10-23 Leonard Michael James System and method for large-scale automatic forecasting
US20070239753A1 (en) * 2006-04-06 2007-10-11 Leonard Michael J Systems And Methods For Mining Transactional And Time Series Data
CN102129525A (en) * 2011-03-24 2011-07-20 华北电力大学 Method for searching and analyzing abnormality of signals during vibration and process of steam turbine set
CN103793601A (en) * 2014-01-20 2014-05-14 广东电网公司电力科学研究院 Turbine set online fault early warning method based on abnormality searching and combination forecasting

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
侯威 等: "一种确定极端事件阈值的新方法:随机重排去趋势波动分析方法", 《物理学报》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016138805A1 (en) * 2015-03-04 2016-09-09 北京京东尚科信息技术有限公司 Method and system for determining and locating distributed data transaction
CN108880845A (en) * 2017-05-16 2018-11-23 腾讯科技(深圳)有限公司 A kind of method and relevant apparatus of information alert
CN107908533A (en) * 2017-06-15 2018-04-13 平安科技(深圳)有限公司 A kind of monitoring method, device, computer-readable recording medium and the equipment of database performance index
WO2018228049A1 (en) * 2017-06-15 2018-12-20 平安科技(深圳)有限公司 Database performance index monitoring method, apparatus and device, and storage medium
CN109697203A (en) * 2017-10-23 2019-04-30 腾讯科技(深圳)有限公司 Index unusual fluctuation analysis method and equipment, computer storage medium, computer equipment
CN109697203B (en) * 2017-10-23 2023-03-24 腾讯科技(深圳)有限公司 Index transaction analysis method and device, computer storage medium, and computer device
CN111090644A (en) * 2019-12-26 2020-05-01 成都康赛信息技术有限公司 Data consistency evaluation method based on data distribution fluctuation rate

Also Published As

Publication number Publication date
HK1208927A1 (en) 2016-03-18
WO2016138805A1 (en) 2016-09-09
CN104715027B (en) 2018-03-30

Similar Documents

Publication Publication Date Title
CN104715027A (en) Distributed data transaction judging and positioning method and system
CN111967976B (en) Knowledge graph-based risk enterprise determination method and device
CN110874674B (en) Abnormality detection method, device and equipment
CN105610654A (en) Server, and policy online test method and system
CN108109066A (en) A kind of credit scoring model update method and system
Petreska et al. The Feldstein-Horioka puzzle and transition economies
CN108197254B (en) A kind of data recovery method based on neighbour
Pham et al. Foreign direct investment, exports and real exchange rate linkages in Vietnam: Evidence from a co-integration approach
Wang Economic off-line quality control strategy with two types of inspection errors
CN105933176A (en) Method and device for detecting states of host
CN106528313B (en) A kind of host variable method for detecting abnormality and system
WO2019019429A1 (en) Anomaly detection method, device and apparatus for virtual machine, and storage medium
Rindell Pricing of index options when interest rates are stochastic: an empirical test
CN109902731A (en) A kind of detection method and device of the performance fault based on support vector machines
US20230156043A1 (en) System and method of supporting decision-making for security management
Doraisami Financial crisis in Malaysia: did FDI flows contribute to vulnerability?
CN114912372A (en) High-precision filling pipeline fault early warning method based on artificial intelligence algorithm
CN106295925A (en) The automatic comprehensive review method and system of master data in the Internet finance wind control examination &amp; verification
CN114549193A (en) List screening method, apparatus, device, storage medium and program product
Minnoor et al. Nifty price prediction from Nifty SGX using machine learning, neural networks and sentiment analysis
CN111666171A (en) Fault identification method and device, electronic equipment and readable storage medium
Puah et al. Revisiting Stock Market Integration Pre-Post Subprime Mortgage Crisis: Insight From BRIC Countries$
Park et al. On study for change point regression problems using a difference-based regression model
CN109308570A (en) A kind of underground complex working condition recognition methods, apparatus and system
TWM568442U (en) Cash flow grouping system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 1208927

Country of ref document: HK

GR01 Patent grant
GR01 Patent grant
REG Reference to a national code

Ref country code: HK

Ref legal event code: GR

Ref document number: 1208927

Country of ref document: HK