CN105608329A - Organizational behavior anomaly detection method based on community evolution - Google Patents

Organizational behavior anomaly detection method based on community evolution Download PDF

Info

Publication number
CN105608329A
CN105608329A CN201610051992.XA CN201610051992A CN105608329A CN 105608329 A CN105608329 A CN 105608329A CN 201610051992 A CN201610051992 A CN 201610051992A CN 105608329 A CN105608329 A CN 105608329A
Authority
CN
China
Prior art keywords
corporations
subsequence
node
neighborhood
sigma
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610051992.XA
Other languages
Chinese (zh)
Inventor
程光权
韩养胜
黄金才
刘忠
谢福利
胡松超
马扬
李帅
修保新
冯旸赫
陈超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN201610051992.XA priority Critical patent/CN105608329A/en
Publication of CN105608329A publication Critical patent/CN105608329A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16ZINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS, NOT OTHERWISE PROVIDED FOR
    • G16Z99/00Subject matter not provided for in other main groups of this subclass

Abstract

The invention discloses an organizational behavior anomaly detection method based on community evolution. The organizational behavior anomaly detection method is characterized by comprising the steps of fuzzy community partition based on an EM (Expectation-Maximization) algorithm, community evolution analysis, anomaly subsequence detection and the like. By adopting the organizational behavior anomaly detection method, organizational changes can be described on a medium scale, the sensitivity to the statuses and roles of members in an organization, the changes of an interaction amount and interaction frequency, and an organization evolution direction is very high, possible loss of details due to investigation of the organizational dynamics from the whole organization is avoided; anomalies of different time scales can be obtained by adjusting the length of a subsequence and the number of neighborhood subsequences, the difference between the subsequence and a neighborhood thereof can be amplified by a consistent factor constructed through a reconstruction weight and a reconstruction error, and the resolution and robustness of anomaly detection are enhanced.

Description

A kind of organizational behavior method for detecting abnormality developing based on corporations
Technical field
The invention belongs to and organize dynamic analysis field, be specifically related to a kind of organizational behavior abnormality detection developing based on corporationsAlgorithm, is applicable to organizational behavior to analyze.
Background technology
Tissue refers to the group with the Social Individual formation being closely connected, and tissue is dynamic evolution, and its function relies onAssistance between organizational member and mutual. Taking social organization as example, along with fast development and the trend of globalization of information technologyFurther deep, social organization's internal connection is tightr, the dependence between tissue is strengthened day by day, is offering convenience and improved efficiencyTime, will produce cascading on a large scale once also make local variation produce. Such as the U.S. in economic field timeThe outburst of borrowing crisis has fed through to worldwide economy, the serious shadow of a chain of generation of all kinds of terrorist incidents in social safety fieldRing normal civil order etc. Therefore how dynamic according to grasped information accurate description microstructure Evolution, and send out rapidlyNow ANOMALOUS VARIATIONS wherein seems very important. Microstructure Evolution behavior depends on and is embodied in mutual the going up between organizational member, byThe interactive information that the organization network of this formation is comprising material between organizational member, information or energy, so can be by organization networkAs the carrier of research organization's behavior, the method for application network science is studied tissue, and this is also current to organizational behaviorThe conventional means of analyzing.
Organizational behavior abnormality detection can be divided into two processes, and the one, to organizing dynamic description, the 2nd, to organizing dynamic orderThe abnormality detection of row. The behavior of tissue dynamically can be entered by the time series of adjacent moment organization network similarity on the wholeLine description, these class methods adjacency matrix normally Network Based, the variation on node metric and limit. Mainly contain at present based on element weightFolded method, the method based on node sequencing, based on the similar method of vector, method based on sequence similarity with based on more than matrixThe measure of five kinds of network similarities such as the method for string.
Organize the conventional Shewhart control chart of abnormality detection of dynamic sequence to carry out. Shewhart control chart is by the U.S.First WAShewhart proposed in nineteen twenty-four. Since proposing, Shewhart control chart just becomes scientific management alwaysAn important tool, aspect quality management, become especially an indispensable management tool. It is that one has control circleLimit figure, be used for distinguish cause that the reason of quality fluctuation is accidental or system, can provide system reason exist letterBreath, thus judge that whether production process is in slave mode.
Make ytFor the time series variable value that current needs are monitored, utFor base-period value, according to Shewhart model, when |mt-nt|>cσtTime claim current data abnormal, wherein u t = 1 B Σ b - 1 B y t - b - g , σ t 2 = 1 B - 1 Σ b = 1 B ( y t - b - g - u t ) 2 . In Shewhart control chart model, all need according to data pair for previous time span B and the time interval g of calculation expectation valueThe feature of elephant is determined.
Due to the needs of organization internal function adjustment or the driving of external environmental factor, organizational member status, angle on microcosmicLook and interactive quantity and frequency of interaction can change, and in large scale more, organizational member can form new aggregation zone, therebyCause the variation of organizing community structure. Experiment finds that dynamic description of the tissue based on organization network similarity exists following shortcoming:1) definition of similarity itself is undirected, thus insensitive to the direction of microstructure Evolution, such as organizing corporations' division and mergingThe similarity curve that two rightabout evolutionary processes obtain may be identical. 2) microstructure Evolution process is normally progressive, quantitative change accumulates gradually as qualitative change, is evolved into the new stage, and method based on similarity cannot be differentiated this gradual change, and notThe details in microstructure Evolution stage can be described.
The shortcoming of Shewhart control chart is that the ability of the little skew of detecting is lower, and very sensitive to normal state state hypothesis, is subject toOutlier impact. And Shewhart control chart is the abnormality detection for a bit, when organizational behavior tends to continue one section extremelyBetween, when target data is in abnormal time section, Shewhart control chart can be given the judgement making mistake.
Summary of the invention
General thought of the present invention is:
For describing and organize dynamic shortcoming based on organization network similarity, definition quantitative target is described corporations and is developed, and entersAnd it is dynamic to portray tissue. The analysis that corporations are developed can be held and organize behavioral characteristics on middle sight yardstick, with respect to based on groupThe dynamic analysis of knitting overall similarity can provide more details.
Shortcoming for Shewhart control chart: 1) a kind of corporations' fair amount evaluation index based on F inspection is proposed,And be applied to fuzzy corporations and divide. The main purpose of this part work is that the each network in organization network sequence is carried outCorporations divide effectively accurately, for microstructure Evolution analysis is below laid a solid foundation. 2) proposed a kind of based on the role of corporations entropyCorporations' EVOLUTION ANALYSIS index. In tissue, the role's of corporations distribution and its function and behavior be closely related, and net is organized in utilization of the present inventionThe role of the Local Clustering coefficient description node of network node, and utilize the thought of comentropy to propose the concept of the role of corporations entropy,Corporations' role's entropy has reflected the heterogeneous situation that organizational member role distributes in corporations. 3) one has been proposed based on neighborhood uniformityThe abnormal subsequence method of inspection. Defining abnormal subsequence is subsequence larger with its sequence of neighborhoods deviation in time series,And this deviation can be portrayed by the uniformity of subsequence and its neighborhood. Utilize multiple linear regression model to describe neighborhoodThe process of subsequence reconstruct, and utilize regression coefficient (reconstruct weights) and reconstruct deviation to define consistent level of factor descriptor orderThe uniformity of row and its neighborhood, and provide both methods based on least-squares estimation and deviation optimization to calculate a reasonSon.
Concrete, a kind of organizational behavior method for detecting abnormality developing based on corporations, is characterized in that, comprises following stepRapid:
The fuzzy corporations of step 1 based on EM algorithm divide
Step 1.1 is extracted node diagnostic vector
The adjacency matrix of network is got the corresponding characteristic vector of a maximum p characteristic value and obtains the eigenmatrix A of n × kt,Get every a line of eigenmatrix as the attribute vector of corresponding node, each node has been mapped to p dimension space, n is netNetwork node number, the attribute vector of node m is
l m = ( a ~ m ( 1 ) , a ~ m ( 2 ) , ... , a ~ m ( p ) )
Step 1.2EM algorithm is divided corporations
For organizational member set v1,v2,…,vn,C1,C2,…,CkFor k fuzzy corporations, c1,c2,…,ckBe respectively societyThe C of group1,C2,…,CkCorporations center, W=[wij] (1≤i≤n, 1≤j≤k) is Matrix dividing, wherein
w i j = 1 d i s t ( v i , c j ) Σ t = 1 k 1 d i s t ( v i , c t )
K corporations of known division, the division of fuzzy corporations utilizes EM algorithm to realize, and step is as follows:
(1) initialize k corporations center, Matrix dividing;
(2) expect step E-step: calculate the degree of membership of each member for each corporations, obtain Matrix dividing W;
(3) maximize step M-step: the Matrix dividing obtaining according to upper step, adjust corporations center
(4) iteration carry out desired step and maximize step, sets iterative steps or corporations' centre convergence to expecting until reachScope or error sum of squares are less than setting threshold;
Step 1.3 corporations quantity is determined
If nodes set N={v1,v2,…,vn, the characteristic vector of node m isIf r is divided corporations' number, { C1,C2,…,CrBe corporations' set, niBe member's number of i corporations, the C of corporationsiRightThe node N answering is vi1,vi2,…,vini, corresponding nodal community vector is respectively li1,li2,…,lini
Note
T i = Σ j = 1 n i l i j , i = 1 , 2 , ... , r
Q 1 = Σ i = 1 r T i , Q 2 = Σ i = 1 r Σ j = 1 n i l i j T l i j
Wherein lijRepresent the attribute vector of j node in i corporations.
Note
S A = Σ i = 1 r T i T T i n i - Q 1 2 n
S e = Q 2 - Q 1 T Q 1 n - S A
Introduce F statistic
F = S A / ( r - 1 ) S e / ( n - r ) ~ H 0 F ( p ( r - 1 , n - r ) )
To given level of signifiance α and the quantity r of corporations, can look into F distribution table and obtain F1-α(p (r-1, n-r)), if F > F1-α(p (r-1, n-r)), according to having significant difference between the known corporations of statistical theory, illustrates that classification is more reasonable; For different societiesGroup's quantity, is meeting F > F1-αDuring all corporations of (p (r-1, n-r)) divide, get and make difference F-F1-αMaximum corporations' quantityAs the most rational corporations quantity, and then obtain best corporations' division.
Step 2 corporations EVOLUTION ANALYSIS
Step 2.1 organizational roles
Cluster coefficients has been described the limit density of neighbor domain of node, and in tissue, the interactive mode of different role often can be embodied in poly-In the difference of class coefficient, therefore the Local Clustering coefficient of node can reflect status and the angle of node in network to a certain extentAberration is different, and the cluster coefficients of nodes i is defined as follows
C ~ i = | E ( Γ i ) | k i 2
Wherein ΓiFor the neighborhood of node i, i.e. node i and all direct adjacent subgraphs forming thereof, E (Γi) expression ΓiMiddle limitQuantity,For ΓiIn the limit quantity of all nodes when interconnected;
Step 2.2 organizational roles entropy
Suppose common n member in organization network G, and in network, have t kind role { j1,j1,…,j1, analogy comentropyDefinition, definition organizational roles entropy
E h ( G ) = - Σ k = 1 t p k log 2 p k
Wherein pkRepresent role jkThe ratio that number of members accounts in tissue,
p k = | j k | n
The role of step 2.3 corporations entropy
Suppose that corporations divideNetwork is divided into m corporation, i.e. { C1,C2,…,Cm, each corporations still comprise differentRole. Regard each corporations as subgroup and knit, the definition role of corporations entropy
E m ( G ) = - Σ i = 1 m | C i | n × E h ( C i )
WhereinRepresent that i corporations are at whole shared proportion, the E of organizingm(G) be based on algorithm m, tissue to be carried outThe required expectation information content of identification member role after corporations divide;
Abnormal subsequence detects step 3
Step 3.1 is determined parameter
Given length is the time series of L:
X={x1,x2,…,xL}
L is seasonal effect in time series length, and the given sub-sequence length that will detect is l, taking l as length of window, and wherein l < < L;From x1Start to intercept subsequence, can obtain altogether the subsequence that n=L-l+1 length is l, seasonal effect in time series l subsequence XjRepresentAs follows:
Xj={xj,xj+1,…,xj+l-1}
For subsequence Xj, define its p (p is even number) neighborhood subsequence and be:
N b p N b p ( X j ) = { X 2 , ... , X p + 1 } , j = 1 { X 1 , ... , X j - 1 , X j + 1 , ... , X P + 1 } , 1 < j < 1 + p / 2 { X j - p / 2 , X j - p / 2 + 1 , ... , X j - 1 , X j + 1 , ... , X j + p / 2 } , 1 + p / 2 &le; j &le; n - p / 2 { X n - p - 1 , ... , X j - 1 , X j + 1 , ... , X n } , n - p / 2 < j < n { X n - p , ... , X n - 1 } , j = n
Wherein each element is former seasonal effect in time series l subsequence, and brief note is here
N b p ( X j ) = { X j ( 1 ) , X j ( 2 ) , , X j ( p ) }
L is sub-sequence length, and p is neighborhood number, and wherein l is related to the resolution ratio of abnormal subsequence, and p is related to extremelyThe scope of effect;
Step 3.2 is set up subsequence regression model
XjRegard the set of l observation of dependent variable as, Nbp (Xj) in l subsequence regard as and affect XjP factor,In order to weigh XjWith the consistent degree of its neighborhood, by Nbp (Xj) in element weighted sum, reconstruct subsequence XjAs follows:
X ^ j = &Sigma; i = 1 p w j ( i ) X j ( i )
ClaimFor XjNeighborhood reproducing sequence, wherein p neighborhood subsequence participates in the weights of reconstruct and is
W j = { w j ( 1 ) , w j ( 2 ) , ... , w j ( p ) }
This process useable linear model tormulation
X j ( i ) = w j ( 1 ) ( i ) X j ( 1 ) ( i ) + ... + w j ( p ) ( i ) X j ( p ) ( i ) + &epsiv; j ( i ) , i = 1 , 2 , ... , l
ε herejXjReconstruction value and the deviation of actual value. Note
X j = X j ( 1 ) X j ( 2 ) . . . X j ( l ) , N X j = X j ( 1 ) ( 1 ) X j ( 2 ) ( 1 ) ... X j ( p ) ( 1 ) X j ( 1 ) ( 2 ) X j ( 2 ) ( 2 ) ... X j ( p ) ( 2 ) . . . . . ... . . . . X j ( 1 ) ( l ) X j ( 2 ) ( l ) ... X j ( p ) ( l ) , W j = w j ( 1 ) w j ( 2 ) . . . w j ( p ) , &epsiv; j = &epsiv; j ( 1 ) &epsiv; j ( 2 ) . . . &epsiv; j ( l )
ClaimFor subsequence neighborhood regression model;
Step 3.3 is calculated the consistent factor
Being i regression coefficient in model, is also XjThe weight of i neighborhood subsequence to its linear reconstruction, andEach subsequence also corresponding its participate in p the weights of the p subsequence of its neighborhood of reconstruct, be designated as XjReconstruct weight vector F j = ( f j ( 1 ) , f j ( 2 ) , ... , f j ( p ) ) ;
Pass through || Fj|| and || εj|| structure is weighed subsequence and the conforming consistent factor of its neighborhood, definition subsequence XjThe consistent factor
ac j = | | F j | | | | &epsiv; j | |
The present invention adopts the method for optimizing reconstruct deviation to solve reconstruct weights, using the normalizing condition of weights as constraint;
If to XjReconstruction result beNaturally will be to XjReconstruct deviation be defined as vectorWith XjTwo norms,?
&epsiv; j = | | X j - X ^ ^ j | | 2
Wherein
X ^ ^ j = &Sigma; i = 1 p w j ( i ) X j ( i )
Optimization problem is defined as follows
min | | X j , X ^ ^ j | | 2
s t : &Sigma; i = 1 p w j ( i ) = 1
In above formula, Section 1 is the minimization of object function reconstruct deviation, and Section 2 is the normalized constraints of reconstruct weights, onState optimization and can obtain XjThe weights that are reconstructedAnd final reconstructed error
To each subsequence XiCarry out above least-squares estimation or optimizing process, obtain from all being reconstructed weightsXiParticipate in p weights (regression coefficient) F of reconstructi=(fi (1),fi (2),…,fi (p)), obtain consistent factor sequence
a c = { | | F 1 | | | | &epsiv; 1 | | , | | F 2 | | | | &epsiv; 2 | | , ... , | | F L - l + 1 | | | | &epsiv; L - l + 1 | | }
In the time of subsequence abnormality detection, by drawing the consistent factor curve of subsequence, the sub-order corresponding to low valley of curveClassify abnormal subsequence as.
Preferably, in described step 2.1, utilize minimax normalization method that all node clustering coefficient quantizations are arrived[0,5], will round the mark of rear numeral as each node role.
Preferably, determine that in described step 3.1 in parameter, it is the higher value that is less than l that p is set, l/p ∈ (1,2).
The beneficial effect that adopts the present invention to obtain is:
1, the corporations' evolution analysis method based on the role of corporations entropy can be described tissue variation from medium yardstick, to tissueThe change of member status, role and interactive quantity and frequency of interaction, and the direction of microstructure Evolution has very high sensitiveness, avoidsInvestigate and organize the details that dynamically may lose from organized whole.
2, the abnormal subsequence detection method based on the consistent factor can be by adjusting sub-sequence length and neighborhood subsequenceNumber, obtains the abnormal of different time yardstick; And can amplify son by reconstruct weights with the consistent factor of reconstructed error structureThe difference of sequence and its neighborhood, resolution ratio and the robustness of raising abnormality detection.
Brief description of the drawings
Fig. 1 is the inventive method flow chart;
Fig. 2 is the comparison of four kinds of group dividing method accuracys rate;
Fig. 3 is the relatively accurate rate of fuzzy clustering group dividing method;
Fig. 4 is the drosophila gene regulated and control network role of corporations entropy curve;
Fig. 5 drosophila gene regulated and control network similarity curve;
Fig. 6 drosophila gene regulated and control network SeqS similarity
The consistent factor variations curve of Fig. 7
Detailed description of the invention
Below, the invention will be further described with specific embodiment by reference to the accompanying drawings. The present invention by emulated data,Public data collection is tested, and application is convenient, satisfactory for result. Consistent with the expection of design.
Experimental data:
GN baseline network model is l group by a network n node division, every group of g node. The connection probability of group interior nodes ispin, between group, connecting probability is pout, the subgraph in each group is p=pinER random network. The average degree of node is<k>=pin(g-1)+poutG (l-1). If pin>pout, organize inner edge density and be greater than limit density between group, network has community structure.Conventionally set l=4, g=32, node average degree<k>=16, now pin+pout≈ 1/2. In calculating, usually use zin=pin(g-1)=31pin,zout=poutAverage nodal degree in g (l-1) expression group and between group. See intuitively zoutLess, networkCommunity structure is more obvious, is also more correctly divided, and in fact most of corporations partitioning algorithm is at zoutReach at 8 o'clock, accuracy rateStart obviously to decline.
Drosophila gene regulated and control network data set has been chosen across fruit bat 66 time points, the i.e. embryos of whole growth cycle(time 1~30), larva (time 31~40), pupa (time 41~58) and adult period (time 59~66). Based on gene originallyBody opinion, notebook data is concentrated the interactive relation having comprised between 588 and the closely-related gene of fruit bat growth course and gene.
1. corporations divide experiment
Experimental data is that corporations divide GN baseline network. Method is fuzzy clustering group dividing method, and classical societyThe division methods GN of group algorithm, FN algorithm and SpectralClust algorithm. Wherein fuzzy clustering group dividing method andCorporations' division numbers of SpectralClust algorithm is determined by the method for checking based on F, and corporations' number of GN algorithm and FN algorithmAmount obtains by optimizing modularity.
For GN baseline network, it is 4 that corporations' quantity is set, to zoutBe incremented to 8 network by 1 and use respectively above four kinds of calculationsMethod is carried out corporations' division, calculates the accuracy rate of each algorithm with regular mutual information measure, and each experiment repeats 5 times, is averagedAccuracy rate.
Fig. 2 is that four kinds of algorithms are at different zoutUnder corporations divide accuracy rate, the accuracy rate that can see four kinds of algorithms all withZoutIncrease and downward trend, and the decline of fuzzy clustering group dividing method comparatively relaxes, and at zoutWhen largerShow higher accuracy rate. Shown in Fig. 3 be fuzzy clustering group dividing method with respect to other three kinds of algorithms accuratelyRate, can obviously find out that the fuzzy clustering group dividing method that utilizes the F method of inspection to determine corporations' quantity is at higher zoutUnder excellentGesture.
2. corporations' EVOLUTION ANALYSIS experiment
Data are drosophila gene regulated and control network data. Fig. 4 is the role of corporations entropy curve, embryo mid-term (t=10) and childrenThere are two obvious peak values in worm period (t=30~40). This result can reasonably explain in conjunction with Biological Knowledge,In the starting stage of growing, the functional role of gene is more local and single-minded, intergenicly tends to alternately occur in role's phaseLike between gene, what cause that between corporations, role distributes differs greatly, therefore have the larger role of corporations entropy. And arrived in embryoAfter stage phase, adapt with Rapid development, gene function is more general, and gene role's heterogeneity reduces, therefore the role of corporationsEntropy starts to reduce, and reaches minimum in the pupa stage. When entering adult period, growth slows down, and gene has become again single-minded role,The role of corporations entropy raises again to some extent. The visible role of corporations entropy can effectively be described the details of microstructure Evolution.
From the angle of network similitude, the evolution of drosophila gene regulated and control network is dynamically described below, result is as Fig. 5Shown in. Gradual due to network evolution, the similarity of the gene regulatory network of adjacent time point of most of the time changes notCan not embody the variation details of fruit bat growth course greatly.
3. abnormal subsequence test experience
In this experiment, this experiment has been chosen to test in 2 and has dynamically been described comparatively fuzzy to drosophila gene regulated and control networkSeqS similarity sequence is as object, as shown in Figure 6. Arranging of parameter considered that fruit bat stage of development minimum length is 10 prioriInformation, is significantly less than 10 therefore l should be set, and considers l/p ∈ (1,2) simultaneously, finally selects l=6, p=4.
Obtain consistent factor variations curve as shown in Figure 7, with the role of the corporations entropy comparative analysis of Fig. 4. Above by analysisThe rational biology implication of the role of corporations entropy curve representation, and consistent factor curve is also fine to fruit bat growth courseDescribe, be embodied in the following aspects.
1) when the time that the valley of the consistent factor occurs, the role of corporations Entropy change trend occurred obviously to change exactly,For example t=10, the place of the appearance step of t=20~30 and the t=50 left and right role of corporations entropy, the corresponding consistent factor all occursLow ebb. Inconsistent behavior in evolutionary process can effectively be indicated in this position of also having verified that the valley of consistent factor curve occurs,This inconsistent behavior be we to detect abnormal.
2) four obvious peak Distribution that consistent factor curve occurs appear at fruit bat and grow in four-stage, a reasonThe subsequence uniformity of the appearance explanation place time period of sub-peak of curve is high, and the interior gene regulation behavior of each stage of development alsoShould have stronger uniformity, both are very identical.
3) the consistent factor is to obtain by the comparatively fuzzy SeqS similarity of meaning, but result can with problem backgroundCoincide well, identify time point interesting in gene regulatory network evolutionary process, illustrate that the consistent factor has refinement, prominent reallyGo out the ability that organizational behavior changes, this ability that abnormality detection will possess just.
Be more than that the present invention is exemplarily described, obvious realization of the present invention is not subject to the restrictions described above,As long as the various improvement that adopted technical solution of the present invention to carry out, or without improving direct to design of the present invention and technical schemeApply other occasion, all in protection scope of the present invention.

Claims (3)

1. the organizational behavior method for detecting abnormality developing based on corporations, is characterized in that, comprises the following steps:
The fuzzy corporations of step 1 based on EM algorithm divide
Step 1.1 is extracted node diagnostic vector
The adjacency matrix of network is got the corresponding characteristic vector of a maximum p characteristic value and obtains the eigenmatrix A of n × kt, get featureEvery a line of matrix, as the attribute vector of corresponding node, has been mapped to p dimension space by each node, and n is network nodeNumber, the attribute vector of node m is
l m = ( a ~ m ( 1 ) , a ~ m ( 2 ) , ... , a ~ m ( p ) )
Step 1.2EM algorithm is divided corporations
For organizational member set v1,v2,…,vn,C1,C2,…,CkFor k fuzzy corporations, c1,c2,…,ckBe respectively the C of corporations1,C2,…,CkCorporations center, W=[wij] (1≤i≤n, 1≤j≤k) is Matrix dividing, wherein
w i j = 1 d i s t ( v i , c j ) &Sigma; t = 1 k 1 d i s t ( v i , c t )
K corporations of known division, the division of fuzzy corporations utilizes EM algorithm to realize, and step is as follows:
(1) initialize k corporations center, Matrix dividing;
(2) expect step E-step: calculate the degree of membership of each member for each corporations, obtain Matrix dividing W;
(3) maximize step M-step: the Matrix dividing obtaining according to upper step, adjust corporations center
(4) iteration carry out desired step and maximize step, sets iterative steps or corporations' centre convergence to expected range until reachOr error sum of squares is less than setting threshold;
Step 1.3 corporations quantity is determined
If nodes set N={v1,v2,…,vn, the characteristic vector of node m isIf r isCorporations' number of dividing, { C1,C2,…,CrBe corporations' set, niBe member's number of i corporations, the C of corporationsiCorresponding jointPoint N isCorresponding nodal community vector is respectively
Note
T i = &Sigma; j = 1 n i l i j , i = 1 , 2 , ... , r
Q 1 = &Sigma; i = 1 r T i , Q 2 = &Sigma; i = 1 r &Sigma; j = 1 n i l i j T l i j
Wherein lijRepresent the attribute vector of j node in i corporations,
S A = &Sigma; i = 1 r T i T T i n i - Q 1 2 n
S e = Q 2 - Q 1 T Q 1 n - S A
Introduce F statistic
F = S A / ( r - 1 ) S e / ( n - r ) ~ H 0 F ( p ( r - 1 , n - r ) )
To given level of signifiance α and the quantity r of corporations, can look into F distribution table and obtain F1-α(p (r-1, n-r)), if F > F1-α(p(r-1, n-r)), according to thering is significant difference between the known corporations of statistical theory, illustrate that classification is more reasonable; For different corporationsQuantity, is meeting F > F1-αDuring all corporations of (p (r-1, n-r)) divide, get and make difference F-F1-αMaximum corporations' quantity is doneFor the most rational corporations quantity, and then obtain best corporations' division;
Step 2 corporations EVOLUTION ANALYSIS
Step 2.1 organizational roles
Cluster coefficients has been described the limit density of neighbor domain of node, and in tissue, the interactive mode of different role often can be embodied in cluster systemIn the difference of number, therefore the Local Clustering coefficient of node can reflect status and the goniochromatism of node in network to a certain extentDifferent, the cluster coefficients of nodes i is defined as follows
C ~ i = | E ( &Gamma; i ) | k i 2
Wherein ΓiFor the neighborhood of node i, i.e. node i and all direct adjacent subgraphs forming thereof, E (Γi) expression ΓiThe number on middle limitAmount, k i 2 = 1 2 k i ( k i - 1 ) For ΓiIn the limit quantity of all nodes when interconnected;
Step 2.2 organizational roles entropy
Suppose common n member in organization network G, and in network, have t kind role { j1,j1,…,j1, the determining of analogy comentropyJustice, definition organizational roles entropy
E h ( G ) = - &Sigma; k = 1 t p k log 2 p k
Wherein pkRepresent role jkThe ratio that number of members accounts in tissue,
p k = | j k | n
The role of step 2.3 corporations entropy
Suppose that corporations divide l network is divided into m corporation, i.e. { C1,C2,…,Cm, each corporations still comprise different roles.Regard each corporations as subgroup and knit, the definition role of corporations entropy
E m ( G ) = - &Sigma; i = 1 m | C i | n &times; E h ( C i )
WhereinRepresent that i corporations are at whole shared proportion, the E of organizingm(G) be based on algorithm m, tissue to be carried out to corporations to drawThe required expectation information content of identification member role after point;
Abnormal subsequence detects step 3
Step 3.1 is determined parameter
Given length is the time series of L:
X={x1,x2,…,xL}
L is seasonal effect in time series length, and the given sub-sequence length that will detect is l, taking l as length of window, and wherein l < < L; From x1Start to intercept subsequence, can obtain altogether the subsequence that n=L-l+1 length is l, seasonal effect in time series l subsequence XjRepresent asUnder:
Xj={xj,xj+1,…,xj+l-1}
For subsequence Xj, define its p (p is even number) neighborhood subsequence and be:
N b p N b p ( X j ) = { X 2 , ... , X p + 1 } , j = 1 { X 1 , ... , X j - 1 , X j + 1 , ... , X P + 1 } , 1 < j < 1 + p / 2 { X j - p / 2 , X j - p / 2 + 1 , ... , X j - 1 , X j + 1 , ... , X j + p / 2 } , 1 + p / 2 &le; j &le; n - p / 2 { X n - p - 1 , ... , X j - 1 , X j + 1 , ... , X n } , n - p / 2 < j < n { X n - p , ... , X n - 1 } , j = n
Wherein each element is former seasonal effect in time series l subsequence, and brief note is here
N b p ( X j ) = { X j ( 1 ) , X j ( 2 ) , , X j ( p ) }
L is sub-sequence length, and p is neighborhood number, and wherein l is related to the resolution ratio of abnormal subsequence, and p is related to Effects of AnomalousScope;
Step 3.2 is set up subsequence regression model
XjRegard the set of l observation of dependent variable as, Nbp (Xj) in l subsequence regard as and affect XjP factor, forWeigh XjWith the consistent degree of its neighborhood, by Nbp (Xj) in element weighted sum, reconstruct subsequence XjAs follows:
X ^ j = &Sigma; i = 1 p w j ( i ) X j ( i )
ClaimFor XjNeighborhood reproducing sequence, wherein p neighborhood subsequence participates in the weights of reconstruct and is
W j = { w j ( 1 ) , w j ( 2 ) , ... , w j ( p ) }
This process useable linear model tormulation
X j ( i ) = w j ( 1 ) ( i ) X j ( 1 ) ( i ) + ... + w j ( p ) ( i ) X j ( p ) ( i ) + &epsiv; j ( i ) , i = 1 , 2 , ... , l
ε herejXjReconstruction value and the deviation of actual value, note
X j = X j ( 1 ) X j ( 2 ) . . . X j ( l ) , N X j = X j ( 1 ) ( 1 ) X j ( 2 ) ( 1 ) ... X j ( p ) ( 1 ) X j ( 1 ) ( 2 ) X j ( 2 ) ( 2 ) ... X j ( p ) ( 2 ) . . . . . . ... . . . X j ( 1 ) ( l ) X j ( 2 ) ( l ) ... X j ( p ) ( l ) , W j = w j ( 1 ) w j ( 2 ) . . . w j ( p ) , &epsiv; j = &epsiv; j ( 1 ) &epsiv; j ( 2 ) . . . &epsiv; j ( l )
ClaimFor subsequence neighborhood regression model;
Step 3.3 is calculated the consistent factor
Being i regression coefficient in model, is also XjThe weight of i neighborhood subsequence to its linear reconstruction, and eachSubsequence also corresponding its participate in p the weights of the p subsequence of its neighborhood of reconstruct, be designated as XjReconstruct weight vector F j = ( f j ( 1 ) , f j ( 2 ) , ... , f j ( p ) ) ;
Pass through || Fj|| and || εj|| structure is weighed subsequence and the conforming consistent factor of its neighborhood, definition subsequence XjOneReason
ac j = | | F j | | | | &epsiv; j | |
The present invention adopts the method for optimizing reconstruct deviation to solve reconstruct weights, using the normalizing condition of weights as constraint;
If to XjReconstruction result beNaturally will be to XjReconstruct deviation be defined as vectorWith XjTwo norms,
&epsiv; j = | | X j - X ^ ^ j | | 2
Wherein
X ^ ^ j = &Sigma; i = 1 p w j ( i ) X j ( i )
Optimization problem is defined as follows
min | | X j , X ^ ^ j | | 2
s t : &Sigma; i = 1 p w j ( i ) = 1
In above formula, Section 1 is the minimization of object function reconstruct deviation, and Section 2 is the normalized constraints of reconstruct weights, above-mentioned excellentChange can obtain XjThe weights that are reconstructedAnd final reconstructed error
To each subsequence XiCarry out above least-squares estimation or optimizing process, weights, obtain X from all being reconstructediGinsengP weights, i.e. regression coefficient with reconstructObtain consistent factor sequence
a c = { | | F 1 | | | | &epsiv; 1 | | , | | F 2 | | | | &epsiv; 2 | | , ... , | | F L - l + 1 | | | | &epsiv; L - l + 1 | | }
In the time of subsequence abnormality detection, by drawing the consistent factor curve of subsequence, the subsequence corresponding to low valley of curve isAbnormal subsequence.
2. a kind of organizational behavior method for detecting abnormality developing based on corporations according to claim 1, is characterized in that:In described step 2.1, utilize minimax normalization method that all node clustering coefficient quantizations are arrived to [0,5], will round rear numeralAs each node role's mark.
3. a kind of organizational behavior method for detecting abnormality developing based on corporations according to claim 1, is characterized in that:Described step 3.1 determines that in parameter, it is the higher value that is less than l that p is set, l/p ∈ (1,2).
CN201610051992.XA 2016-01-26 2016-01-26 Organizational behavior anomaly detection method based on community evolution Pending CN105608329A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610051992.XA CN105608329A (en) 2016-01-26 2016-01-26 Organizational behavior anomaly detection method based on community evolution

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610051992.XA CN105608329A (en) 2016-01-26 2016-01-26 Organizational behavior anomaly detection method based on community evolution

Publications (1)

Publication Number Publication Date
CN105608329A true CN105608329A (en) 2016-05-25

Family

ID=55988258

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610051992.XA Pending CN105608329A (en) 2016-01-26 2016-01-26 Organizational behavior anomaly detection method based on community evolution

Country Status (1)

Country Link
CN (1) CN105608329A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106127231A (en) * 2016-06-16 2016-11-16 中国人民解放军国防科学技术大学 A kind of crime individual discrimination method based on the information Internet
CN106327340A (en) * 2016-08-04 2017-01-11 中国银联股份有限公司 Method and device for detecting abnormal node set in financial network
CN106533742A (en) * 2016-10-31 2017-03-22 天津大学 Time sequence mode representation-based weighted directed complicated network construction method
CN106780263A (en) * 2017-01-13 2017-05-31 中电科新型智慧城市研究院有限公司 High-risk personnel analysis and recognition methods based on big data platform
CN106792523A (en) * 2016-12-10 2017-05-31 武汉白虹软件科技有限公司 A kind of anomaly detection method based on extensive WiFi event traces
CN107133782A (en) * 2017-05-03 2017-09-05 扬州大学 Electronic mail network evolution method based on user profile feature
WO2019042060A1 (en) * 2017-08-30 2019-03-07 腾讯科技(深圳)有限公司 Method and apparatus for determining member role, and storage medium
CN110309134A (en) * 2019-05-31 2019-10-08 国网上海市电力公司 The power distribution network multiplexing electric abnormality detection method to be developed based on electricity consumption transfer of behavior and community
CN115841654A (en) * 2023-02-20 2023-03-24 松立控股集团股份有限公司 Abnormal event detection method based on high-order monitoring video

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060080422A1 (en) * 2004-06-02 2006-04-13 Bernardo Huberman System and method for discovering communities in networks
CN102682050A (en) * 2011-11-14 2012-09-19 吉林大学 Multiple structure mode characterization and discovery method for complex network
CN103136337A (en) * 2013-02-01 2013-06-05 北京邮电大学 Distributed knowledge data mining device and mining method used for complex network

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060080422A1 (en) * 2004-06-02 2006-04-13 Bernardo Huberman System and method for discovering communities in networks
CN102682050A (en) * 2011-11-14 2012-09-19 吉林大学 Multiple structure mode characterization and discovery method for complex network
CN103136337A (en) * 2013-02-01 2013-06-05 北京邮电大学 Distributed knowledge data mining device and mining method used for complex network

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106127231A (en) * 2016-06-16 2016-11-16 中国人民解放军国防科学技术大学 A kind of crime individual discrimination method based on the information Internet
CN106327340A (en) * 2016-08-04 2017-01-11 中国银联股份有限公司 Method and device for detecting abnormal node set in financial network
CN106327340B (en) * 2016-08-04 2022-01-07 中国银联股份有限公司 Abnormal node set detection method and device for financial network
CN106533742A (en) * 2016-10-31 2017-03-22 天津大学 Time sequence mode representation-based weighted directed complicated network construction method
CN106533742B (en) * 2016-10-31 2019-05-14 天津大学 Weighting directed complex networks networking method based on time sequence model characterization
CN106792523A (en) * 2016-12-10 2017-05-31 武汉白虹软件科技有限公司 A kind of anomaly detection method based on extensive WiFi event traces
CN106780263A (en) * 2017-01-13 2017-05-31 中电科新型智慧城市研究院有限公司 High-risk personnel analysis and recognition methods based on big data platform
CN106780263B (en) * 2017-01-13 2020-10-02 中电科新型智慧城市研究院有限公司 High-risk personnel analysis and identification method based on big data platform
CN107133782B (en) * 2017-05-03 2020-06-09 扬州大学 E-mail network evolution method based on user information characteristics
CN107133782A (en) * 2017-05-03 2017-09-05 扬州大学 Electronic mail network evolution method based on user profile feature
WO2019042060A1 (en) * 2017-08-30 2019-03-07 腾讯科技(深圳)有限公司 Method and apparatus for determining member role, and storage medium
CN110309134A (en) * 2019-05-31 2019-10-08 国网上海市电力公司 The power distribution network multiplexing electric abnormality detection method to be developed based on electricity consumption transfer of behavior and community
CN115841654A (en) * 2023-02-20 2023-03-24 松立控股集团股份有限公司 Abnormal event detection method based on high-order monitoring video

Similar Documents

Publication Publication Date Title
CN105608329A (en) Organizational behavior anomaly detection method based on community evolution
US11436395B2 (en) Method for prediction of key performance parameter of an aero-engine transition state acceleration process based on space reconstruction
CN106845717B (en) Energy efficiency evaluation method based on multi-model fusion strategy
CN106056136A (en) Data clustering method for rapidly determining clustering center
CN104537010A (en) Component classifying method based on net establishing software of decision tree
CN110765703B (en) Wind power plant aggregation characteristic modeling method
Ma et al. Decomposition-based multiobjective evolutionary algorithm for community detection in dynamic social networks
Zhang et al. DELR: A double-level ensemble learning method for unsupervised anomaly detection
CN112330050A (en) Power system load prediction method considering multiple features based on double-layer XGboost
CN101976307A (en) Printing and dyeing process sewage monitoring index time constraint associated rule mining algorithm
CN104517020A (en) Characteristic extraction method and device used for cause and effect analysis
CN104376078A (en) Abnormal data detection method based on knowledge entropy
CN105930531A (en) Method for optimizing cloud dimensions of agricultural domain ontological knowledge on basis of hybrid models
CN107301328A (en) Cancer subtypes based on data stream clustering are precisely found and evolution analysis method
CN114091776A (en) K-means-based multi-branch AGCNN short-term power load prediction method
CN112800115A (en) Data processing method and data processing device
Xu et al. An improved LOF outlier detection algorithm
Prakash et al. Mining frequent itemsets from large data sets using genetic algorithms
Efendiyev et al. Estimation of lost circulation rate using fuzzy clustering of geological objects by petrophysical properties
Bo Research on the classification of high dimensional imbalanced data based on the optimizational random forest algorithm
Qin Software reliability prediction model based on PSO and SVM
Louhi et al. Incremental nearest neighborhood graph for data stream clustering
CN115512844A (en) Metabolic syndrome risk prediction method based on SMOTE technology and random forest algorithm
Hou A new clustering validity index based on K-means algorithm
Luo Progress indication for machine learning model building: A feasibility demonstration

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20160525

WD01 Invention patent application deemed withdrawn after publication