CN105608329A

CN105608329A - Organizational behavior anomaly detection method based on community evolution

Info

Publication number: CN105608329A
Application number: CN201610051992.XA
Authority: CN
Inventors: 程光权; 韩养胜; 黄金才; 刘忠; 谢福利; 胡松超; 马扬; 李帅; 修保新; 冯旸赫; 陈超
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2016-01-26
Filing date: 2016-01-26
Publication date: 2016-05-25

Abstract

The invention discloses an organizational behavior anomaly detection method based on community evolution. The organizational behavior anomaly detection method is characterized by comprising the steps of fuzzy community partition based on an EM (Expectation-Maximization) algorithm, community evolution analysis, anomaly subsequence detection and the like. By adopting the organizational behavior anomaly detection method, organizational changes can be described on a medium scale, the sensitivity to the statuses and roles of members in an organization, the changes of an interaction amount and interaction frequency, and an organization evolution direction is very high, possible loss of details due to investigation of the organizational dynamics from the whole organization is avoided; anomalies of different time scales can be obtained by adjusting the length of a subsequence and the number of neighborhood subsequences, the difference between the subsequence and a neighborhood thereof can be amplified by a consistent factor constructed through a reconstruction weight and a reconstruction error, and the resolution and robustness of anomaly detection are enhanced.

Description

A kind of organizational behavior method for detecting abnormality developing based on corporations

Technical field

The invention belongs to and organize dynamic analysis field, be specifically related to a kind of organizational behavior abnormality detection developing based on corporationsAlgorithm, is applicable to organizational behavior to analyze.

Background technology

Tissue refers to the group with the Social Individual formation being closely connected, and tissue is dynamic evolution, and its function relies onAssistance between organizational member and mutual. Taking social organization as example, along with fast development and the trend of globalization of information technologyFurther deep, social organization's internal connection is tightr, the dependence between tissue is strengthened day by day, is offering convenience and improved efficiencyTime, will produce cascading on a large scale once also make local variation produce. Such as the U.S. in economic field timeThe outburst of borrowing crisis has fed through to worldwide economy, the serious shadow of a chain of generation of all kinds of terrorist incidents in social safety fieldRing normal civil order etc. Therefore how dynamic according to grasped information accurate description microstructure Evolution, and send out rapidlyNow ANOMALOUS VARIATIONS wherein seems very important. Microstructure Evolution behavior depends on and is embodied in mutual the going up between organizational member, byThe interactive information that the organization network of this formation is comprising material between organizational member, information or energy, so can be by organization networkAs the carrier of research organization's behavior, the method for application network science is studied tissue, and this is also current to organizational behaviorThe conventional means of analyzing.

Organizational behavior abnormality detection can be divided into two processes, and the one, to organizing dynamic description, the 2nd, to organizing dynamic orderThe abnormality detection of row. The behavior of tissue dynamically can be entered by the time series of adjacent moment organization network similarity on the wholeLine description, these class methods adjacency matrix normally Network Based, the variation on node metric and limit. Mainly contain at present based on element weightFolded method, the method based on node sequencing, based on the similar method of vector, method based on sequence similarity with based on more than matrixThe measure of five kinds of network similarities such as the method for string.

Organize the conventional Shewhart control chart of abnormality detection of dynamic sequence to carry out. Shewhart control chart is by the U.S.First WAShewhart proposed in nineteen twenty-four. Since proposing, Shewhart control chart just becomes scientific management alwaysAn important tool, aspect quality management, become especially an indispensable management tool. It is that one has control circleLimit figure, be used for distinguish cause that the reason of quality fluctuation is accidental or system, can provide system reason exist letterBreath, thus judge that whether production process is in slave mode.

Make y_tFor the time series variable value that current needs are monitored, u_tFor base-period value, according to Shewhart model, when |m_t-n_t|>cσ_tTime claim current data abnormal, wherein

u_{t} = \frac{1}{B} Σ_{b - 1}^{B} y_{t - b - g}, σ_{t}^{2} = \frac{1}{B - 1} Σ_{b = 1}^{B} {(y_{t - b - g} - u_{t})}^{2} .

In Shewhart control chart model, all need according to data pair for previous time span B and the time interval g of calculation expectation valueThe feature of elephant is determined.

Due to the needs of organization internal function adjustment or the driving of external environmental factor, organizational member status, angle on microcosmicLook and interactive quantity and frequency of interaction can change, and in large scale more, organizational member can form new aggregation zone, therebyCause the variation of organizing community structure. Experiment finds that dynamic description of the tissue based on organization network similarity exists following shortcoming:1) definition of similarity itself is undirected, thus insensitive to the direction of microstructure Evolution, such as organizing corporations' division and mergingThe similarity curve that two rightabout evolutionary processes obtain may be identical. 2) microstructure Evolution process is normally progressive, quantitative change accumulates gradually as qualitative change, is evolved into the new stage, and method based on similarity cannot be differentiated this gradual change, and notThe details in microstructure Evolution stage can be described.

The shortcoming of Shewhart control chart is that the ability of the little skew of detecting is lower, and very sensitive to normal state state hypothesis, is subject toOutlier impact. And Shewhart control chart is the abnormality detection for a bit, when organizational behavior tends to continue one section extremelyBetween, when target data is in abnormal time section, Shewhart control chart can be given the judgement making mistake.

Summary of the invention

General thought of the present invention is:

For describing and organize dynamic shortcoming based on organization network similarity, definition quantitative target is described corporations and is developed, and entersAnd it is dynamic to portray tissue. The analysis that corporations are developed can be held and organize behavioral characteristics on middle sight yardstick, with respect to based on groupThe dynamic analysis of knitting overall similarity can provide more details.

Shortcoming for Shewhart control chart: 1) a kind of corporations' fair amount evaluation index based on F inspection is proposed,And be applied to fuzzy corporations and divide. The main purpose of this part work is that the each network in organization network sequence is carried outCorporations divide effectively accurately, for microstructure Evolution analysis is below laid a solid foundation. 2) proposed a kind of based on the role of corporations entropyCorporations' EVOLUTION ANALYSIS index. In tissue, the role's of corporations distribution and its function and behavior be closely related, and net is organized in utilization of the present inventionThe role of the Local Clustering coefficient description node of network node, and utilize the thought of comentropy to propose the concept of the role of corporations entropy,Corporations' role's entropy has reflected the heterogeneous situation that organizational member role distributes in corporations. 3) one has been proposed based on neighborhood uniformityThe abnormal subsequence method of inspection. Defining abnormal subsequence is subsequence larger with its sequence of neighborhoods deviation in time series,And this deviation can be portrayed by the uniformity of subsequence and its neighborhood. Utilize multiple linear regression model to describe neighborhoodThe process of subsequence reconstruct, and utilize regression coefficient (reconstruct weights) and reconstruct deviation to define consistent level of factor descriptor orderThe uniformity of row and its neighborhood, and provide both methods based on least-squares estimation and deviation optimization to calculate a reasonSon.

Concrete, a kind of organizational behavior method for detecting abnormality developing based on corporations, is characterized in that, comprises following stepRapid:

The fuzzy corporations of step 1 based on EM algorithm divide

Step 1.1 is extracted node diagnostic vector

The adjacency matrix of network is got the corresponding characteristic vector of a maximum p characteristic value and obtains the eigenmatrix A of n × k_t，Get every a line of eigenmatrix as the attribute vector of corresponding node, each node has been mapped to p dimension space, n is netNetwork node number, the attribute vector of node m is

l_{m} = ({\tilde{a}}_{m}^{(1)}, {\tilde{a}}_{m}^{(2)}, ..., {\tilde{a}}_{m}^{(p)})

Step 1.2EM algorithm is divided corporations

For organizational member set v₁,v₂,…,v_n，C₁,C₂,…,C_kFor k fuzzy corporations, c₁,c₂,…,c_kBe respectively societyThe C of group₁,C₂,…,C_kCorporations center, W=[w_ij] (1≤i≤n, 1≤j≤k) is Matrix dividing, wherein

w_{i j} = \frac{\frac{1}{d i s t (v_{i}, c_{j})}}{Σ_{t = 1}^{k} \frac{1}{d i s t (v_{i}, c_{t})}}

K corporations of known division, the division of fuzzy corporations utilizes EM algorithm to realize, and step is as follows:

(1) initialize k corporations center, Matrix dividing;

(2) expect step E-step: calculate the degree of membership of each member for each corporations, obtain Matrix dividing W;

(3) maximize step M-step: the Matrix dividing obtaining according to upper step, adjust corporations center

(4) iteration carry out desired step and maximize step, sets iterative steps or corporations' centre convergence to expecting until reachScope or error sum of squares are less than setting threshold;

Step 1.3 corporations quantity is determined

If nodes set N={v₁,v₂,…,v_n, the characteristic vector of node m isIf r is divided corporations' number, { C₁,C₂,…,C_rBe corporations' set, n_iBe member's number of i corporations, the C of corporations_iRightThe node N answering is v_i1,v_i2,…,v_ini, corresponding nodal community vector is respectively l_i1,l_i2,…,l_ini。

Note

T_{i} = Σ_{j = 1}^{n_{i}} l_{i j}, i = 1, 2, ..., r

Q_{1} = Σ_{i = 1}^{r} T_{i}, Q_{2} = Σ_{i = 1}^{r} Σ_{j = 1}^{n_{i}} l_{i j}^{T} l_{i j}

Wherein l_ijRepresent the attribute vector of j node in i corporations.

Note

S_{A} = Σ_{i = 1}^{r} \frac{T_{i}^{T} T_{i}}{n_{i}} - \frac{Q_{1}^{2}}{n}

S_{e} = Q_{2} - \frac{Q_{1}^{T} Q_{1}}{n} - S_{A}

Introduce F statistic

F = \frac{S_{A} / (r - 1)}{S_{e} / (n - r)} \overset{H 0}{~} F (p (r - 1, n - r))

To given level of signifiance α and the quantity r of corporations, can look into F distribution table and obtain F_1-α(p (r-1, n-r)), if F > F_1-α(p (r-1, n-r)), according to having significant difference between the known corporations of statistical theory, illustrates that classification is more reasonable; For different societiesGroup's quantity, is meeting F > F_1-αDuring all corporations of (p (r-1, n-r)) divide, get and make difference F-F_1-αMaximum corporations' quantityAs the most rational corporations quantity, and then obtain best corporations' division.

Step 2 corporations EVOLUTION ANALYSIS

Step 2.1 organizational roles

Cluster coefficients has been described the limit density of neighbor domain of node, and in tissue, the interactive mode of different role often can be embodied in poly-In the difference of class coefficient, therefore the Local Clustering coefficient of node can reflect status and the angle of node in network to a certain extentAberration is different, and the cluster coefficients of nodes i is defined as follows

{\tilde{C}}_{i} = \frac{| E (Γ_{i}) |}{(\begin{matrix} k_{i} \\ 2 \end{matrix})}

Wherein Γ_iFor the neighborhood of node i, i.e. node i and all direct adjacent subgraphs forming thereof, E (Γ_i) expression Γ_iMiddle limitQuantity,For Γ_iIn the limit quantity of all nodes when interconnected;

Step 2.2 organizational roles entropy

Suppose common n member in organization network G, and in network, have t kind role { j₁,j₁,…,j₁, analogy comentropyDefinition, definition organizational roles entropy

E_{h} (G) = - Σ_{k = 1}^{t} p_{k} \log_{2} p_{k}

Wherein p_kRepresent role j_kThe ratio that number of members accounts in tissue,

p_{k} = \frac{| j_{k} |}{n}

The role of step 2.3 corporations entropy

Suppose that corporations divideNetwork is divided into m corporation, i.e. { C₁,C₂,…,C_m, each corporations still comprise differentRole. Regard each corporations as subgroup and knit, the definition role of corporations entropy

E_{m} (G) = - Σ_{i = 1}^{m} \frac{| C_{i} |}{n} \times E_{h} (C_{i})

WhereinRepresent that i corporations are at whole shared proportion, the E of organizing_m(G) be based on algorithm m, tissue to be carried outThe required expectation information content of identification member role after corporations divide;

Abnormal subsequence detects step 3

Step 3.1 is determined parameter

Given length is the time series of L:

X＝{x₁,x₂,…，x_L}

L is seasonal effect in time series length, and the given sub-sequence length that will detect is l, taking l as length of window, and wherein l < < L;From x₁Start to intercept subsequence, can obtain altogether the subsequence that n=L-l+1 length is l, seasonal effect in time series l subsequence X_jRepresentAs follows:

X_j＝{x_j,x_j+1,…,x_j+l-1}

For subsequence X_j, define its p (p is even number) neighborhood subsequence and be:

N b p N b p (X_{j}) = \{\begin{matrix} {X_{2}, ..., X_{p + 1}}, & j = 1 \\ {X_{1}, ..., X_{j - 1}, X_{j + 1}, ..., X_{P + 1}}, & 1 < j < 1 + p / 2 \\ {X_{j - p / 2}, X_{j - p / 2 + 1}, ..., X_{j - 1}, X_{j + 1}, ..., X_{j + p / 2}}, & 1 + p / 2 \leq j \leq n - p / 2 \\ {X_{n - p - 1}, ..., X_{j - 1}, X_{j + 1}, ..., X_{n}}, & n - p / 2 < j < n \\ {X_{n - p}, ..., X_{n - 1}}, & j = n \end{matrix}

Wherein each element is former seasonal effect in time series l subsequence, and brief note is here

N b p (X_{j}) = {X_{j}^{(1)}, X_{j}^{(2)},, X_{j}^{(p)}}

L is sub-sequence length, and p is neighborhood number, and wherein l is related to the resolution ratio of abnormal subsequence, and p is related to extremelyThe scope of effect;

Step 3.2 is set up subsequence regression model

X_jRegard the set of l observation of dependent variable as, Nbp (X_j) in l subsequence regard as and affect X_jP factor,In order to weigh X_jWith the consistent degree of its neighborhood, by Nbp (X_j) in element weighted sum, reconstruct subsequence X_jAs follows:

{\hat{X}}_{j} = Σ_{i = 1}^{p} w_{j}^{(i)} X_{j}^{(i)}

ClaimFor X_jNeighborhood reproducing sequence, wherein p neighborhood subsequence participates in the weights of reconstruct and is

W_{j} = {w_{j}^{(1)}, w_{j}^{(2)}, ..., w_{j}^{(p)}}

This process useable linear model tormulation

X_{j} (i) = w_{j}^{(1)} (i) X_{j}^{(1)} (i) + ... + w_{j}^{(p)} (i) X_{j}^{(p)} (i) + ϵ_{j} (i), i = 1, 2, ..., l

ε here_jX_jReconstruction value and the deviation of actual value. Note

X_{j} = [\begin{matrix} X_{j} (1) \\ X_{j} (2) \\ . \\ . \\ . \\ X_{j} (l) \end{matrix}], N_{X_{j}} = [\begin{matrix} X_{j}^{(1)} (1) & X_{j}^{(2)} (1) & ... & X_{j}^{(p)} (1) \\ X_{j}^{(1)} (2) & X_{j}^{(2)} (2) & ... & X_{j}^{(p)} (2) \\ . & . & . \\ . & . & ... & . \\ . & . & . \\ X_{j}^{(1)} (l) & X_{j}^{(2)} (l) & ... & X_{j}^{(p)} (l) \end{matrix}], W_{j} = [\begin{matrix} w_{j}^{(1)} \\ w_{j}^{(2)} \\ . \\ . \\ . \\ w_{j}^{(p)} \end{matrix}], ϵ_{j} = [\begin{matrix} ϵ_{j} (1) \\ ϵ_{j} (2) \\ . \\ . \\ . \\ ϵ_{j} (l) \end{matrix}]

ClaimFor subsequence neighborhood regression model;

Step 3.3 is calculated the consistent factor

Being i regression coefficient in model, is also X_jThe weight of i neighborhood subsequence to its linear reconstruction, andEach subsequence also corresponding its participate in p the weights of the p subsequence of its neighborhood of reconstruct, be designated as X_jReconstruct weight vector

F_{j} = (f_{j}^{(1)}, f_{j}^{(2)}, ..., f_{j}^{(p)});

Pass through || F_j|| and || ε_j|| structure is weighed subsequence and the conforming consistent factor of its neighborhood, definition subsequence X_jThe consistent factor

{ac}_{j} = \frac{| | F_{j} | |}{| | ϵ_{j} | |}

The present invention adopts the method for optimizing reconstruct deviation to solve reconstruct weights, using the normalizing condition of weights as constraint;

If to X_jReconstruction result beNaturally will be to X_jReconstruct deviation be defined as vectorWith X_jTwo norms,?

ϵ_{j} = | | X_{j} - {\hat{\hat{X}}}_{j} | |_{2}

Wherein

{\hat{\hat{X}}}_{j} = Σ_{i = 1}^{p} w_{j}^{(i)} X_{j}^{(i)}

Optimization problem is defined as follows

\min | | X_{j}, {\hat{\hat{X}}}_{j} | |_{2}

s t : Σ_{i = 1}^{p} w_{j}^{(i)} = 1

In above formula, Section 1 is the minimization of object function reconstruct deviation, and Section 2 is the normalized constraints of reconstruct weights, onState optimization and can obtain X_jThe weights that are reconstructedAnd final reconstructed error

To each subsequence X_iCarry out above least-squares estimation or optimizing process, obtain from all being reconstructed weightsX_iParticipate in p weights (regression coefficient) F of reconstruct_i＝(f_i ⁽¹⁾,f_i ⁽²⁾,…,f_i ^(p)), obtain consistent factor sequence

a c = {\frac{| | F_{1} | |}{| | ϵ_{1} | |}, \frac{| | F_{2} | |}{| | ϵ_{2} | |}, ..., \frac{| | F_{L - l + 1} | |}{| | ϵ_{L - l + 1} | |}}

In the time of subsequence abnormality detection, by drawing the consistent factor curve of subsequence, the sub-order corresponding to low valley of curveClassify abnormal subsequence as.

Preferably, in described step 2.1, utilize minimax normalization method that all node clustering coefficient quantizations are arrived[0,5], will round the mark of rear numeral as each node role.

Preferably, determine that in described step 3.1 in parameter, it is the higher value that is less than l that p is set, l/p ∈ (1,2).

The beneficial effect that adopts the present invention to obtain is:

1, the corporations' evolution analysis method based on the role of corporations entropy can be described tissue variation from medium yardstick, to tissueThe change of member status, role and interactive quantity and frequency of interaction, and the direction of microstructure Evolution has very high sensitiveness, avoidsInvestigate and organize the details that dynamically may lose from organized whole.

2, the abnormal subsequence detection method based on the consistent factor can be by adjusting sub-sequence length and neighborhood subsequenceNumber, obtains the abnormal of different time yardstick; And can amplify son by reconstruct weights with the consistent factor of reconstructed error structureThe difference of sequence and its neighborhood, resolution ratio and the robustness of raising abnormality detection.

Brief description of the drawings

Fig. 1 is the inventive method flow chart;

Fig. 2 is the comparison of four kinds of group dividing method accuracys rate;

Fig. 3 is the relatively accurate rate of fuzzy clustering group dividing method;

Fig. 4 is the drosophila gene regulated and control network role of corporations entropy curve;

Fig. 5 drosophila gene regulated and control network similarity curve;

Fig. 6 drosophila gene regulated and control network SeqS similarity

The consistent factor variations curve of Fig. 7

Detailed description of the invention

Below, the invention will be further described with specific embodiment by reference to the accompanying drawings. The present invention by emulated data,Public data collection is tested, and application is convenient, satisfactory for result. Consistent with the expection of design.

Experimental data:

GN baseline network model is l group by a network n node division, every group of g node. The connection probability of group interior nodes isp_in, between group, connecting probability is p_out, the subgraph in each group is p=p_inER random network. The average degree of node is<k>=p_in(g-1)+p_outG (l-1). If p_in>p_out, organize inner edge density and be greater than limit density between group, network has community structure.Conventionally set l=4, g=32, node average degree<k>=16, now p_in+p_out≈ 1/2. In calculating, usually use z_in＝p_in(g-1)＝31p_in，z_out＝p_outAverage nodal degree in g (l-1) expression group and between group. See intuitively z_outLess, networkCommunity structure is more obvious, is also more correctly divided, and in fact most of corporations partitioning algorithm is at z_outReach at 8 o'clock, accuracy rateStart obviously to decline.

Drosophila gene regulated and control network data set has been chosen across fruit bat 66 time points, the i.e. embryos of whole growth cycle(time 1～30), larva (time 31～40), pupa (time 41～58) and adult period (time 59～66). Based on gene originallyBody opinion, notebook data is concentrated the interactive relation having comprised between 588 and the closely-related gene of fruit bat growth course and gene.

1. corporations divide experiment

Experimental data is that corporations divide GN baseline network. Method is fuzzy clustering group dividing method, and classical societyThe division methods GN of group algorithm, FN algorithm and SpectralClust algorithm. Wherein fuzzy clustering group dividing method andCorporations' division numbers of SpectralClust algorithm is determined by the method for checking based on F, and corporations' number of GN algorithm and FN algorithmAmount obtains by optimizing modularity.

For GN baseline network, it is 4 that corporations' quantity is set, to z_outBe incremented to 8 network by 1 and use respectively above four kinds of calculationsMethod is carried out corporations' division, calculates the accuracy rate of each algorithm with regular mutual information measure, and each experiment repeats 5 times, is averagedAccuracy rate.

Fig. 2 is that four kinds of algorithms are at different z_outUnder corporations divide accuracy rate, the accuracy rate that can see four kinds of algorithms all withZ_outIncrease and downward trend, and the decline of fuzzy clustering group dividing method comparatively relaxes, and at z_outWhen largerShow higher accuracy rate. Shown in Fig. 3 be fuzzy clustering group dividing method with respect to other three kinds of algorithms accuratelyRate, can obviously find out that the fuzzy clustering group dividing method that utilizes the F method of inspection to determine corporations' quantity is at higher z_outUnder excellentGesture.

2. corporations' EVOLUTION ANALYSIS experiment

Data are drosophila gene regulated and control network data. Fig. 4 is the role of corporations entropy curve, embryo mid-term (t=10) and childrenThere are two obvious peak values in worm period (t=30～40). This result can reasonably explain in conjunction with Biological Knowledge,In the starting stage of growing, the functional role of gene is more local and single-minded, intergenicly tends to alternately occur in role's phaseLike between gene, what cause that between corporations, role distributes differs greatly, therefore have the larger role of corporations entropy. And arrived in embryoAfter stage phase, adapt with Rapid development, gene function is more general, and gene role's heterogeneity reduces, therefore the role of corporationsEntropy starts to reduce, and reaches minimum in the pupa stage. When entering adult period, growth slows down, and gene has become again single-minded role,The role of corporations entropy raises again to some extent. The visible role of corporations entropy can effectively be described the details of microstructure Evolution.

From the angle of network similitude, the evolution of drosophila gene regulated and control network is dynamically described below, result is as Fig. 5Shown in. Gradual due to network evolution, the similarity of the gene regulatory network of adjacent time point of most of the time changes notCan not embody the variation details of fruit bat growth course greatly.

3. abnormal subsequence test experience

In this experiment, this experiment has been chosen to test in 2 and has dynamically been described comparatively fuzzy to drosophila gene regulated and control networkSeqS similarity sequence is as object, as shown in Figure 6. Arranging of parameter considered that fruit bat stage of development minimum length is 10 prioriInformation, is significantly less than 10 therefore l should be set, and considers l/p ∈ (1,2) simultaneously, finally selects l=6, p=4.

Obtain consistent factor variations curve as shown in Figure 7, with the role of the corporations entropy comparative analysis of Fig. 4. Above by analysisThe rational biology implication of the role of corporations entropy curve representation, and consistent factor curve is also fine to fruit bat growth courseDescribe, be embodied in the following aspects.

1) when the time that the valley of the consistent factor occurs, the role of corporations Entropy change trend occurred obviously to change exactly,For example t=10, the place of the appearance step of t=20～30 and the t=50 left and right role of corporations entropy, the corresponding consistent factor all occursLow ebb. Inconsistent behavior in evolutionary process can effectively be indicated in this position of also having verified that the valley of consistent factor curve occurs,This inconsistent behavior be we to detect abnormal.

2) four obvious peak Distribution that consistent factor curve occurs appear at fruit bat and grow in four-stage, a reasonThe subsequence uniformity of the appearance explanation place time period of sub-peak of curve is high, and the interior gene regulation behavior of each stage of development alsoShould have stronger uniformity, both are very identical.

3) the consistent factor is to obtain by the comparatively fuzzy SeqS similarity of meaning, but result can with problem backgroundCoincide well, identify time point interesting in gene regulatory network evolutionary process, illustrate that the consistent factor has refinement, prominent reallyGo out the ability that organizational behavior changes, this ability that abnormality detection will possess just.

Be more than that the present invention is exemplarily described, obvious realization of the present invention is not subject to the restrictions described above,As long as the various improvement that adopted technical solution of the present invention to carry out, or without improving direct to design of the present invention and technical schemeApply other occasion, all in protection scope of the present invention.

Claims

1. the organizational behavior method for detecting abnormality developing based on corporations, is characterized in that, comprises the following steps:

The fuzzy corporations of step 1 based on EM algorithm divide

Step 1.1 is extracted node diagnostic vector

The adjacency matrix of network is got the corresponding characteristic vector of a maximum p characteristic value and obtains the eigenmatrix A of n × k_t, get featureEvery a line of matrix, as the attribute vector of corresponding node, has been mapped to p dimension space by each node, and n is network nodeNumber, the attribute vector of node m is

l_{m} = ({\tilde{a}}_{m}^{(1)}, {\tilde{a}}_{m}^{(2)}, ..., {\tilde{a}}_{m}^{(p)})

Step 1.2EM algorithm is divided corporations

For organizational member set v₁,v₂,…,v_n，C₁,C₂,…,C_kFor k fuzzy corporations, c₁,c₂,…,c_kBe respectively the C of corporations₁,C₂,…,C_kCorporations center, W=[w_ij] (1≤i≤n, 1≤j≤k) is Matrix dividing, wherein

w_{i j} = \frac{\frac{1}{d i s t (v_{i}, c_{j})}}{Σ_{t = 1}^{k} \frac{1}{d i s t (v_{i}, c_{t})}}

(1) initialize k corporations center, Matrix dividing;

(4) iteration carry out desired step and maximize step, sets iterative steps or corporations' centre convergence to expected range until reachOr error sum of squares is less than setting threshold;

Step 1.3 corporations quantity is determined

If nodes set N={v₁,v₂,…,v_n, the characteristic vector of node m isIf r isCorporations' number of dividing, { C₁,C₂,…,C_rBe corporations' set, n_iBe member's number of i corporations, the C of corporations_iCorresponding jointPoint N isCorresponding nodal community vector is respectively

Note

T_{i} = Σ_{j = 1}^{n_{i}} l_{i j}, i = 1, 2, ..., r

Q_{1} = Σ_{i = 1}^{r} T_{i}, Q_{2} = Σ_{i = 1}^{r} Σ_{j = 1}^{n_{i}} l_{i j}^{T} l_{i j}

Wherein l_ijRepresent the attribute vector of j node in i corporations,

S_{A} = Σ_{i = 1}^{r} \frac{T_{i}^{T} T_{i}}{n_{i}} - \frac{Q_{1}^{2}}{n}

S_{e} = Q_{2} - \frac{Q_{1}^{T} Q_{1}}{n} - S_{A}

Introduce F statistic

F = \frac{S_{A} / (r - 1)}{S_{e} / (n - r)} \overset{H 0}{~} F (p (r - 1, n - r))

To given level of signifiance α and the quantity r of corporations, can look into F distribution table and obtain F_1-α(p (r-1, n-r)), if F > F_1-α(p(r-1, n-r)), according to thering is significant difference between the known corporations of statistical theory, illustrate that classification is more reasonable; For different corporationsQuantity, is meeting F > F_1-αDuring all corporations of (p (r-1, n-r)) divide, get and make difference F-F_1-αMaximum corporations' quantity is doneFor the most rational corporations quantity, and then obtain best corporations' division;

Step 2 corporations EVOLUTION ANALYSIS

Step 2.1 organizational roles

Cluster coefficients has been described the limit density of neighbor domain of node, and in tissue, the interactive mode of different role often can be embodied in cluster systemIn the difference of number, therefore the Local Clustering coefficient of node can reflect status and the goniochromatism of node in network to a certain extentDifferent, the cluster coefficients of nodes i is defined as follows

{\tilde{C}}_{i} = \frac{| E (Γ_{i}) |}{(\begin{matrix} k_{i} \\ 2 \end{matrix})}

Wherein Γ_iFor the neighborhood of node i, i.e. node i and all direct adjacent subgraphs forming thereof, E (Γ_i) expression Γ_iThe number on middle limitAmount,

(\begin{matrix} k_{i} \\ 2 \end{matrix}) = \frac{1}{2} k_{i} (k_{i} - 1)

For Γ_iIn the limit quantity of all nodes when interconnected;

Step 2.2 organizational roles entropy

Suppose common n member in organization network G, and in network, have t kind role { j₁,j₁,…,j₁, the determining of analogy comentropyJustice, definition organizational roles entropy

E_{h} (G) = - Σ_{k = 1}^{t} p_{k} \log_{2} p_{k}

p_{k} = \frac{| j_{k} |}{n}

The role of step 2.3 corporations entropy

Suppose that corporations divide l network is divided into m corporation, i.e. { C₁,C₂,…,C_m, each corporations still comprise different roles.Regard each corporations as subgroup and knit, the definition role of corporations entropy

E_{m} (G) = - Σ_{i = 1}^{m} \frac{| C_{i} |}{n} \times E_{h} (C_{i})

WhereinRepresent that i corporations are at whole shared proportion, the E of organizing_m(G) be based on algorithm m, tissue to be carried out to corporations to drawThe required expectation information content of identification member role after point;

Abnormal subsequence detects step 3

Step 3.1 is determined parameter

Given length is the time series of L:

X＝{x₁,x₂,…,x_L}

L is seasonal effect in time series length, and the given sub-sequence length that will detect is l, taking l as length of window, and wherein l < < L; From x₁Start to intercept subsequence, can obtain altogether the subsequence that n=L-l+1 length is l, seasonal effect in time series l subsequence X_jRepresent asUnder:

X_j＝{x_j,x_j+1,…,x_j+l-1}

N b p N b p (X_{j}) = \{\begin{matrix} {X_{2}, ..., X_{p + 1}}, & j = 1 \\ {X_{1}, ..., X_{j - 1}, X_{j + 1}, ..., X_{P + 1}}, & 1 < j < 1 + p / 2 \\ {X_{j - p / 2}, X_{j - p / 2 + 1}, ..., X_{j - 1}, X_{j + 1}, ..., X_{j + p / 2}}, & 1 + p / 2 \leq j \leq n - p / 2 \\ {X_{n - p - 1}, ..., X_{j - 1}, X_{j + 1}, ..., X_{n}}, & n - p / 2 < j < n \\ {X_{n - p}, ..., X_{n - 1}}, & j = n \end{matrix}

N b p (X_{j}) = {X_{j}^{(1)}, X_{j}^{(2)},, X_{j}^{(p)}}

L is sub-sequence length, and p is neighborhood number, and wherein l is related to the resolution ratio of abnormal subsequence, and p is related to Effects of AnomalousScope;

Step 3.2 is set up subsequence regression model

X_jRegard the set of l observation of dependent variable as, Nbp (X_j) in l subsequence regard as and affect X_jP factor, forWeigh X_jWith the consistent degree of its neighborhood, by Nbp (X_j) in element weighted sum, reconstruct subsequence X_jAs follows:

{\hat{X}}_{j} = Σ_{i = 1}^{p} w_{j}^{(i)} X_{j}^{(i)}

W_{j} = {w_{j}^{(1)}, w_{j}^{(2)}, ..., w_{j}^{(p)}}

This process useable linear model tormulation

X_{j} (i) = w_{j}^{(1)} (i) X_{j}^{(1)} (i) + ... + w_{j}^{(p)} (i) X_{j}^{(p)} (i) + ϵ_{j} (i), i = 1, 2, ..., l

ε here_jX_jReconstruction value and the deviation of actual value, note

X_{j} = [\begin{matrix} X_{j} (1) \\ X_{j} (2) \\ \begin{matrix} . \\ . \\ . \end{matrix} \\ X_{j} (l) \end{matrix}], N_{X_{j}} = [\begin{matrix} X_{j}^{(1)} (1) & X_{j}^{(2)} (1) & ... & X_{j}^{(p)} (1) \\ X_{j}^{(1)} (2) & X_{j}^{(2)} (2) & ... & X_{j}^{(p)} (2) \\ \begin{matrix} . \\ . \\ . \end{matrix} & \begin{matrix} . \\ . \\ . \end{matrix} & \begin{matrix} ... \end{matrix} & \begin{matrix} . \\ . \\ . \end{matrix} \\ X_{j}^{(1)} (l) & X_{j}^{(2)} (l) & ... & X_{j}^{(p)} (l) \end{matrix}], W_{j} = [\begin{matrix} w_{j}^{(1)} \\ w_{j}^{(2)} \\ \begin{matrix} . \\ . \\ . \end{matrix} \\ w_{j}^{(p)} \end{matrix}], ϵ_{j} = [\begin{matrix} ϵ_{j} (1) \\ ϵ_{j} (2) \\ \begin{matrix} . \\ . \\ . \end{matrix} \\ ϵ_{j} (l) \end{matrix}]

ClaimFor subsequence neighborhood regression model;

Step 3.3 is calculated the consistent factor

Being i regression coefficient in model, is also X_jThe weight of i neighborhood subsequence to its linear reconstruction, and eachSubsequence also corresponding its participate in p the weights of the p subsequence of its neighborhood of reconstruct, be designated as X_jReconstruct weight vector

F_{j} = (f_{j}^{(1)}, f_{j}^{(2)}, ..., f_{j}^{(p)});

Pass through || F_j|| and || ε_j|| structure is weighed subsequence and the conforming consistent factor of its neighborhood, definition subsequence X_jOneReason

{ac}_{j} = \frac{| | F_{j} | |}{| | ϵ_{j} | |}

If to X_jReconstruction result beNaturally will be to X_jReconstruct deviation be defined as vectorWith X_jTwo norms,

ϵ_{j} = | | X_{j} - {\hat{\hat{X}}}_{j} | |_{2}

Wherein

{\hat{\hat{X}}}_{j} = Σ_{i = 1}^{p} w_{j}^{(i)} X_{j}^{(i)}

Optimization problem is defined as follows

\min | | X_{j}, {\hat{\hat{X}}}_{j} | |_{2}

s t : Σ_{i = 1}^{p} w_{j}^{(i)} = 1

In above formula, Section 1 is the minimization of object function reconstruct deviation, and Section 2 is the normalized constraints of reconstruct weights, above-mentioned excellentChange can obtain X_jThe weights that are reconstructedAnd final reconstructed error

To each subsequence X_iCarry out above least-squares estimation or optimizing process, weights, obtain X from all being reconstructed_iGinsengP weights, i.e. regression coefficient with reconstructObtain consistent factor sequence

a c = {\frac{| | F_{1} | |}{| | ϵ_{1} | |}, \frac{| | F_{2} | |}{| | ϵ_{2} | |}, ..., \frac{| | F_{L - l + 1} | |}{| | ϵ_{L - l + 1} | |}}

In the time of subsequence abnormality detection, by drawing the consistent factor curve of subsequence, the subsequence corresponding to low valley of curve isAbnormal subsequence.

2. a kind of organizational behavior method for detecting abnormality developing based on corporations according to claim 1, is characterized in that:In described step 2.1, utilize minimax normalization method that all node clustering coefficient quantizations are arrived to [0,5], will round rear numeralAs each node role's mark.

3. a kind of organizational behavior method for detecting abnormality developing based on corporations according to claim 1, is characterized in that:Described step 3.1 determines that in parameter, it is the higher value that is less than l that p is set, l/p ∈ (1,2).