CN102663142A - Knowledge extraction method - Google Patents

Knowledge extraction method Download PDF

Info

Publication number
CN102663142A
CN102663142A CN2012101572047A CN201210157204A CN102663142A CN 102663142 A CN102663142 A CN 102663142A CN 2012101572047 A CN2012101572047 A CN 2012101572047A CN 201210157204 A CN201210157204 A CN 201210157204A CN 102663142 A CN102663142 A CN 102663142A
Authority
CN
China
Prior art keywords
search
individual
pos
dimension
calculating
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2012101572047A
Other languages
Chinese (zh)
Other versions
CN102663142B (en
Inventor
刘洪波
冯士刚
陈荣
张维石
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian Maritime University
Original Assignee
Dalian Maritime University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian Maritime University filed Critical Dalian Maritime University
Priority to CN201210157204.7A priority Critical patent/CN102663142B/en
Publication of CN102663142A publication Critical patent/CN102663142A/en
Application granted granted Critical
Publication of CN102663142B publication Critical patent/CN102663142B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a knowledge extraction method, which comprises the following steps of: calculating a reduced initial value; enabling a dual-matrix coding strategy; conducting searching initialization; calculating an ending criterion; calculating adaptive values of searching individuals; conducting optimum saving; and conducting state transition joint operation. The dual-matrix coding strategy is adopted, the positions of the searching individuals are coded into 0 and 1 character strings, and the number of dimensions is equal to the number of condition attributes. When the dimension scale exceeds 23, time spent in fishing reduction is not exponentially and obviously increased and the spatial dimensions and the time are saved. Rough set positive area discrimination is adopted. If POS'<E> is equal to U'<pos>, the adaptive values are the number of corresponding condition attributes; and if the POS'<E> is not equal to U'<pos>, the adaptive values are punished to be the total number of the condition attributes. The strategy simply and reasonably guarantees the knowledge extraction effect. The superiority of a group consisting of the searching individuals is used for conducting dynamic searching, and a method of conducting feature combination through effective positive area comparison to obtain much knowledge is adopted.

Description

The method that a kind of knowledge extracts
Technical field
The present invention relates to a kind of knowledge discovering technologies, particularly a kind of Knowledge Extraction Method.
Background technology
Rough set theory is a kind of mathematical tool of handling out of true, inconsistent and fragmentary data, and it is that scientist in Poland Pawlak proposed in nineteen eighty-two, can keep under the constant prerequisite of classification capacity, through the classifying rules of Reduction of Knowledge acquire knowledge.Compare with decision tree, bayes method etc., rough set method does not need priori, finds knowledge in the infosystem of only utilizing data itself to be provided.In real world, the knowledge of many infosystems embodies not unique usually, and the knowledge of a plurality of angles is arranged, and they possibly be a plurality of various combinations of the different attribute in the infosystem, and its classification performance is suitable.These many bodies knowledge possibly brought into play different effects in particular environment.For example, in the multirobot real-time route was selected, under the enough situation of memory body capacity, many bodies knowledge provided more routing, can show stronger avoidant disorder ability.For knowledge extracted, each yojan can be expressed as different monomer knowledge, and the common many bodies knowledge system that forms of these many yojan has very significant values in practical application.
Verified all yojan and minimum yojan of finding the solution decision table is the NP.hard problem.For this reason, adopt didactic method to carry out attribute reduction usually.Heuritic approach commonly used has old attribute reduction algorithms based on information entropy, based on the old attribute reduction algorithms of distinguishable matrix with based on the old attribute reduction algorithms of positive region.The basic ideas of most of heuristic Algorithm for Reduction be with nuclear attribute be starting point; Estimate according to certain of attribute importance then; Select not to be added to yojan beyond the nuclear attribute successively and concentrate most important attribute; It is joined yojan concentrate,, obtain a yojan of decision table thus up to satisfying end condition.This yojan can only be expressed as monomer knowledge in knowledge system.Current, it is a major issue that faces in the knowledge discovering technologies that many knowledge extract.
Summary of the invention
For solving the problems referred to above that prior art exists, the present invention will propose a kind of Knowledge Extraction Method that in the existing information system, obtains many bodies knowledge.
To achieve these goals, technical scheme of the present invention is following: the method that a kind of knowledge extracts may further comprise the steps:
A, calculating yojan initial value
Calculate about normal district POS' according to formula (1), (2) and (3) E, yojan domain U', about normal district U ' Pos
Figure BDA00001657206500021
Note U/C={ [u' 1] C, [u' 2] C..., [u' m] C, then
U′={u′ 1,u' 2,…,u' m}(2)
U &prime; POS = { u &prime; i 1 , u &prime; i 2 , &CenterDot; &CenterDot; &CenterDot; u &prime; i t } - - - ( 3 )
B, launch two square coding strategies
In solution space during Search of Individual; Need encode according to the dimension of solution space, described coding is that conditional attribute is directly formed mapping with the location dimension of Search of Individual, when the information system perspective field object surpasses 4000, the dimension scale is above 23 the time; Per 3 attributes are corresponding to a coding unit; Like this, on dimension, show as 1, the span of position is 0 ~ 7 integer;
C, search initialization
For being without loss of generality, the field of definition of supposing yojan is [0, r], and promptly the solution space maximum occurrences is r, and minimum value is 0, and the solution space dimension is d, if what adopt among the step B is a coded representation, and r=1 so; If that adopt among the step B is abbreviated code representation, r=7 so;
Utilize n the formed population of Search of Individual to carry out parallel search, make the maximal rate v of Search of Individual in solution space Max=r; During time step t=0, to the random initializtion that carries out of n Search of Individual, i.e. the position p of the j of i Search of Individual dimension Ij=Rand (0, the speed v of r) tieing up with the j of i Search of Individual Ij=Rand (v Max, v Max); In the formula, r is a field of definition, and t is a time step;
D, calculating finish criterion
Do not have improvement as if satisfying predetermined maximum iteration time or 10 iteration results, then export p* and f (p*) and end calculating as a result; Otherwise, change step e;
In the formula; P* be Search of Individual form crowd in best individual state;
Figure BDA00001657206500023
is that i Search of Individual begins to iterate to current best state from t=0, f (p*) be Search of Individual form crowd in the best determined adaptive value of individual state.
The adaptive value of E, calculating Search of Individual
Adopt rough set just distinguishing differentiation, if POS' E=U ' Pos, then adaptive value is a respective conditions attribute number; If POS' E≠ U ' Pos, then adaptive value punishment is the conditional attribute sum;
F, the optimum preservation
Make t=t+1, implement optimum conversation strategy, that is:
p i # ( t ) = arg min 1 &le; i &le; n ( f ( p i # ( t - 1 ) ) , f ( p i ( t ) ) )
p * ( t ) = arg min 1 &le; i &le; n ( f ( p * ( t - 1 ) ) , f ( p 1 ( t ) ) , &CenterDot; &CenterDot; &CenterDot; , f ( p n ( t ) ) )
G, state transitions joint operation
Introduce the community superiority that Search of Individual forms and dynamically search for, to each dimension of each Search of Individual according to formula (4), (5) and (6) executing state transfer joint operation:
v ij ( t ) = wv ij ( t - 1 ) + c 1 r 1 ( p ij # ( t - 1 ) - p ij ( t - 1 ) ) + c 2 r 2 ( p j * ( t - 1 ) - p ij ( t - 1 ) ) - - - ( 4 )
p ij ( t ) = 1 if&rho; < sig ( v ij ( t ) ) 0 otherwise . - - - ( 5 )
Wherein,
sig ( v ij ( t ) ) = 1 1 + e - v ij ( t ) - - - ( 6 )
Change step D.
Compared with prior art, the present invention has following beneficial effect:
1, the present invention adopts two square coding strategies.Conditional attribute in the Reduction of Knowledge of the present invention directly forms mapping with the location dimension of Search of Individual, each dimension span of position be 0,1}, ' 0' represent that the attribute of correspondence is not included in the yojan, ' 1' representes that the attribute of correspondence is included in the yojan.Like this, position encoded one-tenth 0,1 character string of Search of Individual, dimension is identical with the conditional attribute number.When the dimension scale surpassed 23, the completion time that yojan consumed was not the index phenomenal growth, has practiced thrift Spatial Dimension and time.
2, the present invention adopts rough set just distinguishing differentiation POS' E=U ' PosAdaptive value is a respective conditions attribute number, if POS' E≠ U ' PosAdaptive value punishment is the conditional attribute sum, has guaranteed that this tactful advantages of simple knowledge extracts effect.
3, the present invention introduces the community superiority that Search of Individual forms and dynamically searches for; According to formula (4); (5) and (6) executing state shift joint operation; Obtain rational many yojan knowledge and distribute, solve effectively that prior art exists, be difficult to the problem in the existing information system to many bodies knowledge.
4, the present invention proposes the many Algorithm for Reduction of two squares coding swarm intelligence rough sets of being convenient to many knowledge extractions, dynamically search for the community superiority that Search of Individual is formed, and adopt a kind of effectively just the district relatively to carry out the method that characteristics combination obtains many knowledge.
Description of drawings
4 in the total accompanying drawing of the present invention, wherein:
Fig. 1 is that two 1 absolute coding of square are represented.
Fig. 2 is that two square multidigit compressed encodings are represented.
Fig. 3 soybean-large-test data set performance comparison curves.
Fig. 4 is a process flow diagram of the present invention.
Embodiment
Below in conjunction with accompanying drawing the present invention is described further.
Fig. 1 is a coded representation method synoptic diagram in the two square coding strategies of the present invention; Conditional attribute in Reduction of Knowledge location dimension direct and Search of Individual forms mapping; Each dimension span of position is { 0; 1}, ' 0' represent that corresponding attribute is not included in the yojan, ' 1' representes that corresponding attribute is included in the yojan.Like this, position encoded one-tenth 0,1 character string of Search of Individual, dimension is identical with the conditional attribute number.
Fig. 2 is the abbreviated code method for expressing synoptic diagram in the two square coding strategies of the present invention, when the information system perspective field object surpass 4000, when the dimension scale surpasses 23, per 3 synthetic unit show as 1 on dimension, the span of position is 0 ~ 7 integer.
Fig. 3 is the performance comparison curves of two kinds of method yojan soybean large test data sets, and performance curve shows that the present invention can obtain will good result than genetic algorithm in the shorter time.For soybean largetest data set, the result of three GA yojan is 12:{1, and 3,4,5,6,7,13,15,16,22,32,35}; The result of three PSO yojan is respectively 10:{1, and 3,5,6,7,12,15,18,22,31}, 10:{1,3,5,6,7,15,23,26,28,30} and 10:{1,2,3,6,7,9,15,21,22,30}.With data by MoM and MEI, the present invention more is inclined to provides many yojan, and the conditional attribute number will be lacked among the result of gained.
Table 1 is one group of example raw data set, just adopts the preceding data mode of yojan of the present invention, c 1, c 2, c 3And c 4Be four conditional attributes, d is four determined decision attributes of conditional attribute, x i(i=1 ... 15) be 15 instances phenotypes after the discretize in this infosystem, table 2 is the results that obtain according to after the flow process yojan shown in Figure 4, wherein comprises two minimum yojan: 1,4} with 2,3} (annotate: the numeral in the braces is the conditional attribute numbering).Instance in the table 2 that obtains after table 1 yojan can be converted into 1, and 4} with 2,3} disome knowledge.
Table 1 sample data collection
c 1 c 2 c 3 c 4 d
x 1 1 1 1 1 0
x 2 2 2 2 1 1
x 3 1 1 1 1 0
x 4 2 3 2 3 0
x 5 2 2 2 1 1
x 6 3 1 2 1 0
x 7 1 2 3 2 2
x 8 2 3 1 2 3
x 9 3 1 2 1 1
x 10 1 2 3 2 2
x 11 3 1 2 1 1
x 12 2 3 1 2 3
x 13 4 3 4 2 1
x 14 1 2 3 2 1
x 15 4 3 4 2 2
Table 2 result set
c 1 c 2 c 3 c 4 d
{1,4}
x 1 1 1 0
x 2 2 1 1
x 4 2 3 0
x 6 3 1 0
x 7 1 2 2
x 8 2 2 3
x 9 3 1 1
x 13 4 2 1
x 14 1 2 1
x 15 4 2 2
{2,3}
x 1 1 1 0
x 2 2 2 1
x 4 3 2 0
x 6 1 2 0
x 7 2 3 2
x 8 3 1 3
x 9 1 2 1
x 13 3 4 1
x 14 2 3 1
x 15 3 4 2

Claims (1)

1. the method that extracts of a knowledge is characterized in that: may further comprise the steps:
A, calculating yojan initial value
Calculate about normal district POS' according to formula (1), (2) and (3) E, yojan domain U', about normal district U ' Pos
Figure FDA00001657206400011
Note U/C={ [u' 1] C, [u' 2] C..., [u' m] C, then
U′={u′ 1,u' 2,…,u' m} (2)
U &prime; POS = { u &prime; i 1 , u &prime; i 2 , &CenterDot; &CenterDot; &CenterDot; u &prime; i t } - - - ( 3 )
B, launch two square coding strategies
In solution space during Search of Individual; Need encode according to the dimension of solution space, described coding is that conditional attribute is directly formed mapping with the location dimension of Search of Individual, when the information system perspective field object surpasses 4000, the dimension scale is above 23 the time; Per 3 attributes are corresponding to a coding unit; Like this, on dimension, show as 1, the span of position is 0 ~ 7 integer;
C, search initialization
For being without loss of generality, the field of definition of supposing yojan is [0, r], and promptly the solution space maximum occurrences is r, and minimum value is 0, and the solution space dimension is d, if what adopt among the step B is a coded representation, and r=1 so; If that adopt among the step B is abbreviated code representation, r=7 so;
Utilize n the formed population of Search of Individual to carry out parallel search, make the maximal rate v of Search of Individual in solution space Max=r; During time step t=0, to the random initializtion that carries out of n Search of Individual, i.e. the position p of the j of i Search of Individual dimension Ij=Rand (0, the speed v of r) tieing up with the j of i Search of Individual Ij=Rand (v Max, v Max); In the formula, r is a field of definition, and t is a time step;
D, calculating finish criterion
Do not have improvement as if satisfying predetermined maximum iteration time or 10 iteration results, then export p* and f (p*) and end calculating as a result; Otherwise, change step e;
In the formula; P* be Search of Individual form crowd in best individual state;
Figure FDA00001657206400013
is that i Search of Individual begins to iterate to current best state from t=0, f (p*) be Search of Individual form crowd in the best determined adaptive value of individual state;
The adaptive value of E, calculating Search of Individual
Adopt rough set just distinguishing differentiation, if POS' E=U ' Pos, then adaptive value is a respective conditions attribute number; If POS' E≠ U ' Pos, then adaptive value punishment is the conditional attribute sum;
F, the optimum preservation
Make t=t+1, implement optimum conversation strategy, that is:
p i # ( t ) = arg min 1 &le; i &le; n ( f ( p i # ( t - 1 ) ) , f ( p i ( t ) ) )
p * ( t ) = arg min 1 &le; i &le; n ( f ( p * ( t - 1 ) ) , f ( p 1 ( t ) ) , &CenterDot; &CenterDot; &CenterDot; , f ( p n ( t ) ) )
G, state transitions joint operation
Introduce the community superiority that Search of Individual forms and dynamically search for, to each dimension of each Search of Individual according to formula (4), (5) and (6) executing state transfer joint operation:
v ij ( t ) = wv ij ( t - 1 ) + c 1 r 1 ( p ij # ( t - 1 ) - p ij ( t - 1 ) ) + c 2 r 2 ( p j * ( t - 1 ) - p ij ( t - 1 ) ) - - - ( 4 )
p ij ( t ) = 1 if&rho; < sig ( v ij ( t ) ) 0 otherwise . - - - ( 5 )
Wherein,
sig ( v ij ( t ) ) = 1 1 + e - v ij ( t ) - - - ( 6 )
Change step D.
CN201210157204.7A 2012-05-18 2012-05-18 Knowledge extraction method Expired - Fee Related CN102663142B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210157204.7A CN102663142B (en) 2012-05-18 2012-05-18 Knowledge extraction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210157204.7A CN102663142B (en) 2012-05-18 2012-05-18 Knowledge extraction method

Publications (2)

Publication Number Publication Date
CN102663142A true CN102663142A (en) 2012-09-12
CN102663142B CN102663142B (en) 2014-02-26

Family

ID=46772633

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210157204.7A Expired - Fee Related CN102663142B (en) 2012-05-18 2012-05-18 Knowledge extraction method

Country Status (1)

Country Link
CN (1) CN102663142B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103530505A (en) * 2013-09-29 2014-01-22 大连海事大学 Human brain language cognition modeling method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060074962A1 (en) * 2004-09-24 2006-04-06 Fontoura Marcus F Method, system, and program for searching documents for ranges of numeric values
CN101187927A (en) * 2007-12-17 2008-05-28 电子科技大学 Criminal case joint investigation intelligent analysis method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060074962A1 (en) * 2004-09-24 2006-04-06 Fontoura Marcus F Method, system, and program for searching documents for ranges of numeric values
CN101187927A (en) * 2007-12-17 2008-05-28 电子科技大学 Criminal case joint investigation intelligent analysis method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
HONGBO LIU ET AL: "A SWARM-BASED ROUGH SET APPROACH FOR FMRI DATA ANALYSIS", 《INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL》, vol. 7, no. 6, 30 June 2011 (2011-06-30), pages 3121 - 3132 *
HONGBO LIU ET AL: "Extracting Multi-knowledge from fMRI Data through Swarm-Based Rough Set Reduction", 《HYBRID ARTIFICIAL INTELLIGENCE SYSTEMS LECTURE NOTES IN COMPUTER SCIENCE》, vol. 5271, 31 December 2008 (2008-12-31), pages 281 - 288, XP019106497 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103530505A (en) * 2013-09-29 2014-01-22 大连海事大学 Human brain language cognition modeling method
CN103530505B (en) * 2013-09-29 2017-02-08 大连海事大学 Human brain language cognition modeling method

Also Published As

Publication number Publication date
CN102663142B (en) 2014-02-26

Similar Documents

Publication Publication Date Title
CN104268629B (en) Complex network community detecting method based on prior information and network inherent information
CN105512289A (en) Image retrieval method based on deep learning and Hash
CN105631416B (en) The method for carrying out recognition of face is clustered using novel density
CN106991442A (en) The self-adaptive kernel k means method and systems of shuffled frog leaping algorithm
CN104615869A (en) Multi-population simulated annealing hybrid genetic algorithm based on similarity expelling
CN103927394A (en) Multi-label active learning classification method and system based on SVM
CN104933624A (en) Community discovery method of complex network and important node discovery method of community
CN103116766A (en) Increment neural network and sub-graph code based image classification method
CN107153837A (en) Depth combination K means and PSO clustering method
CN101324926B (en) Method for selecting characteristic facing to complicated mode classification
CN103605985A (en) A data dimension reduction method based on a tensor global-local preserving projection
CN104809393A (en) Shilling attack detection algorithm based on popularity classification features
CN105550578A (en) Network anomaly classification rule extracting method based on feature selection and decision tree
CN113139251A (en) Variable working condition rolling bearing fault diagnosis method for optimizing theme correlation analysis
CN104657472A (en) EA (Evolutionary Algorithm)-based English text clustering method
CN103927584A (en) Resource scheduling optimization method based on genetic algorithm
CN102663142A (en) Knowledge extraction method
CN103473599A (en) Genetic algorithm and Kalman filtering based RBFN (Radial Basis Function Networks) combined training method
CN105373846A (en) Oil gas gathering and transferring pipe network topological structure intelligent optimization method based on grading strategy
CN103995821B (en) Selective clustering integration method based on spectral clustering algorithm
CN103164487A (en) Clustering algorithm based on density and geometrical information
CN109033746B (en) Protein compound identification method based on node vector
CN102324059A (en) Target assignment method based on evolution
CN108460147A (en) The recommendation method of information core is built based on how sub- population coevolution
CN106021999B (en) A kind of optimal multiple labeling integrated prediction method of multi-functional antimicrobial peptide

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20140226

Termination date: 20150518

EXPY Termination of patent right or utility model