WO2014195782A2 - Differential evolution-based feature selection - Google Patents

Differential evolution-based feature selection Download PDF

Info

Publication number
WO2014195782A2
WO2014195782A2 PCT/IB2014/000939 IB2014000939W WO2014195782A2 WO 2014195782 A2 WO2014195782 A2 WO 2014195782A2 IB 2014000939 W IB2014000939 W IB 2014000939W WO 2014195782 A2 WO2014195782 A2 WO 2014195782A2
Authority
WO
WIPO (PCT)
Prior art keywords
features
variation factor
class variation
feature
class
Prior art date
Application number
PCT/IB2014/000939
Other languages
French (fr)
Other versions
WO2014195782A3 (en
Inventor
Kingshuk CHAKRAVARTY
Diptesh DAS
Aniruddha Sinha
Amit Konar
Original Assignee
Tata Consultancy Services Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tata Consultancy Services Limited filed Critical Tata Consultancy Services Limited
Publication of WO2014195782A2 publication Critical patent/WO2014195782A2/en
Publication of WO2014195782A3 publication Critical patent/WO2014195782A3/en

Links

Classifications

    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/72Signal processing specially adapted for physiological signals or for diagnostic purposes
    • A61B5/7235Details of waveform analysis
    • A61B5/7264Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/24Detecting, measuring or recording bioelectric or biomagnetic signals of the body or parts thereof
    • A61B5/316Modalities, i.e. specific diagnostic methods
    • A61B5/369Electroencephalography [EEG]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • G06F18/2111Selection of the most significant subset of features by using evolutionary computational techniques, e.g. genetic algorithms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • G06F18/2115Selection of the most significant subset of features by evaluating different subsets according to an optimisation criterion, e.g. class separability, forward selection or backward elimination
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/08Feature extraction
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems

Definitions

  • the present subject matter relates, in general, to selection of features and particularly to selection of optimum features using differential evolution.
  • Objects such as people, materials, diseases, etc.
  • the identification and classification of the objects into the classes requires knowledge and information of object features which correlate with their types or characteristics.
  • the kno wn features can be used for the purposes of identification and classification of the objects.
  • Figure 1 illustrates a system environment implementing an optimum feature selection system, in accordance with an implementation of the present subject matter.
  • Figure 2 illustrates a method for selection of an optimum feature subset, in accordance with an implementation of the present subject matter.
  • Figure 3 illustrates a method for identification and classification of objects into classes using an optimum feature subset, in accordance with an implementation of the present subject matter.
  • feature selection refers to selection of a feature subset for identification and classification of objects into classes
  • the optimum feature subset is a feature subset having a number of independent features, substantially sufficient for identification and classification of objects into classes.
  • the identification and classification of objects into the different classes using features that characterize the objects is known.
  • data sets of the objects are gathered, and a plurality of features is extracted from the gathered data sets.
  • biometric data such as skeleton data
  • various gait features are extracted from the obtained biometric data.
  • the plurality of features extracted is mapped to various possible classes, which is then used to train a supervised learning algorithm, also referred to as classifier, for subsequent identification and classification of unknown objects into the classes.
  • the number of features extracted from the data sets of the objects is substantially large.
  • Some conventional classification methodologies utilize all the extracted features for the purpose of identification and classification of the objects. Such conventional methodologies thus require a large number of computational steps to identify and classify the objects, which makes them computationally expensive. Also, some of the extracted features may not be relevant or may be redundant for the classification of objects.
  • the extracted features which may not be relevant or may be redundant for the classification of objects, may contribute to misclassification of the objects.
  • a subset of features from the set of extracted features, is selected.
  • the selection of a subset of features using a classifier is also known. Conventionally, multiple random subsets of features are individually used in a classifier to identify an optimum feature subset, from amongst the subsets of features, which can identify and classify the objects. This optimum feature subset is then used to train the classifier to identify and classify the objects into the classes.
  • the feature selection technique is classifier dependent.
  • the present subject matter describes system(s) and method(s) for selection of optimum feature subset from a plurality of extracted features.
  • the selection of optimum feature subset in accordance with the present subject matter, is classifier independent.
  • For the selection of an optimum feature subset a plurality of features extracted from data sets associated with objects representing multiple classes is obtained. The obtained features are analyzed and an optimum feature subset is selected based on differential evolution process.
  • the selection of optimum feature subset is based on computation of an intra-class variation factor and an inter- class variation factor for a plurality of feature subsets.
  • the intra-class variation factor refers to variations of individual or combination of features within a class.
  • the inter- class variation factor refers to variations of individual or combination of features across multiple classes, i.e., variation of feature from one class with respect to another.
  • the intra- class variation factor is minimized and the inter-class variation factor is maximized using differential evolution process.
  • the differential evolution process refers to an optimization search process which iteratively generates a solution (for example, a feature subset) to a problem (for example, an objective function, a fitness function, etc.,) with regard to a given condition (for example, minimization, maximization, etc.).
  • a solution for example, a feature subset
  • a problem for example, an objective function, a fitness function, etc.,
  • a given condition for example, minimization, maximization, etc.
  • the methodology of present subject matter can be implemented for selection of an optimum feature subset, to identify and classify objects into different classes using the optimum feature subset.
  • the optimum feature subset With the optimum feature subset, the number of computations and size of storage space involved in the identification and classification stage is substantially less and the classification or recognition accuracy substantially improves.
  • the usage of the optimum feature subset also substantially reduces the runtime complexity of identifying and classifying the objects into the classes.
  • the methodology of the present subject matter may be implemented for people identification, where the objects may be individuals who are to be classified as distinct individuals.
  • the gait features extracted from skeleton data sets at different instances for the individuals, are obtained, and an optimum gait feature subset is selected based on differential evolution process of the present subject matter. The optimum gait feature subset is then used in a classifier for the classification of the individuals.
  • the methodology of the present subject matter may be implemented for classification of cognitive loads on individuals, where the objects may be cognitive loads to be classified in different classes.
  • the electroencephalography (EEG) features extracted from EEG signals at different instances for the individuals, are obtained, and an optimum EEG feature subset is selected based on differential evolution process of the present subject matter. The optimum EEG feature subset is then used in a classifier for the classification of the cognitive loads on the individuals.
  • the selection of the optimum feature subset does not involve a classifier and, thus, is independent of the classifier. This removes restrictions on the use of a particular classifier for which the optimum feature subset is obtained and which is trained to use the optimum feature subset for the identification and classification of the objects. Further, the optimum feature subset selected based on differential evolution by the minimization of intra- class variation factor and by the maximization of the inter-class variation factor is substantially accurate. With the optimum feature subset selection of the present subject matter, a substantially accurate identification and classification can be achieved.
  • FIG. 1 illustrates a system environment 100 implementing an optimum feature selection system 102, in accordance with an implementation of the present subject matter.
  • the optimum feature selection system 102 is hereinafter referred to as a system 102.
  • the system 102 can be implemented as a computing device, such as a laptop computer, a desktop computer, a notebook, a workstation, a mainframe computer, and the like.
  • the system 102 is enabled to select an optimum feature subset based on differential evolution process, in accordance with the present subject matter.
  • the system 102 includes processor(s) 104.
  • the processor(s) 104 may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and /or any devices that manipulate signals based on operational instructions.
  • the processor(s) 104 is configured to fetch and execute computer-readable instructions stored in a memory.
  • the system 102 includes interface(s) 106.
  • the interface(s) 106 may include a variety of machine readable instruction-based and hardware-based interfaces that allow the system 102 to communicate with other devices, including servers, data sources, and external repositories. Further, the interface(s) 106 may enable the system 102 to communicate with other communication devices, such as network entities, over a communication network.
  • the system 102 includes a memory 108.
  • the memory 108 may be coupled to the processor(s) 104.
  • the memory 108 can include any computer-readable medium known in the art including, for example, volatile memory, such as static random access memory (SRAM) and dynamic random access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes.
  • volatile memory such as static random access memory (SRAM) and dynamic random access memory (DRAM)
  • non-volatile memory such as read only memory (ROM), erasable read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes.
  • the system 102 includes module(s) 110 and data 112.
  • the module(s) 110 and the data 112 may be coupled to the processor(s) 104.
  • the modules 1 10, amongst other things, include routines, programs, objects, components, data structures, etc., which perform particular tasks or implement particular abstract data types.
  • the modules 1 10 may also be implemented as, signal processor(s), state machine(s), logic circuitries, and /or any other device or component that manipulate signals based on operational instructions.
  • the data 1 12 serves, amongst other things, as a repository for storing data that may be fetched, processed, received, or generated by the module(s) 1 10.
  • the data 112 is shown internal to the system 102, it may be understood that the data 1 12 can reside in an external repository (not shown in the Figure), which may be coupled to the system 102.
  • the system 102 may communicate with the external repository through the interface(s) 106.
  • the module(s) 1 10 can be implemented in hardware, as instructions executed by a processing unit, or by a combination thereof.
  • the processing unit can comprise a computer, a processor, a state machine, a logic array or any other suitable devices capable of processing instructions.
  • the processing unit can be a general-purpose processor which executes instructions to cause the general- purpose processor to perform the required tasks or, the processing unit can be dedicated to perform the required functions.
  • the module(s) 1 10 may be machine-readable instructions (software) which, when executed by a processor/processing unit, perform any of the desired functionalities.
  • the machine-readable instructions may be stored on an electronic memory device, hard disk, optical disk or other machine-readable storage medium or non-transitory medium.
  • the machine-readable instructions can also be downloaded to the storage medium via a network connection.
  • the module(s) 1 10 include a differential evolution feature selection (DEFS) module 114, and other module(s) 116.
  • the other module(s) 1 16 may include programs or coded instructions that supplement applications or functions performed by the system 102.
  • the data 1 12 includes feature data 120, fitness function data 122, optimum feature data 124, and other data 126.
  • the other data 126 amongst other things, may serve as a repository for storing data that is processed, received, or generated as a result of the execution of one or more modules in the module(s) 1 10.
  • the system 102 is coupled to a data source 130 to obtain a plurality of features for the selection of an optimum feature subset.
  • the data source 130 refers to an entity that has the data associated with the plurality of features extracted from data sets for multiple objects representing different classes.
  • the system 102 is coupled to a classifier 132 for classification of objects under the classes using the optimum feature subset.
  • the classifier 132 may be trained for the optimum feature subset over different classes and, subsequently, used for the classification of unknown objects using the optimum feature subset.
  • the DEFS module 114 obtains the plurality of features from the data source 130.
  • the features are extracted from data sets of objects representing multiple classes, taken at multiple instances of time.
  • the data associated with plurality of features is stored in the feature data 120.
  • g is the size of data sets taken for class 1.
  • class 2 (2) where w is the size of data sets taken for class 2.
  • (x ⁇ ,)! denotes the value of the feature d p extracted at w th instance for class 2.
  • ( ⁇ ⁇ ) denotes the value of the feature d p extracted at t th instance for class c.
  • the DEFS module 1 14 may normalize the values of each of the plurality of features to zero mean and unit covariance. With this, the values of the features are substantially scaled for subsequent processing.
  • the DEFS module 1 14 identifies an optimum feature subset based on a differential evolution process, such that the intra-class variation factor is minimum and the inter-class variation factor is maximum.
  • the description below describes the procedure followed for identification of the optimum feature subset based on different evolution process.
  • a population set comprising multiple parameter vectors for the different evolution process is formulated.
  • Each of the parameter vectors comprises a feature subset and a Lagrange's multiplier ⁇ .
  • the feature subset represents and is indicative of features selected from amongst all the features obtained by the DEFS module 1 14.
  • Each feature subset may have a set of features randomly selected from all of the obtained features.
  • the Lagrange's multiplier ⁇ is obtained from a range determined by a ratio of an inter-class variation factor and an intra-class variation factor of each of the features. The procedure of obtaining the Lagrange's multiplier ⁇ is described later in the description.
  • each feature subset is in the form of a binary encoded decimal (BED) pattern indicative of those features which are selected to be a part of the feature subset.
  • the BED pattern is of a size equal to the number of feature obtained by the DEFS module 114.
  • the BED pattern is represented as a binary bit pattern, with the number of bits equal to the number of features obtained, where each bit corresponds to one feature and the value of the bits indicate the selection or the non-selection of the features in the feature subset.
  • the 1 's in the BED pattern represent the features which are selected to be the part of the feature subset and the 0's represent the features which are not selected to be the part of the feature subset.
  • each feature subset is a BED pattern of p bits.
  • the BED pattern for a feature subset may be ' 101 101100100 ⁇ . This indicates that the features ⁇ di, d 3 , d4, d 6 , d 7 , d 10 , d 13 ⁇ are selected to be the part of that feature subset.
  • the population set for the differential evolution includes N number of feature subsets, where N is usually at least three times the number of the obtained features p, i.e., N > 3*p. For example, if the total number of obtained features is 5, the total number of feature subsets N is at least equal to 15. N also denotes the number of parameter vectors in the population set.
  • a range of upper limits of the Lagrange's multiplier ⁇ is determined. For this, the intra-class variation factor and inter-class variation factor for each of the features is computed. The intra-class variation factor of the each feature is divided by the inter- class variation factor of the same feature to obtain the upper limit of the Lagrange's multiplier ⁇ for that feature.
  • the upper limit of the Lagrange's multiplier ⁇ for the j th feature is given by equation (4) below:
  • IntraVar ⁇ - C m+ i)f
  • k governs the data set at the k th instance, i.e., k th datapoint
  • i governs the class
  • c total number of classes
  • n size of data sets in class i
  • j governs the feature for which the intra-class variation factor is to be calculated.
  • InterVar j is the inter-class variation factor of the j th fea and is given by the equation (6) below:
  • k governs the data set at the k instance, i.e., k datapoint
  • i governs the class
  • c total number of classes
  • n size of data sets in class i
  • j governs the feature for which the inter-class variation factor is to be calculated.
  • the upper limits of the Lagrange's multiplier ⁇ ⁇ 5 ⁇ 2 . .. ⁇ ⁇ for all the p features are obtained.
  • the lower limits of the Lagrange's multiplier for the features are considered as significantly small values, let' say epsilon where epsilon is near equal to zero.
  • the Lagrange's multiplier for the parameter vectors are determined.
  • the Lagrange's multiplier for each parameter vector is determined as a random value between the range of lower limits and the range of upper limits as obtained above.
  • the BED pattern for each of the parameter vectors is randomly generated initially.
  • the total number of feature subsets that can be represented using the BED pattern is 2 4 -l, i.e., the range of the BED patterns is from '0001 ' to ⁇ 1 1 .
  • the population set has at least 12 parameter vectors with the BED patterns randomly generated and selected from within the range of possible BED patterns.
  • the BED pattern for the each of the parameter vectors is uniformly randomly generated initially.
  • the 12 BED patterns for the population set are initially generated and selected randomly from within different ranges within ⁇ 00 and ⁇ 11 .
  • the BED patterns selected uniformly may be from different ranges of '0001 ' to '0011 ', '0100' to ⁇ ⁇ ⁇ ', ' 1000' to ' 101 1 ' and ' 1100' to ' 11 1 1'.
  • a fitness function, denoted by J is formulated based on the intra- class variation factor, inter-class variation factor, and the Lagrange's multiplier.
  • the fitness function J is given by equation (9):
  • intra-class variation factor and inter-class variation factor for the each of the feature subsets, in the population set is computed to compute the value of fitness function.
  • the intra-class variation factor for the each of the feature subset is computed using the values of features, as represented by equations (1) to (3), in equation (10) below:
  • IntraVar "
  • the inter-class variation factor for the each of the feature subset is computed using the values of features, as represented by equations (1) to (3), in equation ( 1 1 ) below:
  • k governs the data set at the k instance
  • i governs the class
  • c total number of classes
  • n total size of data sets in class i
  • j governs the feature
  • p is total number of features.
  • j belongs to those features which are selected in the feature subset for which the inter-class variation is to be computed.
  • the intra-class variation factor of the optimum feature subset should be minimum. So the intra-class variation factor for the each of the feature subsets is to'be minimized. Also, the optimum feature subset of objects in each two classes should have minimum amount of similarities, the intra- class variation factor of the optimum feature subset should be maximum. So the inter- class variation factor for the each of the feature subsets is to be maximized. Since the intra-class variation factor has to be minimized and the inter-class variation factor has to be maximized for each of the feature subsets, the fitness function J given by equation (9) has to be minimized for the feature subsets. The feature subset, from amongst the feature subsets, which has minimum value of the fitness function J is considered as the optimum feature subset. The data related to the optimum feature subset is stored in the optimum feature data 124.
  • the DEFS module 114 follows the differential evolution process for the minimization of the fitness function of the feature subsets and, thereby, the identification of the optimum feature subset.
  • the differential evolution process involves four steps: initialization, mutation, recombination (also known as crossover), and selection.
  • a parameter vector from amongst the parameter vectors in the population set, is selected as a target vector.
  • the target vector be denoted by u m , where m may be from 1 to the number of feature subsets N (or the number of parameter vectors).
  • the BED pattern and the Lagrange's multiplier associated with the target vector u m be denoted by BED um and ⁇ ⁇ , respectively.
  • the mutation stage for the selected target . vector u m , three other parameter vectors u p , u q , and u r are randomly selected from amongst the population set such that p ⁇ q ⁇ r ⁇ m.
  • a donor vector v m is generated by adding a weighted difference of any two vectors, from amongst the parameter vectors u p , u q , and u r , to the remaining parameter vector as given by equation (14):
  • v m Up + m f *(u q - u r ) (14)
  • m f is mutation factor taking a value between 0 and 2.
  • the mutation factor controls the rate of evolution of the population set.
  • the mutation factor m f is 0.9.
  • a trial vector t m is generated, where each element of the trial vector t m is selected from the elements of the target vector u m or the donor vector v m , depending on value of a cross-over (CR) ratio.
  • the crossover ratio CR takes a value between 0 and 1. In an implementation, the crossover ratio CR is 0.7.
  • the trial vector t m is generated using equation (15) below:
  • rand(0,l) is a random number generator that generates a random number between 0 and 1.
  • the fitness function value of the trial vector t m is calculated and compared with the fitness function value for the target vector u m . If the fitness function value for the trial vector t m is a lower than that for the target vector u m , then the target vector u m and its corresponding fitness function value are replaced by the trial vector t m and its corresponding fitness value. Based on this revision, the target vector u m , the Lagrange's multiplier ⁇ ( ⁇ and the corresponding fitness function value are stored in the fitness function data 122. [0050] The above procedure of mutation, recombination, and selection is iteratively repeated for all the parameter vectors as the target vectors in the population set.
  • a new population set comprising the new set of target vectors as the parameter vectors and its corresponding fitness values are obtained.
  • the differential evolution process is again performed on the new population set in a manner as described above.
  • the differential evolution process is continued until a stopping criterion is reached.
  • the stopping criterion may be that the values of the fitness function J for the target vectors (or the parameter vectors) stops changing and has the minimum value.
  • the differential evolution process may be performed for a predefined number of times.
  • the optimum feature subset is identified. For this, the values of fitness function for all parameter vectors of population set are compared with each other to identify that parameter vector for which the fitness function value is minimum. The BED pattern associated with that identified parameter vector is considered as the optimum feature subset.
  • the DEFS module 1 14 selects the features in identified optimum feature subset as the optimum set of features. This optimum set of features is substantially sufficient for distinct identification and classification objects in different classes.
  • the data related to the optimum feature subset is stored in the optimum feature data 124.
  • the DEFS module 1 14 provides the optimum feature subset to the classifier 132 for training the classifier 132 for identification and classification of the objects into the classes.
  • the classifier 132 may include a supervised learning algorithm, such as a support vector machine, a naive bayes, a decision tree, linear discriminate analysis, a neural network, and the like.
  • the data source 130 that provides the plurality of features for the selection of the optimum feature subset, and the classifier 132 that receives the optimum feature subset from the system 102 for identification and classification of the objects into classes reside outside the system 102; it may be understood by a person skilled in the art that the system 102 may obtaining the data sets for objects under the classes from devices such as a skeleton recording device, an EEG acquisition device, and the like, extract the plurality of features from the obtained data sets, identify to select the optimum feature subset, and then classify the objects into classes using a classifier.
  • the system 102 may have modules, such as a data acquisition module, a feature extraction module, the DEFS module 114, and a classification module, coupled to the processor(s) 104.
  • Figure 2 illustrates a method for selection of an optimum feature subset, in accordance with an implementation of the present subject matter.
  • the method 200 can be implemented in the optimum feature selection system 102.
  • the order in which the method 200 is described is not intended to be construed as a limitation, and any number of the described method blocks can be combined in any order to implement the method 200, or any alternative methods. Additionally, individual blocks may be deleted from the method 200 without departing from the spirit and scope of the subject matter described herein.
  • the method 200 can be implemented in any suitable hardware.
  • the method 200 may be described in the general context of computer executable instructions.
  • computer executable instructions can include routines, programs, objects, components, data structures, procedures, modules, functions, etc., that perform particular functions or implement particular abstract data types.
  • the method 200 may be implemented in any computing device; in an example described in Figure 2, the method 200 is explained in context of the aforementioned optimum feature selection system 102, for the ease of explanation.
  • a plurality of features extracted from data sets associated with objects representing multiple classes is obtained.
  • the features are obtained by the system 102 from the data source 130.
  • the data source 130 may obtain data sets for the objects representing multiple classes, and may extract the plurality of features from the obtained data sets.
  • the values of the plurality of features are normalized to zero mean and unit covariance.
  • a population set comprising of parameter vectors is formulated for a differential evolution process.
  • Each of the parameter vectors has a feature subset and a Lagrange's multiplier ⁇ .
  • the formulation of the population set is as described earlier in the description.
  • an intra-class variation factor and an inter-class variation factor for multiple feature subsets are computed.
  • the multiple feature subsets are the feature subsets associated with the parameter vectors of the population set.
  • the intra-class variation factor and the inter-class variation factor for the feature subsets associated with the parameter vectors in the population set are computed as described earlier in the description. Using the intra-class variation factor, the inter-class variation factor and the Lagrange's multiplier, the values of the fitness function is obtained for the parameter vectors in the population set.
  • the optimum feature subset is identified, from amongst the multiple feature subsets, based on minimization of the intra-class variation factor and the maximization of the inter-class variation factor using differential evolution.
  • the multiple feature subsets are the feature subsets associated with the parameter vectors of the population set.
  • the minimization of the intra-class variation factor and the maximization of the inter- class variation factor are done through the differential evolution process as described earlier in the description and the optimum feature subset is based on the feature subset having minimum value of the fitness function.
  • the features in the identified optimum feature subset are selected as the optimum features for further processing.
  • the method 200 may include one of obtaining data sets for the objects representing multiple classes, extracting the plurality of features from the obtained data sets, classifying the objects into the classes based on the optimum feature subset, and a combination thereof.
  • Figure 3 illustrates a method 300 for identification and classification of objects into classes using an optimum feature subset, in accordance with an implementation of the present subject matter.
  • the method 300 can be implemented in the optimum feature selection system 102.
  • the order in which the method 300 is described is not intended to be construed as a limitation, and any number of the described method blocks can be combined in any order to implement the method 300, or any alternative methods. Additionally, individual blocks may be deleted from the method 300 without departing from the spirit and scope of the subject matter described herein.
  • the method 300 can be implemented in any suitable hardware.
  • the method 300 may be described in the general context of computer executable instructions.
  • computer executable instructions can include routines, programs, objects, components, data structures, procedures, modules, functions, etc., that perform particular functions or implement particular abstract data types.
  • data sets for the objects representing multiple classes are obtained from a data acquisition device.
  • the data acquisition device may be a skeleton recording device, an EEG acquisition device, and the like, depending on the application for which the method 300 is applied.
  • the data set may include skeleton points, of individuals, obtained using the skeleton recording device.
  • the data sets may include EEG signals, of the individuals, obtained using the EEG acquisition device.
  • a plurality of features is extracted from the data sets obtained at the block 302.
  • the plurality of features may include area-related gait features of the object, dynamic centroid distance-related gait features of the object, angle-related gait features of the object, other static and dynamic gait features of the object and a combination thereof, or the plurality of features may include EEG features.
  • the values of the plurality of features are normalized to zero mean and unit covariance.
  • an optimum feature subset is selected from amongst the plurality of features.
  • the optimum feature subset is identified and selected based on minimization of intra-class variation factor and maximization of inter-class variation factor for multiple feature subsets through differential evolution process, described earlier in the description.
  • a population set comprising of parameter vectors having feature subsets and Lagrange's multiplier ⁇ is formulated for a differential evolution process, as described earlier in the description.
  • a fitness function is formulated as described earlier in the description.
  • an intra-class variation factor and an inter-class variation factor for the feature subsets associated with the parameter vectors in the population set are computed as described earlier in the description.
  • the values of the fitness function is obtained for the parameter vectors in the population set.
  • the differential evolution process is iteratively performed on the population set to minimize the intra-class variation factor and maximizing the inter-class variation factor for each of the feature sub-sets.
  • the differential evolution process is iteratively carried out till a stopping criterion is reached as explained in the description earlier.
  • the optimum feature subset is selected based on the feature subset having minimum value of the fitness function.
  • the features in the identified optimum feature subset are selected as the optimum features for further processing.
  • the objects are classified into classes based on the optimum feature subset.
  • a classifier is used.
  • the classifier may include a supervised learning algorithm, such as a support vector machine, a naive bayes, a decision tree, linear discriminate analysis, a neural network, and the like.

Abstract

The subject matter discloses systems and methods for selection of an optimum feature subset. According to the present subject matter, the system (102) implements the described method, where the method includes obtaining a plurality of features extracted from data sets associated with objects representing multiple classes, computing an intra-class variation factor and an inter-class variation factor for multiple feature subsets, from amongst the plurality of features, and identifying an optimum feature subset, from amongst the multiple feature subsets, based on minimization of the intra-class variation factor and maximization of the inter-class variation factor using differential evolution.

Description

DIFFERENTIAL EVOLUTION-BASED FEATURE SELECTION
TECHNICAL FIELD
[0001] The present subject matter relates, in general, to selection of features and particularly to selection of optimum features using differential evolution.
BACKGROUND
[0002] Objects, such as people, materials, diseases, etc., are generally identified and classified into distinct classes based on their types or characteristics. The identification and classification of the objects into the classes requires knowledge and information of object features which correlate with their types or characteristics. The kno wn features can be used for the purposes of identification and classification of the objects. In some cases, there may be some correlated features or irrelevant features of objects. So, to identify and classify objects into their respective classes, a set of features which can distinguish the objects with respect to the classes is required. Such a set of features enables in distinct classification of different objects in distinct classes.
BRIEF DESCRIPTION OF DRAWINGS
[0003] The detailed description is described with reference to the accompanying figures. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. The same numbers are used throughout the drawings to refer features and components.
[0004] Figure 1 illustrates a system environment implementing an optimum feature selection system, in accordance with an implementation of the present subject matter.
[0005] Figure 2 illustrates a method for selection of an optimum feature subset, in accordance with an implementation of the present subject matter.
l [0006] Figure 3 illustrates a method for identification and classification of objects into classes using an optimum feature subset, in accordance with an implementation of the present subject matter.
[0007] It should be appreciated by those skilled in the art that any block diagrams herein represent conceptual views of illustrative systems embodying the principles of the present subject matter. Similarly, it will be appreciated that any flow charts, flow diagrams, state transition diagrams, pseudo code, and the like represent various processes which may be substantially represented in computer readable medium and so executed by a computing device or processor, whether or not such computing device or processor is explicitly shown.
DETAILED DESCRIPTION
[0008] The subject matter disclosed herein relates to system(s) and method(s) for selection of an optimum feature subset from a plurality of features. For the purposes of the present subject matter, feature selection refers to selection of a feature subset for identification and classification of objects into classes, and the optimum feature subset is a feature subset having a number of independent features, substantially sufficient for identification and classification of objects into classes.
[0009] The identification and classification of objects into the different classes using features that characterize the objects is known. Conventionally, for the identification and classification of objects into classes, data sets of the objects are gathered, and a plurality of features is extracted from the gathered data sets. In an example, to identify and classify people (objects) as distinct individuals (classes) based on their biometric characteristics, biometric data, such as skeleton data, is obtained for the individuals, and various gait features are extracted from the obtained biometric data. The plurality of features extracted is mapped to various possible classes, which is then used to train a supervised learning algorithm, also referred to as classifier, for subsequent identification and classification of unknown objects into the classes. [0010] Generally, the number of features extracted from the data sets of the objects is substantially large. Some conventional classification methodologies utilize all the extracted features for the purpose of identification and classification of the objects. Such conventional methodologies thus require a large number of computational steps to identify and classify the objects, which makes them computationally expensive. Also, some of the extracted features may not be relevant or may be redundant for the classification of objects.
[0011] The extracted features which may not be relevant or may be redundant for the classification of objects, may contribute to misclassification of the objects. For this, a subset of features, from the set of extracted features, is selected. The selection of a subset of features using a classifier is also known. Conventionally, multiple random subsets of features are individually used in a classifier to identify an optimum feature subset, from amongst the subsets of features, which can identify and classify the objects. This optimum feature subset is then used to train the classifier to identify and classify the objects into the classes. Here also, only that classifier, which is trained using the optimum feature subset, can be used for the classification and identification of objects into classes. If another classifier is to be used then that classifier has to be trained using that or another optimum feature subset. Thus, the feature selection technique is classifier dependent.
[0012] The present subject matter describes system(s) and method(s) for selection of optimum feature subset from a plurality of extracted features. The selection of optimum feature subset, in accordance with the present subject matter, is classifier independent. For the selection of an optimum feature subset, a plurality of features extracted from data sets associated with objects representing multiple classes is obtained. The obtained features are analyzed and an optimum feature subset is selected based on differential evolution process.
[0013] The selection of optimum feature subset, in accordance with the present subject matter, is based on computation of an intra-class variation factor and an inter- class variation factor for a plurality of feature subsets. The intra-class variation factor refers to variations of individual or combination of features within a class. The inter- class variation factor refers to variations of individual or combination of features across multiple classes, i.e., variation of feature from one class with respect to another. In an implementation, for the selection of optimum feature subset, the intra- class variation factor is minimized and the inter-class variation factor is maximized using differential evolution process. The differential evolution process refers to an optimization search process which iteratively generates a solution (for example, a feature subset) to a problem (for example, an objective function, a fitness function, etc.,) with regard to a given condition (for example, minimization, maximization, etc.). By minimizing the intra-class variation factor for the optimum feature subset, it can be substantially ensured that the value of a particular feature, in the optimum feature subset, for a class lie in close proximity. Also, by maximizing the inter-class variation factor for the optimum feature subset, it is substantially ensured that the features, in the optimum feature subset, for each class are distinct with respect to the other classes.
[0014] The methodology of present subject matter can be implemented for selection of an optimum feature subset, to identify and classify objects into different classes using the optimum feature subset. With the optimum feature subset, the number of computations and size of storage space involved in the identification and classification stage is substantially less and the classification or recognition accuracy substantially improves. The usage of the optimum feature subset also substantially reduces the runtime complexity of identifying and classifying the objects into the classes.
[0015] In an example, the methodology of the present subject matter may be implemented for people identification, where the objects may be individuals who are to be classified as distinct individuals. In said example, the gait features, extracted from skeleton data sets at different instances for the individuals, are obtained, and an optimum gait feature subset is selected based on differential evolution process of the present subject matter. The optimum gait feature subset is then used in a classifier for the classification of the individuals.
[0016] In another example, the methodology of the present subject matter may be implemented for classification of cognitive loads on individuals, where the objects may be cognitive loads to be classified in different classes. In said example, the electroencephalography (EEG) features, extracted from EEG signals at different instances for the individuals, are obtained, and an optimum EEG feature subset is selected based on differential evolution process of the present subject matter. The optimum EEG feature subset is then used in a classifier for the classification of the cognitive loads on the individuals.
[0017] The selection of the optimum feature subset, in accordance with the present subject matter, does not involve a classifier and, thus, is independent of the classifier. This removes restrictions on the use of a particular classifier for which the optimum feature subset is obtained and which is trained to use the optimum feature subset for the identification and classification of the objects. Further, the optimum feature subset selected based on differential evolution by the minimization of intra- class variation factor and by the maximization of the inter-class variation factor is substantially accurate. With the optimum feature subset selection of the present subject matter, a substantially accurate identification and classification can be achieved.
[0018] The manner in which the system(s) and method(s) shall be implemented has been explained in details with respect to Figure 1 to Figure 3. Although the description herein is with reference to personal computer(s), the method(s) and system(s) may be implemented in other computing device(s) as well, albeit with a few variations, as will be understood by a person skilled in the art. While aspects of described methods can be implemented in any number of different computing devices, transmission environments, and/or configurations, the implementations are described in the context of the following computing device(s).
[0019] Figure 1 illustrates a system environment 100 implementing an optimum feature selection system 102, in accordance with an implementation of the present subject matter. For the purpose of description and simplicity, the optimum feature selection system 102 is hereinafter referred to as a system 102. The system 102 can be implemented as a computing device, such as a laptop computer, a desktop computer, a notebook, a workstation, a mainframe computer, and the like. The system 102 is enabled to select an optimum feature subset based on differential evolution process, in accordance with the present subject matter.
[0020] In an implementation, the system 102 includes processor(s) 104. The processor(s) 104 may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and /or any devices that manipulate signals based on operational instructions. Among other capabilities, the processor(s) 104 is configured to fetch and execute computer-readable instructions stored in a memory.
[0021] The system 102 includes interface(s) 106. The interface(s) 106 may include a variety of machine readable instruction-based and hardware-based interfaces that allow the system 102 to communicate with other devices, including servers, data sources, and external repositories. Further, the interface(s) 106 may enable the system 102 to communicate with other communication devices, such as network entities, over a communication network.
[0022] Further, the system 102 includes a memory 108. The memory 108 may be coupled to the processor(s) 104. The memory 108 can include any computer-readable medium known in the art including, for example, volatile memory, such as static random access memory (SRAM) and dynamic random access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes.
[0023] Further, the system 102 includes module(s) 110 and data 112. The module(s) 110 and the data 112 may be coupled to the processor(s) 104. The modules 1 10, amongst other things, include routines, programs, objects, components, data structures, etc., which perform particular tasks or implement particular abstract data types. The modules 1 10 may also be implemented as, signal processor(s), state machine(s), logic circuitries, and /or any other device or component that manipulate signals based on operational instructions. The data 1 12 serves, amongst other things, as a repository for storing data that may be fetched, processed, received, or generated by the module(s) 1 10. Although the data 112 is shown internal to the system 102, it may be understood that the data 1 12 can reside in an external repository (not shown in the Figure), which may be coupled to the system 102. The system 102 may communicate with the external repository through the interface(s) 106.
[0024] Further, the module(s) 1 10 can be implemented in hardware, as instructions executed by a processing unit, or by a combination thereof. The processing unit can comprise a computer, a processor, a state machine, a logic array or any other suitable devices capable of processing instructions. The processing unit can be a general-purpose processor which executes instructions to cause the general- purpose processor to perform the required tasks or, the processing unit can be dedicated to perform the required functions. In another aspect of the present subject matter, the module(s) 1 10 may be machine-readable instructions (software) which, when executed by a processor/processing unit, perform any of the desired functionalities. The machine-readable instructions may be stored on an electronic memory device, hard disk, optical disk or other machine-readable storage medium or non-transitory medium. In an implementation, the machine-readable instructions can also be downloaded to the storage medium via a network connection. [0025] In an implementation, the module(s) 1 10 include a differential evolution feature selection (DEFS) module 114, and other module(s) 116. The other module(s) 1 16 may include programs or coded instructions that supplement applications or functions performed by the system 102. In said implementation, the data 1 12 includes feature data 120, fitness function data 122, optimum feature data 124, and other data 126. The other data 126 amongst other things, may serve as a repository for storing data that is processed, received, or generated as a result of the execution of one or more modules in the module(s) 1 10.
[0026] As shown in Figure 1, the system 102 is coupled to a data source 130 to obtain a plurality of features for the selection of an optimum feature subset. The data source 130 refers to an entity that has the data associated with the plurality of features extracted from data sets for multiple objects representing different classes. Further, as shown in Figure 1, the system 102 is coupled to a classifier 132 for classification of objects under the classes using the optimum feature subset. The classifier 132 may be trained for the optimum feature subset over different classes and, subsequently, used for the classification of unknown objects using the optimum feature subset.
[0027] For the purpose of selection of an optimum feature subset by the system 102, in an implementation, the DEFS module 114 obtains the plurality of features from the data source 130. As mentioned earlier, the features are extracted from data sets of objects representing multiple classes, taken at multiple instances of time. The data associated with plurality of features is stored in the feature data 120.
[0028] To illustrate the representation of the data associated with the plurality of features, obtained by the DEFS module 114, let us consider a case where p number of features is obtained for objects under 'c' number of classes. Let the ith feature be denoted by dj, and the i class be denoted by 'class i'. Further, as the features d] to dp are obtained for each class, the data associated with the features di to dp for class 1 can be represented under the features as shown by equation (1): class 1 = (i)
Figure imgf000011_0001
where g is the size of data sets taken for class 1. Here,
Figure imgf000011_0002
denotes the value of the feature dp extracted at gth instance for class 1.
[0029] Similarly, the data associated with the features d\ to dp for class 2 can be represented under the features as shown by equation (2):
class 2 (2)
Figure imgf000011_0003
where w is the size of data sets taken for class 2. Here; (x^,)! denotes the value of the feature dp extracted at wth instance for class 2.
[0030] Similarly, the data associated with the features di to dp for class c can be represented under the features as shown by equation (3):
class c = (3)
Figure imgf000011_0004
where t is the size of data sets for class c. Here, (χ ε) denotes the value of the feature dp extracted at tth instance for class c.
[0031] In an implementation, after obtaining the plurality of features, the DEFS module 1 14 may normalize the values of each of the plurality of features to zero mean and unit covariance. With this, the values of the features are substantially scaled for subsequent processing.
[0032] Based on obtained features, the DEFS module 1 14 identifies an optimum feature subset based on a differential evolution process, such that the intra-class variation factor is minimum and the inter-class variation factor is maximum. The description below describes the procedure followed for identification of the optimum feature subset based on different evolution process.
[0033] In an implementation, for the identification of the optimum feature subset using differential evolution process, a population set comprising multiple parameter vectors for the different evolution process is formulated. Each of the parameter vectors comprises a feature subset and a Lagrange's multiplier λ. The feature subset represents and is indicative of features selected from amongst all the features obtained by the DEFS module 1 14. Each feature subset may have a set of features randomly selected from all of the obtained features. The Lagrange's multiplier λ is obtained from a range determined by a ratio of an inter-class variation factor and an intra-class variation factor of each of the features. The procedure of obtaining the Lagrange's multiplier λ is described later in the description.
[0034] In an implementation, each feature subset is in the form of a binary encoded decimal (BED) pattern indicative of those features which are selected to be a part of the feature subset. The BED pattern is of a size equal to the number of feature obtained by the DEFS module 114. In other words, the BED pattern is represented as a binary bit pattern, with the number of bits equal to the number of features obtained, where each bit corresponds to one feature and the value of the bits indicate the selection or the non-selection of the features in the feature subset. The 1 's in the BED pattern represent the features which are selected to be the part of the feature subset and the 0's represent the features which are not selected to be the part of the feature subset. For p number of features, each feature subset is a BED pattern of p bits. In an example, for 13 number of features obtained by the DEFS module 1 14, the BED pattern for a feature subset may be ' 101 101100100 Γ. This indicates that the features {di, d3, d4, d6, d7, d10, d13} are selected to be the part of that feature subset.
[0035] With such representation of feature subsets, for p number of features, a total of 2P number of BED patterns is possible. In an implementation, the population set for the differential evolution includes N number of feature subsets, where N is usually at least three times the number of the obtained features p, i.e., N > 3*p. For example, if the total number of obtained features is 5, the total number of feature subsets N is at least equal to 15. N also denotes the number of parameter vectors in the population set.
[0036] To determine the Lagrange's multiplier λ for each parameter vector, at first a range of upper limits of the Lagrange's multiplier λ is determined. For this, the intra-class variation factor and inter-class variation factor for each of the features is computed. The intra-class variation factor of the each feature is divided by the inter- class variation factor of the same feature to obtain the upper limit of the Lagrange's multiplier λ for that feature. The upper limit of the Lagrange's multiplier λ for the jth feature is given by equation (4) below:
λ, = (4) where Intra Varj is the intra-class variation factor of the jth feature and is given by the equation (5) below:
IntraVar, = Σ^Σ^Σ^ΐΜί - C m+i)f | ■ (5) where k governs the data set at the kth instance, i.e., kth datapoint, i governs the class, c is total number of classes, n is size of data sets in class i, and j governs the feature for which the intra-class variation factor is to be calculated. InterVarj is the inter-class variation factor of the jth fea and is given by the equation (6) below:
Figure imgf000013_0001
interval (6) J where [m]' = n (7)
Figure imgf000014_0001
where k governs the data set at the k instance, i.e., k datapoint, i governs the class, c is total number of classes, n is size of data sets in class i, and j governs the feature for which the inter-class variation factor is to be calculated. Similarly, the upper limits of the Lagrange's multiplier λΐ5 λ2 ... λρ for all the p features are obtained.
[0037] In an implementation, the lower limits of the Lagrange's multiplier for the features are considered as significantly small values, let' say epsilon where epsilon is near equal to zero. After obtaining the range of upper limits and the range of lower limits for the Lagrange's multiplier, the Lagrange's multiplier for the parameter vectors are determined. In an implementation, the Lagrange's multiplier for each parameter vector is determined as a random value between the range of lower limits and the range of upper limits as obtained above.
[0038] In an implementation, the BED pattern for each of the parameter vectors is randomly generated initially. Consider an example with 4 features. The total number of feature subsets that can be represented using the BED pattern is 24-l, i.e., the range of the BED patterns is from '0001 ' to Ί 1 1 . The population set has at least 12 parameter vectors with the BED patterns randomly generated and selected from within the range of possible BED patterns.
[0039] In an implementation, the BED pattern for the each of the parameter vectors is uniformly randomly generated initially. Consider again the example with the 4 features. The 12 BED patterns for the population set are initially generated and selected randomly from within different ranges within Ό00 and Ί 11 . In an example, the BED patterns selected uniformly may be from different ranges of '0001 ' to '0011 ', '0100' to ΌΙ Ι Ι ', ' 1000' to ' 101 1 ' and ' 1100' to ' 11 1 1'. [0040] Further, a fitness function, denoted by J is formulated based on the intra- class variation factor, inter-class variation factor, and the Lagrange's multiplier. The fitness function J is given by equation (9):
J = IntraVar - λ* InterVar (9)
[0041] After formulating the population set with the parameter vectors having distinct feature subsets and the Lagrange's multiplier, intra-class variation factor and inter-class variation factor for the each of the feature subsets, in the population set, is computed to compute the value of fitness function. The intra-class variation factor for the each of the feature subset is computed using the values of features, as represented by equations (1) to (3), in equation (10) below:
IntraVar =
Figure imgf000015_0001
" | (10) where k governs the data set at the k instance, i.e., k datapoint, i governs the class, c is total number of classes, n is size of the datasets in class i. Further, j governs the feature, and p is total number of features, where j belongs to those features which are selected in the feature subset for which the intra-class variation is to be computed.
[0042] The inter-class variation factor for the each of the feature subset is computed using the values of features, as represented by equations (1) to (3), in equation ( 1 1 ) below:
InterVar (1 1)
Figure imgf000015_0002
where [m] = *=* xk)[ (12)
Figure imgf000015_0003
where k is governs the data set at the k instance, i governs the class, c is total number of classes, n is total size of data sets in class i, j governs the feature, and p is total number of features. Here again j belongs to those features which are selected in the feature subset for which the inter-class variation is to be computed.
[0043] Using the Lagrange's multiplier and the above calculated intra-class variation factor and the inter-class variation factor in equation (9), the values of the fitness function are obtained for the parameter vectors in the population set.
[0044] Since each feature selected in optimum feature subset of an object in a class should have maximum amount of similarities, the intra-class variation factor of the optimum feature subset should be minimum. So the intra-class variation factor for the each of the feature subsets is to'be minimized. Also, the optimum feature subset of objects in each two classes should have minimum amount of similarities, the intra- class variation factor of the optimum feature subset should be maximum. So the inter- class variation factor for the each of the feature subsets is to be maximized. Since the intra-class variation factor has to be minimized and the inter-class variation factor has to be maximized for each of the feature subsets, the fitness function J given by equation (9) has to be minimized for the feature subsets. The feature subset, from amongst the feature subsets, which has minimum value of the fitness function J is considered as the optimum feature subset. The data related to the optimum feature subset is stored in the optimum feature data 124.
[0045] . The DEFS module 114 follows the differential evolution process for the minimization of the fitness function of the feature subsets and, thereby, the identification of the optimum feature subset. The differential evolution process involves four steps: initialization, mutation, recombination (also known as crossover), and selection.
[0046] In the initialization stage, a parameter vector, from amongst the parameter vectors in the population set, is selected as a target vector. Let the target vector be denoted by um, where m may be from 1 to the number of feature subsets N (or the number of parameter vectors). Let the BED pattern and the Lagrange's multiplier associated with the target vector um be denoted by BEDum and λυηι, respectively. [0047] In the mutation stage, for the selected target . vector um, three other parameter vectors up, uq, and ur are randomly selected from amongst the population set such that p≠ q≠ r≠ m. Based on the three selected parameter vectors, a donor vector vm is generated by adding a weighted difference of any two vectors, from amongst the parameter vectors up, uq, and ur, to the remaining parameter vector as given by equation (14):
vm = Up + mf*(uq - ur) (14) where mf is mutation factor taking a value between 0 and 2. The mutation factor controls the rate of evolution of the population set. In an implementation, the mutation factor mf is 0.9.
[0048] In the recombination or crossover stage, a trial vector tm is generated, where each element of the trial vector tm is selected from the elements of the target vector um or the donor vector vm, depending on value of a cross-over (CR) ratio. The crossover ratio CR takes a value between 0 and 1. In an implementation, the crossover ratio CR is 0.7. The trial vector tm is generated using equation (15) below:
if rand (0,1) < CR , .
mj iumj otherwise
where tmj, vmj-, and umj are the components (BED pattern and the Lagrange's multiplier λ) of the trial vector, donor vector, and target vector, respectively. Further, rand(0,l) is a random number generator that generates a random number between 0 and 1.
[0049] In the selection stage, the fitness function value of the trial vector tm is calculated and compared with the fitness function value for the target vector um. If the fitness function value for the trial vector tm is a lower than that for the target vector um, then the target vector um and its corresponding fitness function value are replaced by the trial vector tm and its corresponding fitness value. Based on this revision, the target vector um, the Lagrange's multiplier λ(Γη and the corresponding fitness function value are stored in the fitness function data 122. [0050] The above procedure of mutation, recombination, and selection is iteratively repeated for all the parameter vectors as the target vectors in the population set. After all the parameter vectors of the population set are processed for the,mutation, recombination and selection, a new population set comprising the new set of target vectors as the parameter vectors and its corresponding fitness values are obtained. The differential evolution process is again performed on the new population set in a manner as described above.
[0051] The differential evolution process is continued until a stopping criterion is reached. In an implementation, the stopping criterion may be that the values of the fitness function J for the target vectors (or the parameter vectors) stops changing and has the minimum value. In an implementation, the differential evolution process may be performed for a predefined number of times.
[0052] Based on the values of fitness function corresponding to the each of the target vectors in the population set after the differential evolution process, the optimum feature subset is identified. For this, the values of fitness function for all parameter vectors of population set are compared with each other to identify that parameter vector for which the fitness function value is minimum. The BED pattern associated with that identified parameter vector is considered as the optimum feature subset. The DEFS module 1 14 selects the features in identified optimum feature subset as the optimum set of features. This optimum set of features is substantially sufficient for distinct identification and classification objects in different classes. The data related to the optimum feature subset is stored in the optimum feature data 124.
[0053] In an implementation, the DEFS module 1 14 provides the optimum feature subset to the classifier 132 for training the classifier 132 for identification and classification of the objects into the classes. In the implementation, the classifier 132 may include a supervised learning algorithm, such as a support vector machine, a naive bayes, a decision tree, linear discriminate analysis, a neural network, and the like. [0054] Although, as shown in Figure 1, the data source 130 that provides the plurality of features for the selection of the optimum feature subset, and the classifier 132 that receives the optimum feature subset from the system 102 for identification and classification of the objects into classes, reside outside the system 102; it may be understood by a person skilled in the art that the system 102 may obtaining the data sets for objects under the classes from devices such as a skeleton recording device, an EEG acquisition device, and the like, extract the plurality of features from the obtained data sets, identify to select the optimum feature subset, and then classify the objects into classes using a classifier. For this, in the implementation, the system 102 may have modules, such as a data acquisition module, a feature extraction module, the DEFS module 114, and a classification module, coupled to the processor(s) 104.
[0055] Figure 2 illustrates a method for selection of an optimum feature subset, in accordance with an implementation of the present subject matter. The method 200 can be implemented in the optimum feature selection system 102. The order in which the method 200 is described is not intended to be construed as a limitation, and any number of the described method blocks can be combined in any order to implement the method 200, or any alternative methods. Additionally, individual blocks may be deleted from the method 200 without departing from the spirit and scope of the subject matter described herein. Furthermore, the method 200 can be implemented in any suitable hardware.
[0056] The method 200 may be described in the general context of computer executable instructions. Generally, computer executable instructions can include routines, programs, objects, components, data structures, procedures, modules, functions, etc., that perform particular functions or implement particular abstract data types. Further, although the method 200 may be implemented in any computing device; in an example described in Figure 2, the method 200 is explained in context of the aforementioned optimum feature selection system 102, for the ease of explanation. [0057] Referring to Figure 2, at block 202, a plurality of features extracted from data sets associated with objects representing multiple classes is obtained. The features are obtained by the system 102 from the data source 130. In an implementation, the data source 130 may obtain data sets for the objects representing multiple classes, and may extract the plurality of features from the obtained data sets.
[0058] Further, in an implementation, the values of the plurality of features are normalized to zero mean and unit covariance. After this, a population set comprising of parameter vectors is formulated for a differential evolution process. Each of the parameter vectors has a feature subset and a Lagrange's multiplier λ. The formulation of the population set is as described earlier in the description.
[0059] At block 204, an intra-class variation factor and an inter-class variation factor for multiple feature subsets, from amongst the plurality of features are computed. The multiple feature subsets are the feature subsets associated with the parameter vectors of the population set. In an implementation, the intra-class variation factor and the inter-class variation factor for the feature subsets associated with the parameter vectors in the population set are computed as described earlier in the description. Using the intra-class variation factor, the inter-class variation factor and the Lagrange's multiplier, the values of the fitness function is obtained for the parameter vectors in the population set.
[0060] Further, at block 206, the optimum feature subset is identified, from amongst the multiple feature subsets, based on minimization of the intra-class variation factor and the maximization of the inter-class variation factor using differential evolution. The multiple feature subsets are the feature subsets associated with the parameter vectors of the population set. In an implementation, the minimization of the intra-class variation factor and the maximization of the inter- class variation factor are done through the differential evolution process as described earlier in the description and the optimum feature subset is based on the feature subset having minimum value of the fitness function. The features in the identified optimum feature subset are selected as the optimum features for further processing.
[0061] In an implementation, the method 200, besides identifying the optimum feature subset, may include one of obtaining data sets for the objects representing multiple classes, extracting the plurality of features from the obtained data sets, classifying the objects into the classes based on the optimum feature subset, and a combination thereof.
[0062] Figure 3 illustrates a method 300 for identification and classification of objects into classes using an optimum feature subset, in accordance with an implementation of the present subject matter. The method 300 can be implemented in the optimum feature selection system 102. The order in which the method 300 is described is not intended to be construed as a limitation, and any number of the described method blocks can be combined in any order to implement the method 300, or any alternative methods. Additionally, individual blocks may be deleted from the method 300 without departing from the spirit and scope of the subject matter described herein. Furthermore, the method 300 can be implemented in any suitable hardware.
[0063] The method 300 may be described in the general context of computer executable instructions. Generally, computer executable instructions can include routines, programs, objects, components, data structures, procedures, modules, functions, etc., that perform particular functions or implement particular abstract data types.
[0064] Referring to Figure 3, at block 302, data sets for the objects representing multiple classes are obtained from a data acquisition device. The data acquisition device may be a skeleton recording device, an EEG acquisition device, and the like, depending on the application for which the method 300 is applied. Depending on the application, in an implementation, the data set may include skeleton points, of individuals, obtained using the skeleton recording device. In another implementation, the data sets may include EEG signals, of the individuals, obtained using the EEG acquisition device.
[0065] At block 304, a plurality of features is extracted from the data sets obtained at the block 302. In an implementation, depending on the application, the plurality of features may include area-related gait features of the object, dynamic centroid distance-related gait features of the object, angle-related gait features of the object, other static and dynamic gait features of the object and a combination thereof, or the plurality of features may include EEG features.
[0066] In an implementation, the values of the plurality of features are normalized to zero mean and unit covariance.
[0067] At block 306, an optimum feature subset is selected from amongst the plurality of features. The optimum feature subset is identified and selected based on minimization of intra-class variation factor and maximization of inter-class variation factor for multiple feature subsets through differential evolution process, described earlier in the description.
[0068] For this, a population set comprising of parameter vectors having feature subsets and Lagrange's multiplier λ is formulated for a differential evolution process, as described earlier in the description. In addition, a fitness function is formulated as described earlier in the description. After formulating the population set and the fitness function, an intra-class variation factor and an inter-class variation factor for the feature subsets associated with the parameter vectors in the population set are computed as described earlier in the description. Using the intra-class variation factor, the inter-class variation factor and the Lagrange's multiplier, the values of the fitness function is obtained for the parameter vectors in the population set. The differential evolution process is iteratively performed on the population set to minimize the intra-class variation factor and maximizing the inter-class variation factor for each of the feature sub-sets. The differential evolution process is iteratively carried out till a stopping criterion is reached as explained in the description earlier. After this, the optimum feature subset is selected based on the feature subset having minimum value of the fitness function. The features in the identified optimum feature subset are selected as the optimum features for further processing.
[0069] At block 308, the objects are classified into classes based on the optimum feature subset. For this purpose, a classifier is used. In an implementation, the classifier may include a supervised learning algorithm, such as a support vector machine, a naive bayes, a decision tree, linear discriminate analysis, a neural network, and the like.
[0070] Although implementations for system(s) and method(s) for optimum feature subset selection are described, it is to be understood that the present subject matter is not necessarily limited to the specific features or methods described. Rather, the specific features and methods are disclosed as implementations to select an Optimum feature subset from a plurality of features.

Claims

A computer-implemented method for differential evolution-based feature selection of an optimum feature subset from a plurality of features of objects for classification of the objects into multiple classes, the method comprising: obtaining the plurality of features extracted from data sets associated with the objects representing the multiple classes;
computing, by a computing system, an intra-class variation factor and an inter-class variation factor for multiple feature subsets, from amongst the plurality of features; and
identifying, by the computing system, the optimum feature subset, from amongst the multiple feature subsets, based on minimization of the intra-class variation factor and maximization of the inter-class variation factor using differential evolution.
The method as claimed in claim 1 further comprising formulating a fitness function based on the intra-class variation factor, the inter-class variation factor, and a Lagrange's multiplier.
The method as claimed in claim 2 further comprising formulating, by the computing system, a population set comprising parameter vectors for the differential evolution, wherein each of the parameter vectors has:
a binary encoded decimal pattern corresponding to a feature subset, from amongst the multiple feature subsets, and
a Lagrange's multiplier obtained from a range determined by a ratio of an inter-class variation factor and an intra-class variation factor of each of the features.
The method as claimed in claim 3, wherein the binary encoded decimal pattern is initially generated randomly.
The method as claimed in claim 3, wherein the binary encoded decimal pattern is initially generated uniformly randomly.
6. The method as claimed in claim 2, wherein the identifying of the optimum feature subset is based on the feature subset for which the corresponding fitness function has a minimum value.
7. The method as claimed in claim 1 further comprising classifying, by the computing system, the objects based on a classifier using the optimum feature subset, wherein the classifier is a learning algorithm comprising a support vector machine, a naive bayes, a decision tree, linear discriminate analysis and a neural network.
8. The method as claimed in claim 1, wherein, for classification of individuals, the data sets are three-dimensional coordinates of skeleton points of each of the individuals, wherein the three-dimensional coordinates of skeleton points are obtained by a skeleton recording device; the plurality of features are gait features of the each of the individuals; and the each of the individuals is an object classified under a distinct class, from amongst the multiple classes.
9. The method as claimed in claim 1 , wherein, for cognition load determination of individuals, the data sets are EEG signals obtained from an EEG acquisition device for each of the individuals; the plurality of features is Electroencephalography (EEG) features of the each of the individuals; and cognition load of the each of the individuals is classified under one of the multiple classes.
10. A system (102) for differential evolution-based feature selection of an optimum feature subset from a plurality of features of objects for classification of the objects into multiple classes, the system (102) comprising:
a processor (104);
a differential evolution feature selection (DEFS) module (1 14) coupled to the processor (104), to obtain the plurality of features extracted from data sets associated with the objects representing the multiple classes;
compute an intra-class variation factor and an inter-class variation factor for multiple feature subsets, from amongst the plurality of features; and
identify the optimum feature subset, from amongst the multiple feature subsets, based on minimization of the intra-class variation factor and maximization of the inter-class variation factor using differential evolution.
1 1. The system (102) as claimed in claim 10, wherein the DEFS module (1 14) formulates a fitness function based on the intra-class variation factor, the inter-class variation factor, and a Lagrange's multiplier.
12. The system (102) as claimed in claim 1 1, wherein the DEFS module (114) formulates a population set comprising parameter vectors for the differential evolution, wherein each of the parameter vectors has:
a binary encoded decimal pattern corresponding to a feature subset, from amongst the multiple feature subsets, and
a Lagrange's multiplier obtained from a range determined by a ratio of an inter-class variation factor and an intra-class variation factor of each of the features.
13. The system (102) as claimed in claim 12, wherein the binary encoded decimal pattern is initially generated randomly.
14. The system (102) as claimed in claim 1 1, wherein the DEFS module (1 14) minimizes the fitness function for identifying the optimum feature subset. 15. A non-transitory computer readable medium having a set of computer readable instructions that, when executed, cause a computing system to:
obtain a plurality of features extracted from data sets associated with objects representing multiple classes; compute an intra-class variation factor and an inter-class variation factor for multiple feature subsets, from amongst the plurality of features; and
identify an optimum feature subset, from amongst the multiple feature subsets, based on minimization of the intra-class variation factor and maximization of the inter-class variation factor using differential evolution.
PCT/IB2014/000939 2013-06-03 2014-06-03 Differential evolution-based feature selection WO2014195782A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
IN1938/MUM/2013 2013-06-03
IN1938MU2013 IN2013MU01938A (en) 2013-06-03 2014-06-03

Publications (2)

Publication Number Publication Date
WO2014195782A2 true WO2014195782A2 (en) 2014-12-11
WO2014195782A3 WO2014195782A3 (en) 2015-02-05

Family

ID=52008655

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2014/000939 WO2014195782A2 (en) 2013-06-03 2014-06-03 Differential evolution-based feature selection

Country Status (2)

Country Link
IN (1) IN2013MU01938A (en)
WO (1) WO2014195782A2 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105137717A (en) * 2015-08-05 2015-12-09 哈尔滨工业大学 Compact Differential Evolution algorithm-based soft-measurement method for mechanical parameters of mask table micropositioner of lithography machine
CN108573338A (en) * 2018-03-14 2018-09-25 中山大学 A kind of distributed differential evolution algorithm and device based on MPI
CN109636487A (en) * 2019-01-14 2019-04-16 平安科技(深圳)有限公司 Advertisement sending method, server, computer equipment and storage medium
CN109885710A (en) * 2019-01-14 2019-06-14 平安科技(深圳)有限公司 User's portrait depicting method and server based on Differential Evolution Algorithm
US10558933B2 (en) 2016-03-30 2020-02-11 International Business Machines Corporation Merging feature subsets using graphical representation
CN111553530A (en) * 2020-04-27 2020-08-18 华侨大学 Inter-city network car booking and packing travel capacity prediction and travel recommendation method and system

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105184803A (en) * 2015-09-30 2015-12-23 西安电子科技大学 Attitude measurement method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040071363A1 (en) * 1998-03-13 2004-04-15 Kouri Donald J. Methods for performing DAF data filtering and padding
US20080101705A1 (en) * 2006-10-31 2008-05-01 Motorola, Inc. System for pattern recognition with q-metrics
US20100094155A1 (en) * 2005-10-31 2010-04-15 New York University System and Method for Prediction of Cognitive Decline
US20100111396A1 (en) * 2008-11-06 2010-05-06 Los Alamos National Security Object and spatial level quantitative image analysis
US20100158334A1 (en) * 2004-07-01 2010-06-24 Johanne Martel-Pelletier Non-invasive joint evaluation

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040071363A1 (en) * 1998-03-13 2004-04-15 Kouri Donald J. Methods for performing DAF data filtering and padding
US20100158334A1 (en) * 2004-07-01 2010-06-24 Johanne Martel-Pelletier Non-invasive joint evaluation
US20100094155A1 (en) * 2005-10-31 2010-04-15 New York University System and Method for Prediction of Cognitive Decline
US20080101705A1 (en) * 2006-10-31 2008-05-01 Motorola, Inc. System for pattern recognition with q-metrics
US20100111396A1 (en) * 2008-11-06 2010-05-06 Los Alamos National Security Object and spatial level quantitative image analysis

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105137717A (en) * 2015-08-05 2015-12-09 哈尔滨工业大学 Compact Differential Evolution algorithm-based soft-measurement method for mechanical parameters of mask table micropositioner of lithography machine
US10558933B2 (en) 2016-03-30 2020-02-11 International Business Machines Corporation Merging feature subsets using graphical representation
US10565521B2 (en) 2016-03-30 2020-02-18 International Business Machines Corporation Merging feature subsets using graphical representation
US11574011B2 (en) 2016-03-30 2023-02-07 International Business Machines Corporation Merging feature subsets using graphical representation
CN108573338A (en) * 2018-03-14 2018-09-25 中山大学 A kind of distributed differential evolution algorithm and device based on MPI
CN109636487A (en) * 2019-01-14 2019-04-16 平安科技(深圳)有限公司 Advertisement sending method, server, computer equipment and storage medium
CN109885710A (en) * 2019-01-14 2019-06-14 平安科技(深圳)有限公司 User's portrait depicting method and server based on Differential Evolution Algorithm
CN109885710B (en) * 2019-01-14 2022-03-18 平安科技(深圳)有限公司 User image depicting method based on differential evolution algorithm and server
CN109636487B (en) * 2019-01-14 2023-09-29 平安科技(深圳)有限公司 Advertisement pushing method, server, computer device and storage medium
CN111553530A (en) * 2020-04-27 2020-08-18 华侨大学 Inter-city network car booking and packing travel capacity prediction and travel recommendation method and system
CN111553530B (en) * 2020-04-27 2022-08-02 华侨大学 Inter-city network car booking and packing travel capacity prediction and travel recommendation method and system

Also Published As

Publication number Publication date
IN2013MU01938A (en) 2015-05-29
WO2014195782A3 (en) 2015-02-05

Similar Documents

Publication Publication Date Title
Zhang et al. Deep fuzzy k-means with adaptive loss and entropy regularization
WO2014195782A2 (en) Differential evolution-based feature selection
Celebi et al. A comparative study of efficient initialization methods for the k-means clustering algorithm
Zeng et al. Deep convolutional neural networks for multi-instance multi-task learning
WO2017003666A1 (en) Method and apparatus for large scale machine learning
Guo et al. A centroid-based gene selection method for microarray data classification
Lekamalage et al. Extreme learning machine for clustering
Zhang et al. Cgmos: Certainty guided minority oversampling
Demidova et al. Improving the Classification Quality of the SVM Classifier for the Imbalanced Datasets on the Base of Ideas the SMOTE Algorithm
Cord et al. Feature selection in robust clustering based on Laplace mixture
Bahrololoum et al. A data clustering approach based on universal gravity rule
You et al. Totalpls: local dimension reduction for multicategory microarray data
Banijamali et al. Fast spectral clustering using autoencoders and landmarks
Hassan et al. Oversampling method based on Gaussian distribution and K-Means clustering
Tanha A multiclass boosting algorithm to labeled and unlabeled data
Krishnan et al. A modified Kohonen map algorithm for clustering time series data
Sungheetha et al. Extreme learning machine and fuzzy K-nearest neighbour based hybrid gene selection technique for cancer classification
Saez et al. KSUFS: A novel unsupervised feature selection method based on statistical tests for standard and big data problems
Bhardwaj et al. Dynamic feature scaling for k-nearest neighbor algorithm
Salman et al. Gene expression analysis via spatial clustering and evaluation indexing
Ahishakiye et al. Comparative performance of machine leaning algorithms in prediction of cervical cancer
Paulk et al. A supervised learning approach for fast object recognition from RGB-D data
Barchiesi et al. Learning incoherent subspaces: classification via incoherent dictionary learning
Payne et al. Fly wing biometrics
Chaudhari et al. Performance evaluation of SVM based semi-supervised classification algorithm

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14807253

Country of ref document: EP

Kind code of ref document: A2

122 Ep: pct application non-entry in european phase

Ref document number: 14807253

Country of ref document: EP

Kind code of ref document: A2