WO2003072701A1 - A system for analyzing dna-chips using gene ontology and a method thereof - Google Patents

A system for analyzing dna-chips using gene ontology and a method thereof Download PDF

Info

Publication number
WO2003072701A1
WO2003072701A1 PCT/KR2003/000400 KR0300400W WO03072701A1 WO 2003072701 A1 WO2003072701 A1 WO 2003072701A1 KR 0300400 W KR0300400 W KR 0300400W WO 03072701 A1 WO03072701 A1 WO 03072701A1
Authority
WO
WIPO (PCT)
Prior art keywords
optimal branch
gene
distance
pseudo
cluster
Prior art date
Application number
PCT/KR2003/000400
Other languages
French (fr)
Inventor
Yang-Suk Kim
Jung-Uk Hur
Sung-Geun Lee
Original Assignee
Istech Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Istech Co., Ltd. filed Critical Istech Co., Ltd.
Priority to AU2003212669A priority Critical patent/AU2003212669A1/en
Publication of WO2003072701A1 publication Critical patent/WO2003072701A1/en

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • G16B25/10Gene or protein expression profiling; Expression-ratio estimation or normalisation
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B5/00ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6813Hybridisation assays
    • C12Q1/6834Enzymatic or biochemical coupling of nucleic acids to a solid phase
    • C12Q1/6837Enzymatic or biochemical coupling of nucleic acids to a solid phase using probe arrays or probe chips

Definitions

  • the present invention relates to a system for DNA microarray analysis using
  • Gene OntologyTM and a method thereof and more specifically to a system and a method for biologically analyzing a gene expression pattern of a DNA chip or microarray assays by modeling of a hierarchical structure of gene ontology (hereinafter referred to as "GO") .
  • GO hierarchical structure of gene ontology
  • Biochips are broadly divided into microarray chips and microfluidics chips.
  • a microarray chip contains thousands of or tens of thousands of DNA or protein samples arranged at regular intervals, and thus can process analyte to identify its binding pattern.
  • Microarray chips generally refer to DNA chips and protein chips. DNA chips have been the most dominant biochips up to date. Microfluidics chips pass over a small amount of analyte in controlled flow and analyze the reaction pattern of the analyte with the molecule on a chip or with a sensor.
  • DNA chips are made by spotting a target DNA, cDNA or oligonucleotide on a glass slide, nitrocellulose membrane or silicon.
  • DNA chips consist of a small-sized solid on which cDNA or oligonucleotide probes with known base sequences are micro-arrayed at predetermined positions.
  • DNA chips if hybridized with a probe labeled with a radioactive isotope or fluorescent dye, can be used in identification of gene mutations and levels of gene expression, single nucleotide polymorphism (SNP), diagnosis of diseases, high-throughput screening (HTS) and so on.
  • SNP single nucleotide polymorphism
  • HTS high-throughput screening
  • a sample DNA fragment to be analyzed is combined to a DNA chip, the probe affixed to the DNA chip and the base sequence of the sample DNA fragment are hybridized depending on the level of complementarity. It is possible to analyze the base sequence of the sample DNA by detecting and understanding the hybridization by an optical or radioactive chemical method. If DNA chips are utilized, expression information of genes can be easily and rapidly obtained. DNA chips are now used for development of new drugs and medical diagnosis.
  • the Swiss-Prot provides protein information, classifies the functions of proteins by keywords.
  • keywords used for the classification, which makes it difficult to perform automated biological analyses of DNA chips.
  • the group information of the particular fields such as the CGAP (Cancer Genome Anatomy Project)
  • CGAP Cancer Genome Anatomy Project
  • Ontology means a classification system for biological terms and vocabularies.
  • the goal of the Gene Ontology Consortium is to construct a controlled and unified system of biological terms.
  • the Consortium provides about 10,000 dynamic controlled terms - the number of terms is vaiying when necessary - that can be applied to describe the roles of genes and gene products in all organisms.
  • Gene OntologyTM(GO) shows the relationships between genes and the keywords assigned for each gene and it is applicable to bioinformatics.
  • the GO terms amounting to about 10,000, have a tree-like hierarchical structure called DAG(Directed Acyclic Graph) and are divided into three categories.
  • the GO terms can be used to find a biological meaning when analyzing a DNA chip.
  • GO terms are classified into three categories reflecting the biological roles of genes: i) molecular function, ii) biological process and iii) cellular component.
  • Hierarchically controlled vocabularies are established for each category.
  • the three categories are not exclusive but all descriptive of a gene.
  • the present invention has been made in view of the above-mentioned problems, and it is an object of the present invention to provide a system and a method for DNA chip analysis utilizing Gene OntologyTM to enable a systematic biological analysis of a gene expression pattern of a DNA chip test by modeling of a GO hierarchy.
  • Another object of the present invention is to provide a method for extracting representative functions which are most common and ideal among the genes contained in a cluster formed by statistical clustering of DNA chip test results, utilizing GO terms and hierarchical tree structure.
  • FIG. 1 is a view showing the construction of a system for DNA chip analysis using Gene OntologyTM according to the present invention.
  • FIG. 2 shows one example of the GO tree structure according to the present invention.
  • FIG. 3 shows one example of a modification of the GO tree structure in text format according to the present invention.
  • FIG. 4 shows one example of a conversion of extracted GO codes according to the present invention.
  • FIG. 5 is a view briefly showing the principle of finding an optimal branch using GO according to the present invention.
  • FIG. 6 is a view showing the principle of measuring a pseudo-distance according to the present invention.
  • FIG. 7 is an operation flow of the analysis of a DNA chip using GO according to the present invention.
  • a system for DNA chip analysis using Gene OntologyTM comprising: a) means for receiving statistical clustering results of DNA chip data and for assigning appropriate GO identifiers to each gene pertaining to given cluster; b) means for converting each GO identifier assigned to said gene into a GO code using a GO code file; c) means for selecting a proper process among three predetermined process that adopt pseudo-distance to designate necessary parameters and for extracting an optimal branch; and d) means for extracting biological meanings from each extracted optimal branch; and e) optionally, visualizing means for displaying the optimal branch, GO code and biological meanings of a given cluster of gene.
  • the visualizing means displays summarized information on the GO code, optimal branch and biological meanings of a given cluster of genes in a form of table, or in a form of a graphical tree structure.
  • the optimal branch gives a proper weight to each level of the GO tree structure.
  • the pseudo-distance "Pd(vl,v2)," wherein vl and v2 represent nodes, is a weight of the level corresponding to the GO code of the optimal branch formed by nodes vl and v2.
  • the Pd value is zero.
  • max_pd(G) max ⁇ pd(Vj,Vj) ⁇ (1 ⁇ i ⁇ j ⁇ n)
  • aver_pd(G) (sum of all pd(v ⁇ ,V j ) in G)/ n
  • C 2 2 x (sum of all pd(vj,V j ) in G)/n(n-l)
  • max_pd max_pd
  • aver_pd aver_pd
  • the average pseudo-distance (aver_pd) shows how well genes are clustered with the same functional categories, and how frequently similar codes are observed.
  • the predetermined process of the means for extracting an optimal branch comprises i) basic process, ii) N-level selective process and iii) percentage selective process.
  • the means designates a proper process among them and necessary parameters to extract an optimal branch.
  • the basic process utilizes the maximum pseudo-distance (max_pd) and average pseudo-distance (aver_pd) of all nodes in the GO tree structure. The results obtained by the basic process roughly show the biological meanings of a given cluster.
  • the N-level selective process predesignates and computes each level of the optimal branch, observes formation of the optimal branch at a particular level N and analogizes the biological meaning at a lower level.
  • the percentage selective process predesignates the percentage of genes pertaining to the optimal branch and shows all combinations of genes in percentages desired by a user.
  • the N-level selective process shows both the first candidate of GO code combination and the next candidate of combinations to reflect the diversity that a single gene can be involved in two or more functions.
  • the predetermined process comprises i) basic process, ii) N-level selective process and iii) percentage selective process. Proper one of these processes and necessary parameters are designated to extract an optimal branch.
  • FIG. 1 is a view showing the construction of a system for DNA chip analysis using GO.
  • the system for DNA chip analysis comprises: an input section(ll ⁇ ) for inputting statistical clustering results of said DNA chip data; GO identifier assigning section(130) for assigning GO identifiers to each gene pertaining to each cluster for inputting clustering results using a GO identifier index file(120); GO identifier/GO code converting section(140) for converting each GO identifier assigned to the corresponding genes into a GO code utilizing GO code file; an optimal branch extracting section(220) for selecting a predetermined process according to pseudo-distance algorithm(210) to designate a necessary parameter for said GO code and extracting an optimal branch; and biological meanings extracting section(230) for extracting biological meanings from each optimal branch.
  • the system may further comprise a visualization module(310) for displaying the extracted optimal branch, the GO code and the biological meanings over a cluster.
  • the present invention assigns GO terms to each gene, extracts an optimal branch by mathematically utilizing the GO hierarchical tree structure, and efficiently displays the results of the optimal branch extraction.
  • FIG. 2 shows an example of the GO tree structure according to the present invention.
  • the highest level refers to GO level.
  • the second level refers to three categories, i.e., molecular function, biological process and cellular component.
  • the lower levels (3 rd , 4 th and 5 th ) form a tree-like inheritance structure.
  • FIG. 3 shows an example of a modification of the GO tree in a text format according to the present invention.
  • GO is not a tree structure in original form but a mathematical graph called DAG, directed graph without cycle.
  • the GO structure can be simply changed to a GO tree structure in the present invention.
  • FIG. 4 shows an example of conversion from GO terms to GO codes according to the present invention. This drawing illustrates the outputs of GO codes converted by the GO code converting section(140).
  • An optimal branch refers to the lowest nodes among the nodes including the greatest number of genes at the bottom in a tree structure.
  • the optimal branch is a broad term representing all the functions of genes included in the nodes at the bottom.
  • the system of the present invention assigns genes pertaining to a given cluster in the GO tree structure, finds the optimal branch through the pseudo-distance algorithm and displays the results.
  • GO terms are assigned to corresponding genes by text mining of various biological databases. For the allocation of GO terms, such information, at the DNA level or the protein level as provided by UniGene, LocusLink, Swiss-Prot and MGI, is utilized together with direct comparison of identifiers and sequence similarity searches. Also, gene identifier conversion files provided by each database that participated in the GO Consortium are utilized to assign GO terms.
  • UniGene of the NCBI National Center of Biotechnology Information
  • LocusLink which is the result of the reference sequence project of the NCBI, provides information about the functions of genes and representative sequences.
  • Swiss-Prot of the Swiss Institute of Bioinformatics provides gene information at the protein level.
  • MGI Mae Genome Informatics
  • each GO code is a sequence of numbers, here 15 numbers. Note that the sequence length 15 is variable according to the versions of GO syntax files. Each number in a GO code sequence represents the positional information at each step. Since a unique GO code is assigned to each node in GO tree, GO terms are distinguishable even if the same terms are used in different nodes in the GO tree structure.
  • FIG. 5 is a view for briefly explaining the principle of finding an optimal branch using GO tree according to the present invention.
  • the optimal branch can be found using GO codes.
  • a weight and a pseudo-distance between GO codes is defined.
  • FIG. 6 is a view for explaining the principle of measuring a pseudo-distance between nodes according to the present invention.
  • the pseudo-distance is defined as follows.
  • Pd(vl,v2) is a weight of the level at which there exists the optimal branch formed by nodes vl and v2. When vl and v2 are the same, the Pd value is zero. Ultimately, a combination of GO codes is selected through the following pseudo-distance concept.
  • pd(vl,v2) weight of the level where the optimal branch between vl and v2 is located(when vl ⁇ v2)
  • maximum pseudo-distance (max_pd) is used to roughly evaluate clusters. If the optimal branch of a cluster is located at a higher level, the cluster is likely to include bad genes which do not share the common characteristics with the other genes in that cluster.
  • the average pseudo-distance (aver_pd) shows how well genes are clustered in a given cluster with similar functional categories, and how frequently similar GO codes are observed.
  • the pseudo-distance is applicable to three processes: a basic process, an
  • N-level selective process and a percentage selective process N-level selective process and a percentage selective process.
  • the basic process has two modules using the maximum pseudo-distance (max_pd) and average pseudo-distance (aver _pd) of all nodes in GO tree structure.
  • the results of basic process show the overall biological meanings of a given cluster.
  • a user can designate particular limits.
  • the N-level selective process predesignates the level of an optimal branch so that the formation of the optimal branch at a particular level N can be easily computed.
  • the N-level selective process enables the user to easily analogize the biological meanings at a lower level, which is not possible in the basic process.
  • the N-level selective process shows both the first candidate of GO code combination and the next candidates of combinations to reflect the diversity that a single gene can be involved in two or more functions.
  • the percentage selective process predesignates the percentage of genes pertaining to the optimal branch and finds all combinations of genes in percentages desired by a user. Like the N-level selective process, the percentage selective process can fully show the functional diversity of genes.
  • FIG. 7 is the flow diagram of DNA chip analysis using GO according to the present invention.
  • the method of analysis comprises the steps of: receiving statistical clustering results of DNA chip data (S10) and assigning GO identifiers to each gene pertaining to given cluster (S20); converting each GO identifier assigned to the corresponding genes into a GO code using GO code file (S30); selecting a process among basic process (S41), N-level selective process (S42) and percentage selective process (S43) according to pseudo-distance algorithm (S40) to designate a necessary parameter for said GO code and extracting an optimal branch (S50); extracting a biological meaning of each extracted optimal branch (S60); and displaying an optimal branch of a cluster and its GO code (S70).
  • the system for biological analysis of the gene expression pattern of DNA chip using GO structure according to the present invention is comprised of three broad sections 100, 200 and 300. The operation of each section will be described in detail by reference to FIG. 7.
  • GO identifiers and their GO codes are assigned to each gene in a given cluster that is obtained from a statistical clustering method. More specifically, when clustering results are inputted (S10), GO identifiers are assigned to each gene within a cluster (S20) based on the index file that has previously assigned GO identifiers to genes through data mining of various databases. Subsequently, each GO identifier assigned to genes in a given cluster is converted into a GO code (S30) using the GO code file which all nodes in GO tree structures are all coded.
  • a proper process among basic process (S41), N-level selective process (S42) and percentage selective process (S43) is chosen using pseudo algorithm, and necessary parameters are designated.
  • An optimal branch is then computed (S50) based on the pseudo distance in each process. Also, biological meanings of the optimal branch are extracted.
  • the optimal branch extracted for genes in each cluster and the GO code assigned to the genes are displayed. Summarized information on the GO code for each gene, the optimal branch and the biological meanings can be displayed in the form of a table or a graphical tree.
  • the pseudo algorithm is also applicable to a different biochip, protein chip.
  • the pseudo-distance algorithm can be utilized to analyze a protein chip in the same way as utilized to analyze a DNA chip in FIGs. 1 and 7.
  • the present invention enables a systematic and automated biological analysis of gene expression patterns of DNA chip assays by a mathematical modeling of GO hierarchy. Also, the present invention can extract the biological functions that are commonest and most optimal among genes within a cluster formed by a statistical clustering method of DNA chip data, utilizing GO terms and tree structure.

Abstract

The present invention relates to a system and a method for biological analysis of gene expression patterns of DNA chips or DNA microarrays by a mathematical modeling of Gene OntologyTM hierarchical structure. Disclosed is a system for analyzing DNA chip data using Gene OntologyTM, comprising: means for receiving statistical clustering results of DNA chip data and for assigning appropriate GO identifiers to each gene pertaining to given cluster; means for converting each GO identifier assigned to said gene into a GO code using a GO code file; means for selecting a proper process among three predetermined process that adopt pseudo-distance to designate necessary parameters and for extracting an optimal branch; d) means for extracting biological meanings from each extracted optimal branch; and optionally, visualizing means for displaying the optimal branch, GO code and biological meanings of a given cluster of gene. The present invention enables a systematic and automated biological analysis of gene expression patterns of DNA chip by modeling GO hierarchy.

Description

A SYSTEM FOR ANALYZING DNA-CHIPS USING GENE ONTOLOGY AND A METHOD THEREOF
Technical Field
The present invention relates to a system for DNA microarray analysis using
Gene Ontology™ and a method thereof, and more specifically to a system and a method for biologically analyzing a gene expression pattern of a DNA chip or microarray assays by modeling of a hierarchical structure of gene ontology (hereinafter referred to as "GO") .
Background Art
Since the discovery of the double helix structure of DNA by Watson and Crick in 1954, discovery of restriction enzymes and development of hybridization techniques and polymerase chain reaction (PCR) have greatly contributed to the understanding of life phenomena at the molecular level. Also, to meet the rising need for comprehensive and non-fragmentary understanding of the life phenomena showing a complicated regulation mechanism, through the human genome project (HGP) or the like, studies have been made to identify the functions of base sequences, resulting in the development of DNA chips. In order to efficiently utilize the results of the HGP and DNA chips, studies on bioinformatics and functional genomics are highly active.
Biochips are broadly divided into microarray chips and microfluidics chips. A microarray chip contains thousands of or tens of thousands of DNA or protein samples arranged at regular intervals, and thus can process analyte to identify its binding pattern. Microarray chips generally refer to DNA chips and protein chips. DNA chips have been the most dominant biochips up to date. Microfluidics chips pass over a small amount of analyte in controlled flow and analyze the reaction pattern of the analyte with the molecule on a chip or with a sensor.
DNA chips are made by spotting a target DNA, cDNA or oligonucleotide on a glass slide, nitrocellulose membrane or silicon. In other words, DNA chips consist of a small-sized solid on which cDNA or oligonucleotide probes with known base sequences are micro-arrayed at predetermined positions.
DNA chips, if hybridized with a probe labeled with a radioactive isotope or fluorescent dye, can be used in identification of gene mutations and levels of gene expression, single nucleotide polymorphism (SNP), diagnosis of diseases, high-throughput screening (HTS) and so on. When a sample DNA fragment to be analyzed is combined to a DNA chip, the probe affixed to the DNA chip and the base sequence of the sample DNA fragment are hybridized depending on the level of complementarity. It is possible to analyze the base sequence of the sample DNA by detecting and understanding the hybridization by an optical or radioactive chemical method. If DNA chips are utilized, expression information of genes can be easily and rapidly obtained. DNA chips are now used for development of new drugs and medical diagnosis.
Both statistical methods and biological methods are used to analyze DNA microarray data. Based on the gene expression levels detected by an image analysis, genes showing the common expression pattern are clustered by a statistical method. It is possible to give a general biological meaning to the cluster and validate the biological reliability of the cluster based on the known function of each gene.
Previously, biological validation has been made by obtaining information about the functions of genes from biomedical literature or existing biological databases and then comparing the functions of genes with the DNA microarray data. Available databases are the NCBI (National Center for Biotechnology Information) providing basic DNA information, the MIPS (Munich Information Center for Protein Sequences) and CGAP (Cancer Genome Anatomy Project) providing functional category information, and the Swiss-Prot Protein Knowledgebase providing annotated protein sequence information. However, most biological identification works have been done manually by researchers or scientists. Due to the diversity of biological terms, it has been difficult to perform systematic and automated biological analyses.
As a well-known biological database, for example, the Swiss-Prot provides protein information, classifies the functions of proteins by keywords. However, there is no correlation or hierarchy between the keywords used for the classification, which makes it difficult to perform automated biological analyses of DNA chips. Further, the group information of the particular fields, such as the CGAP (Cancer Genome Anatomy Project), provides highly focused information for particular fields only. And since the group information covers a overly broad function, it is difficult to cover a detailed function.
As an effort to overcome such difficulties, GO terms offered by the Gene Ontology Consortium can be used. Ontology here means a classification system for biological terms and vocabularies. The goal of the Gene Ontology Consortium is to construct a controlled and unified system of biological terms. The Consortium provides about 10,000 dynamic controlled terms - the number of terms is vaiying when necessary - that can be applied to describe the roles of genes and gene products in all organisms. Gene Ontology™(GO) shows the relationships between genes and the keywords assigned for each gene and it is applicable to bioinformatics.
The GO terms, amounting to about 10,000, have a tree-like hierarchical structure called DAG(Directed Acyclic Graph) and are divided into three categories. The GO terms can be used to find a biological meaning when analyzing a DNA chip. Generally, GO terms are classified into three categories reflecting the biological roles of genes: i) molecular function, ii) biological process and iii) cellular component. Hierarchically controlled vocabularies are established for each category. The three categories are not exclusive but all descriptive of a gene.
Disclosure of the Invention
Therefore, the present invention has been made in view of the above-mentioned problems, and it is an object of the present invention to provide a system and a method for DNA chip analysis utilizing Gene Ontology™ to enable a systematic biological analysis of a gene expression pattern of a DNA chip test by modeling of a GO hierarchy.
Another object of the present invention is to provide a method for extracting representative functions which are most common and ideal among the genes contained in a cluster formed by statistical clustering of DNA chip test results, utilizing GO terms and hierarchical tree structure. Brief Description of Drawing
The foregoing and other objects, features and advantages of the present invention will become more apparent from the following detailed description when taken in conjunction with the accompanying drawings in which:
FIG. 1 is a view showing the construction of a system for DNA chip analysis using Gene Ontology™ according to the present invention.
FIG. 2 shows one example of the GO tree structure according to the present invention.
FIG. 3 shows one example of a modification of the GO tree structure in text format according to the present invention.
FIG. 4 shows one example of a conversion of extracted GO codes according to the present invention. FIG. 5 is a view briefly showing the principle of finding an optimal branch using GO according to the present invention.
FIG. 6 is a view showing the principle of measuring a pseudo-distance according to the present invention.
FIG. 7 is an operation flow of the analysis of a DNA chip using GO according to the present invention.
Best Mode for Carrying Out the Invention
In order to accomplish the objects of the present invention, there is provided a system for DNA chip analysis using Gene Ontology™, comprising: a) means for receiving statistical clustering results of DNA chip data and for assigning appropriate GO identifiers to each gene pertaining to given cluster; b) means for converting each GO identifier assigned to said gene into a GO code using a GO code file; c) means for selecting a proper process among three predetermined process that adopt pseudo-distance to designate necessary parameters and for extracting an optimal branch; and d) means for extracting biological meanings from each extracted optimal branch; and e) optionally, visualizing means for displaying the optimal branch, GO code and biological meanings of a given cluster of gene. The visualizing means displays summarized information on the GO code, optimal branch and biological meanings of a given cluster of genes in a form of table, or in a form of a graphical tree structure.
The optimal branch gives a proper weight to each level of the GO tree structure. The pseudo-distance "Pd(vl,v2)," wherein vl and v2 represent nodes, is a weight of the level corresponding to the GO code of the optimal branch formed by nodes vl and v2. When vl and v2 are the same, the Pd value is zero.
In a cluster or a group "G" of GO codes, each optimal branch is computed using the maximum pseudo-distance (max_pd) and the average pseudo-distance (aver_pd). Given a multiset G={vl, v2, v3, v4, , vn} of GO codes, max_pd and aver_pd are defined as follows:
max_pd(G) = max{pd(Vj,Vj)} (1 < i < j < n) aver_pd(G) = (sum of all pd(vι,Vj) in G)/nC2 = 2 x (sum of all pd(vj,Vj) in G)/n(n-l)
Among possible combinations of codes, the lowest values of max_pd and aver_pd are finally considered as optimal. The maximum pseudo-distance (max_pd) is used to roughly evaluate clusters. If the optimal branch of a cluster is located at a higher level, it will be likely that the cluster may include bad genes which do not share common biological characteristics with the other genes in that cluster.
The average pseudo-distance (aver_pd) shows how well genes are clustered with the same functional categories, and how frequently similar codes are observed.
The predetermined process of the means for extracting an optimal branch comprises i) basic process, ii) N-level selective process and iii) percentage selective process. The means designates a proper process among them and necessary parameters to extract an optimal branch. The basic process utilizes the maximum pseudo-distance (max_pd) and average pseudo-distance (aver_pd) of all nodes in the GO tree structure. The results obtained by the basic process roughly show the biological meanings of a given cluster. The N-level selective process predesignates and computes each level of the optimal branch, observes formation of the optimal branch at a particular level N and analogizes the biological meaning at a lower level. The percentage selective process predesignates the percentage of genes pertaining to the optimal branch and shows all combinations of genes in percentages desired by a user. The N-level selective process shows both the first candidate of GO code combination and the next candidate of combinations to reflect the diversity that a single gene can be involved in two or more functions. In order to accomplish the objects of the present invention, there is also provided a method for DNA chip analysis using Gene Ontology™, comprising: a) receiving statistical clustering results of DNA chip data and for assigning appropriate GO identifiers to each gene pertaining to given cluster; b) converting each GO identifier assigned to said gene into a GO code using a GO code file; c) selecting a proper process among three predetermined process that adopt pseudo-distance to designate necessary parameters and for extracting an optimal branch; and d) extracting biological meanings from each extracted optimal branch; and e) optionally, displaying an optimal branch, GO code and biological meanings of a given cluster of gene.
At the step of extracting an optimal branch, the predetermined process comprises i) basic process, ii) N-level selective process and iii) percentage selective process. Proper one of these processes and necessary parameters are designated to extract an optimal branch. Hereinafter, a preferred embodiment of the present invention will be described with reference to the accompanying drawings.
FIG. 1 is a view showing the construction of a system for DNA chip analysis using GO. The system for DNA chip analysis comprises: an input section(llθ) for inputting statistical clustering results of said DNA chip data; GO identifier assigning section(130) for assigning GO identifiers to each gene pertaining to each cluster for inputting clustering results using a GO identifier index file(120); GO identifier/GO code converting section(140) for converting each GO identifier assigned to the corresponding genes into a GO code utilizing GO code file; an optimal branch extracting section(220) for selecting a predetermined process according to pseudo-distance algorithm(210) to designate a necessary parameter for said GO code and extracting an optimal branch; and biological meanings extracting section(230) for extracting biological meanings from each optimal branch. The system may further comprise a visualization module(310) for displaying the extracted optimal branch, the GO code and the biological meanings over a cluster.
The present invention aims to extract the GO terms that are the commonest or most functionally-related among genes pertaining to a cluster formed by statistical clustering of DNA chip data, utilizing GO terms and GO tree structure.
To this end, the present invention assigns GO terms to each gene, extracts an optimal branch by mathematically utilizing the GO hierarchical tree structure, and efficiently displays the results of the optimal branch extraction.
FIG. 2 shows an example of the GO tree structure according to the present invention. The highest level refers to GO level. The second level refers to three categories, i.e., molecular function, biological process and cellular component. The lower levels (3rd, 4th and 5th) form a tree-like inheritance structure. FIG. 3 shows an example of a modification of the GO tree in a text format according to the present invention. Actually, GO is not a tree structure in original form but a mathematical graph called DAG, directed graph without cycle. The GO structure can be simply changed to a GO tree structure in the present invention. Further, FIG. 4 shows an example of conversion from GO terms to GO codes according to the present invention. This drawing illustrates the outputs of GO codes converted by the GO code converting section(140).
An optimal branch refers to the lowest nodes among the nodes including the greatest number of genes at the bottom in a tree structure. The optimal branch is a broad term representing all the functions of genes included in the nodes at the bottom. The system of the present invention assigns genes pertaining to a given cluster in the GO tree structure, finds the optimal branch through the pseudo-distance algorithm and displays the results. GO terms are assigned to corresponding genes by text mining of various biological databases. For the allocation of GO terms, such information, at the DNA level or the protein level as provided by UniGene, LocusLink, Swiss-Prot and MGI, is utilized together with direct comparison of identifiers and sequence similarity searches. Also, gene identifier conversion files provided by each database that participated in the GO Consortium are utilized to assign GO terms.
UniGene of the NCBI (National Center of Biotechnology Information) provides gene information at the DNA level. LocusLink, which is the result of the reference sequence project of the NCBI, provides information about the functions of genes and representative sequences. Swiss-Prot of the Swiss Institute of Bioinformatics provides gene information at the protein level. MGI (Mouse Genome Informatics) provides integrated access to data on the genomics of laboratory mice.
In the present invention, in order to compute an optimal branch in GO tree and to find representative GO terms over a given cluster based on the optimal branch, all nodes in the GO tree are GO-coded. As shown in FIG. 4, each GO code is a sequence of numbers, here 15 numbers. Note that the sequence length 15 is variable according to the versions of GO syntax files. Each number in a GO code sequence represents the positional information at each step. Since a unique GO code is assigned to each node in GO tree, GO terms are distinguishable even if the same terms are used in different nodes in the GO tree structure.
FIG. 5 is a view for briefly explaining the principle of finding an optimal branch using GO tree according to the present invention.
Computationally the optimal branch can be found using GO codes. At each level of GO tree, a weight and a pseudo-distance between GO codes is defined.
FIG. 6 is a view for explaining the principle of measuring a pseudo-distance between nodes according to the present invention. The pseudo-distance is defined as follows.
Pd(vl,v2) is a weight of the level at which there exists the optimal branch formed by nodes vl and v2. When vl and v2 are the same, the Pd value is zero. Ultimately, a combination of GO codes is selected through the following pseudo-distance concept.
pd(vl,v2) = weight of the level where the optimal branch between vl and v2 is located(when vl≠v2) pd(vl,v2) = 0 (when vl=v2)
In a cluster or a group "G" of GO codes, each optimal branch is computed using the maximum pseudo-distance (max_pd) and the average pseudo-distance (aver_pd). Given a multiset G={vl, v2, v3, v4, , vn} of GO codes, max_pd and aver_pd are defined as follows:
max_pd(G) = max{pd(vj,Vj)} with 1 < i < j < n αver_pd(G) = (sum of all pd(vj,Vj) in G)/nC2 = 2 x (sum of all pd(v„Vj) in G)/n(n-l)
The maximum pseudo-distance (max_pd) is used to roughly evaluate clusters. If the optimal branch of a cluster is located at a higher level, the cluster is likely to include bad genes which do not share the common characteristics with the other genes in that cluster.
The average pseudo-distance (aver_pd) shows how well genes are clustered in a given cluster with similar functional categories, and how frequently similar GO codes are observed. The pseudo-distance is applicable to three processes: a basic process, an
N-level selective process and a percentage selective process.
The basic process has two modules using the maximum pseudo-distance (max_pd) and average pseudo-distance (aver _pd) of all nodes in GO tree structure. The results of basic process show the overall biological meanings of a given cluster. In the N-level selective and percentage selective processes, a user can designate particular limits. The N-level selective process predesignates the level of an optimal branch so that the formation of the optimal branch at a particular level N can be easily computed. Also, the N-level selective process enables the user to easily analogize the biological meanings at a lower level, which is not possible in the basic process. The N-level selective process shows both the first candidate of GO code combination and the next candidates of combinations to reflect the diversity that a single gene can be involved in two or more functions. The percentage selective process predesignates the percentage of genes pertaining to the optimal branch and finds all combinations of genes in percentages desired by a user. Like the N-level selective process, the percentage selective process can fully show the functional diversity of genes.
FIG. 7 is the flow diagram of DNA chip analysis using GO according to the present invention. The method of analysis comprises the steps of: receiving statistical clustering results of DNA chip data (S10) and assigning GO identifiers to each gene pertaining to given cluster (S20); converting each GO identifier assigned to the corresponding genes into a GO code using GO code file (S30); selecting a process among basic process (S41), N-level selective process (S42) and percentage selective process (S43) according to pseudo-distance algorithm (S40) to designate a necessary parameter for said GO code and extracting an optimal branch (S50); extracting a biological meaning of each extracted optimal branch (S60); and displaying an optimal branch of a cluster and its GO code (S70).
Referring to FIG. 1 and FIG. 7, the system for biological analysis of the gene expression pattern of DNA chip using GO structure according to the present invention is comprised of three broad sections 100, 200 and 300. The operation of each section will be described in detail by reference to FIG. 7.
First, GO identifiers and their GO codes are assigned to each gene in a given cluster that is obtained from a statistical clustering method. More specifically, when clustering results are inputted (S10), GO identifiers are assigned to each gene within a cluster (S20) based on the index file that has previously assigned GO identifiers to genes through data mining of various databases. Subsequently, each GO identifier assigned to genes in a given cluster is converted into a GO code (S30) using the GO code file which all nodes in GO tree structures are all coded.
A proper process among basic process (S41), N-level selective process (S42) and percentage selective process (S43) is chosen using pseudo algorithm, and necessary parameters are designated. An optimal branch is then computed (S50) based on the pseudo distance in each process. Also, biological meanings of the optimal branch are extracted.
The optimal branch extracted for genes in each cluster and the GO code assigned to the genes are displayed. Summarized information on the GO code for each gene, the optimal branch and the biological meanings can be displayed in the form of a table or a graphical tree.
The pseudo algorithm is also applicable to a different biochip, protein chip. The pseudo-distance algorithm can be utilized to analyze a protein chip in the same way as utilized to analyze a DNA chip in FIGs. 1 and 7.
While this invention has been described in connection with what is presently considered to be the most practical and preferred embodiment, it is to be understood that the invention is not limited to the disclosed embodiment and the drawings, however, on the contrary, it is intended to cover various modifications and variations within the spirit and scope of the appended claims.
Industrial applicability
As can be seen from the foregoing, the present invention enables a systematic and automated biological analysis of gene expression patterns of DNA chip assays by a mathematical modeling of GO hierarchy. Also, the present invention can extract the biological functions that are commonest and most optimal among genes within a cluster formed by a statistical clustering method of DNA chip data, utilizing GO terms and tree structure.

Claims

What is claimed is:
1. A system for DNA chip analysis using Gene Ontology™, comprising: a) means for receiving statistical clustering results of DNA chip data and for assigning appropriate GO identifiers to each gene pertaining to given cluster; b) means for converting each GO identifier assigned to said gene into a GO code using a GO code file; c) means for selecting a proper process among three predetermined process that adopt pseudo-distance to designate necessary parameters and for extracting an optimal branch; and d) means for extracting biological meanings from each extracted optimal branch.
2. The system according to claim 1, further comprising (e) visualizing means for displaying the optimal branch, GO code and biological meanings of a given cluster of gene.
3. The system according to claim 2, wherein said visualizing means displays the summarized information on the GO code, optimal branch and biological meanings of a given cluster of genes in a form of table, or in a form of a graphical tree structure.
4. The system according to claim 1, wherein said optimal branch gives a proper weight to each level of the GO tree structure.
5. The system according to claim 1, wherein said pseudo-distance
Figure imgf000019_0001
wherein vl and v2 represent nodes, is a weight of the level corresponding to a code of the optimal branch formed by nodes vl and v2, the Pd value being zero when vl and v2 are the same, and wherein the optimal branch is obtained based on the maximum pseudo-distance (max_pd) and the average pseudo-distance (aver_pd), which are defined as follows in a group "G" of GO codes or in a given cluster (G={vl, v2, v3, v4, , vn}): max_pd(G) = max {pd(Vj,Vj)} ( 1 < i < j < n) αver_pd(G) = (sum of all pd(vi, ) in G)/nC2
= 2x(sum of all pd(vj,Vj) in G)/n(n-l), the lowest values of max_pd and aver_pd being finally obtained among possible combinations of GO codes.
6. The system according to claim 5, wherein said maximum pseudo-distance
(max_pd) is used to roughly evaluate clusters and shows that, if the optimal branch of a cluster is located at a higher level, the cluster is likely to include bad genes which do not share the common biological characteristic with the other genes in that cluster.
7. The system according to claim 5, wherein said average pseudo-distance (aver_pd) shows how well GO codes are clustered in a given cluster with similar functional categories and how frequently similar GO codes are observed.
8. The system according to claim 1, wherein the predefined process of corresponding means for extracting an optimal branch comprises i) basic process, ii) N-level selecting process and iii) percentage selecting process, proper one of the three processes and necessary parameters being designated to extract an optimal branch.
9. The system according to claim 8, wherein said basic process utilizes the maximum pseudo-distance (max_pd) and average pseudo-distance (aver_pd) of all nodes in the GO tree structure, the results obtained by said basic process showing the overall biological meanings of a given cluster.
10. The system according to claim 8, wherein said N-level selecting process computes an optimal branch of a cluster at pre-designated level N, observes formation of the optimal branch at a particular level N and analogizes the biological meanings at a lower level.
11. The system according to claim 10, wherein said N-level selecting process shows both the first candidate of GO code combination and the next candidates of combinations to reflect the diversity that a single gene can be involved in two or more functions.
12. The system according to claim 8, wherein said percentage selecting process predesignates the percentage of genes pertaining to the optimal branch and shows all combinations of genes in percentages desired by a user.
13. A method for DNA chip analysis using Gene Ontology™, comprising the steps of: a) receiving statistical clustering results of DNA chip data and for assigning appropriate GO identifiers to each gene pertaining to given cluster; b) converting each GO identifier assigned to said gene into a GO code using a GO code file; c) selecting a proper process among three predetermined process that adopt pseudo-distance to designate necessary parameters and for extracting an optimal branch; and d) extracting biological meanings from each extracted optimal branch.
14. The method according to claim 13, further comprising (e) step of displaying the optimal branch, GO code and biological meanings of a given cluster of gene.
15. The method according to claim 13, wherein said pseudo-distance "Pd(v\,v2)," wherein vl and v2 represent nodes, is a weight of the level corresponding to a code of the optimal branch formed by nodes vl and v2, the Pd value being zero when vl and v2 are the same, and wherein the optimal branch is obtained based on the maximum pseudo-distance (max_pd) and the average pseudo-distance (aver_pd), which are defined as follows in a group "G" of GO codes or in a given cluster (G={vl, v2, v3, v4, , vn}): max_pd(G) - max {pd(vj,Vj)} ( 1 < i < j < n) αver_pd(G) = (sum of all pd(vi,v,) in G)/nC2
= 2x(sum of all pd(v„Vj) in G)/n(n-l), the lowest values of max_pd and aver_pd being finally obtained among possible combinations of GO codes.
16. The method according to claim 15, wherein said maximum pseudo-distance (max_pd) is used to roughly evaluate clusters and shows that, if the optimal branch of a cluster is located at a higher level, the cluster is likely to include bad genes which do not share the common biological characteristic with the other genes in that cluster.
17. The method according to claim 15, wherein said average pseudo-distance (aver_pd) shows how well GO codes are clustered in a given cluster with similar functional categories and how frequently similar GO codes are observed.
18. The method according to claim 13, wherein the predefined process of corresponding step for extracting an optimal branch comprises i) basic process, ii)
N-level selecting process and iii) percentage selecting process, proper one of the three processes and necessary parameters being designated to extract an optimal branch.
PCT/KR2003/000400 2002-02-28 2003-02-28 A system for analyzing dna-chips using gene ontology and a method thereof WO2003072701A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2003212669A AU2003212669A1 (en) 2002-02-28 2003-02-28 A system for analyzing dna-chips using gene ontology and a method thereof

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR10-2002-0010826 2002-02-28
KR10-2002-0010826A KR100431620B1 (en) 2002-02-28 2002-02-28 A system for analyzing dna-chips using gene ontology, and a method thereof

Publications (1)

Publication Number Publication Date
WO2003072701A1 true WO2003072701A1 (en) 2003-09-04

Family

ID=27764625

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2003/000400 WO2003072701A1 (en) 2002-02-28 2003-02-28 A system for analyzing dna-chips using gene ontology and a method thereof

Country Status (3)

Country Link
KR (1) KR100431620B1 (en)
AU (1) AU2003212669A1 (en)
WO (1) WO2003072701A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102567314A (en) * 2010-12-07 2012-07-11 中国电信股份有限公司 Device and method for inquiring knowledge
US8396872B2 (en) 2010-05-14 2013-03-12 National Research Council Of Canada Order-preserving clustering data analysis system and method
CN103366098A (en) * 2013-07-24 2013-10-23 国家电网公司 Experimental ability quantitative evaluation method based on experiment resource tree

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20050096044A (en) * 2004-03-29 2005-10-05 주식회사 이즈텍 A method for analyzing function of gene
US7848890B2 (en) 2004-12-08 2010-12-07 Electronics And Telecommunications Research Institute Method and system for predicting gene pathway using gene expression pattern data and protein interaction data
KR100849497B1 (en) * 2006-09-29 2008-07-31 한국전자통신연구원 Method of Protein Name Normalization Using Ontology Mapping
KR100836865B1 (en) * 2006-09-29 2008-06-11 고려대학교 산학협력단 Method for integrated management of microarray experiment informaion and Recording medium thereof
KR100897523B1 (en) * 2006-12-05 2009-05-15 한국전자통신연구원 Apparatus and method for giving an organism pathway name using Gene Homologue information
KR101067352B1 (en) * 2009-11-19 2011-09-23 한국생명공학연구원 System and method comprising algorithm for mode-of-action of microarray experimental data, experiment/treatment condition-specific network generation and experiment/treatment condition relation interpretation using biological network analysis, and recording media having program therefor
KR101151785B1 (en) * 2010-01-18 2012-05-31 한국기초과학지원연구원 The method for the discovery of orthologue gene using gene ontology
CN116150864B (en) * 2023-04-25 2023-07-04 中国建筑第五工程局有限公司 Method for automatically generating building structure analysis model from BIM model

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5887120A (en) * 1995-05-31 1999-03-23 Oracle Corporation Method and apparatus for determining theme for discourse
WO1999039174A2 (en) * 1998-01-29 1999-08-05 Yissum Research Development Company Of The Hebrew University Of Jerusalem An automatic method of classifying molecules
WO1999067727A1 (en) * 1998-06-25 1999-12-29 Microsoft Corporation Method and system for visualization of clusters and classifications

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5887120A (en) * 1995-05-31 1999-03-23 Oracle Corporation Method and apparatus for determining theme for discourse
WO1999039174A2 (en) * 1998-01-29 1999-08-05 Yissum Research Development Company Of The Hebrew University Of Jerusalem An automatic method of classifying molecules
WO1999067727A1 (en) * 1998-06-25 1999-12-29 Microsoft Corporation Method and system for visualization of clusters and classifications

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
BAKER P.G. ET AL.: "An ontology for bioinformatics applications", BIOINFORMATICS, vol. 15, no. 6, 1999, pages 510 - 520, XP002230457, DOI: doi:10.1093/bioinformatics/15.6.510 *
BERTONE P., GERSTEIN M.: "Integrative data mining: the new direction in bioinformatics", ENGINEERING IN MEDICINE AND BIOLOGY MAGAZINE, IEEE, vol. 20, no. 4, July 2001 (2001-07-01) - August 2001 (2001-08-01), pages 33 - 40 *
PATON N.W. ET AL.: "A query processing in the TAMBIS bioinformatics source integration system", PROC. 11TH INT. CONF. ON SCIENTIFIC AND STATISTICAL DATABASES (SSDBM), IEEE PRESS, 1999, pages 138 - 147, XP010348735, DOI: doi:10.1109/SSDM.1999.787629 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8396872B2 (en) 2010-05-14 2013-03-12 National Research Council Of Canada Order-preserving clustering data analysis system and method
CN102567314A (en) * 2010-12-07 2012-07-11 中国电信股份有限公司 Device and method for inquiring knowledge
CN102567314B (en) * 2010-12-07 2015-03-04 中国电信股份有限公司 Device and method for inquiring knowledge
CN103366098A (en) * 2013-07-24 2013-10-23 国家电网公司 Experimental ability quantitative evaluation method based on experiment resource tree

Also Published As

Publication number Publication date
KR100431620B1 (en) 2004-05-17
KR20030071225A (en) 2003-09-03
AU2003212669A1 (en) 2003-09-09

Similar Documents

Publication Publication Date Title
Dubitzky et al. Introduction to microarray data analysis
US9147037B2 (en) Automated analysis of multiplexed probe-target interaction patterns: pattern matching and allele identification
US20040049354A1 (en) Method, system and computer software providing a genomic web portal for functional analysis of alternative splice variants
US20020183936A1 (en) Method, system, and computer software for providing a genomic web portal
JP5464503B2 (en) Medical analysis system
US20060142949A1 (en) System, method, and computer program product for dynamic display, and analysis of biological sequence data
WO2001016860A2 (en) Artificial intelligence system for genetic analysis
US20150310165A1 (en) Efficient comparison of polynucleotide sequences
WO2003072701A1 (en) A system for analyzing dna-chips using gene ontology and a method thereof
US20070143031A1 (en) Method of analyzing a bio chip
US20040030504A1 (en) System, method, and computer program product for the representation of biological sequence data
WO2006001896A2 (en) A universal gene chip for high throughput chemogenomic analysis
Chen et al. How will bioinformatics impact signal processing research?
Ho et al. DNA microarrays in prostate cancer
Zubi et al. Sequence mining in DNA chips data for diagnosing cancer patients
US6994965B2 (en) Method for displaying results of hybridization experiment
KR20050096044A (en) A method for analyzing function of gene
US20040073527A1 (en) Method, system and computer software for predicting protein interactions
Agapito et al. A software pipeline for multiple microarray data analysis
Brush Making sense of microchip array data
Zubi et al. Using sequence DNA chips data to Mining and Diagnosing Cancer Patients
Monforte et al. Strategy for gene expression-based biomarker discovery
Yang et al. Multiagent framework for bio-data mining
Stubbs et al. Microarray bioinformatics
WO2002091110A2 (en) Method, system and computer software for providing a genomic web portal

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SC SD SE SG SK SL TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PT SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase

Ref country code: JP

WWW Wipo information: withdrawn in national office

Country of ref document: JP