US20030014191A1 - System, method and computer program product for identifying chemical compounds having desired properties - Google Patents
System, method and computer program product for identifying chemical compounds having desired properties Download PDFInfo
- Publication number
- US20030014191A1 US20030014191A1 US10/170,628 US17062802A US2003014191A1 US 20030014191 A1 US20030014191 A1 US 20030014191A1 US 17062802 A US17062802 A US 17062802A US 2003014191 A1 US2003014191 A1 US 2003014191A1
- Authority
- US
- United States
- Prior art keywords
- compounds
- property
- chemical
- compound
- models
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T11/00—2D [Two Dimensional] image generation
- G06T11/20—Drawing from basic elements, e.g. lines or circles
- G06T11/206—Drawing of charts or graphs
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B01—PHYSICAL OR CHEMICAL PROCESSES OR APPARATUS IN GENERAL
- B01J—CHEMICAL OR PHYSICAL PROCESSES, e.g. CATALYSIS OR COLLOID CHEMISTRY; THEIR RELEVANT APPARATUS
- B01J19/00—Chemical, physical or physico-chemical processes in general; Their relevant apparatus
- B01J19/0046—Sequential or parallel reactions, e.g. for the synthesis of polypeptides or polynucleotides; Apparatus and devices for combinatorial chemistry or for making molecular arrays
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
- G06F18/2137—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on criteria of topology preservation, e.g. multidimensional scaling or self-organising maps
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/40—Software arrangements specially adapted for pattern recognition, e.g. user interfaces or toolboxes therefor
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/60—In silico combinatorial chemistry
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/80—Data visualisation
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B01—PHYSICAL OR CHEMICAL PROCESSES OR APPARATUS IN GENERAL
- B01J—CHEMICAL OR PHYSICAL PROCESSES, e.g. CATALYSIS OR COLLOID CHEMISTRY; THEIR RELEVANT APPARATUS
- B01J2219/00—Chemical, physical or physico-chemical processes in general; Their relevant apparatus
- B01J2219/00274—Sequential or parallel reactions; Apparatus and devices for combinatorial chemistry or for making arrays; Chemical library technology
- B01J2219/0068—Means for controlling the apparatus of the process
- B01J2219/00686—Automatic
- B01J2219/00689—Automatic using computers
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B01—PHYSICAL OR CHEMICAL PROCESSES OR APPARATUS IN GENERAL
- B01J—CHEMICAL OR PHYSICAL PROCESSES, e.g. CATALYSIS OR COLLOID CHEMISTRY; THEIR RELEVANT APPARATUS
- B01J2219/00—Chemical, physical or physico-chemical processes in general; Their relevant apparatus
- B01J2219/00274—Sequential or parallel reactions; Apparatus and devices for combinatorial chemistry or for making arrays; Chemical library technology
- B01J2219/0068—Means for controlling the apparatus of the process
- B01J2219/00686—Automatic
- B01J2219/00691—Automatic using robots
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B01—PHYSICAL OR CHEMICAL PROCESSES OR APPARATUS IN GENERAL
- B01J—CHEMICAL OR PHYSICAL PROCESSES, e.g. CATALYSIS OR COLLOID CHEMISTRY; THEIR RELEVANT APPARATUS
- B01J2219/00—Chemical, physical or physico-chemical processes in general; Their relevant apparatus
- B01J2219/00274—Sequential or parallel reactions; Apparatus and devices for combinatorial chemistry or for making arrays; Chemical library technology
- B01J2219/0068—Means for controlling the apparatus of the process
- B01J2219/00695—Synthesis control routines, e.g. using computer programs
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B01—PHYSICAL OR CHEMICAL PROCESSES OR APPARATUS IN GENERAL
- B01J—CHEMICAL OR PHYSICAL PROCESSES, e.g. CATALYSIS OR COLLOID CHEMISTRY; THEIR RELEVANT APPARATUS
- B01J2219/00—Chemical, physical or physico-chemical processes in general; Their relevant apparatus
- B01J2219/00274—Sequential or parallel reactions; Apparatus and devices for combinatorial chemistry or for making arrays; Chemical library technology
- B01J2219/0068—Means for controlling the apparatus of the process
- B01J2219/007—Simulation or vitual synthesis
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B01—PHYSICAL OR CHEMICAL PROCESSES OR APPARATUS IN GENERAL
- B01J—CHEMICAL OR PHYSICAL PROCESSES, e.g. CATALYSIS OR COLLOID CHEMISTRY; THEIR RELEVANT APPARATUS
- B01J2219/00—Chemical, physical or physico-chemical processes in general; Their relevant apparatus
- B01J2219/00274—Sequential or parallel reactions; Apparatus and devices for combinatorial chemistry or for making arrays; Chemical library technology
- B01J2219/0068—Means for controlling the apparatus of the process
- B01J2219/00702—Processes involving means for analysing and characterising the products
-
- C—CHEMISTRY; METALLURGY
- C40—COMBINATORIAL TECHNOLOGY
- C40B—COMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
- C40B40/00—Libraries per se, e.g. arrays, mixtures
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/70—Machine learning, data mining or chemometrics
Definitions
- the present invention relates generally to the generation of chemical entities with defined physical, chemical and/or bioactive properties, and more particularly, to iterative selection and testing of chemical entities.
- new chemical entities with useful properties are generated by identifying a chemical compound (called a “lead compound”) with some desirable property or activity, creating variants of the lead compound, and evaluating the property and activity of those variant compounds.
- chemical entities with useful properties include paints, finishes, plasticizers, surfactants, scents, flavorings, and bioactive compounds, but can also include chemical compounds with any other useful property that depends upon chemical structure, composition, or physical state.
- Chemical entities with desirable biological activities include drugs, herbicides, pesticides, veterinary products, etc.
- One deficiency pertains to the first step of the conventional approach, i.e., the identification of lead compounds.
- the search for lead compounds has been limited to an analysis of compound banks, for example, available commercial, custom, or natural products chemical libraries. Consequently, a fundamental limitation of the conventional approach is the dependence upon the availability, size, and structural diversity of these chemical libraries.
- chemical libraries cumulatively total an estimated 9 million identified compounds, they reflect only a small sampling of all possible organic compounds with molecular weights less than 1200. Moreover, only a small subset of these libraries is usually accessible for biological testing.
- the conventional approach is limited by the relatively small pool of previously identified chemical compounds which may be screened to identify new lead compounds.
- lead compound variants are generated by chemists using conventional chemical synthesis procedures. Such chemical synthesis procedures are manually performed by chemists.
- the generation of lead compound variants is very labor intensive and time consuming. For example, it typically takes many chemist years to produce even a small subset of the compound variants for a single lead compound.
- a combinatorial chemical library is a collection of diverse chemical compounds generated by either chemical synthesis or biological synthesis by combining a number of chemical “building blocks” such as reagents.
- a linear combinatorial chemical library such as a polypeptide library is formed by combining a set of chemical building blocks called amino acids in every possible way for a given compound length (i.e., the number of amino acids in a polypeptide compound). Millions of chemical compounds theoretically can be synthesized through such combinatorial mixing of chemical building blocks.
- combinatorial chemical libraries to generate lead compounds exacerbates this problem.
- Greater and greater diversity has often been achieved in combinatorial chemical libraries by using larger and larger compounds (that is, compounds having a greater number of variable subunits, such as pentameric compounds instead of tetrameric compounds in the case of polypeptides).
- larger and larger compounds that is, compounds having a greater number of variable subunits, such as pentameric compounds instead of tetrameric compounds in the case of polypeptides.
- bioactive agents such as drugs and agricultural products possess diversity that could never be achieved with available peptide and oligonucleotide libraries since the available peptide and oligonucleotide components only possess limited functional group diversity and limited topology imposed through the inherent nature of the available components.
- the difficulties associated with synthesizing variants of lead compounds are exacerbated by using typical peptide and oligonucleotide combinatorial chemical libraries to produce such lead compounds.
- the issues described above are not limited to bioactive agents but rather to any lead generating paradigm for which a chemical agent of defined and specific activity is desired.
- the present invention is an automatic, partially automatic, and/or manual iterative system, method and/or computer program product for generating chemical entities having desired or specified physical, chemical, functional, and/or bioactive properties.
- the present invention is also directed to the chemical entities produced by this system, method and/or computer program product. In an embodiment, the following steps are performed during each iteration:
- the system and computer program product includes an Experiment Planner, a Selector, a Synthesis Module and an Analysis Module.
- the system also includes one or more databases, such as a Structure-Property database, a Compound Database, a Reagent database and a Compound Library.
- the Experiment Planner receives, among other things, Historical Structure-Property data from the Structure-Property database and current Structure-Property data that was generated by the Analysis Module during a prior iteration of the invention.
- the Experiment Planner generates Selection Criteria for use by the Selector.
- One or more of the Selection Criteria can be combined into one or more Objective Functions.
- An Objective Function describes the collective ability of a given subset of compounds from the Compound Library to simultaneously satisfy all the prescribed Selection Criteria.
- An Objective Function defines the influence of each Selection Criterion in the final selection.
- the Selection Criteria and the exact form of the Objective Function can be specified by a human operator or can be automatically generated by a computer program or other process, or can be specified via human/computer interaction.
- the one or more Selection Criteria and/or Objective Functions can represent: one or more desired characteristics that the resulting compounds should possess, individually or collectively; one or more undesired characteristics that the resulting compounds should not possess, individually or collectively; and/or one or more constraints that exclude certain compounds and/or combinations of compounds in order to limit the scope of the selection.
- the Selection Criteria can be in the form of mathematical functions or computer algorithms, and can be calculated using a digital computer.
- the Selector receives the Selection Criteria and Objective Functions and searches the Compound Library to identify a subset of compounds that maximizes or minimizes the Objective Functions.
- the Compound Library can be a collection of pre-existing or virtual chemical compounds.
- the Selector identifies a smaller subset of these compounds, referred to herein as a Directed Diversity Library, based on one or more Selection Criteria and/or Objective Functions.
- the number of compounds in this subset can be specified by the operator or can be determined automatically or partially automatically within any limits specified by the operator.
- the Selection Criteria can be applied either simultaneously or sequentially. For example, in one embodiment, one part of the Directed Diversity Library can be selected based on a first set of Criteria and/or Objective Function, while another part of that Directed Diversity Library can be selected based on a second set of Selection Criteria and/or Objective Function.
- the compounds comprising the Directed Diversity Library are then collected, acquired or synthesized, and are analyzed to evaluate their physical, chemical and/or bioactive properties of interest.
- the compound is retrieved from the Chemical Inventory. This avoids unnecessary time and expense of synthesizing a compound that is already available.
- Compounds that are not available from a Chemical Inventory are synthesized in the Synthesis Module.
- the Synthesis Module is an automated robotic module that receives synthesis instructions from a Synthesis Protocol Generator. Alternatively, synthesis can be performed manually or semi-automatically.
- the Analysis Module receives the compounds of the Directed Diversity Library from the Chemical Inventory and/or the Synthesis Module.
- the Analysis Module analyzes the compounds and outputs Structure-Property data.
- the Structure-Property data is provided to the Experiment Planner and is also stored in the Structure-Property database.
- the Experiment Planner defines one or more new Selection Criteria and/or Objective Functions for the next iteration of the invention.
- the new Selection Criteria and/or Objective Functions can be defined through operator input, through an automated process, through a partially automated process, or any combination thereof.
- Structure-Property Model Generator can include structure-property activity data from all previous iterations or from a subset of all previous iterations, as specified by user input, for example.
- the Structure-Property Model Generator generates Structure-Property Models that conform to the observed data.
- the Structure-Property Models are provided to the Experiment Planner which uses the Models to generate subsequent Selection Criteria and/or Objective Function.
- the Selection Criteria and/or Objective Functions are provided to the Selector which selects the next Directed Diversity Library therefrom.
- the functions of the Experiment Planner, the Selector and the optional Synthesis Protocol Generator are performed by automated machines under the control of one or more computer programs executed on one or more processors and/or human operators.
- one or more of the functions of the Experiment Planner, the Selector and the optional Synthesis Protocol Generator can be performed manually.
- FIG. 1 is a flow diagram depicting the flow of data and materials among elements of a lead generation system, in accordance with the present invention
- FIG. 2 is a flow diagram depicting the flow of data and materials among elements of an embodiment of the lead generation system, in accordance with the present invention
- FIG. 3 is a block diagram of the lead generation system, in accordance with the present invention.
- FIG. 4 is a block diagram of an analysis module that can be employed by the lead generation system illustrated in FIG. 3;
- FIG. 5 is a block diagram of a structure-property database that can be employed by the lead generation system illustrated in FIG. 3;
- FIG. 6 is a process flowchart illustrating an iterative method for identifying chemical compounds having desired properties
- FIG. 7 is a process flowchart illustrating a method for performing steps 612 and 614 of the method illustrated in FIG. 6;
- FIG. 8 is a flow diagram depicting the flow of data among elements of a structure-property model generator that can be employed by a lead generation system;
- FIG. 9 is an illustration of a generalized regression neural network model that can be generated by the structure-property model generator illustrated in FIG. 8 and that can employ a K-Nearest-Neighbor classifiers;
- FIG. 10 is a flow diagram depicting the flow of data among elements of a fuzzy structure-property model than can be generated by the structure-property model generator illustrated in FIG. 8;
- FIG. 11 is a Neuro-Fuzzy structure-property model that can be generated by the structure-property model generator illustrated in FIG. 8;
- FIG. 12 is a flow diagram depicting the flow of data among an experiment planner and a selector in a lead generation system
- FIG. 13 is a flow diagram depicting the flow of data during selection of a directed diversity library
- FIG. 14 illustrates a distribution of compounds in a directed diversity library
- FIG. 15 illustrates another distribution of compounds in a directed diversity library
- FIG. 16 illustrates another distribution of compounds in a directed diversity library
- FIG. 17 is a process flowchart illustrating a method for generating structure-property models in accordance with the present invention
- FIG. 18 is a process flowchart illustrating a method for selecting a directed diversity library, in accordance with the present invention.
- FIG. 19 is a block diagram of a computer system that can be used to implement one or more portions of the lead generation system illustrated in FIG. 3.
- the present invention is an iterative system, method and computer program product for generating chemical entities having desired physical, chemical and/or bioactive properties.
- the present invention iteratively selects, analyzes and evaluates Directed Diversity Libraries for desired properties.
- the present invention can be implemented as a fully or partially automated, computer-aided robotic system, or without any robotics.
- the present invention is also directed to the chemical entities generated by operation of the present invention.
- the present invention employs a dynamic Compound Library.
- the Compound Library is dynamic in that the compounds comprising the Compound Library can change from one iteration of the present invention to the next.
- the dynamic Compound Library can expand and/or contract.
- the Compound Library includes chemical compounds that already exist and/or chemical compounds that can be synthesized on demand, either individually or combinatorially.
- the Compound Library can be a combinatorial chemical library, a set of combinatorial chemical libraries and/or non-combinatorial chemical libraries.
- the Compound Library is not limited to a combinatorial chemical library.
- Directed Diversity Libraries provide an optimization approach that is focused and directed.
- a lead generation/optimization system 100 includes an Experiment Planner 130 , a Selector 106 , a Synthesis Module 112 and an Analysis Module 118 .
- the system also includes one or more databases, such as: a Structure-Property database 126 , a Compound Database 134 , a Reagent database 138 and a Compound Library 102 .
- the Selector 106 receives Selection Criteria 104 from the Experiment Planner 130 .
- the Selector 106 can also receive one or more Objective Functions 105 from the Experiment Planner 130 .
- the Selection Criteria 104 represent desired or undesired characteristics that the resulting compounds should or should not possess, either individually or collectively, and/or constraints that exclude certain compounds and/or combinations of compounds.
- the Selection Criteria 104 can be in the form of mathematical functions or computer algorithms, and can be calculated using a digital computer.
- One or more of the Selection Criteria 104 can be combined into one or more Objective Functions 105 by the Experiment Planner 130 .
- the Objective Functions 105 describe the extent to which a given set of compounds should satisfy all the prescribed Selection Criteria 104 .
- the Objective Functions 105 can define the influence of each Selection Criterion 104 in the selection of a Directed Diversity Library.
- the Selection Criteria 104 and the exact form of the Objective Functions 105 can be specified by a human operator or can be automatically or semi-automatically generated (with human input) by the Experiment Planner 130 .
- the Selector 106 searches the Compound Library 102 to identify one or more subsets of compounds that maximize or minimize the Selection Criteria 104 and/or Objective Function 105 .
- the subset of compounds is referred to herein as a Directed Diversity Library 108 .
- the Directed Diversity Library 108 is a list of compounds. These compounds may or may not already exist (i.e., they may or may not be in the Chemical Inventory 110 ).
- the properties of the Directed Diversity Library 108 of compounds are generally hitherto unknown.
- the number of compounds in a Directed Diversity Library can be specified by the operator, or can be determined automatically within any limits specified by the operator.
- the Selection Criteria 104 can be applied either simultaneously or sequentially.
- one part of the Directed Diversity Library 108 can be selected based on a given set of Selection Criteria 104 and/or Objective Function 105
- another part of that Directed Diversity Library 108 can be selected based on a different set of Selection Criteria 104 and/or Objective Function 105 .
- the present invention represents a multi-objective property refinement system, in the sense that one or more Selection Criteria 104 can be used, and one or more Objective Functions 105 can be pursued, during each iteration.
- Compounds from the Directed Diversity Libraries 108 are provided to the Analysis Module 118 for analysis.
- the compounds can be manually analyzed or partially manually analyzed and partially automatically analyzed.
- one or more compounds in a Directed Diversity Library 108 that have previously been synthesized are retrieved from a Chemical Inventory 110 instead of being synthesized again. This saves time and costs associated with re-synthesizing the selected compounds.
- the Chemical Inventory 110 represents any source of available compounds including, but not limited to, a corporate chemical inventory, a supplier of commercially available chemical compounds, a natural product collection, etc.
- a system and computer program product that determines whether a compound in a Directed Diversity Library 108 exists in the Chemical Inventory 110 can be implemented within the Selector Module 102 , the Synthesis Module 112 or in any other module.
- the Selector Module 106 can include instructions for searching the Chemical Inventory 110 to identify and retrieve any previously synthesized compounds therefrom that are listed in the Directed Diversity Library 108 (or a subset of the Directed Diversity Library 108 , as determined by user input, for example).
- Synthesis Module 112 can retrieve and selectively combine Reagents 114 from the Reagent Inventory 116 , in accordance to a prescribed chemical synthesis protocol.
- the Synthesis Module 112 is used to robotically synthesize compounds.
- the term “robotically” refers to any method that involves an automated or partially automated device that performs functions specified by instructions that the Synthesis Module 112 receives from the operator or some other component of the system of the present invention.
- FIG. 2 which is similar to FIG. 1, but which illustrates a synthesis protocol generator 202 in the path to the Synthesis Module 112 .
- the Synthesis Protocol Generator 202 provides Robotic Synthesis Instructions 204 to the Synthesis Module 112 .
- the Synthesis Protocol Generator 202 receives a list of compounds in the Directed Diversity Library 108 to be synthesized.
- the Synthesis Protocol Generator 202 extracts, under computer control, Reagent Data 136 from a Reagent Database 138 , and generates Robotic Synthesis Instructions 204 that will enable the Synthesis Module 112 to automatically or partially automatically synthesize the compounds in the Directed Diversity Library 108 .
- the Robotic Synthesis Instructions 204 identify Reagents 114 from a Reagent Inventory 116 that are to be mixed by the Synthesis Module 112 .
- the Robotic Synthesis Instructions 204 also identify the manner in which such Reagents 114 are to be mixed by the Synthesis Module 112 .
- the Robotic Synthesis Instructions 204 can specify which Reagents 114 are to be mixed together.
- the Robotic Synthesis Instructions 204 can also specify chemical and/or physical conditions, such as temperature, length of time, stirring, etc. for mixing of the specified Reagents 114 .
- compounds from the Directed Diversity Library 108 are manually synthesized and then delivered to the Analysis Module 118 for analysis.
- a Compound Library 102 includes a single combinatorial chemical library that can be synthesized from approximately one hundred commercially available reagents that are suitable for generating thrombin inhibitors.
- the Synthesis Module 112 combines these reagents using well-known synthetic chemistry techniques to synthesize inhibitors of the enzyme thrombin.
- Each inhibitor is generally composed of, but not restricted to, three chemical building blocks.
- the Directed Diversity Library 108 preferably comprises a plurality of thrombin inhibitors generally composed of, but not restricted to, three sites of variable structure (i.e. trimers).
- Compound Library 102 can include many other types of libraries.
- the present invention is equally adapted and intended to generate other chemical compounds having other desired properties, such as paints, finishes, plasticizers, surfactants, scents, flavorings, bioactive compounds, drugs, herbicides, pesticides, veterinary products, etc., and/or lead compounds for any of the above.
- the present invention can generate chemical compounds having any useful properties that depend up structure, composition, or state.
- the compounds in the Directed Diversity Library 108 after being synthesized or retrieved from the Chemical Inventory 110 , are provided to the Analysis Module 118 for analysis. Analysis can include chemical, biochemical, physical, and/or biological analysis.
- the Analysis Module 118 assays the compounds in the Directed Diversity Library 108 to obtain, for example, enzyme activity data, cellular activity data, toxicology data, and/or bioavailability data.
- the Analysis Module 118 analyzes the compounds to identify which of the compounds were adequately synthesized and which of the compounds were not adequately synthesized.
- the Analysis Module 118 further analyzes the compounds to obtain other pertinent data, such as structure and electronic structure data.
- the Analysis Module 118 also classifies any compounds that possess the Desired Properties 120 as Leads (lead compounds) 122 .
- this function can be performed by another module such as, for example, the Experiment planner 130 or the Selector Module 106 .
- Analysis can be performed automatically, manually or semi-automatically/semi-manually.
- the Analysis Module 118 generates Structure-Property Data 124 for the analyzed compounds.
- Structure-Property Data 124 can include structure-property and/or structure-activity data.
- Structure-Property Data 124 can include physical data, synthesis data, enzyme activity data, cellular activity data, toxicology data, bioavailability data, etc.
- the Structure-Property Data 124 is stored in a Structure-Property Database 126 .
- the Structure-Property Data 124 is also provided to the Experiment Planner 130 .
- the Experiment Planner 130 receives current Structure-Property Data 124 from the Analysis Module 118 and Historical Structure-Property Data 128 from the Structure-Property Database 126 .
- Historical Structure-Property Data 128 can include well known structure-property or structure-activity relationship data, collectively referred to as Structure-Property Relationships or SPR, pertaining to one or more relationships between the properties and activities of a compound and the chemical structure of the compound.
- the Experiment Planner 130 also receives Compound Data 132 from the Compound Database 134 , Reagent Data 136 from Reagent Database 138 and Desired Properties 120 . Desired Properties 120 can be sent from an automated system or database (not shown) or from user input. In one embodiment, the Experiment Planner 130 also receives one or more Structure-Property Models 192 from one or more optional Structure-Property Model Generators 190 . The Experiment Planner 130 uses the above inputs to generate one or more Selection Criteria 104 and Objective Functions 105 .
- Compound Data 132 and Reagent Data 136 permit the Experiment Planner 130 to include, for example, one or more of the following criteria in the Selection Criteria 104 :
- molecular diversity refers to a collective propensity of a set of compounds to exhibit a variety of a prescribed set of structural, physical, chemical and/or biological characteristics
- the optional Structure-Property Models 192 can be used by the Experiment Planner 130 to predict the properties of compounds in the Compound Library 102 whose real properties are hitherto unknown.
- the Structure-Property Models 192 are used by the Experiment Planner 130 to define and/or refine a set of Selection Criteria 104 that depend upon the predictions of one or more Structure-Property Models 192 .
- Structure-Property Models 192 permit the Experiment Planner 130 to include one or more of the following in Selection Criteria 104 :
- Experiment Planner 130 can be achieved by an automated or partially automated process, or by a trained operator, aided or not by a computer. Further details of Structure-Property Models 192 are provided below.
- the one or more new Selection Criteria 104 and Objective Functions 105 are sent to the Selector 106 which uses them to select a new Directed Diversity Library 108 for the next iteration of the present invention.
- the compounds in the new Directed Diversity Library 108 are retrieved from the Chemical Inventory 110 and/or synthesized by the Synthesis Module 112 .
- the Analysis Module 118 analyzes the new Directed Diversity Library 108 to obtain Structure-Property Data 124 pertaining to the compounds in the new Directed Diversity Library 108 .
- the Experiment Planner 130 analyzes the new Structure-Property Data 124 , Historical Structure-Property Data 128 , and any of Compound Data 132 , Reagent Data 136 , Desired Properties 120 and Structure-Property Models 192 , to identify a new set of Selection Criteria 104 .
- the new set of Selection Criteria 104 can be used by the Selector 106 to select yet another Directed Diversity Library 108 for another iteration.
- the present invention is an iterative system, method and/or computer program product for generating chemical entities, including new chemical entities, having a set of physical, chemical, and/or biological properties optimized towards a prescribed set of targets.
- a Directed Diversity Library 108 is generated, the compounds in the Directed Diversity Library 108 are analyzed, Structure-Property Models are optionally derived and elaborated, a list of Selection Criteria 104 are defined, and a new Directed Diversity Library 108 is selected for the next iteration.
- elements of the present invention are controlled by a data processing device (with or without operator input, intervention or control), such as a computer operating in accordance with software. Consequently, it is possible in the present invention to store massive amounts of data, and to utilize this data in a current iteration to generate Selection Criteria 104 for the next iteration.
- a data processing device such as a computer operating in accordance with software. Consequently, it is possible in the present invention to store massive amounts of data, and to utilize this data in a current iteration to generate Selection Criteria 104 for the next iteration.
- the elements of the present invention are controlled by a data processing device, it is possible to store the Structure-Property Data 124 obtained during each iteration. It is also possible to utilize the Historical Structure-Property Data 128 obtained during previous iterations, as well as other pertinent structure-property data obtained by other experiments, to generate Selection Criteria 104 for the next iteration. In other words, the selection of the Directed Diversity Library 108 for the next iteration is guided by the results of all previous iterations (or any subset of the previous iterations, as determined by user input, for example). Thus, the present invention “learns” from its past performance such that the present invention is “intelligent”. As a result, the Leads 122 identified in subsequent iterations are better (i.e. exhibit physical, chemical, and/or biological properties closer to the prescribed values) that the Leads 122 identified in prior iterations.
- the Compound Library 102 includes one or more combinatorial chemical libraries, comprised exclusively of compounds that can be synthesized by combining a set of chemical building blocks an a variety of combinations.
- the Synthesis Module 112 is used to robotically synthesize the Directed Diversity Library 108 during each iteration.
- one or more Structure-Property Model Generators 190 generate Structure-Property Models 192 that conform to observed data.
- the Structure-Property Models 192 are used by the Experiment planner 130 to generate Selection Criteria 104 and/or Objective Functions 105 .
- Structure-Property Model Generator 800 one embodiment of a Structure-Property Model Generator 190 is illustrated as Structure-Property Model Generator 800 .
- the Structure-Property Model Generator 800 defines a Model Structure 820 based on Statistics 802 , Neural Networks 804 , Fuzzy Logic 806 , and/or other Model-Specific Methods 808 .
- Model-Specific Methods 808 refer to methods that are specific to the application domain of the model. Examples of such Model-Specific Methods 808 are methods that compute the energy of a particular molecular conformation or receptor-ligand complex such as an empirical force field or a quantum-mechanical method, methods that align the 3-dimensional structures of two or more chemical compounds based on their shape, electronic fields and/or other criteria, methods that predict the affinity and binding conformation of a ligand to a particular receptor binding site, methods that construct receptor models based on the 3-dimensional structures of known ligands, etc. Examples of such Model-Specific Methods 808 are described in greater detail below.
- the Model Structure 820 can combine elements of Statistics 802 , Neural Networks 804 , Fuzzy Logic 806 , and/or Model-Specific Methods 808 . Such Model Structures 820 are hereafter referred to as Hybrid Model Structures or Hybrid Models.
- Hybrid Model Architecture 820 is a Model Architecture that combines elements of Neural Networks 804 and Fuzzy Logic 806 , hereafter referred to as a Neuro-Fuzzy Model Architecture or Neuro-Fuzzy Model.
- An example of a Neuro-Fuzzy Model Architecture is discussed in greater detail below.
- One embodiment of a Structure-Property Model Generator 800 includes a Trainer 822 that generates one or more Structure-Property Models 842 for a given Model Architecture 820 .
- the Trainer 822 optimizes a particular Model Structure 820 using selected Structure-Property Data 124 and 128 from the Structure-Property Database 126 , as determined by user input, for example.
- the Trainer 822 optimizes the Model Structure 820 by minimizing the error between the actual properties of selected compounds, as determined by the Analysis Module 118 (Structure-Property Data 124 , 128 ), and the predicted properties of the compounds as determined by the Structure-Property Model 842 .
- the error is referred to hereafter as the Structure-Property Prediction Error or Prediction Error.
- the process of minimizing the Prediction Error shall hereafter be referred to as Training.
- the Trainer 822 minimizes the Prediction Error using a search/optimization method such as Gradient Minimization 832 , Monte-Carlo Sampling 834 , Simulated Annealing 836 , Evolutionary Programming 838 , and/or a Genetic Algorithm 840 .
- the Trainer 822 minimizes the Prediction Error using a hybrid search/optimization method that combines elements of Gradient Minimization 832 , Monte-Carlo Sampling 834 , Simulated Annealing 836 , Evolutionary Programming 838 , and/or a Genetic Algorithm 840 .
- An example of a hybrid method is a method that combines Simulated Annealing 836 with Gradient Minimization 832 .
- Another example of a hybrid method is a method that combines Monte-Carlo Sampling 834 with Gradient Minimization 832 . Examples of such methods are described in greater detail below.
- the Structure-Property Data 124 , 128 are divided into Structure Data 824 and Property Data 828 .
- Structure Data 824 and Property Data 828 are preferably encoded as Encoded Structure Data 826 and Encoded Property Data 830 .
- Encoding should be of a form that is appropriate for the particular Model Structure 820 .
- the Encoded Structure Data 826 and Encoded Property Data 830 are used by the Trainer 822 to derive one or more final Structure-Property Models 842 .
- the Trainer 822 can employ Gradient Minimization 832 , Monte-Carlo Sampling 834 , Simulated Annealing 836 , Evolutionary Programming 838 , and/or a Genetic Algorithm 840 .
- the Trainer 822 trains the Model Structure 820 using a suitably encoded version of the Structure-Property Data 124 , 128 , or a selected subset of the Structure-Property Data 124 , 128 , as determined by user-input, for example.
- the Trainer 822 generates one or more Structure-Property Models 842 for a given Model Structure 820 .
- Structure-Property Models 842 are represented as a linear combination of basis functions of one or more molecular features (descriptors). The descriptors collectively represent the Encoded Structure Data 826 .
- a Statistical Module 802 can define a Statistical Model Structure 820 .
- the resultant Structure-Property Model 842 is referred to as a Statistical Structure-Property Model 842 .
- Structure-Property Models 192 are represented as a linear combination of basis functions of one or more molecular features (descriptors).
- the descriptors can include topological indices, physicochemical properties, electrostatic field parameters, volume and surface parameters, etc.
- the number of descriptors can range from a few tens to tens of thousands.
- the descriptors can include, but are not limited to, molecular volume and surface areas, dipole moments, octanol-water partition coefficients, molar refractivities, heats of formation, total energies, ionization potentials, molecular connectivity indices, substructure keys, hashed fingerprints, atom pairs and/or topological torsions, atom layers, 2D and 3D auto-correlation vectors, 3D structural and/or pharmacophoric keys, electronic fields, etc.
- the coefficients of the linear combination of the basis functions of Statistical Structure-Property Models 842 are determined using linear regression techniques. If many features are used, linear regression can be combined with principle component analysis, factor analysis, and/or multi-dimensional scaling. These are well known techniques for reducing the dimensionality and extracting the most important features from a large table.
- the basis functions and/or features used by the Trainer 822 to optimize the Statistical Structure-Property Models 842 are selected using Monte-Carlo Sampling 834 , Simulated Annealing 836 , Evolutionary Programming 838 , and/or a Genetic Algorithm 840 .
- a method for selecting the basis functions and/or features using a Genetic Algorithm 840 known as a genetic function approximation (GFA), is described in Rogers and Hopfinger, J. Chem. Inf. Comput. Sci., 34: 854 (1994) incorporated herein by reference in its entirety.
- a Structure-Property Model 842 is represented as a linear string that encodes the features and basis functions employed by the model.
- a population of linearly encoded Structure-Property Models 842 is then initialized by a random process, and allowed to evolve through the repeated application of genetic operators, such as crossover, mutation and selection. Selection is based on the relative fitness of the models, as measured by a least-squares error procedure, for example.
- Friedman's lack-of-fit algorithm described in J. Friedman, Technical Report No. 100, Laboratory for Computational Statistics, Department of Statistics, Stanford University, Stanford, Calif., November 1988, herein incorporated by reference in its entirety, or other suitable metrics well known to persons skilled in the art, can also be used.
- GFA can build models using linear polynomials as well as higher-order polynomials, splines and Gaussians. Upon completion, the procedure yields a population of models, ranked according to their fitness score.
- the basis functions and/or features can be selected using a Monte-Carlo Sampling 834 or Simulated Annealing 834 technique.
- an initial model is generated at random, and is gradually refined by a series of small stochastic ‘steps’.
- step is taken to imply a stochastic (random or semi-random) modification of the model's underlying structure.
- the model in this embodiment is also defined as a linear combination of basis functions, whose coefficients are determined by linear regression.
- the model is modified by making a ‘small’ stochastic step.
- the model can be modified by inserting a new basis function, by removing an existing basis function, by modifying an existing basis function (i.e. by modifying one or more of the features and/or parameters associated with that particular basis function), and/or by swapping features and/or parameters between two (compatible) basis functions.
- the quality of the model is assessed using a least-squares error criterion. Alternatively, Friedman's lack-of-fit criterion, or any other suitable error criterion can be used.
- the new model is compared to the old model using the Metropolis criterion. Alternatively, any other suitable comparison criterion can be used. If the new model is approved, it replaces the old model and the process is repeated. If the new model is not approved, the old model is retained as the current model, and the process is repeated. This general process is controlled by a Monte-Carlo Sampling protocol 834 , a Simulated Annealing protocol 836 , or variants thereof, which are well known to the people skilled in the art.
- the Trainer 822 can be configured to retain a list of models according to some predefined criteria. For example, the Trainer 822 can be configured to retain the ten best Structure-Property Models 842 discovered during the simulation. Alternatively, the Trainer 822 can be configured to retain the ten best Structure-Property Models 842 discovered during the simulation, which differ from each other by some predetermined amount.
- the difference between two models can be defined ‘genotypically’ or ‘phenotypically’.
- a ‘genotypical’ comparison between two models involves a comparison of their underlying structure (i.e. the basis functions and/or coefficients used to represent the Structure-Property Models 842 ). Conversely, a ‘phenotypical’ comparison between two models involves a comparison based on their respective predictions.
- the Structure-Property Model Generator 800 can generate Structure-Property Models 842 based on Neural Networks 804 .
- Neural Networks 804 are physical cellular systems that can acquire, store, and utilize experimental knowledge. Neural Networks 804 are extensively reviewed in Haykin, Neural Networks. A Comprehensive Foundation, MacMillan, New York (1994), incorporated herein by reference in its entirety.
- Structure Data 824 can be encoded using one or more molecular features (descriptors). Molecular features collectively represent the Encoded Structure Data 826 . Molecular features can include topological indices, physicochemical properties, electrostatic field parameters, volume and surface parameters, etc., and their number can range from a few tens to tens of thousands.
- these features can include, but are not limited to, molecular volume and surface areas, dipole moments, octanol-water partition coefficients, molar refractivities, heats of formation, total energies, ionization potentials, molecular connectivity indices, substructure keys, hashed fingerprints, atom pairs and/or topological torsions, atom layers, 2D and 3D auto-correlation vectors, 3D structural and/or pharmacophoric keys, electronic fields, etc.
- neural network training can be combined with principle component analysis, factor analysis, and/or multi-dimensional scaling, which are well known techniques for reducing the dimensionality and extracting the most important features from a large table.
- a Neural Network Model Structure 820 is a Multi-Layer Feed-Forward Neural Network or Multi-Layer Perceptron, trained using the error back-propagation algorithm.
- the Multi-Layered Perceptron can be trained using Monte-Carlo Sampling 834 , Simulated Annealing 836 , Evolutionary Programming 838 , and/or a Genetic Algorithm 840 .
- Neural Network training is the process of adjusting the number of neurons, synaptic weights, and/or transfer functions in the input, output and hidden layers of the Neural Network, so that the overall prediction error is minimized. Many variants of such training algorithms have been reported, and are well known to those skilled in the art.
- the Trainer 822 can be configured to retain more than one Neural Network Models 842 during the training phase (flow arrow 890 in FIG. 8).
- the Trainer 822 can be configured to retain the ten best Neural Network Models 842 discovered during the training phase.
- the Trainer 822 can be configured to retain the ten best Neural Network Models 842 discovered during training, which differ from each other by some predetermined amount.
- the difference between two models can be defined ‘genotypically’ or ‘phenotypically’, i.e. by comparing the models based either on their internal structure, or their predictions.
- Neural Network Model Structure 820 is a Generalized Regression Neural Network Model Structure (or Generalized Regression Neural Network).
- Generalized Regression Neural Networks are described in Specht, D. IEEE Trans. Neural Networks, 2(6): 568 (1991), and Masters, T., Advanced Algorithms for Neural Networks, Wiley (1995), incorporated herein by reference.
- a Generalized Regression Neural Network 900 is comprised of four layers of neurons (units). The first layer is the Input Layer 902 , the second layer is the Pattern Layer 904 , the third layer is the Summation Layer 906 , and the fourth layer is the Output Layer 908 , which is comprised of a single unit.
- the Pattern Layer 904 contains one unit per input-output pair or structure-property pair (referred to hereafter as a Training Case).
- the collection of all Training Cases used in the Pattern Layer 904 is hereafter referred to as the Training Set.
- the input vector (or input case, which in the example shown in FIG. 9 consists of 3 variables) is simultaneously presented to all units in the Pattern Layer 904 .
- Each of these units computes a distance measure separating the Training Case represented by that unit from the input case. This distance is acted on by the transfer function associated with that unit, to compute the output of that particular unit.
- the transfer function is also referred to as an activation function or kernel.
- the Summation Layer 906 of the Generalized Regression Neural Network 900 (i.e. the third layer) is comprised of two units.
- the first unit is called the Numerator 910
- the second unit is called the Denominator 912 .
- Each unit in the Pattern Layer 904 is fully connected to the Numerator 910 and Denominator 912 units in the Summation Layer 906 .
- Both the Numerator 910 and Denominator 912 units are simple summation units, i.e. they accumulate the input received from all units in the Pattern Layer 904 .
- the weight vector is unity, so a simple sum is performed.
- the weight connecting each pattern unit is equal to the value of the dependent variable for the training case of that pattern unit (i.e. the output in the input-output pair, or the property in the structure-property pair).
- the output of the Numerator 910 and Denominator 912 units in the Summation Layer 906 are forwarded to the Output unit 908 .
- the Output unit 908 divides the output of the Numerator 910 unit by the output of the Denominator 912 unit, to compute the output of the network for a particular input case.
- the activation used by the units in the Pattern Layer 904 is typically a Parzen Window.
- Parzen Windows is a well known method for estimating a univariate or multivariate probability density function from a random sample. They are described in Parzen, Annals Math. Stat., 33: 1065 (1962), and Cacoullos, Annals Inst. Stat. Meth., 18(2): 179 (1966), incorporated herein by reference in their entirety.
- the Parzen Window can involve different scaling parameters for each input variable.
- a Parzen Window can be configured to perform feature scaling in the vicinity of the Training Case on which it is centered. If the Parzen Windows associated with each Training Case share common feature weights, the Generalized Regression Neural Network 900 is said to be globally weighted. Conversely, if the Parzen Windows associated with each Training Case do not share common feature weights, the Generalized Regression Neural Network 900 is said to be locally weighted.
- a Generalized Regression Neural Network 900 can be trained to minimize the prediction error using Gradient Minimization 832 , Monte-Carlo Sampling 834 , Simulated Annealing 836 , Evolutionary Programming 838 , and/or a Genetic Algorithm 840 .
- the Generalized Regression Neural Network 900 can be trained to minimize the prediction error using a combination of Gradient Minimization 832 , Monte-Carlo Sampling 834 , Simulated Annealing 836 , Evolutionary Programming 838 , and/or a Genetic Algorithm 840 .
- the training process involves adjusting the parameters of the activation function associated with each unit in the Pattern Layer 904 to minimize the mean prediction error for the entire Training Set, or some other suitable error criterion.
- the input-output pairs in the Training Set are presented to the network, and a prediction error for the entire Training Set is computed.
- each Training Case is presented to each of the units (Training Cases) in the Pattern Layer 904 , and the output of these units are summed by the units in the Summation Layer 906 .
- the output of the summation units 910 and 912 are then divided to compute the output of the network for that particular Training Case.
- This process is repeated for each Training Case in the Training Set.
- the parameters of the transfer functions are then adjusted so that the prediction error is reduced.
- This process is repeated until the prediction error for the entire Training Set is minimized, within some prescribed tolerance.
- the process is repeated for a prescribed number of cycles (as determined by user input, for example), even though the prediction error for the entire Training Set may not be at a minimum, within a prescribed tolerance.
- each Training Case is not presented to itself, i.e. the output of each Training Case is computed based on every Training Case other than itself.
- the resulting Generalized Regression Neural Network Models 842 are cross-validated, in the sense that they were designed to resist over fitting.
- the Trainer 822 can also perform feature selection in addition to scaling (i.e. adjusting the parameters of the transfer functions).
- Feature selection refers to the process of selecting a subset of features, and applying the Generalized Regression Neural Network 900 algorithm only on that subset of features.
- the Generalized Regression Neural Network 900 is trained using a Monte-Carlo Sampling 834 or Simulated Annealing 836 algorithm.
- an initial model is generated at random, by selecting a random set of features and randomizing the transfer functions associated with each Training Case.
- the model is then gradually refined by a series of small stochastic ‘steps’.
- the term ‘step’ is taken to imply a stochastic (random or semi-random) modification of the model's underlying structure.
- the model can be modified by inserting a new feature, by removing an existing feature, by modifying an existing feature weight if the model is globally weighted, and/or by modifying a randomly chosen transfer function (i.e. by modifying one or more of the parameters associated with that particular transfer function, such as a feature weight).
- the quality of the resulting model is assessed, and the new model is compared to the old model using the Metropolis criterion. Alternatively, any other suitable comparison criterion can be used. If the new model is approved, it replaces the old model and the process is repeated. If the new model is not approved, the old model is retained as the current model, and the process is repeated.
- This general process is controlled by a Monte-Carlo Sampling protocol 834 , a Simulated Annealing protocol 836 , or variants thereof, which are well known to people skilled in the art.
- the system of the present invention is not limited to these embodiments.
- the Generalized Regression Neural Network 900 can be trained using Evolutionary Programming 838 , Genetic Algorithms 840 , or any other suitable search/optimization algorithm. The implementation of these methods should be straightforward to persons skilled in the art.
- K-Nearest-Neighbor prediction is a well known technique for property prediction and classification. It is described in detail in Dasarathy, Nearest Neighbor ( NN ) Norms: NN pattern classification techniques, IEEE Computer Society Press, Los Alamitos, Calif. (1991), incorporated herein by reference in its entirety. K-Nearest-Neighbor prediction forms the basis of many ‘lazy learning’ algorithms, that are commonly used in artificial intelligence and control. The K-Nearest-Neighbor algorithm predicts the output (property) of a particular input query by retrieving the K nearest (most similar) Training Cases to that query, and averaging their (known) outputs according to some weighting scheme. Therefore, the quality of K-Nearest-Neighbor generalization depends on which Training Cases are considered most similar, which is, in turn, determined by the distance function.
- Generalized Regression Neural Networks 900 are combined with K-Nearest-Neighbor classifiers, to generate a hybrid Model Structure 820 referred to hereafter as a Nearest Neighbor Generalized Regression Neural Network.
- the operation of a Nearest Neighbor Generalized Regression Neural Network is similar to that of a regular Generalized Regression Neural Network, except that the query (input case) is not presented to all Training Cases in the Pattern Layer 904 . Instead, the query is presented to the K nearest Training Cases in the Pattern Layer 904 , as determined by a suitable distance metric.
- the K nearest neighbors are retrieved using a nearest neighbor detection algorithm such as a k-d tree (Bentley, Comm. ACM, 18(9): 509 (1975), Friedman et al., ACM Trans. Math. Soft., 3(3): 209 (1977)).
- a nearest neighbor detection algorithm such as a k-d tree (Bentley, Comm. ACM, 18(9): 509 (1975), Friedman et al., ACM Trans. Math. Soft., 3(3): 209 (1977)).
- any other suitable algorithm can be used including, but not limited to, ball trees (Omohundro, International Computer Science Institute Report TR -89-063, Berkeley, Calif. (1989)), bump trees (Omohundro, Advances in Neural Information Processing Systems 3, Morgan Kaufmann, San Mateo, Calif. (1991)), gridding, and/or Voronoi tesselation (Sedgewick, Algorithms in C, Addison-Wesley, Princeton (1990).
- ball trees Ol
- the Generalized Regression Neural Network 900 can be trained in multiple phases using different optimization algorithms (i.e. Monte-Carlo Sampling 834 , Simulated Annealing 836 , Evolutionary Programming 838 , and/or Genetic Algorithms 840 ), and/or different kernel parameters and number of nearest-neighbors during each phase.
- the Generalized Regression Neural Network 900 can be initially trained to perform feature detection using Simulated Annealing 836 , ten nearest neighbors, a uniform kernel (i.e. the same kernel for all Training Cases), and a common scaling factor for all features.
- the resulting (partially optimized) network can then be further refined using Gradient Minimization 832 using fifty nearest neighbors, a uniform kernel, and a different scaling factor for each feature. Any number of phases and training schemes can be used as appropriate.
- the Trainer 822 can be configured to retain more than one Generalized Regression Neural Network Models 842 during the training phase (flow arrow 890 in FIG. 8).
- the Trainer 822 can be configured to retain the ten best Generalized Regression Neural Network Models 842 discovered during the training phase.
- the Trainer 822 can be configured to retain the ten best Generalized Regression Neural Network Models 842 discovered during training, which differ from each other by some predetermined amount.
- the difference between two models can be defined ‘genotypically’ or ‘phenotypically’, i.e. by comparing the models based either on their internal structure, or their predictions.
- the Structure-Property Model Generator 800 can generate Structure-Property Models 842 based on Fuzzy Logic 806 .
- Fuzzy Logic was developed by Zadeh (Zadeh, Information and Control, 8: 338 (1965); Zadeh, Information and Control, 12: 94 (1968)) as a means of representing and manipulating data that is fuzzy rather than precise.
- Zadeh Zadeh, Information and Control, 8: 338 (1965
- Fuzzy Logic Central to the theory of Fuzzy Logic is the concept of a fuzzy set. In contrast to a traditional crisp set where an item either belongs to the set or does not belong to the set, fuzzy sets allow partial membership. That is, an item can belong to a fuzzy set to a degree that ranges from 0 to 1. A membership degree of 1 indicates complete membership, whereas a membership value of 0 indicates non-membership. Any value between 0 and 1 indicates partial membership. Fuzzy sets can be used to construct rules for fuzzy expert systems and to perform fuzzy inference.
- fuzzy Logic 806 is the ability to reason (draw conclusions from facts or partial facts) using fuzzy sets, fuzzy rules, and fuzzy inference.
- a fuzzy model is a representation of the essential features of a system by the apparatus of fuzzy set theory (Yager and Filev, Essentials of Fuzzy Modeling and Control, Wiley (1994)). The aforementioned publication is incorporated herein by reference in its entirety.
- Fuzzy Logic 806 has been employed to control complex or adaptive systems that defy exact mathematical modeling. Applications of fuzzy logic controllers range from cement-kiln process control, to robot control, image processing, motor control, camcorder auto-focusing, etc. However, as of to date, there has been no report on the use of Fuzzy Logic 806 for chemical structure-property prediction. A preferred embodiment of a Structure-Property Model Generator 800 using Fuzzy Logic 806 shall now be described in detail.
- the Structure-Property Model Generator 800 generates Fuzzy Structure-Property Models 842 , i.e. models that represent the essential features of the system using the apparatus of fuzzy set theory.
- a Fuzzy Structure-Property Model 842 makes predictions using fuzzy rules from a fuzzy rule base describing the system of interest.
- a fuzzy rule is an IF-THEN rule with one or more antecedent and consequent variables.
- a fuzzy rule can be single-input-single-output (SISO), multiple-input-single-output (MISO), or multiple-input-multiple-output (MIMO).
- SISO single-input-single-output
- MISO multiple-input-multiple-output
- a fuzzy rule base is comprised of a collection of one or more such fuzzy rules.
- a MISO fuzzy rule base is of the form:
- x 1 , . . . , x n are the input variables
- y is the output (dependent) variable
- X ij , Y i , i (1, . . . , r)
- the fuzzy model described above is referred to as a linguistic model.
- ‘high’ and ‘low’ are fuzzy sets in the universe of discourse of molecular weight, logP, and activity.
- TSK Takagi-Sugeno-Kang
- a TSK model involves functional consequents, typically implemented as a linear function of the input variables.
- the Fuzzy Knowledge Base 1002 is comprised of a Rule Base 1004 and a Data Base 1006 .
- the Data Base 1006 defines the membership functions of the fuzzy sets used as values for each system variable, while the Rule Base 1004 is a collection of fuzzy rules of the type described above.
- the system variables are of two main types: input variables and output variables.
- the input variables in a Fuzzy Structure-Activity Model 842 can be molecular features (descriptors).
- molecular features which collectively represent the Encoded Structure Data 826 , can include topological indices, physicochemical properties, electrostatic field parameters, volume and surface parameters, etc., and their number can range from a few tens to tens of thousands.
- these features can include, but are not limited to, molecular volume and surface areas, dipole moments, octanol-water partition coefficients, molar refractivities, heats of formation, total energies, ionization potentials, molecular connectivity indices, substructure keys, hashed fingerprints, atom pairs and/or topological torsions, atom layers, 2D and 3D auto-correlation vectors, 3D structural and/or pharmacophoric keys, electronic fields, etc.
- Fuzzy Logic 806 can be combined with principle component analysis, factor analysis, and/or multi-dimensional scaling, which are well known techniques for reducing the dimensionality and extracting the most important features from a large table.
- the input variables i.e. the Encoded Structure Data 826 , which are usually crisp
- the Fuzzy Inference Module 1010 evaluates all the rules in the Rule Base 1004 to produce the output, using the method described below.
- the Fuzzy Inference Module 1010 performs the following steps:
- [0215] (1) determines the degree of match between the fuzzified input data and the fuzzy sets defined for the input variables in the Data Base 1006 ;
- [0217] (3) derives the output based on the firing strength of each rule computed in step 2 and the fuzzy sets defined for the output variable in the Data Base 1006 .
- the Fuzzy Structure-Property Model is a linguistic model
- the fuzzy output of the Fuzzy Inference Module 1010 is finally defuzzified by the Defuzzification Unit 1012 , using the output fuzzy set definitions in the Data Base 1006 , and a defuzzification strategy such as the mean-of-maximum method.
- the center-of-area or any other suitable deffuzification method can be used.
- the Trainer 822 of the Fuzzy Structure-Property Model Generator 800 preferably trains the Fuzzy Knowledge Base 1002 using Gradient Minimization 832 , Monte-Carlo Sampling 834 , Simulated Annealing 836 , Evolutionary Programming 838 , and/or a Genetic Algorithm 840 , in order to minimize the overall prediction error for a prescribed set of Training Cases.
- the Trainer 822 can use a pre-existing Fuzzy Knowledge Base 1002 or may construct one directly from the Structure-Property Data 124 , 128 . Training is the process of creating, modifying and/or refining the fuzzy set definitions and fuzzy rules in the Fuzzy Knowledge Base 1002 .
- the Fuzzy Knowledge Base 1002 is trained using a Monte-Carlo Sampling 834 or Simulated Annealing 836 algorithm.
- an initial model is generated at random, by selecting a random set of rules and randomizing the membership functions associated with each input variable.
- the model is then gradually refined by a series of small stochastic ‘steps’.
- step is taken to imply a stochastic (random or semi-random) modification of the model's underlying structure.
- the model can be modified by inserting a new rule, by removing an existing rule, by modifying an existing rule (i.e. by inserting or removing a variable from the antecedent part of the fuzzy rule), by modifying the membership function of an existing fuzzy set, and/or by modifying the number of fuzzy partitions of a fuzzy variable (i.e. by increasing or decreasing the number of fuzzy partitions of the fuzzy variable).
- the quality of the resulting model is assessed, and the new model is compared to the old model using the Metropolis criterion. Alternatively, any other suitable comparison criterion can be used. If the new model is approved, it replaces the old model and the process is repeated. If the new model is not approved, the old model is retained as the current model, and the process is repeated.
- This general process is controlled by a Monte-Carlo Sampling protocol 834 , a Simulated Annealing protocol 836 , or variants thereof, which are well known to people skilled in the art.
- the system of the present invention is not limited to these embodiments.
- the Fuzzy Knowledge Base 1002 can be trained using Evolutionary Programming 838 , Genetic Algorithms 840 , or any other suitable search/optimization algorithm. The implementation of these methods should be straightforward to persons skilled in the art.
- the Trainer 822 can be configured to retain more than one Fuzzy Structure-Property Models 842 during the training phase (flow arrow 890 in FIG. 8).
- the Trainer 822 can be configured to retain the ten best Fuzzy Structure-Property Models 842 discovered during the training phase.
- the Trainer 822 can be configured to retain the ten best Fuzzy Structure-Property Models 842 discovered during training, which differ from each other by some predetermined amount.
- the difference between two models can be defined ‘genotypically’ or ‘phenotypically’, i.e. by comparing the models based either on their internal structure, or their predictions.
- the Structure-Property Model Generator 800 can generate Model Structures 820 that combine elements of Statistics 802 , Neural Networks 804 , Fuzzy Logic 806 , and/or Model-Specific Methods 808 .
- Model Structures 820 are referred to as Hybrid Model Structures, and the corresponding models are referred to as Hybrid Models.
- a preferred embodiment of such a Hybrid Model Structure 820 that combines elements of Neural Networks 804 and Fuzzy Logic 806 is referred to as a NeruoFuzzy Model Structure, and shall now be described in detail.
- Hybrid Model Structure 820 is a Model Structure that combines elements of Neural Networks 804 and Fuzzy Logic 806 , hereafter referred to as a Neuro-Fuzzy Model Structure or Neuro-Fuzzy Model.
- a Neuro-Fuzzy Model Structure is discussed in greater detail below.
- a Neuro-Fuzzy Model Structure is a Model Structure 820 that combines the advantages of Fuzzy Logic 806 (e.g. human-like rule-based reasoning, ease of incorporating expert knowledge) and Neural Networks 804 (e.g. learning ability, optimization ability, and connectionist structure).
- Fuzzy Logic 806 e.g. human-like rule-based reasoning, ease of incorporating expert knowledge
- Neural Networks 804 e.g. learning ability, optimization ability, and connectionist structure.
- the parameters that control the performance of a fuzzy model can be tuned using techniques similar to those used in neural network systems.
- neural networks can improve their transparency, making them closer to fuzzy systems, while fuzzy systems can self-adapt, making them closer to neural networks.
- Neuro-Fuzzy systems can be of three main types:
- a Neuro-Fuzzy Structure-Property Model is a Neural Fuzzy Model with Fuzzy Singleton Rules described in Nomura et al., Proc. IEEE Int. Conf. Fuzzy Syst., 1320, San Diego (1992), incorporated herein by reference in its entirety.
- the Structure of a Neural Fuzzy Model with Fuzzy Singleton Rules 1100 is shown in FIG. 11. Fuzzy singleton rules are of the form:
- x 1 , . . . , x n are the input variables
- y is the output (dependent) variable
- X ij , i (1, . . . , m)
- ⁇ i ⁇ 1i ( x 1 ) ⁇ x 2i ( X 2 ) . . . ⁇ x m ( x n ) EQ. 2
- the Trainer 822 of the Neuro-Fuzzy Structure-Property Model Generator 800 preferably trains (i.e. constructs and/or refines) the Neuro-Fuzzy Structure-Property Model Structure 820 using Gradient Minimization 832 , Monte-Carlo Sampling 834 , Simulated Annealing 836 , Evolutionary Programming 838 , and/or a Genetic Algorithm 840 , in order to minimize the overall prediction error for a prescribed set of Training Cases.
- the Trainer 822 can use a pre-existing Neuro-Fuzzy Structure-Property Model 842 or can construct a new one directly from the Structure-Property Data 124 , 128 .
- training is the process of constructing and/or refining the rules, membership functions ⁇ x ij (x i ), and/or the real numbers w i .
- membership functions can be Gaussians, triangular functions, or trapezoidal functions. Alternatively, any other suitable functional form can be used.
- the Trainer 822 can be configured to retain more than one Neuro-Fuzzy Structure-Property Models 842 during the training phase (flow arrow 890 in FIG. 8).
- the Trainer 822 can be configured to retain the ten best Neuro-Fuzzy Structure-Property Models 842 discovered during the training phase.
- the Trainer 822 can be configured to retain the ten best Neuro-Fuzzy Structure-Property Models 842 discovered during training, which differ from each other by some predetermined amount.
- the difference between two models can be defined ‘genotypically’ or ‘phenotypically’, i.e. by comparing the models based either on their internal structure, or their predictions.
- Model-Specific Methods 808 refer to methods that are specific to the application domain of the model. Examples of such Model-Specific Methods 808 are methods that compute the energy of a particular molecular conformation or receptor-ligand complex such as an empirical force field or a quantum-mechanical method, methods that align the 3-dimensional structures of two or more chemical compounds based on their shape, electronic fields and/or other criteria, methods that predict the affinity and binding conformation of a ligand to a particular receptor binding site, methods that construct receptor models based on the 3-dimensional structures of known ligands, etc. Examples of such Model-Specific Methods 808 are described in greater detail below.
- Model-Specific Methods 808 can include methods that take into account the 3-dimensional structures of the chemical compounds and/or their biological targets. Such methods are of two main types: docking methods and 3D QSAR methods. Examples of such methods that can be used shall now be described.
- Docking methods are methods that attempt to predict the binding conformation between a ligand and a receptor based on their 3-dimensional fit, and/or provide an absolute or relative measure of the affinity of a particular ligand for a particular receptor, based on the quality of their 3-dimensional fit. Docking methods require a 3-dimensional model of the receptor (or parts of the receptor), which can be determined directly through X-ray crystallography, nuclear magnetic resonance, or some other 3D structure-determination technique, or indirectly through homology modeling based on the 3-dimensional structure of a related receptor, for example.
- Such energy functions are usually combined with a search/optimization method such as Gradient Minimization 832 , Monte-Carlo Sampling 834 , Simulated Annealing 836 , Evolutionary Programming 838 , and/or a Genetic Algorithm 840 , to identify one or more low energy binding conformations, and to predict the affinity of a particular ligand for a particular receptor.
- a search/optimization method such as Gradient Minimization 832 , Monte-Carlo Sampling 834 , Simulated Annealing 836 , Evolutionary Programming 838 , and/or a Genetic Algorithm 840 , to identify one or more low energy binding conformations, and to predict the affinity of a particular ligand for a particular receptor.
- a docking method can be used to derive 3-dimensional structural models of ligands bound to a particular receptor(s), and/or to obtain estimates of the binding affinity of ligands for a particular receptor(s).
- the Analysis Module 118 determines the 3-dimensional structures of selected receptor-ligand complexes from the Directed Diversity Library 108 .
- the 3-dimensional structures of the complexes are determined using X-ray crystallography, nuclear magnetic resonance, or some other suitable 3D structure-determination technique.
- Every compound in the Directed Diversity Library 108 is analyzed by the Analysis Module 118 to derive a 3-dimensional receptor map. It should be understood that it is possible that none of the compounds in a given Directed Diversity Library 108 or a sequence of Directed Diversity Libraries 108 will be analyzed by the Analysis Module 118 to obtain a 3-dimensional receptor map. It is also possible that every compound in the Directed Diversity Library 108 is analyzed by the Analysis Module 118 to derive a 3-dimensional receptor map. The determination as to which compounds from the Directed Diversity Library 108 will actually be analyzed by the Analysis Module 118 to derive a 3-dimensional receptor map can be determined manually (as specified by operator input, for example) or automatically by the Directed Diversity Manager 310 .
- the 3D Receptor Map Data 522 (FIG. 5) generated by the 3D Receptor Mapping Module 418 is used by the Trainer 822 to train (i.e. construct and/or refine) the energy function that is used by the docking method to evaluate the energy of a particular receptor-ligand binding conformation.
- the training of the energy function is carried out using Gradient Minimization 832 , Monte-Carlo Sampling 834 , Simulated Annealing 836 , Evolutionary Programming 838 , and/or a Genetic Algorithm 840 , so that the prediction error for a prescribed Training Set of 3D Receptor Map Data 522 is minimized.
- the prediction error is specified based on the difference between the actual and predicted 3-dimensional structures of the receptor-ligand complexes in the Training Set (such as the RMSD criterion, for example), and/or based on the difference between the actual and predicted affinities of the receptor-ligand complexes in the Training Set.
- the Structure-Property Model Generator 800 can also be used to generate one or more 3D QSAR models.
- 3D QSAR models are models that are based on an analysis of the 3-dimensional structures of a series of ligands whose biological activities/properties are known. Unlike docking methods, however, 3D QSAR methods do not require knowledge of the 3-dimensional structure of the receptor or receptor-ligand complex. 3D QSAR methods are reviewed in Kubinyi (Ed.), 3 D QSAR in Drug Design, ESCOM, Leiden (1993), incorporated herein by reference in its entirety.
- the Structure-Property Model Generator 800 generates Structure-Property Models 842 based on one or more 3D QSAR methods.
- 3D QSAR methods include, but are not limited to, pharmacophore identification, structural alignment and molecular superposition, molecular shape analysis, mini-receptors and pseudo-receptors, distance geometry, hypothetical active site lattice, and/or molecular interaction fields.
- any other suitable 3D QSAR method can be used.
- a 3D QSAR Model Structure 820 can be trained to minimize the prediction error using Gradient Minimization 832 , Monte-Carlo Sampling 834 , Simulated Annealing 836 , Evolutionary Programming 838 , and/or a Genetic Algorithm 840 .
- the 3D QSAR Model Structure 820 can be trained to minimize the prediction error using a combination of Gradient Minimization 832 , Monte-Carlo Sampling 834 , Simulated Annealing 836 , Evolutionary Programming 838 , and/or Genetic Algorithms 840 .
- the training process involves adjusting the free parameters of the 3D QSAR Structure-Property Model Structure 820 to minimize the mean prediction error (or some other suitable error criterion) for a Training Set of Structure-Property Data 124 , 128 within some prescribed tolerance.
- the implementation of such method should be straightforward to persons skilled in the art.
- the Trainer 822 can be configured to retain more than one 3D QSAR Models 842 during the training phase (flow arrow 890 in FIG. 8).
- the Trainer 822 can be configured to retain the ten best 3D QSAR Models 842 discovered during the training phase.
- the Trainer 822 can be configured to retain the ten best 3D QSAR Models 842 discovered during training, which differ from each other by some predetermined amount.
- the difference between two models can be defined ‘genotypically’ or ‘phenotypically’, i.e. by comparing the models based either on their internal structure, or their predictions.
- the Experiment planner 130 can define two general types of Selection Criteria 104 .
- the first type of Selection Criteria 104 represents functions or algorithms that receive a compound and/or a list of compounds from the Compound Library 102 , and that return a numerical value that represents an individual or collective property of these compounds.
- the second type of Selection Criteria 104 represents specific constraints and/or methods for generating such lists of compounds. Both types of Selection Criteria 104 are discussed below.
- the first type of Selection Criteria 104 represent functions or algorithms that receive a compound and/or a list of compounds from the Compound Library 102 , and return a numerical value that represents an individual or collective property of these compounds. Examples of such Selection Criteria 104 that can be used in a preferred embodiment shall now be described. However, it should be understood that the present invention is not limited to this embodiment, and that other suitable Selection Criteria 104 can also be used.
- One such Selection Criterion 104 receives as input a list of compounds from the Compound Library 102 , and returns the number or fraction of these compounds that are available from the Chemical Inventory 110 .
- Another such Selection Criterion 104 receives as input a list of compounds from the Compound Library 102 , and returns the number of Reagents 114 that must be mixed together in the Synthesis Module 112 in order to synthesize these compounds according to a prescribed synthetic scheme.
- Another such Selection Criterion 104 receives as input a list of compounds from the Compound Library 102 , identifies which Reagents 114 must be mixed together in the Synthesis Module 112 in order to synthesize these compounds according to a prescribed synthetic scheme, and returns the number or fraction of these Reagents 114 that are available from the Reagent Inventory 116 .
- Another such Selection Criterion 104 receives as input a list of compounds from the Compound Library 102 , identifies which Reagents 114 must be mixed together in the Synthesis Module 112 in order to synthesize these compounds according to a prescribed synthetic scheme, identifies which of these Reagents 114 need to be purchased from an external source, and returns the cost of purchasing these Reagents 114 from such an external source.
- Another such Selection Criterion 104 receives as input a list of compounds from the Compound Library 102 , and returns a numerical value that represents the molecular diversity of these compounds.
- Molecular diversity refers to the ability of a given set of compounds to exhibit a variety of prescribed structural, physical, chemical and/or biological characteristics. The field of molecular diversity is reviewed in Martin et al., Reviews in Computational Chemistry, Vol 10, VCH, Weinheim (1977), and Agrafiotis, Encyclopedia of Computational Chemistry, Wiley (in press), incorporated herein by reference in their entirety.
- Molecular diversity is a collective property, and is usually defined in a prescribed ‘chemical space’, i.e. in a space defined by a prescribed set of molecular properties or characteristics. Consequently, a diverse collection of compounds in one definition of chemical space may not necessarily be diverse in another definition of chemical space.
- the system of the present invention represents molecular diversity as a Selection Criterion 104 , i.e. as a function or algorithm that receives as input a list of compounds, and returns a numerical value that represents the molecular diversity of these compounds.
- the Diversity Criterion can be used as part of an Objective Function that is used by the Selector 106 to select a Directed Diversity Library 108 for the next iteration.
- D ⁇ ( S ) ⁇ i n ⁇ min n j ⁇ i ⁇ ⁇ d ij n ⁇ ( n - 1 ) / 2 EQ. 4
- S is a set of compounds
- D(S) is the diversity of the compounds in S
- n is the number of compounds in S
- i, j are used to index the elements of S
- d ij is the distance between the i-th and j-th compounds in S.
- the distance d ij is a Minkowski metric (e.g. Manhattan distance, Euclidean distance, ultrametric distance, etc.) in a multivariate property space.
- the property space is defined using one or more molecular features (descriptors).
- molecular features can include topological indices, physicochemical properties, electrostatic field parameters, volume and surface parameters, etc.
- these features can include, but are not limited to, molecular volume and surface areas, dipole moments, octanol-water partition coefficients, molar refractivities, heats of formation, total energies, ionization potentials, molecular connectivity indices, substructure keys, hashed fingerprints, atom pairs and/or topological torsions, atom layers, 2D and 3D auto-correlation vectors, 3D structural and/or pharmacophoric keys, electronic fields, etc.
- the Hamming distance: d ij ⁇ XOR ⁇ ( x i , x j ) ⁇ k EQ. 5
- x i and x j represent binary strings encoding the i-th and j-th structures, respectively (e.g. a substructure key, pharmacophore key, or hashed fingerprint)
- k is the length of the binary sets x i and x j
- IOR(x i , x j ) and XOR(x i , x j ) are the binary intersection, union (‘inclusive or’) and ‘exclusive or’ of x i and x j , respectively, and
- the present invention is not limited to these embodiments, and any suitable distance measure and/or definition of chemical space can alternatively be used.
- EQ. 4 exhibits quadratic time complexity, i.e. the time required to compute D(S) scales to the square of the number of compounds in the set S.
- the method can be combined with a nearest neighbor algorithm such as a k-d tree (Bentley, Comm. ACM, 18(9): 509 (1975), Friedman et al., ACM Trans. Math. Soft., 3(3): 209 (1977)), incorporated herein by reference in its entirety.
- any other suitable algorithm can be used, including, but not limited to:
- Another such Selection Criterion 104 receives as input a list of compounds from the Compound Library 102 and a list of reference compounds, and returns a numerical value that represents the molecular similarity of these compounds to the reference compounds.
- S is a set of compounds
- L is a set of reference compounds
- M(S, L) is the measure of similarity of the compounds in S to the compounds in L
- n is the number of compounds in S
- k is the number of compounds in L
- i and j are used to index the elements of S and L, respectively
- d ij is the distance between the i-th compound in S and the j-th compound in L.
- EQ. 8 represents the mean distance of a compound in S from its nearest reference compound in L.
- the distance d ij is a Minkowski metric (e.g. Manhattan distance, Euclidean distance, ultrametric distance, etc.) in a multivariate property space.
- the property space is defined using one or more molecular features (descriptors).
- molecular features can include topological indices, physicochemical properties, electrostatic field parameters, volume and surface parameters, etc.
- these features can include, but are not limited to, molecular volume and surface areas, dipole moments, octanol-water partition coefficients, molar refractivities, heats of formation, total energies, ionization potentials, molecular connectivity indices, substructure keys, hashed fingerprints, atom pairs and/or topological torsions, atom layers, 2D and 3D auto-correlation vectors, 3D structural and/or pharmacophoric keys, electronic fields, etc.
- the distance d ij can be computed by the Hamming (EQ. 5), Tanimoto (EQ. 6), or Dice coefficients (EQ. 7) using a binary molecular representation, such as a substructure key, pharmacophore key, or hashed fingerprint, for example.
- a binary molecular representation such as a substructure key, pharmacophore key, or hashed fingerprint, for example.
- the present invention is not limited to these embodiments, and any suitable definition of chemical space, distance measure, and/or Similarity Criterion can alternatively be used.
- the set of reference compounds may or may not represent real or synthesizable compounds.
- the set of reference compounds can represent an ‘ideal’ or ‘target’ set of properties that the selected compounds should possess.
- the Similarity Criterion in EQ. 8 measures how well a particular set of compounds matches a prescribed set of target properties.
- the Similarity Criterion can be used to design a set of compounds close to a reference set of compounds, or to design a set of compounds far from a reference set of compounds. For example, if EQ. 8 is used, this can be achieved by simply reversing the sign of D(S, L).
- Another Selection Criterion 104 receives as input a compound (or list of compounds) from the Compound Library 102 , and returns a confidence factor that this compound can be synthesized by the Synthesis Module 112 using a prescribed synthetic scheme.
- this confidence factor can be computed by an expert system for computer-assisted organic synthesis.
- the present invention is not limited to this embodiment.
- Another such Selection Criterion 104 receives as input a compound (or list of compounds) from the Compound Library 102 , and returns a predicted yield for the compound(s), if the compound(s) were to be synthesized by the Synthesis Module 112 according to a prescribed synthetic scheme.
- the synthetic yield can be computed by an expert system for computer-assisted organic synthesis.
- the present invention is not limited to this embodiment.
- Another such Selection Criterion 104 receives as input a list of compounds from the Compound Library 102 , and returns a numerical value that represents the ease of planning and executing the synthesis of these in the Synthesis Module 112 according to a prescribed synthetic scheme.
- a Synthetic Planning Criterion can be a value indicating if (and by how much) a particular collection of compounds exceeds the synthetic capacity of an automated robotic Synthesis Module 112 .
- Another example of such a Synthetic Planning Criterion may be the number of different synthetic schemes that must be executed by the Synthesis Module 112 in order to synthesize a particular collection of compounds.
- the present invention is not limited to these embodiments.
- Another such Selection Criterion 104 receives as input a list of compounds from the Compound Library 102 and a Structure-Property Model 842 , and returns the mean predicted property (or activity) of these compounds, as inferred by the specified model.
- a Structure-Property Model Confirmatory Criterion receives as input a list of compounds from the Compound Library 102 and a Structure-Property Model 842 , and returns the mean predicted property (or activity) of these compounds, as inferred by the specified model.
- any other suitable numerical value that can be derived from the predicted properties of the specified compounds as inferred by the specified Structure-Property Model can be used.
- the Structure-Property Model Confirmatory Criterion can return the minimum property, maximum property, or deviation of properties of the specified list of compounds, as inferred by the specified Structure-Property Model.
- the Structure-Property Models 842 can include models derived from Statistics 802 , Neural Networks 804 , Fuzzy Logic 806 , and/or Model-Specific Methods 808 , and/or models derived from a combination of Statistics 802 , Neural Networks 804 , Fuzzy Logic 806 , and/or Model-Specific Methods 808 , such as the Neuro-Fuzzy Structure Property Model 1100 described above, for example.
- These Structure-Property Models 842 can also include models derived from docking methods and/or 3D QSAR methods including, but not limited to, pharmacophore identification, structural alignment and molecular superposition, molecular shape analysis, mini-receptors and pseudo-receptors, distance geometry, hypothetical active site lattice, and/or molecular interaction fields.
- pharmacophore identification identification
- structural alignment and molecular superposition including, but not limited to, pharmacophore identification, structural alignment and molecular superposition, molecular shape analysis, mini-receptors and pseudo-receptors, distance geometry, hypothetical active site lattice, and/or molecular interaction fields.
- 3D QSAR methods including, but not limited to, pharmacophore identification, structural alignment and molecular superposition, molecular shape analysis, mini-receptors and pseudo-receptors, distance geometry, hypothetical active site lattice, and/or molecular interaction fields.
- Another such Selection Criterion 104 receives as input a compound (or list of compounds) from the Compound Library 102 and two or more Structure-Property Models 842 , and returns a numerical value that represents the ability (or collective ability) of this compound (or list of compounds) to discriminate between the specified models.
- the term ‘discriminate’ is used herein to denote the ability of a compound (or list of compounds) to distinguish between two or more models. A compound is said to possess high discriminatory ability if the models differ substantially in their predictions of the properties of that compound.
- Structure-Property Model Discriminatory Criteria 104 can be used if the Structure-Property Models 842 are weak or under-determined, for example. In such cases, it is often difficult to select which Structure-Property Model(s) 842 should be used to select the Directed Diversity Library 108 for the next iteration. Thus, it may be desirable to select compounds that can discriminate between two or more Structure-Property Models 842 , so that the Structure-Property Models 842 that reflect true correlations are reinforced, while the Structure-Property Models 842 that do not reflect true correlations are eliminated.
- Structure-Property Model Discriminatory Criterion is the difference between the minimum and maximum property predictions for a given compound as inferred by the specified Structure-Property Models 842 , or the deviation of the property predictions for a given compound as inferred by the specified Structure-Property Models 842 .
- Structure-Property Model Confirmatory Criteria 104 any form of a Structure-Property Model 842 can be used in this regard.
- the Structure-Property Models 842 can include models derived from Statistics 802 , Neural Networks 804 , Fuzzy Logic 806 , and/or Model-Specific Methods 808 , and/or models derived from a combination of Statistics 802 , Neural Networks 804 , Fuzzy Logic 806 , and/or Model-Specific Methods 808 , such as the Neuro-Fuzzy Structure Property Model 1100 described above, for example.
- These Structure-Property Models 842 can also include models derived from docking methods and/or 3D QSAR methods including, but not limited to, pharmacophore identification, structural alignment and molecular superposition, molecular shape analysis, mini-receptors and pseudo-receptors, distance geometry, hypothetical active site lattice, and/or molecular interaction fields.
- pharmacophore identification identification
- structural alignment and molecular superposition including, but not limited to, pharmacophore identification, structural alignment and molecular superposition, molecular shape analysis, mini-receptors and pseudo-receptors, distance geometry, hypothetical active site lattice, and/or molecular interaction fields.
- 3D QSAR methods including, but not limited to, pharmacophore identification, structural alignment and molecular superposition, molecular shape analysis, mini-receptors and pseudo-receptors, distance geometry, hypothetical active site lattice, and/or molecular interaction fields.
- Structure-Property Model Discriminatory Criteria can also be used to determine if a particular compound or list of compounds exhibits selective properties.
- Structure-Property Model Discriminatory Criteria 104 can be used to determine whether a particular compound can bind selectively to a specific target (also referred to herein as a Selectivity Criterion).
- s i denotes the selectivity of a particular compound for the i-th property (EQ. 9 assumes that the properties p i are normalized).
- EQ. 9 can be used to describe whether a particular compound binds selectively to the enzyme Thrombin versus the enzymes Trypsin and Urokinase, by substituting p i with the binding affinities of that compound for Thrombin, Trypsin and Urokinase as predicted by a Thrombin, Trypsin and Urokinase Structure-Property Model 842 , respectively.
- p ij is the i-th property of the compound as predicted by the j-th Structure-Property Model 842
- mean(.) is a function that returns the mean of its arguments.
- Another such Selection Criterion 104 receives as input a compound (or list of compounds) from the Compound Library 102 , and returns a value indicating whether this compound is protected by an issued US or foreign patent.
- the Experiment Planner 130 searches a patent database to determine is the specified compound (or list of compounds) has been patented or has not been patented.
- Another such Selection Criterion 104 receives as input a compound (or list of compounds) from the Compound Library 102 , and returns a value that represents the predicted bioavailability of that compound, as inferred by a suitable Bioavailability Structure-Property Model.
- Toxicity Criterion receives as input a compound (or list of compounds) from the Compound Library 102 , and returns a value that represents the predicted toxicity of that compound, as inferred by a suitable Toxicity Structure-Property Model.
- the Experiment Planner 130 can define other Selection Criteria 104 that can be derived from information pertaining to a given compound or list of compounds, and that can be used to guide the selection of the Directed Diversity Library 108 for the next iteration.
- the second type of Selection Criteria 104 represent specific constraints and/or methods for generating such lists of compounds. A few examples of such Selection Criteria 104 shall now be described.
- One such Selection Criterion 104 defines a list of compounds that should not be included in the Directed Diversity Library 108 for the next iteration (referred to herein as the Excluded Compounds Criterion).
- these compounds can be compounds whose properties of interest are already known (e.g. compounds previously analyzed by the Analysis Module 118 ).
- the Excluded Compounds can be compounds whose predicted bioavailability as predicted by a Bioavailability Structure-Property Model can be below a prescribed threshold, compounds whose predicted toxicity as predicted by a Toxicity Structure-Property Model can be above a prescribed threshold, compounds that require expensive Reagents 114 to be mixed together in order to be generated by the Synthesis Module 112 (e.g. Reagents 114 whose const exceeds a prescribed value), compounds that cannot be made in an automated or partially automated fashion by the Synthesis Module 112 , etc.
- the Excluded Compounds can also represent combinations of compounds that cannot all be part of a Directed Diversity Library 108 for the next iteration.
- the Excluded Compounds can be a set of compounds that require more than one synthetic scheme to be executed by the Synthesis Module 112 in order to be synthesized.
- the Excluded Compounds Criterion can be used to exclude combinations of compounds that cannot all be made using a single synthetic scheme, or to limit the selection of compounds for the next Directed Diversity Library 108 to a specific combinatorial library (or libraries).
- the Excluded Compounds can represent combinations of compounds that require more than a prescribed number of Reagents 114 to be mixed together by the Synthesis Module 112 in order for these compounds to be synthesized.
- the present invention is not limited to these embodiments.
- Another such Selection Criterion 104 defines the number and/or subset of Reagents 114 that can be mixed together by the Synthesis Module 112 .
- Such a Selection Criterion limits the selection of the Directed Diversity Library 108 for the next iteration to a specific number and/or subset of building blocks.
- Another such Selection Criterion 104 defines the way in which the Reagents 114 are to be mixed together by the Synthesis Module 112 .
- a Selection Criterion 104 can specify that twenty Reagents 114 must be divided into two sets of ten, and these two sets of ten Reagents 114 must be mixed together in a combinatorial fashion to generate all one hundred combinations of a combinatorial library with two variable sites (referred to as an Array Design hereafter).
- an Array Design two variable sites
- the Experiment planner 130 uses one or more Selection Criteria 104 to define one or more Objective Functions 105 .
- the Objective Function 105 represents a function and/or algorithm that receives a list of compounds from the Compound Library 102 and a list of Selection Criteria 104 , and returns a numerical value that represents a collective property of the specified compounds.
- S is a set of compounds
- c i (S) is the value of the i-th Selection Criterion 104 for the set S
- w i is a weighting factor
- f(S) is the value of the Objective Function 105 for the set of compounds S.
- any other suitable functional form can be used.
- An Objective Function 105 might combine, for example, a Molecular Diversity Criterion with a Molecular Similarity Criterion using EQ. 11.
- the weights w i determine the relative influence of the Molecular Diversity Criterion and the Molecular Similarity Criterion.
- EQ. 11 can be used to compute a numerical value that reflects the collective ability of a given set of compounds S to satisfy both the Molecular Diversity Criterion and Molecular Similarity Criterion under the specified weights w i .
- Such Objective Functions 105 that combine multiple Selection Criteria 104 are referred to hereafter as Multi-Objective Functions or Multi-Criteria Functions.
- an Objective Function 105 can include a single Selection Criterion 104 .
- an Objective Function 105 can simply return the molecular diversity of a collection of compounds, as computed by a Molecular Diversity Criterion. Examples of the use of such Objective Functions 105 and Multi-Objective Functions (not shown) to select a Directed Diversity Library 108 for the next iteration are described below.
- the Selector 106 selects a Directed Diversity Library 108 for analysis, according to the Selection Criteria 104 and any Objective Functions 105 .
- the Directed Diversity Library 108 is comprised of compounds that are optimal or nearly optimal with respect to the specified Selection Criteria 104 and Objective Functions 105 .
- the Directed Diversity Library 108 should be comprised of compounds that satisfy any constraints specified by some of these Selection Criteria 104 .
- the task of identifying an optimal or nearly optimal set of compounds for the next Directed Diversity Library 108 involves a search of all subsets of compounds from the Compound Library 102 that satisfy the constraints defined by the Experiment Planner 130 .
- the term ‘constraint’ denotes a Selection Criterion 104 that excludes certain compounds or certain combinations of compounds from being selected as part of the Directed Diversity Library 108 for the next iteration. Contrast constraints to other Selection Criteria 104 , which specify desired properties that the selected compounds should possess, either individually or collectively.
- the Directed Diversity Library 108 for the next iteration should satisfy any specified constraints and should maximize the desired properties, to the extent possible.
- the task of identifying an optimal or nearly optimal set of compounds for the next Directed Diversity Library 108 can be an enormous combinatorial problem.
- one Selection Criterion 104 limits the selection to an n-membered Compound Library 102
- another Selection Criterion 104 specifies that the size of the Directed Diversity Library 108 for the next iteration should be comprised of k compounds from the aforementioned n-membered library
- This task is combinatorially explosive because, in all but the simplest cases, N is far too large to allow for the construction and evaluation of every possible subset given current data processing technology.
- a variety of stochastic modeling techniques can be employed, that are capable of providing good approximate solutions to combinatorial problems in realistic time frames.
- the present invention envisions and includes the construction and evaluation of every individual k-membered subset once computer technology advances to an appropriate point.
- the Selector 106 receives the Selection Criteria 104 and Objective Functions 105 and returns the Directed Diversity Library 108 .
- the Selector 106 preferably uses a stochastic (or exhaustive, if possible) search/optimization technique.
- the Selector 106 is coupled to the Compound Database 134 , the Reagent Database 138 and the Structure-Property Database 126 via dedicated Servers 1204 .
- the Selector 106 can send a proposed Compound List 1302 the Servers 1204 .
- the Servers 1204 can retrieve property values for the Compound List 1302 and return them to the Selector 106 as Values 1304 .
- the Selector 106 Preferably, the Selector 106 generates an initial list of proposed compounds based on Selection Criteria 104 and then refines the list through an iterative process.
- the Selector 106 can employ Monte-Carlo Sampling 834 , Simulated Annealing 836 , Evolutionary Programming 838 , and/or a Genetic Algorithm 840 , to produce a list of compounds that best satisfy all the Selection Criteria 104 in the manner specified by the Objective Function 105 .
- the list can be refined to become the Directed Diversity Library 108 for the next iteration.
- each Server 1204 can receive a Compound List 1302 from the Selector 106 .
- the Servers 1204 can access one or more of the databases 126 , 134 and 138 to retrieve property values associated with the compounds in the Compound List 1302 , and use these property values to compute the values of the respective Selection Criteria 104 .
- the Servers 1204 can return their respective computed values as Selection Criteria Values 1304 for Compound List 1302 .
- the Server 1204 can be configured by user input. For example, a user might want to select a particular method for computing molecular diversity. Similarly, a user might want to select one or more particular Structure-Property Models 192 for predicting the properties of compounds.
- the Selector 106 selects the Directed Diversity Library 108 for the next iteration using a Monte-Carlo Sampling 834 or Simulated Annealing 836 algorithm. Operation of this embodiment is described below with reference to FIG. 18.
- a lead generation/optimization system 100 can be implemented as a fully automated system or as a partially automated system that relies, in part, on human interaction.
- human interaction can be employed to perform or assist in the functions described herein with respect to the Synthesis Module 112 and/or by the Analysis Module 118 and/or the Directed Diversity Manager 310 .
- the automated portion of the lead generation/optimization system 100 can be implemented as hardware, firmware, software or any combination thereof, and can be implemented in one or more computer systems and/or other processing systems. In one embodiment, the automated portion of the invention is directed toward one or more computer systems capable of carrying out the functionality described herein.
- an example computer system 1901 includes one or more processors, such as processor 1904 .
- Processor 1904 is connected to a communication bus 1902 .
- Various software embodiments are described in terms of this example computer system 1901 . After reading this description, it will become apparent to a person skilled in the relevant art how to implement the invention using other computer systems and/or computer architectures.
- Computer system 1902 also includes a main memory 1906 , preferably random access memory (RAM), and can also include a secondary memory 1908 .
- Secondary memory 1908 can include, for example, a hard disk drive 1910 and/or a removable storage drive 1912 , representing a floppy disk drive, a magnetic tape drive, an optical disk drive, etc.
- Removable storage drive 1912 reads from and/or writes to a removable storage unit 1914 in a well known manner.
- Removable storage unit 1914 represents a floppy disk, magnetic tape, optical disk, etc. which is read by and written to by removable storage drive 1912 .
- Removable storage unit 1914 includes a computer usable storage medium having stored therein computer software and/or data.
- secondary memory 1908 can include other similar means for allowing computer programs or other instructions to be loaded into computer system 1901 .
- Such means can include, for example, a removable storage unit 1922 and an interface 1920 .
- Examples of such can include a program cartridge and cartridge interface (such as that found in video game devices), a removable memory chip (such as an EPROM, or PROM) and associated socket, and other removable storage units 1922 and interfaces 1920 which allow software and data to be transferred from the removable storage unit 1922 to computer system 1901 .
- Computer system 1901 can also include a communications interface 1924 .
- Communications interface 1924 allows software and data to be transferred between computer system 1901 and external devices. Examples of communications interface 1924 include, but are not limited to a modem, a network interface (such as an Ethernet card), a communications port, a PCMCIA slot and card, etc.
- Software and data transferred via communications interface 1924 are in the form of signals which can be electronic, electromagnetic, optical or other signals capable of being received by communications interface 1924 .
- These signals 1926 are provided to communications interface via a channel 1928 .
- This channel 1928 carries signals 1926 and can be implemented using wire or cable, fiber optics, a phone line, a cellular phone link, an RF link and other communications channels.
- computer program medium and “computer usable medium” are used to generally refer to media such as removable storage device 1912 , a hard disk installed in hard disk drive 1910 , and signals 1926 . These computer program products are means for providing software to computer system 1901 .
- Computer programs are stored in main memory and/or secondary memory 1908 . Computer programs can also be received via communications interface 1924 . Such computer programs, when executed, enable the computer system 1901 to perform the features of the present invention as discussed herein. In particular, the computer programs, when executed, enable the processor 1904 to perform the features of the present invention. Accordingly, such computer programs represent controllers of the computer system 1901 .
- the software can be stored in a computer program product and loaded into computer system 1901 using removable storage drive 1912 , hard drive 1910 or communications interface 1924 .
- the control logic when executed by the processor 1904 , causes the processor 1904 to perform the functions of the invention as described herein.
- the automated portion of the invention is implemented primarily in hardware using, for example, hardware components such as application specific integrated circuits (ASICs).
- ASICs application specific integrated circuits
- the invention is implemented using a combination of both hardware and software.
- a lead generation/optimization system 300 includes one or more central processing units (CPUs) 302 a , 302 b and 302 c , which can be one or more of processors 1904 .
- CPUs 302 operate according to control logic 304 , 306 , and 308 , which can be software, firmware, hardware or any combination thereof.
- Processors 302 a , 302 b and 302 c can represent a single processor 302 or can represent multiple processors.
- Control logic 304 , 306 , and 308 can be executed on a single processor or on multiple processors 302 .
- Control logic 304 , 306 , and 308 preferably represent one or more computer programs such that the processor 302 operates according to software instructions contained in the control logic 304 , 306 , and 308 .
- the processor 302 and/or the control logic 304 , 306 , and 308 are implemented as a hardware state machine.
- Processor 302 a and control logic 304 collectively represent the Experiment Planner 130 .
- Processor 302 b and control logic 306 collectively represent the Selector 106 .
- Processor 302 and control logic 308 collectively represent the Synthesis Protocol Generator 202 .
- the Experiment Planner 130 , the Selector 106 , and the Synthesis Protocol Generator 202 collectively represent a Directed Diversity Manager 310 .
- Directed Diversity Manager 310 can be implemented as part of a variety of computer systems.
- Directed Diversity Manager 310 can be implemented on an Indigo, Indy, Onyx, Challenge, Power Challenge, Octane or Origin 2000 computer made by Silicon Graphics, Inc., of Mountain View, Calif.
- Another suitable form for the processor 302 is a DEC Alpha Workstation computer made by Digital Equipment Corporation of Maynard, Mass.
- Another suitable form for the Processor 302 is one of the Pentium family of processors from Intel, such as the Pentium Pro or Pentium II. Any other suitable computer system could alternatively be used.
- a Communication Medium 312 comprising one or more data buses and/or IO (input/output) interface devices, connect the Experiment Planner 130 , the Selector 106 , and the Synthesis Protocol Generator 202 to a number of peripheral devices, such as one or more Input Devices 316 , one or more Output Devices 318 , one or more Synthesis Modules 112 , one or more Analysis Modules 118 , and one or more Data Storage Devices 314 .
- peripheral devices such as one or more Input Devices 316 , one or more Output Devices 318 , one or more Synthesis Modules 112 , one or more Analysis Modules 118 , and one or more Data Storage Devices 314 .
- the Input Device(s) 316 receive input (such as data, commands, etc.) from human operators and forward such input to the Experiment Planner 130 , the Selector 106 , and/or the Synthesis Protocol Generator 202 via the Communication Medium 312 .
- Any well known, suitable input device can be used in the present invention to receive input, commands, selections, etc., from operators 317 , such as a keyboard, pointing device (mouse, roller ball, track ball, light pen, etc.), touch screen, voice recognition, etc.
- User input can also be stored and then retrieved, as appropriate, from data/command files.
- the Output Device(s) 318 output information to human operators 317 .
- the Experiment Planner 130 , the Selector 106 , and/or the Synthesis Protocol Generator 202 transfer such information to the Output Device(s) 318 via the Communication Medium 312 .
- Any well known, suitable output device can be used in the present invention, such as a monitor, a printer, a floppy disk drive, a text-to-speech synthesizer, etc.
- the Synthesis Module 112 receives Robotic Synthesis Instructions 204 (FIG. 2) from the Synthesis Protocol Generator 202 via the Communication Medium 312 .
- the Synthesis Module 112 operates according to the Robotic Synthesis Instructions 204 to selectively combine a particular set of Reagents 114 from the Reagent Inventory 116 to thereby generate the compounds from the Directed Diversity Library 108 specified by the Selector 106 , that are not retrieved from the Chemical Inventory 110 .
- Directed Diversity Manager 310 is implemented as part of a computer system
- Communication Medium 312 , Input Device(s) 316 and Output Device(s) 318 can be an integral part of the computer system.
- the Synthesis Module 112 is preferably a robot capable of mix-and-split, solid phase chemistry for coupling chemical building blocks.
- robot refers to any automated or partially automated device that automatically or semi-automatically performs functions specified by instructions such as the Robotic Synthesis Instructions 204 (FIG. 2) generated by the Synthesis Protocol Generator 202 .
- the Synthesis Module 112 preferably performs selective micro-scale solid state synthesis of a specific combinatorial library of Directed Diversity Library 108 compounds, but is not limited to this embodiment.
- the Synthesis Module 112 preferably cleaves and separates the compounds of the Directed Diversity Library 108 from support resin and distributes the compounds into preferably 96 wells with from 1 to 20 Directed Diversity Library 108 compounds per well, corresponding to an output of 96 to 1920 compounds per synthetic cycle iteration, but is not limited to this embodiment.
- This function can alternatively be performed by a well known liquid transfer robot (not shown).
- Synthesis Module(s) suitable for use with the present invention are well known and are commercially available from a number of manufacturers, such as the following: TABLE 1 Manufacturer City State Model Advanced ChemTech Louisville KY 357 MPS 390 MPS Rainin Woburn MA Symphony Perkin-Elmer Corporation Applied Foster City CA 433A Biosystems Division Millipore Bedford MA 9050 Plus
- Peptide synthesis is by no means the only approach envisioned and intended for use with the present invention.
- Other chemistries for generating the Directed Diversity Libraries 108 can also be used.
- peptoids PCT Publication No. WO 91/19735, Dec. 26, 1991
- encoded peptides PCT Publication WO 93/20242, Oct. 14, 1993
- random bio-oligomers PCT Publication WO 92/00091, Jan. 9, 1992
- benzodiazepines U.S. Pat. No. 5,288,514
- diversomeres such as hydantoins, benzodiazepines and dipeptides
- Hobbs DeWitt S.
- the Synthesis Module 112 can be a robot capable of solution-phase synthesis, or a workstation that enables manual synthesis of the compounds in the Directed Diversity Library 108 .
- a number of well-known robotic systems have also been developed for solution phase chemistries. These systems include automated workstations like the automated synthesis apparatus developed by Takeda Chemical Industries, LTD. (Osaka, Japan) and many robotic systems utilizing robotic arms (Zymate II, Zymark Corporation, Hopkinton, Mass.; Orca, Hewlett-Packard, Palo Alto, Calif.) that mimic the manual synthetic operations performed by a chemist. Any of the above devices are suitable for use with the present invention. The nature and implementation of modifications to these devices (if any) so that they can operate as discussed herein will be apparent to persons skilled in the relevant art.
- Synthesis Module 112 can be alternatively performed by human operators, aided or not aided by robots and/or computers.
- the Analysis Module(s) 118 receives the chemical compounds synthesized by the Synthesis Module(s) 112 or retrieved from the Chemical Inventory 110 .
- the Analysis Module(s) 118 analyzes these compounds to obtain Structure-Property Data 124 pertaining to the compounds.
- FIG. 4 is a more detailed structural block diagram of an embodiment of the Analysis Module(s) 118 .
- the Analysis Module(s) 118 include one or more Assay Modules 402 , such as an Enzyme Activity Assay Module 404 , a Cellular Activity Assay Module 406 , a Toxicology Assay Module 408 , and/or a Bioavailability Assay Module 410 .
- the Enzyme Activity Assay Module 404 assays the compounds synthesized by the Synthesis Module(s) 112 using well known procedures to obtain enzyme activity data relating to the compounds.
- the Cellular Activity Assay Module 406 assays the compounds using well known procedures to obtain cellular activity data relating to the compounds.
- the Toxicology Assay Module 408 assays the compounds using well known procedures to obtain toxicology data relating to the compounds.
- the Bioavailability Assay Module 410 assays the compounds using well known procedures to obtain bioavailability data relating to the compounds.
- the Enzyme Activity Assay Module 404 , Cellular Activity Assay Module 406 , Toxicology Assay Module 408 , and Bioavailability Assay Module 410 are implemented in a well known manner to facilitate the preparation of solutions, initiation of the biological or chemical assay, termination of the assay (optional depending on the type of assay) and measurement of the results, commonly using a counting device, spectrophotometer, fluorometer or radioactivity detection device. Each of these steps can be done manually (with or without the aid of robots or computers) or by robots, in a well known manner.
- Raw data is collected and stored on magnetic media under computer control or input manually into a computer.
- Useful measurement parameters such as dissociation constants or 50% inhibition concentrations can then be manually or automatically calculated from the observed data, stored on magnetic media and output to a relational database.
- the Analysis Module(s) 118 optionally include a Structure and Composition Analysis Module 414 to obtain two dimensional structure and composition data relating to the compounds.
- the structure and composition analysis module 414 is implemented using a liquid chromatograph device and/or a mass spectrometer.
- a sampling robot (not shown) transfers aliquots from the 96 wells to a coupled liquid chromatography-mass spectrometry system to perform sample analysis.
- the Structure and Composition Analysis Module 414 can be utilized to determine product composition and to monitor reaction progress by comparison of the experimental results to the theoretical results predicted by the Synthesis Protocol Generator 202 .
- the Analysis Module(s) 118 can use, but is not limited to, infra-red spectroscopy, decoding of a molecular tag, mass spectrometry (MS), gas chromatography (GC), liquid chromatography (LC), or combinations of these techniques (i.e., GC-MS, LC-MS, or MS-MS).
- the Structure and Composition Analysis Module 414 is implemented using a mass spectrometric technique such as Fast Atom Bombardment Mass Spectrometry (FABSMS) or triple quadrapole ion spray mass spectrometry, optionally coupled to a liquid chromatograph, or matrix-assisted laser desorption ionization time-of-flight mass spectrometry (MALDI-TOF MS).
- FBSMS Fast Atom Bombardment Mass Spectrometry
- MALDI-TOF MS matrix-assisted laser desorption ionization time-of-flight mass spectrometry
- Liquid chromatograph devices, gas chromatograph devices, and mass spectrometers suitable for use with the present invention are well known and are commercially available from a number of manufacturers, such as the following: TABLE 2 GAS CHROMATOGRAPHY Manufacturer City State Model Hewlett-Packard Company Palo Alto CA 5890 Varian Associates Inc. Palo Alto CA Shimadzu Scientific Inst. Columbia MD GC-17A Fisons Instruments Beverly MA GC 8000
- the Analysis Module(s) 118 can optionally further include a Chemical Synthesis Indicia Generator 412 that analyzes the structure and composition data obtained by the Structure and Composition Analysis Module 414 to determine which compounds were adequately synthesized by the Synthesis Module(s) 112 , and which compounds were not adequately synthesized by the Synthesis Module(s) 112 .
- the Chemical Synthesis Indicia Generator 412 is implemented using a processor, such as Processor 302 , operating in accordance with appropriate control logic, such as Control Logic 304 , 306 , and/or 308 .
- the Control Logic 304 , 306 , and/or 308 represents a computer program such that the Processor 302 operates in accordance with instructions in the Control Logic 304 , 306 , and/or 308 to determine which compounds were adequately synthesized by the Synthesis Module(s) 112 , and which compounds were not adequately synthesized by the Synthesis Module(s) 112 .
- Persons skilled in the relevant art will be able to produce such Control Logic 304 , 306 , and/or 308 based on the discussion of the Chemical Synthesis Indicia Generator 412 contained herein.
- the Analysis Module(s) 118 can also include a three dimensional (3D) Receptor Mapping Module 418 to obtain three dimensional structure data relating to a receptor binding site.
- the 3D Receptor Mapping Module 418 preferably determines the three dimensional structure of a receptor binding site empirically through x-ray crystallography and/or nuclear magnetic resonance spectroscopy, and/or as a result of the application of extensive 3D QSAR (quantitative structure-activity relationship) and receptor field analysis procedures, well known to persons skilled in the art and described in: “Strategies for Indirect Computer-Aided Drug Design”, Gilda H. Loew et al., Pharmaceutical Research, Volume 10, No.
- the functions performed by the Analysis Modules 118 can alternatively be performed by human operators, with or without the aid of robots and/or computers.
- the Analysis Module(s) 118 can additionally include a Physical and/or Electronic Property Analysis Module(s) 416 that analyzes the compounds synthesized by the Synthesis Module(s) 112 to obtain physical and/or electronic property data relating to the compounds.
- Such properties can include water/octanol partition coefficients, molar refractivity, dipole moment, fluorescence etc. Such properties can either be measured experimentally or computed using methods well known to persons skilled in the art.
- the Data Storage Device 314 is a read/write high storage capacity device such as a tape drive unit or a hard disk unit.
- Data storage devices suitable for use with the present invention are well known and are commercially available from a number of manufacturers, such as the 2 gigabyte Differential System Disk, part number FTO-SD8-2NC, and the 10 gigabyte DLT tape drive, part number P-W-DLT, both made by Silicon Graphics, Inc., of Mountain View, Calif.
- the Reagent Database 138 , Compound Database 134 , and Structure-Property Database 126 are stored in the Data Storage Device 314 .
- the Reagent Database 138 contains information pertaining to the reagents in the Reagent Inventory 116 .
- the Reagent Database 138 contains information pertaining to the chemical substructures, chemical properties, physical properties, biological properties, and electronic properties of the reagents in the Reagent Inventory 116 .
- the Structure-Property Database 126 stores Structure-Property Data 124 , 128 (FIG. 1) pertaining to the compounds that were synthesized by the Synthesis Module(s) 112 . Such Structure-Property Data 124 , 128 is obtained as a result of the analysis of the compounds performed by the Analysis Module(s) 118 , as described above. The Structure-Property Data 124 , 128 obtained by the Analysis Module(s) 118 is transferred to and stored in the Structure-Property Database 126 via the Communication Medium 312 .
- FIG. 5 is a more detailed block diagram of an embodiment of the Structure-Property Database 126 .
- the Structure-Property Database 126 includes a Structure and Composition Database 502 , a Physical and Electronic Properties Database 504 , a Chemical Synthesis Database 506 , a Chemical Properties Database 508 , a 3D Receptor Map Database 510 , and a Biological Properties Database 512 .
- the Structure and Composition Database 502 stores Structure and Composition Data 514 pertaining to compounds synthesized by the Synthesis Module(s) 112 and analyzed by the Analysis Module(s) 118 .
- the Physical and Electronic Properties Database 504 , Chemical Synthesis Database 506 , Chemical Properties Database 508 , 3D Receptor Map Database 510 , and Biological Properties Database 512 store Physical and Electronic Properties Data 516 , Chemical Synthesis Data 518 , Chemical Properties Data 520 , 3D Receptor Map Data 522 , and Biological Properties Data 524 , respectively, pertaining to compounds retrieved from the Chemical Inventory 110 and/or synthesized by the Synthesis Module(s) 112 , and analyzed by the Analysis Module(s) 118 .
- the Structure and Composition Data 514 , Physical and Electronic Properties Data 516 , Chemical Synthesis Data 518 , Chemical Properties Data 520 , 3D Receptor Map Data 522 , and Biological Properties Data 524 collectively represent the Structure-Property Data 124 , 128 .
- the Structure and Composition Database 502 , Physical and Electronic Properties Database 504 , Chemical Synthesis Database 506 , Chemical Properties Database 508 , 3D Receptor Map Database 510 , and Biological Properties Database 512 each include one record for each chemical compound retrieved from the Chemical Inventory 110 and/or synthesized by the Synthesis Module(s) 112 and analyzed by the Analysis Module(s) 118 (other database structures could alternatively be used).
- Steps 602 - 618 in process flowchart 600 represent a preferred method for identifying chemical compounds having desired properties.
- the lead generation/optimization system 100 implements an iterative process where, during each iteration:
- a set of Selection Criteria 104 and/or one or more Objective Functions are defined (step 602 );
- steps 602 - 618 of flowchart 600 are performed during each iteration of the iterative process as indicated by control line 620 in flowchart 600 .
- the process begins at step 602 , where the Experiment Planner 130 defines Selection Criteria 104 and/or one or more Objective Functions 105 .
- the Experiment Planner 130 defines Selection Criteria 104 and/or Objective Functions 105 based on current Structure-Property Data 124 and Historical Structure-Property Data 128 . Historical Structure-Property Data 128 can be identified from previous iterations of the lead generation/optimization system 100 and/or from other independent experiments.
- the Experiment Planner 130 can also define Selection Criteria 104 and/or Objective Functions 105 based on one or more of: Compound Data 132 ; Reagent Data 136 ; Desired Properties 120 ; and Structure-Property Models 192 .
- the Selection Criteria 104 and/or Objective Functions 105 are sent to the Selector 106 . Additional details of step 602 are provided below, in the description of the next iteration of the process.
- step 604 the Selector 106 selects a Directed Diversity Library 108 .
- the Selector 106 uses the Selection Criteria 104 and/or Objective Functions 105 that were defined by the Experiment Planner 130 in step 603 .
- the Selector 106 can use a stochastic (or exhaustive, if possible) search/optimization technique.
- the search can include, but is not limited to, Monte-Carlo Sampling 834 , Simulated Annealing 836 , Evolutionary Programming 838 , and/or a Genetic Algorithm 840 , to produce a list of compounds that best satisfy all the Selection Criteria 104 in the manner specified by the Objective Function 105 , and will comprise the Directed Diversity Library 108 for the next iteration.
- the Selector 106 selects the Directed Diversity Library 108 for the next iteration using a Monte-Carlo Sampling 834 or Simulated Annealing 836 algorithm.
- a collection of compounds that satisfies all the constraints specified by the Experiment Planner 130 represents a ‘state’, and is encoded in a manner that is most appropriate given those constraints.
- the precise encoding of a state can vary, depending on some of the Selection Criteria 104 specified by the Experiment Planner 130 .
- step 604 the process of step 604 is illustrated in greater detail for where a Monte-Carlo Sampling 834 or Simulated Annealing 836 algorithm is used.
- a state i.e., the collection of compounds that will comprise the Directed Diversity Library 108 for the next iteration, is initialized preferably at random.
- Other initialization approaches could alternatively be used, such as biased or human input.
- the state is initialized by selecting a set of compounds and/or a set of reagents preferably at random.
- steps 1806 - 1816 the state is gradually refined by a series of small stochastic ‘steps’.
- step means a stochastic (random or partially random) modification of the state's composition, i.e. the compounds comprising the state.
- step 1806 the state is modified. Modification can include sending an randomly generated state to the Server 1204 as Compound List 1302 and receiving Values 1304 for the compounds in the Compound List 1302 .
- the initial state can then be modified, for example, by replacing a compound currently in the state with a compound not currently in the state, or by replacing a building block of one or more compounds currently in the state.
- the new state can be sent to the Server 1204 as Compound List 1302 and Values 1304 can be returned for the new state.
- the quality of the new state can be assessed using the Objective Function 105 specified by the Experiment Planner 130 .
- the quality can be assessed by comparing the new state to the old state using the Metropolis criterion. Alternatively, any other suitable comparison criterion can be used.
- step 1810 if the new state is approved, processing proceeds to step 1812 , where the Selector 106 replaces the old state with the new state. If the new state is not approved, processing proceeds to step 1814 , where the Selector 106 discards the new state.
- step 1816 the Selector 106 determines whether to repeat steps 1806 - 1814 or use the current state as the next Directed Diversity Library 108 .
- Steps 1806 - 1816 can be performed under control of a Monte-Carlo Sampling protocol 834 , a Simulated Annealing protocol 836 , or variants thereof, which are well known to persons skilled in the art. However, it should be understood that the system of the present invention is not limited to these embodiments.
- the Selector 106 can use Evolutionary Programming 838 or Genetic Algorithms 840 , where the population of states (or chromosomes) is initialized at random and is allowed to evolve through the repeated application of genetic operators, such as crossover, mutation, and selection.
- the genetic operators alter the composition of the states, either individually (e.g. mutation), or by mixing elements of two or more states (e.g. crossover) in some prescribed manner.
- Selection is probabilistic, and is based on the relative fitness of these states as measured by the Objective Function 105 .
- the states (or chromosomes) are encoded in a manner that is most appropriate given the constraints specified by the Experiment Planner 130 .
- the Selector 106 can also use any other suitable search/optimization algorithm to identify the optimal (or a nearly optimal) Directed Diversity Library 108 .
- step 604 can vary, depending on, among other things, the Selection Criteria 104 specified by the Experiment Planner 130 .
- the implementation of these methods should be straightforward to persons skilled in the art.
- the Selector 106 uses Simulated Annealing 836 to identify a set of 50 compounds from a 10,000-membered Compound Library 102 that maximize the Objective Function 105 given by EQ. 13:
- the system encodes a state by a pair of index lists, one containing the indices of the compounds currently in the set (Included Set), and another containing the indices of the compounds not currently in the set (Excluded Set).
- a step i.e. a modification of the composition of the current state is performed by swapping one or more indices from the Included and Excluded Sets. The search was carried out in 30 temperature cycles, using 1,000 sampling steps per cycle, an exponential cooling schedule, and the Metropolis acceptance criterion.
- FIG. 14 The results of the simulation are shown in FIG. 14, where, as the simulation progresses, the selected compounds assume an optimal distribution, i.e. the diversity (spread) of these compounds is maximized.
- the set of compounds highlighted in FIG. 14 represent a Directed Diversity Library 108 for the next iteration, selected according to the prescribed Selection Criteria 104 and the Objective Function 105 in EQ. 13.
- the Selector 106 uses Simulated Annealing 836 to identify a set of 50 compounds from a 10,000-membered Compound Library 102 that maximize the Objective Function 105 given by EQ. 14:
- FIG. 14 The results of the simulation are shown in FIG. 14. As can be seen from FIG. 15, as the simulation progresses, the selected compounds assume an optimal distribution, i.e. the selected compounds cluster tightly around the specified reference compounds.
- the set of compounds highlighted in FIG. 15 represent a Directed Diversity Library 108 for the next iteration, selected according to the prescribed Selection Criteria 104 and the Objective Function 105 in EQ. 14.
- the Selector 106 uses Simulated Annealing 836 to identify a set of 50 compounds from a 10,000-membered Compound Library 102 that maximize the Objective Function 105 given by EQ. 15:
- EQ. 15 represents a Multi-Objective Function, i.e. an Objective Function 105 that combines two, rather than one, Selection Criteria 104 .
- the Objective Function 105 in EQ. 15 represents an Objective Function 105 that combines molecular diversity and molecular similarity. That is, the Objective Function 105 in EQ. 15 favors solutions that are both diverse and focused.
- FIG. 16 As can be seen from FIG. 16, as the simulation progresses, the selected compounds assume an optimal distribution, i.e. the selected compounds become both diverse and focused.
- the set of compounds highlighted in FIG. 16 represent a Directed Diversity Library 108 for the next iteration, selected according to the prescribed Selection Criteria 104 and the Objective Function 105 in EQ. 15.
- steps 606 and 608 compounds specified in the Directed Diversity Library 108 are retrieved or synthesized. Steps 606 and 608 are said to be optional because one or both of steps 606 and 608 can be performed. In one embodiment, steps 606 and 608 are both employed: when compounds specified in the Directed Diversity Library 108 were previously synthesized, they are retrieved from a chemical inventory in step 606 rather than re-synthesized; when compounds specified in the Directed Diversity Library 108 were not previously synthesized, they are synthesized in step 608 . Alternatively, either of steps 606 and 608 could be employed exclusively or could be employed with other methods.
- the Directed Diversity Manager 310 retrieves compounds specified in the Directed Diversity Library 108 that are available in the Chemical Inventory 110 .
- the Chemical Inventory 110 represents any source of available compounds including, but not limited to, a corporate chemical inventory, a supplier of commercially available chemical compounds, a natural product collection, etc.
- the Directed Diversity Manager 310 searches the Chemical Inventory 110 to identify and retrieve existing compounds of the Directed Diversity Library 108 .
- a subset of the Directed Diversity Library 108 as determined by user input, for example, can be searched for and retrieved from the Chemical Inventory 110 .
- step 608 the compounds in the Directed Diversity Library 108 that were not retrieved from the Chemical Inventory 110 in step 606 , are synthesized.
- step 608 is performed by one or more are automated robotic Synthesis Modules 112 that receive Robotic Synthesis Instructions 204 from the Synthesis Protocol Generator 202 .
- the Directed Diversity Manager 310 selects Reagent Data 136 from the Reagent Database 138 and generates Robotic Synthesis Instructions 204 .
- the Reagent Data 136 identifies Reagents 114 in the Reagent Inventory 116 that are to be mixed by the one or more Synthesis Modules 112 .
- the Robotic Synthesis Instructions 204 identify the manner in which such Reagents 114 are to be mixed.
- the manner of mixing can include identifying Reagents 114 to be mixed together, and specifying chemical and/or physical conditions for mixing, such as temperature, length of time, stirring, etc.
- the one or more Synthesis Modules 112 synthesize compounds in the Directed Diversity Library 108 , using selected Reagents 114 from the Reagent Inventory 116 , in accordance with the Robotic Synthesis Instructions 204 .
- optional step 608 is performed semi-automatically or manually.
- the chemical compounds that were retrieved from the Chemical Inventory 110 and/or synthesized by the Synthesis Modules 112 (or synthesized manually) collectively represent physical compounds from a Directed Diversity Library 108 .
- one or more Analysis Modules 118 analyze the compounds in the Directed Diversity Library 108 to obtain Structure-Property data 124 , pertaining to the compounds.
- the Analysis Modules 118 receive compounds that were retrieved from the Chemical Inventory 110 in step 606 and compounds that were synthesized by the Synthesis Modules 112 in step 610 .
- one or more Assay Modules 402 can robotically assay the chemical compounds in the Directed Diversity Library 108 to obtain Physical Properties Data 516 , Chemical Properties Data 520 and Biological Properties Data 524 , pertaining to the chemical compounds.
- the Enzyme Activity Assay Module 404 can robotically assay the chemical compounds using well known assay techniques to obtain enzyme activity data relating to the compounds.
- Enzyme activity data can include inhibition constants K i , maximal velocity V max , etc.
- the Cellular Activity Assay Module 406 can robotically assay the compounds using well known assay techniques to obtain cellular activity data relating to the compounds.
- the Toxicology Assay Module 408 can robotically assay the compounds using well known assay techniques to obtain toxicology data relating to the compounds.
- the Bioavailability Assay Module 410 can robotically assay the compounds using well known assay techniques to obtain bioavailability data relating to the compounds.
- the enzyme activity data, cellular activity data, toxicology data, and bioavailability data represent the Physical Properties Data 516 , Chemical Properties Data 520 and Biological Properties Data 524 .
- Physical Properties Data 516 can be obtained by the Physical and Electronic Property Analysis Module 416 .
- the Physical and Electronic Properties Analysis Module 416 can analyze the chemical compounds contained in the Directed Diversity Library 108 to obtain Electronic Properties Data 516 pertaining to the chemical compounds.
- the Electronic Properties Data 516 is stored in the Physical and Electronic Properties Database 504 during step 614 .
- the 3D receptor mapping module 418 can obtain 3D Receptor Map Data 522 representing the three-dimensional structure pertaining to a receptor binding site being tested.
- the 3D Receptor Mapping Module 418 preferably determines the three-dimensional structure of the receptor binding site empirically through X-ray crystallography, nuclear magnetic resonance spectroscopy, and/or as a result of the application of 3D QSAR and receptor field analysis procedures.
- the Receptor Map Data 522 is stored in the Receptor Map Database 510 during step 614 .
- an optional Structure and Composition Analysis Module 414 can analyze the chemical compounds contained in the Directed Diversity Library 108 to obtain Structure and Composition Data 514 pertaining to the chemical compounds.
- the Structure and Composition Data 514 is stored in the Structure and Composition Database 502 during step 614 .
- step 612 is performed robotically, under control of one or more computer programs. Alternatively, step 612 can be performed manually or by some combination of the two.
- the one or more Analysis Modules 118 store the Structure-Property Data 124 obtained in step 612 .
- the Structure-Property Data 124 can be stored in the Structure-Property Database 126 of the Data Storage Device 314 .
- the Structure-Property Database 126 can also store Historical Structure-Property Data 128 .
- Historical Structure-Property Data 128 can be associated with chemical compounds that were synthesized and analyzed in previous iterations by the Synthesis Modules 112 and the Analysis Modules 118 , respectively.
- Historical Structure-Property Data 128 can also include other pertinent Structure-Property Data obtained from independent experiments.
- the Physical Properties Data 516 can be stored in the Physical and Electronic Properties Database 504
- the Chemical Properties Data 520 can be stored in the Chemical Properties Database 508
- the Biological Properties Data 524 can be stored in the Biological Properties Database 512 .
- the process begins at step 702 , where the Structure and Composition Analysis Module 414 analyzes chemical compounds to obtain Structure and Composition Data 514 .
- the Structure and Composition Analysis Module 414 analyzes the chemical compounds using well known mass spectra analysis techniques.
- step 704 the Structure and Composition Data 514 is stored in a Structure and Composition Database 502 that forms part of the Structure-Property Database 126 .
- the Chemical Synthesis Indicia Generator 412 retrieves predicted Structure and Composition Data 514 relating to the compounds.
- the data is retrieved from the Structure-Property Database 126 .
- the retrieved data includes predicted mass and structural data for the compounds.
- step 708 the Chemical Synthesis Indicia Generator 412 compares the measured Structure and Composition Data 514 to the predicted data to generate Chemical Synthesis Indicia 518 . Based on the comparisons, the Chemical Synthesis Indicia 518 identifies chemical compounds that were adequately synthesized and chemical compounds that were not adequately synthesized.
- the Chemical Synthesis Indicia Generator 412 compares the measured mass of each compound to the predicted mass of the compound. If the measured mass and the predicted mass differ by less than a predetermined amount, the Chemical Synthesis Indicia Generator 412 determines that the chemical compound was adequately synthesized. If the measured mass and the predicted mass differ by more than the predetermined amount, the Chemical Synthesis Indicia Generator 412 determines that the chemical compound was not adequately synthesized. This predetermined amount can depend on the sensitivity of the instrument used for the structure and composition analysis.
- step 710 the Chemical Synthesis Indicia Generator 412 generates Chemical Synthesis Indicia 518 pertaining to the compounds in the Directed Diversity Library 108 , and stores such Chemical Synthesis Indicia 518 in the Chemical Synthesis Database 506 .
- the Chemical Synthesis Indicia 518 for each compound is a first value (such as “1”) if the compound was adequately synthesized (as determined in step 708 ), and is a second value (such as “0”) if the compound was not adequately synthesized.
- step 710 control passes to step 616 .
- the Directed Diversity Manager 310 compares the Structure-Property Data 124 , pertaining to the compounds in the Directed Diversity Library 108 , to the Desired Properties 120 .
- the Desired Properties 120 might have been entered by human operators using the input device 316 , or read from a computer file.
- the Directed Diversity Manager 310 compares the data to determine whether any of the compounds substantially conforms to the Desired Properties 120 . When a compound substantially conforms to the Desired Properties 120 , it can be classified as a Lead compound 122 .
- the compounds can be rated in order to select new Leads 122 .
- the Directed Diversity Manager 310 can assign one or more rating factors to each compound in the Directed Diversity Library 108 , based on how closely the compound's properties match the Desired Properties 120 .
- the one or more rating factors can be represented by numerical or linguistic values. Numerical rating factors represent a sliding scale between a low value, corresponding to a property profile far from the Prescribed Set of Properties 120 , and a high value, corresponding to a property profile identical, or very similar, to the Prescribed Set of Properties 120 . Linguistic rating factors can include values such as “poor,” “average,” “good,” “very good,” etc.
- one or more Structure-Property Models 192 are generated and/or refined. Structure-Property Models 192 are generated and/or refined to conform to observed Structure-Property Data 124 and Historical Structure-Property Data 128 .
- the resulting Structure-Property Models 192 can be used by the Experiment Planner 130 and/or the Selector 106 to predict the properties of compounds in the Compound Library 102 whose real properties are hitherto unknown.
- the Structure-Property Models can be used by the Experiment Planner 130 to define and/or refine a set of Selection Criteria 104 that depend upon the predictions of the Structure-Property Models.
- Model Structures 820 are defined by Structure-Property Model Generator 800 .
- the Structure-Property Model Generator 800 can defines Model Structures 820 based on Statistics 802 , Neural Networks 804 , Fuzzy Logic 806 , and/or other Model-Specific Methods 808 .
- the Model Structure 820 can combine elements of Statistics 802 , Neural Networks 804 , Fuzzy Logic 806 , and/or Model-Specific Methods 808 .
- Such Model Structures 820 are hereafter referred to as Hybrid Model Structures or Hybrid Models.
- step 1704 Structure-Property Model Generator 800 receives Structure-Property Data 124 and 128 .
- Structure-Property Data 124 and 128 is separated into Structure Data 824 and Property Data 828 .
- step 1706 Structure Data 824 is encoded as Encoded Structure Data 826 .
- Structure Data 824 is encoded in a form that is appropriate for the particular Model Structure 820 .
- Property Data 828 is encoded as Encoded Property Data 830 .
- Property Data 828 is encoded in a form that is appropriate for the particular Model Structure.
- the Trainer 822 optimizes, or trains, the Model Structure 820 that was generated in step 1702 .
- Trainer 822 uses Encoded Structure Data 826 , and Encoded Property Data 830 to derive one or more Structure-Property Models 842 .
- Trainer 822 uses one or more of Gradient Minimization 832 , Monte-Carlo Sampling 834 , Simulated Annealing 836 , Evolutionary Programming 838 , and/or a Genetic Algorithm 840 , depending upon the type of Structure Model 820 that is being optimized.
- step 618 is complete and control passes back to step 602 for defining another set of Selection Criteria 104 and/or Objective Functions 105 and then to step 604 for selecting another Directed Diversity Library 108 to analyze.
- the Directed Diversity Library 108 for the next iteration can be selected using one or more Selection Criteria 104 , one or more Objective Functions 105 , and one or more selection phases.
- a selection phase refers to a single run of the Selector 106 using a Monte-Carlo Sampling 834 , Simulated Annealing 836 , Evolutionary Programming 838 , and/or a Genetic Algorithm 840 .
Abstract
Description
- This application is related to commonly owned U.S. provisional patent application No. 60/030,187, filed Nov. 4, 1996.
-
- The present invention relates generally to the generation of chemical entities with defined physical, chemical and/or bioactive properties, and more particularly, to iterative selection and testing of chemical entities.
-
- Conventionally, new chemical entities with useful properties are generated by identifying a chemical compound (called a “lead compound”) with some desirable property or activity, creating variants of the lead compound, and evaluating the property and activity of those variant compounds. Examples of chemical entities with useful properties include paints, finishes, plasticizers, surfactants, scents, flavorings, and bioactive compounds, but can also include chemical compounds with any other useful property that depends upon chemical structure, composition, or physical state. Chemical entities with desirable biological activities include drugs, herbicides, pesticides, veterinary products, etc. There are a number of flaws with this conventional approach to lead generation, particularly as it pertains to the discovery of bioactive compounds.
- One deficiency pertains to the first step of the conventional approach, i.e., the identification of lead compounds. Traditionally, the search for lead compounds has been limited to an analysis of compound banks, for example, available commercial, custom, or natural products chemical libraries. Consequently, a fundamental limitation of the conventional approach is the dependence upon the availability, size, and structural diversity of these chemical libraries. Although chemical libraries cumulatively total an estimated 9 million identified compounds, they reflect only a small sampling of all possible organic compounds with molecular weights less than 1200. Moreover, only a small subset of these libraries is usually accessible for biological testing. Thus, the conventional approach is limited by the relatively small pool of previously identified chemical compounds which may be screened to identify new lead compounds.
- Also, compounds in a chemical library are traditionally screened (for the purpose of identifying new lead compounds) using a combination of empirical science and chemical intuition. However, as stated by Rudy M. Baum in his article “Combinatorial Approaches Provide Fresh Leads for Medicinal Chemistry,” C&EN, Feb. 7, 1994, pages 20-26, “chemical intuition, at least to date, has not proven to be a particularly good source of lead compounds for the drug discovery process.”
- Another deficiency pertains to the second step of the conventional approach, i.e., the creation of variants of lead compounds. Traditionally, lead compound variants are generated by chemists using conventional chemical synthesis procedures. Such chemical synthesis procedures are manually performed by chemists. Thus, the generation of lead compound variants is very labor intensive and time consuming. For example, it typically takes many chemist years to produce even a small subset of the compound variants for a single lead compound. Baum, in the article referenced above, states that “medicinal chemists, using traditional synthetic techniques, could never synthesize all of the possible analogs of a given, promising lead compound.” Thus, the use of conventional, manual procedures for generating lead compound variants operates to impose a limit on the number of compounds that can be evaluated as new drug leads. Overall, the traditional approach to new lead generation is an inefficient, labor-intensive, time consuming process of limited scope.
- Recently, attention has focused on the use of combinatorial chemical libraries to assist in the generation of new chemical compound leads. A combinatorial chemical library is a collection of diverse chemical compounds generated by either chemical synthesis or biological synthesis by combining a number of chemical “building blocks” such as reagents. For example, a linear combinatorial chemical library such as a polypeptide library is formed by combining a set of chemical building blocks called amino acids in every possible way for a given compound length (i.e., the number of amino acids in a polypeptide compound). Millions of chemical compounds theoretically can be synthesized through such combinatorial mixing of chemical building blocks. For example, one commentator has observed that the systematic, combinatorial mixing of 100 interchangeable chemical building blocks results in the theoretical synthesis of 100 million tetrameric compounds or 10 billion pentameric compounds (Gallop et al., “Applications of Combinatorial Technologies to Drug Discovery, Background and Peptide Combinatorial Libraries,” J. Med. Chem. 37, 1233-1250 (1994)).
- To date, most work with combinatorial chemical libraries has been limited only to peptides and oligonucleotides for the purpose of identifying bioactive agents; little research has been performed using non-peptide, non-nucleotide based combinatorial chemical libraries. It has been shown that the compounds in peptide and oligonucleotide based combinatorial chemical libraries can be assayed to identify ones having bioactive properties. However, there is no consensus on how such compounds (identified as having desirable bioactive properties and desirable profile for medicinal use) can be used.
- Some commentators speculate that such compounds could be used as orally efficacious drugs. This is unlikely, however, for a number of reasons. First, such compounds would likely lack metabolic stability. Second, such compounds would be very expensive to manufacture, since the chemical building blocks from which they are made most likely constitute high priced reagents. Third, such compounds would tend to have a large molecular weight, such that they would have bioavailability problems (i.e., they could only be taken by injection).
- Others believe that the compounds from a combinatorial chemical library that are identified as having desirable biological properties could be used as lead compounds. Variants of these lead compounds could be generated and evaluated in accordance with the conventional procedure for generating new bioactive compound leads, described above. However, the use of combinatorial chemical libraries in this manner does not solve all of the problems associated with the conventional lead generation procedure. Specifically, the problem associated with manually synthesizing variants of the lead compounds is not resolved.
- In fact, the use of combinatorial chemical libraries to generate lead compounds exacerbates this problem. Greater and greater diversity has often been achieved in combinatorial chemical libraries by using larger and larger compounds (that is, compounds having a greater number of variable subunits, such as pentameric compounds instead of tetrameric compounds in the case of polypeptides). However, it is more difficult, time consuming, and costly to synthesize variants of larger compounds. Furthermore, the real issues of structural and functional group diversity are still not directly addressed; bioactive agents such as drugs and agricultural products possess diversity that could never be achieved with available peptide and oligonucleotide libraries since the available peptide and oligonucleotide components only possess limited functional group diversity and limited topology imposed through the inherent nature of the available components. Thus, the difficulties associated with synthesizing variants of lead compounds are exacerbated by using typical peptide and oligonucleotide combinatorial chemical libraries to produce such lead compounds. The issues described above are not limited to bioactive agents but rather to any lead generating paradigm for which a chemical agent of defined and specific activity is desired.
- Additional drawbacks to conventional systems are described in U.S. Pat. No. 5,574,656, titled, “System and Method of Automatically Generating Chemical Compounds with Desired Properties,” issued Nov. 12, 1996, incorporated herein in its entirety by reference.
- Thus, the need remains for a system and method for efficiently and effectively generating new leads designed for specific utilities.
- The present invention is an automatic, partially automatic, and/or manual iterative system, method and/or computer program product for generating chemical entities having desired or specified physical, chemical, functional, and/or bioactive properties. The present invention is also directed to the chemical entities produced by this system, method and/or computer program product. In an embodiment, the following steps are performed during each iteration:
- (1) identify a set of compounds for analysis;
- (2) collect, acquire or synthesize the identified compounds;
- (3) analyze the compounds to determine one or more physical, chemical and/or bioactive properties (structure-property data); and
- (4) use the structure-property data to identify another set of compounds for analysis in the next iteration.
- For purposes of illustration, the present invention is described herein with respect to the production of drug leads. However, the present invention is not limited to this embodiment.
- In one embodiment, the system and computer program product includes an Experiment Planner, a Selector, a Synthesis Module and an Analysis Module. The system also includes one or more databases, such as a Structure-Property database, a Compound Database, a Reagent database and a Compound Library.
- The Experiment Planner receives, among other things, Historical Structure-Property data from the Structure-Property database and current Structure-Property data that was generated by the Analysis Module during a prior iteration of the invention.
- The Experiment Planner generates Selection Criteria for use by the Selector. One or more of the Selection Criteria can be combined into one or more Objective Functions. An Objective Function describes the collective ability of a given subset of compounds from the Compound Library to simultaneously satisfy all the prescribed Selection Criteria. An Objective Function defines the influence of each Selection Criterion in the final selection. The Selection Criteria and the exact form of the Objective Function can be specified by a human operator or can be automatically generated by a computer program or other process, or can be specified via human/computer interaction.
- The one or more Selection Criteria and/or Objective Functions can represent: one or more desired characteristics that the resulting compounds should possess, individually or collectively; one or more undesired characteristics that the resulting compounds should not possess, individually or collectively; and/or one or more constraints that exclude certain compounds and/or combinations of compounds in order to limit the scope of the selection. The Selection Criteria can be in the form of mathematical functions or computer algorithms, and can be calculated using a digital computer.
- The Selector receives the Selection Criteria and Objective Functions and searches the Compound Library to identify a subset of compounds that maximizes or minimizes the Objective Functions. The Compound Library can be a collection of pre-existing or virtual chemical compounds.
- The Selector identifies a smaller subset of these compounds, referred to herein as a Directed Diversity Library, based on one or more Selection Criteria and/or Objective Functions. The number of compounds in this subset can be specified by the operator or can be determined automatically or partially automatically within any limits specified by the operator.
- The Selection Criteria can be applied either simultaneously or sequentially. For example, in one embodiment, one part of the Directed Diversity Library can be selected based on a first set of Criteria and/or Objective Function, while another part of that Directed Diversity Library can be selected based on a second set of Selection Criteria and/or Objective Function.
- The compounds comprising the Directed Diversity Library are then collected, acquired or synthesized, and are analyzed to evaluate their physical, chemical and/or bioactive properties of interest. In one embodiment, when a compound in a Directed Diversity Library is available in a Chemical Inventory, the compound is retrieved from the Chemical Inventory. This avoids unnecessary time and expense of synthesizing a compound that is already available. Compounds that are not available from a Chemical Inventory are synthesized in the Synthesis Module.
- In one embodiment, the Synthesis Module is an automated robotic module that receives synthesis instructions from a Synthesis Protocol Generator. Alternatively, synthesis can be performed manually or semi-automatically.
- The Analysis Module receives the compounds of the Directed Diversity Library from the Chemical Inventory and/or the Synthesis Module. The Analysis Module analyzes the compounds and outputs Structure-Property data. The Structure-Property data is provided to the Experiment Planner and is also stored in the Structure-Property database.
- The Experiment Planner defines one or more new Selection Criteria and/or Objective Functions for the next iteration of the invention. The new Selection Criteria and/or Objective Functions can be defined through operator input, through an automated process, through a partially automated process, or any combination thereof.
- In one embodiment, current and historical Structure-Property data are provided to an optional Structure-Property Model Generator. The Structure-Property data can include structure-property activity data from all previous iterations or from a subset of all previous iterations, as specified by user input, for example.
- The Structure-Property Model Generator generates Structure-Property Models that conform to the observed data. The Structure-Property Models are provided to the Experiment Planner which uses the Models to generate subsequent Selection Criteria and/or Objective Function. The Selection Criteria and/or Objective Functions are provided to the Selector which selects the next Directed Diversity Library therefrom.
- In one embodiment, the functions of the Experiment Planner, the Selector and the optional Synthesis Protocol Generator are performed by automated machines under the control of one or more computer programs executed on one or more processors and/or human operators. Alternatively, one or more of the functions of the Experiment Planner, the Selector and the optional Synthesis Protocol Generator can be performed manually.
- The functions of the Synthesis Module and the Analysis Module can be performed manually, robotically, or by any combination thereof.
- Further features and advantages of the present invention, as well as the structure and operation of various embodiments of the present invention, are described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers indicate identical or functionally similar elements. Also, the leftmost digit(s) of the reference numbers identify the drawings in which the associated elements are first introduced.
- The present invention will be described with reference to the accompanying drawings, wherein:
- FIG. 1 is a flow diagram depicting the flow of data and materials among elements of a lead generation system, in accordance with the present invention;
- FIG. 2 is a flow diagram depicting the flow of data and materials among elements of an embodiment of the lead generation system, in accordance with the present invention;
- FIG. 3 is a block diagram of the lead generation system, in accordance with the present invention;
- FIG. 4 is a block diagram of an analysis module that can be employed by the lead generation system illustrated in FIG. 3;
- FIG. 5 is a block diagram of a structure-property database that can be employed by the lead generation system illustrated in FIG. 3;
- FIG. 6 is a process flowchart illustrating an iterative method for identifying chemical compounds having desired properties;
- FIG. 7 is a process flowchart illustrating a method for performing
steps - FIG. 8 is a flow diagram depicting the flow of data among elements of a structure-property model generator that can be employed by a lead generation system;
- FIG. 9 is an illustration of a generalized regression neural network model that can be generated by the structure-property model generator illustrated in FIG. 8 and that can employ a K-Nearest-Neighbor classifiers;
- FIG. 10 is a flow diagram depicting the flow of data among elements of a fuzzy structure-property model than can be generated by the structure-property model generator illustrated in FIG. 8;
- FIG. 11 is a Neuro-Fuzzy structure-property model that can be generated by the structure-property model generator illustrated in FIG. 8;
- FIG. 12 is a flow diagram depicting the flow of data among an experiment planner and a selector in a lead generation system;
- FIG. 13 is a flow diagram depicting the flow of data during selection of a directed diversity library;
- FIG. 14 illustrates a distribution of compounds in a directed diversity library;
- FIG. 15 illustrates another distribution of compounds in a directed diversity library;
- FIG. 16 illustrates another distribution of compounds in a directed diversity library;
- FIG. 17 is a process flowchart illustrating a method for generating structure-property models in accordance with the present invention;
- FIG. 18 is a process flowchart illustrating a method for selecting a directed diversity library, in accordance with the present invention; and
- FIG. 19 is a block diagram of a computer system that can be used to implement one or more portions of the lead generation system illustrated in FIG. 3.
- 1. General Overview
- 2. Example Environment
- 3. Structure-Property Models
- a. Statistical Models
- b. Neural Network Models
- i. Generalized Regression Neural Networks
- c. Fuzzy Logic Models
- d. Hybrid Models
- i. Neuro-fuzzy Models
- e. Model-Specific Methods
- i. Docking Models
- ii. 3D QSAR Models
- 4.
Experiment Planner 130 - a.
Selection Criteria 104 - i. First Type of
Selection Criteria 104 - ii. Second Type of
Selection Criteria 104 - b.
Objective Functions 105 - 5.
Selector 106 - 6. Structure of the Present Invention
- 7. Operation of the Present Invention
- 8. Conclusions
- 1. General Overview
- The present invention is an iterative system, method and computer program product for generating chemical entities having desired physical, chemical and/or bioactive properties. The present invention iteratively selects, analyzes and evaluates Directed Diversity Libraries for desired properties. The present invention can be implemented as a fully or partially automated, computer-aided robotic system, or without any robotics. The present invention is also directed to the chemical entities generated by operation of the present invention.
- Conventional systems perform combinatorial chemical synthesis and analysis of static compound libraries. This tends to be scattershot and random, essentially constituting a “needle in a haystack” research paradigm.
- In contrast, the present invention employs a dynamic Compound Library. The Compound Library is dynamic in that the compounds comprising the Compound Library can change from one iteration of the present invention to the next. The dynamic Compound Library can expand and/or contract.
- The Compound Library includes chemical compounds that already exist and/or chemical compounds that can be synthesized on demand, either individually or combinatorially. The Compound Library can be a combinatorial chemical library, a set of combinatorial chemical libraries and/or non-combinatorial chemical libraries. However, the Compound Library is not limited to a combinatorial chemical library.
- Instead of searching and analyzing the whole Compound Library, the present invention identifies and analyzes particular subsets of the Compound Library. These subsets of the Compound Library are referred to herein as Directed Diversity Libraries. As opposed to conventional techniques, Directed Diversity Libraries provide an optimization approach that is focused and directed.
- 2. Example Environment
- Referring to the flow diagram in FIG. 1, a lead generation/
optimization system 100 includes anExperiment Planner 130, aSelector 106, aSynthesis Module 112 and anAnalysis Module 118. The system also includes one or more databases, such as: a Structure-Property database 126, aCompound Database 134, aReagent database 138 and aCompound Library 102. - The
Selector 106 receivesSelection Criteria 104 from theExperiment Planner 130. TheSelector 106 can also receive one or moreObjective Functions 105 from theExperiment Planner 130. - The
Selection Criteria 104 represent desired or undesired characteristics that the resulting compounds should or should not possess, either individually or collectively, and/or constraints that exclude certain compounds and/or combinations of compounds. TheSelection Criteria 104 can be in the form of mathematical functions or computer algorithms, and can be calculated using a digital computer. - One or more of the
Selection Criteria 104 can be combined into one or moreObjective Functions 105 by theExperiment Planner 130. TheObjective Functions 105 describe the extent to which a given set of compounds should satisfy all theprescribed Selection Criteria 104. TheObjective Functions 105 can define the influence of eachSelection Criterion 104 in the selection of a Directed Diversity Library. TheSelection Criteria 104 and the exact form of theObjective Functions 105 can be specified by a human operator or can be automatically or semi-automatically generated (with human input) by theExperiment Planner 130. - The
Selector 106 searches theCompound Library 102 to identify one or more subsets of compounds that maximize or minimize theSelection Criteria 104 and/orObjective Function 105. The subset of compounds is referred to herein as aDirected Diversity Library 108. Note that theDirected Diversity Library 108 is a list of compounds. These compounds may or may not already exist (i.e., they may or may not be in the Chemical Inventory 110). The properties of theDirected Diversity Library 108 of compounds are generally hitherto unknown. The number of compounds in a Directed Diversity Library can be specified by the operator, or can be determined automatically within any limits specified by the operator. - The
Selection Criteria 104 can be applied either simultaneously or sequentially. For example, in one embodiment of the present invention, one part of theDirected Diversity Library 108 can be selected based on a given set ofSelection Criteria 104 and/orObjective Function 105, while another part of thatDirected Diversity Library 108 can be selected based on a different set ofSelection Criteria 104 and/orObjective Function 105. Thus, the present invention represents a multi-objective property refinement system, in the sense that one ormore Selection Criteria 104 can be used, and one or moreObjective Functions 105 can be pursued, during each iteration. - Compounds from the
Directed Diversity Libraries 108 are provided to theAnalysis Module 118 for analysis. Alternatively, the compounds can be manually analyzed or partially manually analyzed and partially automatically analyzed. In one embodiment, one or more compounds in aDirected Diversity Library 108 that have previously been synthesized are retrieved from aChemical Inventory 110 instead of being synthesized again. This saves time and costs associated with re-synthesizing the selected compounds. TheChemical Inventory 110 represents any source of available compounds including, but not limited to, a corporate chemical inventory, a supplier of commercially available chemical compounds, a natural product collection, etc. - A system and computer program product that determines whether a compound in a
Directed Diversity Library 108 exists in theChemical Inventory 110 can be implemented within theSelector Module 102, theSynthesis Module 112 or in any other module. For example, theSelector Module 106 can include instructions for searching theChemical Inventory 110 to identify and retrieve any previously synthesized compounds therefrom that are listed in the Directed Diversity Library 108 (or a subset of theDirected Diversity Library 108, as determined by user input, for example). - Compounds in the
Directed Diversity Library 108 that are not retrieved from theChemical Inventory 110 are synthesized individually or combinatorially by theSynthesis Module 112. TheSynthesis Module 112 can retrieve and selectively combineReagents 114 from theReagent Inventory 116, in accordance to a prescribed chemical synthesis protocol. - In one embodiment, the
Synthesis Module 112 is used to robotically synthesize compounds. As used herein, the term “robotically” refers to any method that involves an automated or partially automated device that performs functions specified by instructions that theSynthesis Module 112 receives from the operator or some other component of the system of the present invention. - For example, refer to FIG. 2, which is similar to FIG. 1, but which illustrates a
synthesis protocol generator 202 in the path to theSynthesis Module 112. TheSynthesis Protocol Generator 202 provides Robotic Synthesis Instructions 204 to theSynthesis Module 112. TheSynthesis Protocol Generator 202 receives a list of compounds in theDirected Diversity Library 108 to be synthesized. TheSynthesis Protocol Generator 202 extracts, under computer control,Reagent Data 136 from aReagent Database 138, and generates Robotic Synthesis Instructions 204 that will enable theSynthesis Module 112 to automatically or partially automatically synthesize the compounds in theDirected Diversity Library 108. - The Robotic Synthesis Instructions204
identify Reagents 114 from aReagent Inventory 116 that are to be mixed by theSynthesis Module 112. The Robotic Synthesis Instructions 204 also identify the manner in whichsuch Reagents 114 are to be mixed by theSynthesis Module 112. For example, the Robotic Synthesis Instructions 204 can specify whichReagents 114 are to be mixed together. The Robotic Synthesis Instructions 204 can also specify chemical and/or physical conditions, such as temperature, length of time, stirring, etc. for mixing of the specifiedReagents 114. - In one embodiment, compounds from the
Directed Diversity Library 108 are manually synthesized and then delivered to theAnalysis Module 118 for analysis. - In one embodiment, a
Compound Library 102 includes a single combinatorial chemical library that can be synthesized from approximately one hundred commercially available reagents that are suitable for generating thrombin inhibitors. Preferably, theSynthesis Module 112 combines these reagents using well-known synthetic chemistry techniques to synthesize inhibitors of the enzyme thrombin. Each inhibitor is generally composed of, but not restricted to, three chemical building blocks. Thus, theDirected Diversity Library 108 preferably comprises a plurality of thrombin inhibitors generally composed of, but not restricted to, three sites of variable structure (i.e. trimers). - The present invention, however, is not limited to this thrombin example. One skilled in the art will recognize that
Compound Library 102 can include many other types of libraries. For example, the present invention is equally adapted and intended to generate other chemical compounds having other desired properties, such as paints, finishes, plasticizers, surfactants, scents, flavorings, bioactive compounds, drugs, herbicides, pesticides, veterinary products, etc., and/or lead compounds for any of the above. In fact, the present invention can generate chemical compounds having any useful properties that depend up structure, composition, or state. - As noted above, the compounds in the
Directed Diversity Library 108, after being synthesized or retrieved from theChemical Inventory 110, are provided to theAnalysis Module 118 for analysis. Analysis can include chemical, biochemical, physical, and/or biological analysis. - Preferably, the
Analysis Module 118 assays the compounds in theDirected Diversity Library 108 to obtain, for example, enzyme activity data, cellular activity data, toxicology data, and/or bioavailability data. Optionally, theAnalysis Module 118 analyzes the compounds to identify which of the compounds were adequately synthesized and which of the compounds were not adequately synthesized. TheAnalysis Module 118 further analyzes the compounds to obtain other pertinent data, such as structure and electronic structure data. - The
Analysis Module 118 also classifies any compounds that possess the DesiredProperties 120 as Leads (lead compounds) 122. Alternatively, this function can be performed by another module such as, for example, theExperiment planner 130 or theSelector Module 106. - Analysis can be performed automatically, manually or semi-automatically/semi-manually.
- The
Analysis Module 118 generates Structure-Property Data 124 for the analyzed compounds. Structure-Property Data 124 can include structure-property and/or structure-activity data. For example, Structure-Property Data 124 can include physical data, synthesis data, enzyme activity data, cellular activity data, toxicology data, bioavailability data, etc. The Structure-Property Data 124 is stored in a Structure-Property Database 126. The Structure-Property Data 124 is also provided to theExperiment Planner 130. - The
Experiment Planner 130 receives current Structure-Property Data 124 from theAnalysis Module 118 and Historical Structure-Property Data 128 from the Structure-Property Database 126. Historical Structure-Property Data 128 can include well known structure-property or structure-activity relationship data, collectively referred to as Structure-Property Relationships or SPR, pertaining to one or more relationships between the properties and activities of a compound and the chemical structure of the compound. - The
Experiment Planner 130 also receivesCompound Data 132 from theCompound Database 134,Reagent Data 136 fromReagent Database 138 and DesiredProperties 120. DesiredProperties 120 can be sent from an automated system or database (not shown) or from user input. In one embodiment, theExperiment Planner 130 also receives one or more Structure-Property Models 192 from one or more optional Structure-Property Model Generators 190. TheExperiment Planner 130 uses the above inputs to generate one ormore Selection Criteria 104 andObjective Functions 105. -
Compound Data 132 andReagent Data 136 permit theExperiment Planner 130 to include, for example, one or more of the following criteria in the Selection Criteria 104: - (1) the molecular diversity of a given set of compounds (as used herein, molecular diversity refers to a collective propensity of a set of compounds to exhibit a variety of a prescribed set of structural, physical, chemical and/or biological characteristics);
- (2) the molecular similarity of a given compound or set of compounds with respect to one or more reference compounds (typically known leads);
- (3) the cost of a given compound or set of compounds if these compounds are to be retrieved from the
Chemical Inventory 110, or the cost of theReagents 114 if the compound(s) are to be synthesized by theSynthesis Module 112; - (4) the availability of a given compound or set of compounds from the
Chemical Inventory 110, or the availability of theReagents 114 if the compound(s) are to be synthesized by theSynthesis Module 112; - (5) the predicted ease of synthesis of a given compound or set of compounds if these compound(s) are to be synthesized by the
Synthesis Module 112; - (6) the predicted yield of synthesis of a given compound or set of compounds if these compound(s) are to be synthesized by the
Synthesis Module 112; - (7) the method of synthesis of a given compound or set of compounds if these compound(s) are to be synthesized by the
Synthesis Module 112; - (8) the predicted ability of a given compound or set of compounds to fit a receptor binding site;
- (9) the predicted ability of a given compound or set of compounds to bind selectively to a receptor binding site;
- (10) the predicted ability of a given compound or set of compounds to fit a 3-dimensional receptor map model;
- (11) the predicted bioavailability of a given compound or set of compounds as determined by one or more bioavailability models;
- (12) the predicted toxicity of a given compound or set of compounds as determined by one or more toxicity models; and/or
- (13) other selection criteria that can be derived from information pertaining to a given compound or set of compounds and that can be used to guide the selection of the
Directed Diversity Library 108 for the next iteration of the system of the present invention. - The optional Structure-
Property Models 192 can be used by theExperiment Planner 130 to predict the properties of compounds in theCompound Library 102 whose real properties are hitherto unknown. The Structure-Property Models 192 are used by theExperiment Planner 130 to define and/or refine a set ofSelection Criteria 104 that depend upon the predictions of one or more Structure-Property Models 192. - Structure-
Property Models 192 permit theExperiment Planner 130 to include one or more of the following in Selection Criteria 104: - (1) the predicted ability of a given compound or set of compounds to exhibit one or more desired properties as predicted by one or more structural-property models;
- (2) the predicted ability of a given compound or set of compounds to test the validity of one or more Structure-Property Models; and/or
- (3) the predicted ability of a given compound or set of compounds to discriminate between two or more Structure-Property Models (one or more Structure-Property models can be tested and evaluated in parallel).
- The functionality of the
Experiment Planner 130 can be achieved by an automated or partially automated process, or by a trained operator, aided or not by a computer. Further details of Structure-Property Models 192 are provided below. - The one or more
new Selection Criteria 104 andObjective Functions 105 are sent to theSelector 106 which uses them to select a newDirected Diversity Library 108 for the next iteration of the present invention. - Thus, in summary, the compounds in the new
Directed Diversity Library 108 are retrieved from theChemical Inventory 110 and/or synthesized by theSynthesis Module 112. TheAnalysis Module 118 analyzes the newDirected Diversity Library 108 to obtain Structure-Property Data 124 pertaining to the compounds in the newDirected Diversity Library 108. TheExperiment Planner 130 analyzes the new Structure-Property Data 124, Historical Structure-Property Data 128, and any ofCompound Data 132,Reagent Data 136, DesiredProperties 120 and Structure-Property Models 192, to identify a new set ofSelection Criteria 104. The new set ofSelection Criteria 104 can be used by theSelector 106 to select yet anotherDirected Diversity Library 108 for another iteration. - Thus, the present invention is an iterative system, method and/or computer program product for generating chemical entities, including new chemical entities, having a set of physical, chemical, and/or biological properties optimized towards a prescribed set of targets. During each iteration, a
Directed Diversity Library 108 is generated, the compounds in theDirected Diversity Library 108 are analyzed, Structure-Property Models are optionally derived and elaborated, a list ofSelection Criteria 104 are defined, and a newDirected Diversity Library 108 is selected for the next iteration. - Preferably, elements of the present invention are controlled by a data processing device (with or without operator input, intervention or control), such as a computer operating in accordance with software. Consequently, it is possible in the present invention to store massive amounts of data, and to utilize this data in a current iteration to generate
Selection Criteria 104 for the next iteration. - In particular, since the elements of the present invention are controlled by a data processing device, it is possible to store the Structure-
Property Data 124 obtained during each iteration. It is also possible to utilize the Historical Structure-Property Data 128 obtained during previous iterations, as well as other pertinent structure-property data obtained by other experiments, to generateSelection Criteria 104 for the next iteration. In other words, the selection of theDirected Diversity Library 108 for the next iteration is guided by the results of all previous iterations (or any subset of the previous iterations, as determined by user input, for example). Thus, the present invention “learns” from its past performance such that the present invention is “intelligent”. As a result, theLeads 122 identified in subsequent iterations are better (i.e. exhibit physical, chemical, and/or biological properties closer to the prescribed values) that theLeads 122 identified in prior iterations. - In one embodiment of the present invention, the
Compound Library 102 includes one or more combinatorial chemical libraries, comprised exclusively of compounds that can be synthesized by combining a set of chemical building blocks an a variety of combinations. According to this embodiment, theSynthesis Module 112 is used to robotically synthesize theDirected Diversity Library 108 during each iteration. - The integrated use of data processing devices (i.e. the
Experiment Planner 130, theSelector 106, theSynthesis Protocol Generator 202, theSynthesis Module 112, and the Analysis Module 118) in the present invention enables the automatic or semi-automatic and intelligent synthesis and screening of very large numbers of chemical compounds. - Additional details of the Structure-
Property Models 192,Selection Criteria 104,Objective Functions 105,Experiment Planner 130 and theSelector 106 are now provided. - 3. Structure-
Property Models 192 - In one embodiment of the present invention, one or more Structure-
Property Model Generators 190 generate Structure-Property Models 192 that conform to observed data. The Structure-Property Models 192 are used by theExperiment planner 130 to generateSelection Criteria 104 and/orObjective Functions 105. - Referring to FIG. 8, one embodiment of a Structure-
Property Model Generator 190 is illustrated as Structure-Property Model Generator 800. The Structure-Property Model Generator 800 defines aModel Structure 820 based onStatistics 802,Neural Networks 804,Fuzzy Logic 806, and/or other Model-Specific Methods 808. - Model-
Specific Methods 808 refer to methods that are specific to the application domain of the model. Examples of such Model-Specific Methods 808 are methods that compute the energy of a particular molecular conformation or receptor-ligand complex such as an empirical force field or a quantum-mechanical method, methods that align the 3-dimensional structures of two or more chemical compounds based on their shape, electronic fields and/or other criteria, methods that predict the affinity and binding conformation of a ligand to a particular receptor binding site, methods that construct receptor models based on the 3-dimensional structures of known ligands, etc. Examples of such Model-Specific Methods 808 are described in greater detail below. - The
Model Structure 820 can combine elements ofStatistics 802,Neural Networks 804,Fuzzy Logic 806, and/or Model-Specific Methods 808.Such Model Structures 820 are hereafter referred to as Hybrid Model Structures or Hybrid Models. An example of such aHybrid Model Architecture 820 is a Model Architecture that combines elements ofNeural Networks 804 andFuzzy Logic 806, hereafter referred to as a Neuro-Fuzzy Model Architecture or Neuro-Fuzzy Model. An example of a Neuro-Fuzzy Model Architecture is discussed in greater detail below. - One embodiment of a Structure-
Property Model Generator 800 includes aTrainer 822 that generates one or more Structure-Property Models 842 for a givenModel Architecture 820. TheTrainer 822 optimizes aparticular Model Structure 820 using selected Structure-Property Data Property Database 126, as determined by user input, for example. Preferably, theTrainer 822 optimizes theModel Structure 820 by minimizing the error between the actual properties of selected compounds, as determined by the Analysis Module 118 (Structure-Property Data 124, 128), and the predicted properties of the compounds as determined by the Structure-Property Model 842. The error is referred to hereafter as the Structure-Property Prediction Error or Prediction Error. - The process of minimizing the Prediction Error shall hereafter be referred to as Training. Preferably, the
Trainer 822 minimizes the Prediction Error using a search/optimization method such asGradient Minimization 832, Monte-Carlo Sampling 834,Simulated Annealing 836,Evolutionary Programming 838, and/or aGenetic Algorithm 840. Alternatively, theTrainer 822 minimizes the Prediction Error using a hybrid search/optimization method that combines elements ofGradient Minimization 832, Monte-Carlo Sampling 834,Simulated Annealing 836,Evolutionary Programming 838, and/or aGenetic Algorithm 840. An example of a hybrid method is a method that combinesSimulated Annealing 836 withGradient Minimization 832. Another example of a hybrid method is a method that combines Monte-Carlo Sampling 834 withGradient Minimization 832. Examples of such methods are described in greater detail below. - Preferably, the Structure-
Property Data Structure Data 824 andProperty Data 828.Structure Data 824 andProperty Data 828 are preferably encoded as EncodedStructure Data 826 and EncodedProperty Data 830. Encoding should be of a form that is appropriate for theparticular Model Structure 820. The EncodedStructure Data 826 and EncodedProperty Data 830 are used by theTrainer 822 to derive one or more final Structure-Property Models 842. TheTrainer 822 can employGradient Minimization 832, Monte-Carlo Sampling 834,Simulated Annealing 836,Evolutionary Programming 838, and/or aGenetic Algorithm 840. TheTrainer 822 trains theModel Structure 820 using a suitably encoded version of the Structure-Property Data Property Data - The
Trainer 822 generates one or more Structure-Property Models 842 for a givenModel Structure 820. In one embodiment, Structure-Property Models 842 are represented as a linear combination of basis functions of one or more molecular features (descriptors). The descriptors collectively represent the EncodedStructure Data 826. - To illustrate the present invention, several example embodiments and implementations of the Structure-
Property Model Generator 800 shall now be discussed in detail. These examples are provided to illustrate the present invention. The present invention is not limited to these examples. - a. Statistical Models
- A
Statistical Module 802 can define aStatistical Model Structure 820. When the trainer optimizes theStatistical Model Structure 820, the resultant Structure-Property Model 842 is referred to as a Statistical Structure-Property Model 842. - In one embodiment, Structure-
Property Models 192 are represented as a linear combination of basis functions of one or more molecular features (descriptors). The descriptors can include topological indices, physicochemical properties, electrostatic field parameters, volume and surface parameters, etc. The number of descriptors can range from a few tens to tens of thousands. For example, the descriptors can include, but are not limited to, molecular volume and surface areas, dipole moments, octanol-water partition coefficients, molar refractivities, heats of formation, total energies, ionization potentials, molecular connectivity indices, substructure keys, hashed fingerprints, atom pairs and/or topological torsions, atom layers, 2D and 3D auto-correlation vectors, 3D structural and/or pharmacophoric keys, electronic fields, etc. - Such descriptors and their use in the fields of Quantitative Structure-Activity Relationships (QSAR) and molecular diversity are reviewed in Kier, L. B. and Hall L. H.,Molecular Connectivity in Chemistry and Drug Research, Academic Press, New York (1976); Kier, L. B. and Hall L. H., Molecular Connectivity in Structure-Activity Analysis, Research Studies Press, Wiley, Letchworth (1986); Kubinyi, H., Methods and Principles in Medicinal Chemistry, Vol. 1, VCH, Weinheim (1993); and Agrafiotis, D. K., Encyclopedia of Computational Chemistry, Wiley (in press), the contents of which are incorporated herein by reference.
- In one embodiment, the coefficients of the linear combination of the basis functions of Statistical Structure-
Property Models 842 are determined using linear regression techniques. If many features are used, linear regression can be combined with principle component analysis, factor analysis, and/or multi-dimensional scaling. These are well known techniques for reducing the dimensionality and extracting the most important features from a large table. - In one embodiment, the basis functions and/or features used by the
Trainer 822 to optimize the Statistical Structure-Property Models 842 are selected using Monte-Carlo Sampling 834,Simulated Annealing 836,Evolutionary Programming 838, and/or aGenetic Algorithm 840. A method for selecting the basis functions and/or features using aGenetic Algorithm 840, known as a genetic function approximation (GFA), is described in Rogers and Hopfinger, J. Chem. Inf. Comput. Sci., 34: 854 (1994) incorporated herein by reference in its entirety. - In the GFA algorithm, a Structure-
Property Model 842 is represented as a linear string that encodes the features and basis functions employed by the model. A population of linearly encoded Structure-Property Models 842 is then initialized by a random process, and allowed to evolve through the repeated application of genetic operators, such as crossover, mutation and selection. Selection is based on the relative fitness of the models, as measured by a least-squares error procedure, for example. Friedman's lack-of-fit algorithm, described in J. Friedman, Technical Report No. 100, Laboratory for Computational Statistics, Department of Statistics, Stanford University, Stanford, Calif., November 1988, herein incorporated by reference in its entirety, or other suitable metrics well known to persons skilled in the art, can also be used. GFA can build models using linear polynomials as well as higher-order polynomials, splines and Gaussians. Upon completion, the procedure yields a population of models, ranked according to their fitness score. - Another method for selecting basis functions and/or features is described in Luke,J. Chem. Info. Comput. Sci., 34: 1279 (1994), incorporated herein by reference in its entirety. This method is similar to the GFA method of Rogers and Hopfinger described above, but uses
Evolutionary Programming 838 instead of aGenetic Algorithm 840 to control the evolution of the population of models. - Alternatively, the basis functions and/or features can be selected using a Monte-
Carlo Sampling 834 orSimulated Annealing 834 technique. In this embodiment, an initial model is generated at random, and is gradually refined by a series of small stochastic ‘steps’. Here, the term ‘step’ is taken to imply a stochastic (random or semi-random) modification of the model's underlying structure. - As in the GFA algorithm, the model in this embodiment is also defined as a linear combination of basis functions, whose coefficients are determined by linear regression. During each step, the model is modified by making a ‘small’ stochastic step. For example, the model can be modified by inserting a new basis function, by removing an existing basis function, by modifying an existing basis function (i.e. by modifying one or more of the features and/or parameters associated with that particular basis function), and/or by swapping features and/or parameters between two (compatible) basis functions.
- The quality of the model is assessed using a least-squares error criterion. Alternatively, Friedman's lack-of-fit criterion, or any other suitable error criterion can be used. At the end of each step, the new model is compared to the old model using the Metropolis criterion. Alternatively, any other suitable comparison criterion can be used. If the new model is approved, it replaces the old model and the process is repeated. If the new model is not approved, the old model is retained as the current model, and the process is repeated. This general process is controlled by a Monte-
Carlo Sampling protocol 834, aSimulated Annealing protocol 836, or variants thereof, which are well known to the people skilled in the art. - During the training process, the
Trainer 822 can be configured to retain a list of models according to some predefined criteria. For example, theTrainer 822 can be configured to retain the ten best Structure-Property Models 842 discovered during the simulation. Alternatively, theTrainer 822 can be configured to retain the ten best Structure-Property Models 842 discovered during the simulation, which differ from each other by some predetermined amount. The difference between two models can be defined ‘genotypically’ or ‘phenotypically’. A ‘genotypical’ comparison between two models involves a comparison of their underlying structure (i.e. the basis functions and/or coefficients used to represent the Structure-Property Models 842). Conversely, a ‘phenotypical’ comparison between two models involves a comparison based on their respective predictions. - b. Neural Network Models
- The Structure-
Property Model Generator 800 can generate Structure-Property Models 842 based onNeural Networks 804.Neural Networks 804 are physical cellular systems that can acquire, store, and utilize experimental knowledge.Neural Networks 804 are extensively reviewed in Haykin, Neural Networks. A Comprehensive Foundation, MacMillan, New York (1994), incorporated herein by reference in its entirety. - As in the functional models described above,
Structure Data 824 can be encoded using one or more molecular features (descriptors). Molecular features collectively represent the EncodedStructure Data 826. Molecular features can include topological indices, physicochemical properties, electrostatic field parameters, volume and surface parameters, etc., and their number can range from a few tens to tens of thousands. For example, these features can include, but are not limited to, molecular volume and surface areas, dipole moments, octanol-water partition coefficients, molar refractivities, heats of formation, total energies, ionization potentials, molecular connectivity indices, substructure keys, hashed fingerprints, atom pairs and/or topological torsions, atom layers, 2D and 3D auto-correlation vectors, 3D structural and/or pharmacophoric keys, electronic fields, etc. If many features are used, neural network training can be combined with principle component analysis, factor analysis, and/or multi-dimensional scaling, which are well known techniques for reducing the dimensionality and extracting the most important features from a large table. - One embodiment of a Neural
Network Model Structure 820 is a Multi-Layer Feed-Forward Neural Network or Multi-Layer Perceptron, trained using the error back-propagation algorithm. Alternatively, the Multi-Layered Perceptron can be trained using Monte-Carlo Sampling 834,Simulated Annealing 836,Evolutionary Programming 838, and/or aGenetic Algorithm 840. In general, Neural Network training is the process of adjusting the number of neurons, synaptic weights, and/or transfer functions in the input, output and hidden layers of the Neural Network, so that the overall prediction error is minimized. Many variants of such training algorithms have been reported, and are well known to those skilled in the art. - As in the functional models described above, the
Trainer 822 can be configured to retain more than oneNeural Network Models 842 during the training phase (flowarrow 890 in FIG. 8). For example, theTrainer 822 can be configured to retain the ten bestNeural Network Models 842 discovered during the training phase. Alternatively, theTrainer 822 can be configured to retain the ten bestNeural Network Models 842 discovered during training, which differ from each other by some predetermined amount. Again, the difference between two models can be defined ‘genotypically’ or ‘phenotypically’, i.e. by comparing the models based either on their internal structure, or their predictions. - i. Generalized Regression Neural Networks
- Another embodiment of a Neural
Network Model Structure 820 is a Generalized Regression Neural Network Model Structure (or Generalized Regression Neural Network). Generalized Regression Neural Networks are described in Specht, D. IEEE Trans. Neural Networks, 2(6): 568 (1991), and Masters, T., Advanced Algorithms for Neural Networks, Wiley (1995), incorporated herein by reference. - An example of a Generalized
Regression Neural Network 900 is shown in FIG. 9. A GeneralizedRegression Neural Network 900 is comprised of four layers of neurons (units). The first layer is theInput Layer 902, the second layer is thePattern Layer 904, the third layer is theSummation Layer 906, and the fourth layer is theOutput Layer 908, which is comprised of a single unit. - The
Pattern Layer 904 contains one unit per input-output pair or structure-property pair (referred to hereafter as a Training Case). The collection of all Training Cases used in thePattern Layer 904 is hereafter referred to as the Training Set. In the example shown in FIG. 9, there are four Training Cases. The input vector (or input case, which in the example shown in FIG. 9 consists of 3 variables) is simultaneously presented to all units in thePattern Layer 904. Each of these units computes a distance measure separating the Training Case represented by that unit from the input case. This distance is acted on by the transfer function associated with that unit, to compute the output of that particular unit. The transfer function is also referred to as an activation function or kernel. - The
Summation Layer 906 of the Generalized Regression Neural Network 900 (i.e. the third layer) is comprised of two units. The first unit is called theNumerator 910, and the second unit is called theDenominator 912. Each unit in thePattern Layer 904 is fully connected to theNumerator 910 andDenominator 912 units in theSummation Layer 906. Both theNumerator 910 andDenominator 912 units are simple summation units, i.e. they accumulate the input received from all units in thePattern Layer 904. For theDenominator 912 unit, the weight vector is unity, so a simple sum is performed. For theNumerator 910 unit, the weight connecting each pattern unit is equal to the value of the dependent variable for the training case of that pattern unit (i.e. the output in the input-output pair, or the property in the structure-property pair). - The output of the
Numerator 910 andDenominator 912 units in theSummation Layer 906 are forwarded to theOutput unit 908. TheOutput unit 908 divides the output of theNumerator 910 unit by the output of theDenominator 912 unit, to compute the output of the network for a particular input case. - The activation used by the units in the
Pattern Layer 904 is typically a Parzen Window. Parzen Windows is a well known method for estimating a univariate or multivariate probability density function from a random sample. They are described in Parzen, Annals Math. Stat., 33: 1065 (1962), and Cacoullos, Annals Inst. Stat. Meth., 18(2): 179 (1966), incorporated herein by reference in their entirety. The Parzen Window is a weight function w(d) that has its largest value at d=0, and decreases rapidly as the absolute value of d increases. Examples of such weight functions are histogram bins, Gaussians, triangular functions, reciprocal functions, etc. If the number of input variables (features) exceeds one, the Parzen Window can involve different scaling parameters for each input variable. Thus, a Parzen Window can be configured to perform feature scaling in the vicinity of the Training Case on which it is centered. If the Parzen Windows associated with each Training Case share common feature weights, the GeneralizedRegression Neural Network 900 is said to be globally weighted. Conversely, if the Parzen Windows associated with each Training Case do not share common feature weights, the GeneralizedRegression Neural Network 900 is said to be locally weighted. - Referring back to FIG. 8, a Generalized
Regression Neural Network 900 can be trained to minimize the prediction error usingGradient Minimization 832, Monte-Carlo Sampling 834,Simulated Annealing 836,Evolutionary Programming 838, and/or aGenetic Algorithm 840. Alternatively, the GeneralizedRegression Neural Network 900 can be trained to minimize the prediction error using a combination ofGradient Minimization 832, Monte-Carlo Sampling 834,Simulated Annealing 836,Evolutionary Programming 838, and/or aGenetic Algorithm 840. - The training process involves adjusting the parameters of the activation function associated with each unit in the
Pattern Layer 904 to minimize the mean prediction error for the entire Training Set, or some other suitable error criterion. During training, the input-output pairs in the Training Set are presented to the network, and a prediction error for the entire Training Set is computed. In particular, each Training Case is presented to each of the units (Training Cases) in thePattern Layer 904, and the output of these units are summed by the units in theSummation Layer 906. The output of thesummation units - This process is repeated for each Training Case in the Training Set. The parameters of the transfer functions are then adjusted so that the prediction error is reduced. This process is repeated until the prediction error for the entire Training Set is minimized, within some prescribed tolerance. Alternatively, the process is repeated for a prescribed number of cycles (as determined by user input, for example), even though the prediction error for the entire Training Set may not be at a minimum, within a prescribed tolerance. Preferably, during the training phase, each Training Case is not presented to itself, i.e. the output of each Training Case is computed based on every Training Case other than itself. Thus, it is said that the resulting Generalized Regression
Neural Network Models 842 are cross-validated, in the sense that they were designed to resist over fitting. - If the number of features is large, the
Trainer 822 can also perform feature selection in addition to scaling (i.e. adjusting the parameters of the transfer functions). Feature selection refers to the process of selecting a subset of features, and applying the GeneralizedRegression Neural Network 900 algorithm only on that subset of features. - For example, in one embodiment, the Generalized
Regression Neural Network 900 is trained using a Monte-Carlo Sampling 834 orSimulated Annealing 836 algorithm. In this embodiment, an initial model is generated at random, by selecting a random set of features and randomizing the transfer functions associated with each Training Case. - The model is then gradually refined by a series of small stochastic ‘steps’. Here, the term ‘step’ is taken to imply a stochastic (random or semi-random) modification of the model's underlying structure. For example, the model can be modified by inserting a new feature, by removing an existing feature, by modifying an existing feature weight if the model is globally weighted, and/or by modifying a randomly chosen transfer function (i.e. by modifying one or more of the parameters associated with that particular transfer function, such as a feature weight). After the ‘step’ is performed, the quality of the resulting model is assessed, and the new model is compared to the old model using the Metropolis criterion. Alternatively, any other suitable comparison criterion can be used. If the new model is approved, it replaces the old model and the process is repeated. If the new model is not approved, the old model is retained as the current model, and the process is repeated.
- This general process is controlled by a Monte-
Carlo Sampling protocol 834, aSimulated Annealing protocol 836, or variants thereof, which are well known to people skilled in the art. However, it should be understood that the system of the present invention is not limited to these embodiments. Alternatively, the GeneralizedRegression Neural Network 900 can be trained usingEvolutionary Programming 838,Genetic Algorithms 840, or any other suitable search/optimization algorithm. The implementation of these methods should be straightforward to persons skilled in the art. - The training of a Generalized
Regression Neural Network 900 using the method described above involves (N−1)*(N−1) distance comparisons during each optimization cycle, where N is the number of Training Cases. That is, in order to compute the prediction error for the entire Training Set, each Training Case must be presented to all other (N−1) Training Cases in the network. Thus, it is said that the system operating in the manner described above exhibits quadratic time complexity. - For large Training Sets, such as those anticipated in a typical operation of the system of the present invention, this process can become computationally intractable. To remedy this problem, a preferred embodiment of the system of the present invention uses a hybrid approach that combines Generalized
Regression Neural Networks 900 with K-Nearest-Neighbor classifiers. - K-Nearest-Neighbor prediction is a well known technique for property prediction and classification. It is described in detail in Dasarathy,Nearest Neighbor (NN) Norms: NN pattern classification techniques, IEEE Computer Society Press, Los Alamitos, Calif. (1991), incorporated herein by reference in its entirety. K-Nearest-Neighbor prediction forms the basis of many ‘lazy learning’ algorithms, that are commonly used in artificial intelligence and control. The K-Nearest-Neighbor algorithm predicts the output (property) of a particular input query by retrieving the K nearest (most similar) Training Cases to that query, and averaging their (known) outputs according to some weighting scheme. Therefore, the quality of K-Nearest-Neighbor generalization depends on which Training Cases are considered most similar, which is, in turn, determined by the distance function.
- In the embodiment described herein, Generalized
Regression Neural Networks 900 are combined with K-Nearest-Neighbor classifiers, to generate ahybrid Model Structure 820 referred to hereafter as a Nearest Neighbor Generalized Regression Neural Network. The operation of a Nearest Neighbor Generalized Regression Neural Network is similar to that of a regular Generalized Regression Neural Network, except that the query (input case) is not presented to all Training Cases in thePattern Layer 904. Instead, the query is presented to the K nearest Training Cases in thePattern Layer 904, as determined by a suitable distance metric. - To accelerate the performance of a Nearest Neighbor Generalized Regression Neural Network, the K nearest neighbors are retrieved using a nearest neighbor detection algorithm such as a k-d tree (Bentley,Comm. ACM, 18(9): 509 (1975), Friedman et al., ACM Trans. Math. Soft., 3(3): 209 (1977)). Alternatively, any other suitable algorithm can be used including, but not limited to, ball trees (Omohundro, International Computer Science Institute Report TR-89-063, Berkeley, Calif. (1989)), bump trees (Omohundro, Advances in Neural Information Processing Systems 3, Morgan Kaufmann, San Mateo, Calif. (1991)), gridding, and/or Voronoi tesselation (Sedgewick, Algorithms in C, Addison-Wesley, Princeton (1990). The contents of all of the aforementioned publications are incorporated herein by reference.
- The Generalized
Regression Neural Network 900 can be trained in multiple phases using different optimization algorithms (i.e. Monte-Carlo Sampling 834,Simulated Annealing 836,Evolutionary Programming 838, and/or Genetic Algorithms 840), and/or different kernel parameters and number of nearest-neighbors during each phase. For example, the GeneralizedRegression Neural Network 900 can be initially trained to perform feature detection usingSimulated Annealing 836, ten nearest neighbors, a uniform kernel (i.e. the same kernel for all Training Cases), and a common scaling factor for all features. The resulting (partially optimized) network can then be further refined usingGradient Minimization 832 using fifty nearest neighbors, a uniform kernel, and a different scaling factor for each feature. Any number of phases and training schemes can be used as appropriate. - As in the functional models and multi-layer perceptrons described above, the
Trainer 822 can be configured to retain more than one Generalized RegressionNeural Network Models 842 during the training phase (flowarrow 890 in FIG. 8). For example, theTrainer 822 can be configured to retain the ten best Generalized RegressionNeural Network Models 842 discovered during the training phase. Alternatively, theTrainer 822 can be configured to retain the ten best Generalized RegressionNeural Network Models 842 discovered during training, which differ from each other by some predetermined amount. Again, the difference between two models can be defined ‘genotypically’ or ‘phenotypically’, i.e. by comparing the models based either on their internal structure, or their predictions. - c. Fuzzy Logic Models
- The Structure-
Property Model Generator 800 can generate Structure-Property Models 842 based onFuzzy Logic 806. Fuzzy Logic was developed by Zadeh (Zadeh, Information and Control, 8: 338 (1965); Zadeh, Information and Control, 12: 94 (1968)) as a means of representing and manipulating data that is fuzzy rather than precise. The aforementioned publications are incorporated herein by reference in their entirety. - Central to the theory of Fuzzy Logic is the concept of a fuzzy set. In contrast to a traditional crisp set where an item either belongs to the set or does not belong to the set, fuzzy sets allow partial membership. That is, an item can belong to a fuzzy set to a degree that ranges from 0 to 1. A membership degree of 1 indicates complete membership, whereas a membership value of 0 indicates non-membership. Any value between 0 and 1 indicates partial membership. Fuzzy sets can be used to construct rules for fuzzy expert systems and to perform fuzzy inference.
- Usually, knowledge in a fuzzy system is expressed as rules of the form “if x is A, then y is B”, where x is a fuzzy variable, and A and B are fuzzy values. Such fuzzy rules are stored in a fuzzy rule base or fuzzy knowledge base describing the system of interest.
Fuzzy Logic 806 is the ability to reason (draw conclusions from facts or partial facts) using fuzzy sets, fuzzy rules, and fuzzy inference. Thus, following Yager's definition, a fuzzy model is a representation of the essential features of a system by the apparatus of fuzzy set theory (Yager and Filev, Essentials of Fuzzy Modeling and Control, Wiley (1994)). The aforementioned publication is incorporated herein by reference in its entirety. -
Fuzzy Logic 806 has been employed to control complex or adaptive systems that defy exact mathematical modeling. Applications of fuzzy logic controllers range from cement-kiln process control, to robot control, image processing, motor control, camcorder auto-focusing, etc. However, as of to date, there has been no report on the use ofFuzzy Logic 806 for chemical structure-property prediction. A preferred embodiment of a Structure-Property Model Generator 800 usingFuzzy Logic 806 shall now be described in detail. - In one embodiment, the Structure-
Property Model Generator 800 generates Fuzzy Structure-Property Models 842, i.e. models that represent the essential features of the system using the apparatus of fuzzy set theory. In particular, a Fuzzy Structure-Property Model 842 makes predictions using fuzzy rules from a fuzzy rule base describing the system of interest. A fuzzy rule is an IF-THEN rule with one or more antecedent and consequent variables. A fuzzy rule can be single-input-single-output (SISO), multiple-input-single-output (MISO), or multiple-input-multiple-output (MIMO). A fuzzy rule base is comprised of a collection of one or more such fuzzy rules. A MISO fuzzy rule base is of the form: - IF x
1 is X11 AND x2 is X12 AND . . . AND Xn is X1n THEN y is Y1 - ALSO
- IF x
1 is X21 AND x2 is X22 AND . . . AND xn is X2n THEN y isY 2 - ALSO
- . . .
- ALSO
- IF x
1 is Xr1 AND x2 is Xr2 AND . . . AND xn is Xrn THEN y is Yr, - where x
1 , . . . , xn are the input variables, y is the output (dependent) variable, and Xij , Yi , i=(1, . . . , r), j=(1, . . . , n) are fuzzy subsets of the universes of discourse of X1 , . . . , Xn , and Y1 , . . . , Yn , respectively. The fuzzy model described above is referred to as a linguistic model. - An example of a fuzzy structure-activity rule is:
- IF molecular weight is high AND logP is low THEN activity is low
- where ‘high’ and ‘low’ are fuzzy sets in the universe of discourse of molecular weight, logP, and activity.
- Alternatively, a Takagi-Sugeno-Kang (TSK) model can be used. A TSK fuzzy rule base is of the form:
- IF x
1 is X11 AND x2 is X12 AND . . . AND xn is X1n THEN y=b10 +b11 x1 + . . . +b1n xn - ALSO
- IF x
1 is X21 AND x2 is X22 AND . . . AND xn is X2n THEN y=b 20 +b21 x1 + . . . +b2n xn - ALSO
- . . .
- ALSO
- IF x
1 is Xr1 AND x2 is Xr2 AND . . . AND xn is Xrn THEN y=br0 +br1 x1 + . . . +brn xn - Thus, unlike a linguistic model that involves fuzzy consequents, a TSK model involves functional consequents, typically implemented as a linear function of the input variables.
- Referring to FIG. 10, a Fuzzy Structure-
Property Model 1000 is illustrated. In this embodiment, theFuzzy Knowledge Base 1002 is comprised of aRule Base 1004 and aData Base 1006. TheData Base 1006 defines the membership functions of the fuzzy sets used as values for each system variable, while theRule Base 1004 is a collection of fuzzy rules of the type described above. The system variables are of two main types: input variables and output variables. - In one embodiment, the input variables in a Fuzzy Structure-
Activity Model 842 can be molecular features (descriptors). Such molecular features, which collectively represent the EncodedStructure Data 826, can include topological indices, physicochemical properties, electrostatic field parameters, volume and surface parameters, etc., and their number can range from a few tens to tens of thousands. - For example, these features can include, but are not limited to, molecular volume and surface areas, dipole moments, octanol-water partition coefficients, molar refractivities, heats of formation, total energies, ionization potentials, molecular connectivity indices, substructure keys, hashed fingerprints, atom pairs and/or topological torsions, atom layers, 2D and 3D auto-correlation vectors, 3D structural and/or pharmacophoric keys, electronic fields, etc.
- If many features are used,
Fuzzy Logic 806 can be combined with principle component analysis, factor analysis, and/or multi-dimensional scaling, which are well known techniques for reducing the dimensionality and extracting the most important features from a large table. - In one embodiment, the input variables (i.e. the Encoded
Structure Data 826, which are usually crisp) are first converted into fuzzy sets by theFuzzification Unit 1008 using the fuzzy set definitions in theData Base 1006. Then, theFuzzy Inference Module 1010 evaluates all the rules in theRule Base 1004 to produce the output, using the method described below. In particular, theFuzzy Inference Module 1010 performs the following steps: - (1) determines the degree of match between the fuzzified input data and the fuzzy sets defined for the input variables in the
Data Base 1006; - (2) calculates the firing strength of each rule based on the degree of match of the fuzzy sets computed in step 1 and the connectives used in the antecedent part of the fuzzy rule (i.e. AND, OR, etc.); and
- (3) derives the output based on the firing strength of each rule computed in
step 2 and the fuzzy sets defined for the output variable in theData Base 1006. - If the Fuzzy Structure-Property Model is a linguistic model, the fuzzy output of the
Fuzzy Inference Module 1010 is finally defuzzified by theDefuzzification Unit 1012, using the output fuzzy set definitions in theData Base 1006, and a defuzzification strategy such as the mean-of-maximum method. Alternatively, the center-of-area or any other suitable deffuzification method can be used. - Referring back to FIG. 8, the
Trainer 822 of the Fuzzy Structure-Property Model Generator 800 preferably trains theFuzzy Knowledge Base 1002 usingGradient Minimization 832, Monte-Carlo Sampling 834,Simulated Annealing 836,Evolutionary Programming 838, and/or aGenetic Algorithm 840, in order to minimize the overall prediction error for a prescribed set of Training Cases. TheTrainer 822 can use a pre-existingFuzzy Knowledge Base 1002 or may construct one directly from the Structure-Property Data Fuzzy Knowledge Base 1002. - For example, in a preferred embodiment, the
Fuzzy Knowledge Base 1002 is trained using a Monte-Carlo Sampling 834 orSimulated Annealing 836 algorithm. In this embodiment, an initial model is generated at random, by selecting a random set of rules and randomizing the membership functions associated with each input variable. The model is then gradually refined by a series of small stochastic ‘steps’. Here, the term ‘step’ is taken to imply a stochastic (random or semi-random) modification of the model's underlying structure. - For example, the model can be modified by inserting a new rule, by removing an existing rule, by modifying an existing rule (i.e. by inserting or removing a variable from the antecedent part of the fuzzy rule), by modifying the membership function of an existing fuzzy set, and/or by modifying the number of fuzzy partitions of a fuzzy variable (i.e. by increasing or decreasing the number of fuzzy partitions of the fuzzy variable). After the ‘step’ is performed, the quality of the resulting model is assessed, and the new model is compared to the old model using the Metropolis criterion. Alternatively, any other suitable comparison criterion can be used. If the new model is approved, it replaces the old model and the process is repeated. If the new model is not approved, the old model is retained as the current model, and the process is repeated.
- This general process is controlled by a Monte-
Carlo Sampling protocol 834, aSimulated Annealing protocol 836, or variants thereof, which are well known to people skilled in the art. However, it should be understood that the system of the present invention is not limited to these embodiments. Alternatively, theFuzzy Knowledge Base 1002 can be trained usingEvolutionary Programming 838,Genetic Algorithms 840, or any other suitable search/optimization algorithm. The implementation of these methods should be straightforward to persons skilled in the art. - As in the functional and neural network models described above, the
Trainer 822 can be configured to retain more than one Fuzzy Structure-Property Models 842 during the training phase (flowarrow 890 in FIG. 8). For example, theTrainer 822 can be configured to retain the ten best Fuzzy Structure-Property Models 842 discovered during the training phase. Alternatively, theTrainer 822 can be configured to retain the ten best Fuzzy Structure-Property Models 842 discovered during training, which differ from each other by some predetermined amount. Again, the difference between two models can be defined ‘genotypically’ or ‘phenotypically’, i.e. by comparing the models based either on their internal structure, or their predictions. - d. Hybrid Models
- The Structure-
Property Model Generator 800 can generateModel Structures 820 that combine elements ofStatistics 802,Neural Networks 804,Fuzzy Logic 806, and/or Model-Specific Methods 808.Such Model Structures 820 are referred to as Hybrid Model Structures, and the corresponding models are referred to as Hybrid Models. A preferred embodiment of such aHybrid Model Structure 820 that combines elements ofNeural Networks 804 andFuzzy Logic 806 is referred to as a NeruoFuzzy Model Structure, and shall now be described in detail. - An example of such a
Hybrid Model Structure 820 is a Model Structure that combines elements ofNeural Networks 804 andFuzzy Logic 806, hereafter referred to as a Neuro-Fuzzy Model Structure or Neuro-Fuzzy Model. An example of a Neuro-Fuzzy Model Structure is discussed in greater detail below. - i. Neuro-Fuzzy Models
- A Neuro-Fuzzy Model Structure is a
Model Structure 820 that combines the advantages of Fuzzy Logic 806 (e.g. human-like rule-based reasoning, ease of incorporating expert knowledge) and Neural Networks 804 (e.g. learning ability, optimization ability, and connectionist structure). On the neural side, more transparency is obtained by pre-structuring a neural network to improve its performance, or by interpreting the weight matrix that results from training. On the fuzzy side, the parameters that control the performance of a fuzzy model can be tuned using techniques similar to those used in neural network systems. Thus, neural networks can improve their transparency, making them closer to fuzzy systems, while fuzzy systems can self-adapt, making them closer to neural networks. - Neuro-Fuzzy systems can be of three main types:
- (1) neural fuzzy systems that use neural networks as tools in fuzzy models;
- (2) fuzzy neural networks that fuzzify conventional neural networks; and
- (3) Neuro-Fuzzy hybrid systems that incorporate neural networks and fuzzy systems into hybrid systems.
- Neuro-Fuzzy modeling is reviewed in Lin and Lee,Neural Fuzzy Systems, Prentice-Hall (1996), incorporated herein by reference in its entirety.
- One embodiment of a Neuro-Fuzzy Structure-Property Model is a Neural Fuzzy Model with Fuzzy Singleton Rules described in Nomura et al.,Proc. IEEE Int. Conf. Fuzzy Syst., 1320, San Diego (1992), incorporated herein by reference in its entirety. The Structure of a Neural Fuzzy Model with
Fuzzy Singleton Rules 1100 is shown in FIG. 11. Fuzzy singleton rules are of the form: - IF x
1 is X11 AND x2 is X12 AND . . . AND xn is X1n THEN y=w1, - where x
1 , . . . , xn are the input variables, y is the output (dependent) variable, Xij , i=(1, . . . , m), j=(1, . . . , n) are fuzzy subsets of the universes of discourse of X1 , . . . , Xn with fuzzy membership functions μxij (xi ), and wi is a real number of the consequent part. If product inference and a centroid defuzzifier are used, the output y of such a Neuro-Fuzzy Structure-Property Model 1100 is computed by EQ. 1: - where:
- μi=μ1i(x 1)μx
2i (X 2) . . . μxm (x n) EQ. 2 -
- Referring back to FIG. 8, the
Trainer 822 of the Neuro-Fuzzy Structure-Property Model Generator 800 preferably trains (i.e. constructs and/or refines) the Neuro-Fuzzy Structure-Property Model Structure 820 usingGradient Minimization 832, Monte-Carlo Sampling 834,Simulated Annealing 836,Evolutionary Programming 838, and/or aGenetic Algorithm 840, in order to minimize the overall prediction error for a prescribed set of Training Cases. TheTrainer 822 can use a pre-existing Neuro-Fuzzy Structure-Property Model 842 or can construct a new one directly from the Structure-Property Data ij (xi ), and/or the real numbers wi . As in traditional fuzzy systems, the membership functions can be Gaussians, triangular functions, or trapezoidal functions. Alternatively, any other suitable functional form can be used. - An example of a training procedure for a Neural Fuzzy Model with Fuzzy Singleton Rules based on
Gradient Minimization 832 is given in Nomura et al., and Lin and Lee, Supra. However, the present invention is not limited to this embodiment. Alternatively, theTrainer 822 can train the Neuro-Fuzzy Structure-Property Model Structure 820 usingGradient Minimization 832, Monte-Carlo Sampling 834,Simulated Annealing 836,Evolutionary Programming 838, and/or aGenetic Algorithm 840. Each of these methods requires a suitable encoding of the free parameters of the model, and their implementation should be straightforward to persons skilled in the art. - Again, the
Trainer 822 can be configured to retain more than one Neuro-Fuzzy Structure-Property Models 842 during the training phase (flowarrow 890 in FIG. 8). For example, theTrainer 822 can be configured to retain the ten best Neuro-Fuzzy Structure-Property Models 842 discovered during the training phase. Alternatively, theTrainer 822 can be configured to retain the ten best Neuro-Fuzzy Structure-Property Models 842 discovered during training, which differ from each other by some predetermined amount. Again, the difference between two models can be defined ‘genotypically’ or ‘phenotypically’, i.e. by comparing the models based either on their internal structure, or their predictions. - e. Model-Specific Methods
- The Structure-
Property Model Generator 800 can generateStructure Property Models 842 based on Model-Specific Methods 808. Model-Specific Methods 808 refer to methods that are specific to the application domain of the model. Examples of such Model-Specific Methods 808 are methods that compute the energy of a particular molecular conformation or receptor-ligand complex such as an empirical force field or a quantum-mechanical method, methods that align the 3-dimensional structures of two or more chemical compounds based on their shape, electronic fields and/or other criteria, methods that predict the affinity and binding conformation of a ligand to a particular receptor binding site, methods that construct receptor models based on the 3-dimensional structures of known ligands, etc. Examples of such Model-Specific Methods 808 are described in greater detail below. - Model-
Specific Methods 808 can include methods that take into account the 3-dimensional structures of the chemical compounds and/or their biological targets. Such methods are of two main types: docking methods and 3D QSAR methods. Examples of such methods that can be used shall now be described. - i. Docking Methods
- Docking methods are methods that attempt to predict the binding conformation between a ligand and a receptor based on their 3-dimensional fit, and/or provide an absolute or relative measure of the affinity of a particular ligand for a particular receptor, based on the quality of their 3-dimensional fit. Docking methods require a 3-dimensional model of the receptor (or parts of the receptor), which can be determined directly through X-ray crystallography, nuclear magnetic resonance, or some other 3D structure-determination technique, or indirectly through homology modeling based on the 3-dimensional structure of a related receptor, for example.
- Most docking methods reported to date are static in nature. That is, a suitable energy function is derived based on an analysis of the 3-dimensional structures of known receptor-ligand complexes, and that energy function is subsequently used to evaluate the energy of a particular receptor-ligand binding conformation. The terms ‘energy’ and ‘energy function’ are used herein to denote any numerical method for evaluating the quality of the interaction between a ligand and a receptor at a particular binding conformation. Such energy functions are usually combined with a search/optimization method such as
Gradient Minimization 832, Monte-Carlo Sampling 834,Simulated Annealing 836,Evolutionary Programming 838, and/or aGenetic Algorithm 840, to identify one or more low energy binding conformations, and to predict the affinity of a particular ligand for a particular receptor. - Docking methods are reviewed in Lybrand,Curr. Opin. Struct. Biol. (April 1995), Shoichet et al., Chem. Biol. (March 1996), Lengauer et al., Curr. Opin. Struct. Biol. (June 1996), Willett, Trends Biotechnol. (1995), and Jackson, Curr. Opin. Biotechnol. (December 1995), incorporated herein by reference in their entirety.
- A docking method can be used to derive 3-dimensional structural models of ligands bound to a particular receptor(s), and/or to obtain estimates of the binding affinity of ligands for a particular receptor(s). In a preferred embodiment, the
Analysis Module 118 determines the 3-dimensional structures of selected receptor-ligand complexes from theDirected Diversity Library 108. Preferably, the 3-dimensional structures of the complexes are determined using X-ray crystallography, nuclear magnetic resonance, or some other suitable 3D structure-determination technique. - It is not necessary that every compound in the
Directed Diversity Library 108 is analyzed by theAnalysis Module 118 to derive a 3-dimensional receptor map. It should be understood that it is possible that none of the compounds in a givenDirected Diversity Library 108 or a sequence ofDirected Diversity Libraries 108 will be analyzed by theAnalysis Module 118 to obtain a 3-dimensional receptor map. It is also possible that every compound in theDirected Diversity Library 108 is analyzed by theAnalysis Module 118 to derive a 3-dimensional receptor map. The determination as to which compounds from theDirected Diversity Library 108 will actually be analyzed by theAnalysis Module 118 to derive a 3-dimensional receptor map can be determined manually (as specified by operator input, for example) or automatically by theDirected Diversity Manager 310. - In one embodiment, the 3D Receptor Map Data522 (FIG. 5) generated by the 3D
Receptor Mapping Module 418 is used by theTrainer 822 to train (i.e. construct and/or refine) the energy function that is used by the docking method to evaluate the energy of a particular receptor-ligand binding conformation. The training of the energy function is carried out usingGradient Minimization 832, Monte-Carlo Sampling 834,Simulated Annealing 836,Evolutionary Programming 838, and/or aGenetic Algorithm 840, so that the prediction error for a prescribed Training Set of 3DReceptor Map Data 522 is minimized. The prediction error is specified based on the difference between the actual and predicted 3-dimensional structures of the receptor-ligand complexes in the Training Set (such as the RMSD criterion, for example), and/or based on the difference between the actual and predicted affinities of the receptor-ligand complexes in the Training Set. Several energy functions and several methods for training such energy functions have been reported, and their implementation should be straightforward to persons skilled in the art. - ii. 3D QSAR Methods
- The Structure-
Property Model Generator 800 can also be used to generate one or more 3D QSAR models. 3D QSAR models are models that are based on an analysis of the 3-dimensional structures of a series of ligands whose biological activities/properties are known. Unlike docking methods, however, 3D QSAR methods do not require knowledge of the 3-dimensional structure of the receptor or receptor-ligand complex. 3D QSAR methods are reviewed in Kubinyi (Ed.), 3D QSAR in Drug Design, ESCOM, Leiden (1993), incorporated herein by reference in its entirety. - In one embodiment, the Structure-
Property Model Generator 800 generates Structure-Property Models 842 based on one or more 3D QSAR methods. Such 3D QSAR methods include, but are not limited to, pharmacophore identification, structural alignment and molecular superposition, molecular shape analysis, mini-receptors and pseudo-receptors, distance geometry, hypothetical active site lattice, and/or molecular interaction fields. Alternatively, any other suitable 3D QSAR method can be used. - Referring back to FIG. 8, a 3D
QSAR Model Structure 820 can be trained to minimize the prediction error usingGradient Minimization 832, Monte-Carlo Sampling 834,Simulated Annealing 836,Evolutionary Programming 838, and/or aGenetic Algorithm 840. Alternatively, the 3DQSAR Model Structure 820 can be trained to minimize the prediction error using a combination ofGradient Minimization 832, Monte-Carlo Sampling 834,Simulated Annealing 836,Evolutionary Programming 838, and/orGenetic Algorithms 840. The training process involves adjusting the free parameters of the 3D QSAR Structure-Property Model Structure 820 to minimize the mean prediction error (or some other suitable error criterion) for a Training Set of Structure-Property Data - As in the functional, Neural Network, Fuzzy, and Neuro-Fuzzy models described above, the
Trainer 822 can be configured to retain more than one3D QSAR Models 842 during the training phase (flowarrow 890 in FIG. 8). For example, theTrainer 822 can be configured to retain the ten best3D QSAR Models 842 discovered during the training phase. Alternatively, theTrainer 822 can be configured to retain the ten best3D QSAR Models 842 discovered during training, which differ from each other by some predetermined amount. Again, the difference between two models can be defined ‘genotypically’ or ‘phenotypically’, i.e. by comparing the models based either on their internal structure, or their predictions. - 4.
Experiment Planner 130 - a.
Selection Criteria 104 - The
Experiment planner 130 can define two general types ofSelection Criteria 104. The first type ofSelection Criteria 104 represents functions or algorithms that receive a compound and/or a list of compounds from theCompound Library 102, and that return a numerical value that represents an individual or collective property of these compounds. The second type ofSelection Criteria 104 represents specific constraints and/or methods for generating such lists of compounds. Both types ofSelection Criteria 104 are discussed below. - i. First Type of
Selection Criteria 104 - The first type of
Selection Criteria 104 represent functions or algorithms that receive a compound and/or a list of compounds from theCompound Library 102, and return a numerical value that represents an individual or collective property of these compounds. Examples ofsuch Selection Criteria 104 that can be used in a preferred embodiment shall now be described. However, it should be understood that the present invention is not limited to this embodiment, and that othersuitable Selection Criteria 104 can also be used. - One such Selection Criterion104 (referred to hereafter as a Compound Availability Criterion) receives as input a list of compounds from the
Compound Library 102, and returns the number or fraction of these compounds that are available from theChemical Inventory 110. - Another such Selection Criterion104 (referred to hereafter as a Reagent Count Criterion) receives as input a list of compounds from the
Compound Library 102, and returns the number ofReagents 114 that must be mixed together in theSynthesis Module 112 in order to synthesize these compounds according to a prescribed synthetic scheme. - Another such Selection Criterion104 (referred to hereafter as a Reagent Availability Criterion) receives as input a list of compounds from the
Compound Library 102, identifies whichReagents 114 must be mixed together in theSynthesis Module 112 in order to synthesize these compounds according to a prescribed synthetic scheme, and returns the number or fraction of theseReagents 114 that are available from theReagent Inventory 116. - Another such Selection Criterion104 (referred to hereafter as a Reagent Cost Criterion) receives as input a list of compounds from the
Compound Library 102, identifies whichReagents 114 must be mixed together in theSynthesis Module 112 in order to synthesize these compounds according to a prescribed synthetic scheme, identifies which of theseReagents 114 need to be purchased from an external source, and returns the cost of purchasing theseReagents 114 from such an external source. - Another such Selection Criterion104 (referred to hereafter as a Molecular Diversity Criterion) receives as input a list of compounds from the
Compound Library 102, and returns a numerical value that represents the molecular diversity of these compounds. Molecular diversity refers to the ability of a given set of compounds to exhibit a variety of prescribed structural, physical, chemical and/or biological characteristics. The field of molecular diversity is reviewed in Martin et al., Reviews in Computational Chemistry, Vol 10, VCH, Weinheim (1977), and Agrafiotis, Encyclopedia of Computational Chemistry, Wiley (in press), incorporated herein by reference in their entirety. - Molecular diversity is a collective property, and is usually defined in a prescribed ‘chemical space’, i.e. in a space defined by a prescribed set of molecular properties or characteristics. Consequently, a diverse collection of compounds in one definition of chemical space may not necessarily be diverse in another definition of chemical space.
- A number of methods and algorithms to extract a diverse subset of compounds from a larger collection have been reported. Such algorithms include clustering, maximin, stepwise elimination, cluster sampling, d-optimal design, etc. Most of these methods are ‘greedy’ methods that select compounds in an incremental manner. The system of the present invention represents molecular diversity as a
Selection Criterion 104, i.e. as a function or algorithm that receives as input a list of compounds, and returns a numerical value that represents the molecular diversity of these compounds. Moreover, the Diversity Criterion can be used as part of an Objective Function that is used by theSelector 106 to select aDirected Diversity Library 108 for the next iteration. -
- where S is a set of compounds, D(S) is the diversity of the compounds in S, n is the number of compounds in S, i, j are used to index the elements of S, and d
ij is the distance between the i-th and j-th compounds in S. In a preferred embodiment, the distance dij is a Minkowski metric (e.g. Manhattan distance, Euclidean distance, ultrametric distance, etc.) in a multivariate property space. Preferably, the property space is defined using one or more molecular features (descriptors). Such molecular features can include topological indices, physicochemical properties, electrostatic field parameters, volume and surface parameters, etc. For example, these features can include, but are not limited to, molecular volume and surface areas, dipole moments, octanol-water partition coefficients, molar refractivities, heats of formation, total energies, ionization potentials, molecular connectivity indices, substructure keys, hashed fingerprints, atom pairs and/or topological torsions, atom layers, 2D and 3D auto-correlation vectors, 3D structural and/or pharmacophoric keys, electronic fields, etc. Alternatively, the Hamming distance: -
- or Dice coefficient:
- can be used. In EQ. 5-7, x
i and xj represent binary strings encoding the i-th and j-th structures, respectively (e.g. a substructure key, pharmacophore key, or hashed fingerprint), k is the length of the binary sets xi and xj , AND(xi , xj ), IOR(xi , xj ) and XOR(xi , xj ) are the binary intersection, union (‘inclusive or’) and ‘exclusive or’ of xi and xj , respectively, and |xi | is the number of bits that are ‘on’ in xi . However, the present invention is not limited to these embodiments, and any suitable distance measure and/or definition of chemical space can alternatively be used. - EQ. 4 exhibits quadratic time complexity, i.e. the time required to compute D(S) scales to the square of the number of compounds in the set S. To remedy this problem, in a preferred embodiment, the method can be combined with a nearest neighbor algorithm such as a k-d tree (Bentley,Comm. ACM, 18(9): 509 (1975), Friedman et al., ACM Trans. Math. Soft., 3(3): 209 (1977)), incorporated herein by reference in its entirety. Alternatively, any other suitable algorithm can be used, including, but not limited to:
- (1) ball trees (Omohundro,International Computer Science Institute Report TR-89-063, Berkeley, Calif. (1989)), incorporated herein by reference in its entirety;
- (2) bump trees (Omohundro,Advances in Neural Information Processing Systems 3, Morgan Kaufmann, San Mateo, Calif. (1991)), incorporated herein by reference in its entirety; and
- (3) gridding, and Voronoi tesselation (Sedgewick,Algorithms in C, Addison-Wesley, Princeton (1990), incorporated herein by reference in its entirety.
- Another such Selection Criterion104 (referred to hereafter as a Molecular Similarity Criterion) receives as input a list of compounds from the
Compound Library 102 and a list of reference compounds, and returns a numerical value that represents the molecular similarity of these compounds to the reference compounds. In a preferred embodiment, the similarity of a list of compounds to a prescribed set of reference compounds is computed using EQ. 8: - where S is a set of compounds, L is a set of reference compounds, M(S, L) is the measure of similarity of the compounds in S to the compounds in L, n is the number of compounds in S, k is the number of compounds in L, i and j are used to index the elements of S and L, respectively, and d
ij is the distance between the i-th compound in S and the j-th compound in L. Thus, EQ. 8 represents the mean distance of a compound in S from its nearest reference compound in L. In a preferred embodiment, the distance dij is a Minkowski metric (e.g. Manhattan distance, Euclidean distance, ultrametric distance, etc.) in a multivariate property space. Preferably, the property space is defined using one or more molecular features (descriptors). Such molecular features can include topological indices, physicochemical properties, electrostatic field parameters, volume and surface parameters, etc. For example, these features can include, but are not limited to, molecular volume and surface areas, dipole moments, octanol-water partition coefficients, molar refractivities, heats of formation, total energies, ionization potentials, molecular connectivity indices, substructure keys, hashed fingerprints, atom pairs and/or topological torsions, atom layers, 2D and 3D auto-correlation vectors, 3D structural and/or pharmacophoric keys, electronic fields, etc. Alternatively, the distance dij can be computed by the Hamming (EQ. 5), Tanimoto (EQ. 6), or Dice coefficients (EQ. 7) using a binary molecular representation, such as a substructure key, pharmacophore key, or hashed fingerprint, for example. However, the present invention is not limited to these embodiments, and any suitable definition of chemical space, distance measure, and/or Similarity Criterion can alternatively be used. - The set of reference compounds may or may not represent real or synthesizable compounds. For example, the set of reference compounds can represent an ‘ideal’ or ‘target’ set of properties that the selected compounds should possess. In this case, the Similarity Criterion in EQ. 8 (or any other suitable Similarity Criterion) measures how well a particular set of compounds matches a prescribed set of target properties.
- The Similarity Criterion can be used to design a set of compounds close to a reference set of compounds, or to design a set of compounds far from a reference set of compounds. For example, if EQ. 8 is used, this can be achieved by simply reversing the sign of D(S, L).
- Another Selection Criterion104 (referred to hereafter as a Synthetic Confidence Criterion) receives as input a compound (or list of compounds) from the
Compound Library 102, and returns a confidence factor that this compound can be synthesized by theSynthesis Module 112 using a prescribed synthetic scheme. For example, this confidence factor can be computed by an expert system for computer-assisted organic synthesis. However, it should be understood that the present invention is not limited to this embodiment. - Another such Selection Criterion104 (referred to hereafter as a Synthetic Yield Criterion) receives as input a compound (or list of compounds) from the
Compound Library 102, and returns a predicted yield for the compound(s), if the compound(s) were to be synthesized by theSynthesis Module 112 according to a prescribed synthetic scheme. For example, the synthetic yield can be computed by an expert system for computer-assisted organic synthesis. However, it should be understood that the present invention is not limited to this embodiment. - Another such Selection Criterion104 (referred to hereafter as a Synthetic Ease or Synthetic Planning Criterion) receives as input a list of compounds from the
Compound Library 102, and returns a numerical value that represents the ease of planning and executing the synthesis of these in theSynthesis Module 112 according to a prescribed synthetic scheme. For example, one such Synthetic Planning Criterion can be a value indicating if (and by how much) a particular collection of compounds exceeds the synthetic capacity of an automatedrobotic Synthesis Module 112. Another example of such a Synthetic Planning Criterion may be the number of different synthetic schemes that must be executed by theSynthesis Module 112 in order to synthesize a particular collection of compounds. However, it should be understood that the present invention is not limited to these embodiments. - Another such Selection Criterion104 (referred to hereafter as a Structure-Property Model Confirmatory Criterion) receives as input a list of compounds from the
Compound Library 102 and a Structure-Property Model 842, and returns the mean predicted property (or activity) of these compounds, as inferred by the specified model. Alternatively, any other suitable numerical value that can be derived from the predicted properties of the specified compounds as inferred by the specified Structure-Property Model can be used. For example, the Structure-Property Model Confirmatory Criterion can return the minimum property, maximum property, or deviation of properties of the specified list of compounds, as inferred by the specified Structure-Property Model. However, it should be understood that the present invention is not limited to these embodiments. Any form of a Structure-Property Model 842 can be used in this regard. For example, the Structure-Property Models 842 can include models derived fromStatistics 802,Neural Networks 804,Fuzzy Logic 806, and/or Model-Specific Methods 808, and/or models derived from a combination ofStatistics 802,Neural Networks 804,Fuzzy Logic 806, and/or Model-Specific Methods 808, such as the Neuro-FuzzyStructure Property Model 1100 described above, for example. These Structure-Property Models 842 can also include models derived from docking methods and/or 3D QSAR methods including, but not limited to, pharmacophore identification, structural alignment and molecular superposition, molecular shape analysis, mini-receptors and pseudo-receptors, distance geometry, hypothetical active site lattice, and/or molecular interaction fields. However, it should be understood that the present invention is not limited to these embodiments. - Another such Selection Criterion104 (referred to hereafter as a Structure-Property Model Discriminatory Criterion) receives as input a compound (or list of compounds) from the
Compound Library 102 and two or more Structure-Property Models 842, and returns a numerical value that represents the ability (or collective ability) of this compound (or list of compounds) to discriminate between the specified models. The term ‘discriminate’ is used herein to denote the ability of a compound (or list of compounds) to distinguish between two or more models. A compound is said to possess high discriminatory ability if the models differ substantially in their predictions of the properties of that compound. Structure-Property ModelDiscriminatory Criteria 104 can be used if the Structure-Property Models 842 are weak or under-determined, for example. In such cases, it is often difficult to select which Structure-Property Model(s) 842 should be used to select theDirected Diversity Library 108 for the next iteration. Thus, it may be desirable to select compounds that can discriminate between two or more Structure-Property Models 842, so that the Structure-Property Models 842 that reflect true correlations are reinforced, while the Structure-Property Models 842 that do not reflect true correlations are eliminated. An example of a Structure-Property Model Discriminatory Criterion is the difference between the minimum and maximum property predictions for a given compound as inferred by the specified Structure-Property Models 842, or the deviation of the property predictions for a given compound as inferred by the specified Structure-Property Models 842. However, it should be understood that the present invention is not limited to these embodiments. As with Structure-PropertyModel Confirmatory Criteria 104, any form of a Structure-Property Model 842 can be used in this regard. For example, the Structure-Property Models 842 can include models derived fromStatistics 802,Neural Networks 804,Fuzzy Logic 806, and/or Model-Specific Methods 808, and/or models derived from a combination ofStatistics 802,Neural Networks 804,Fuzzy Logic 806, and/or Model-Specific Methods 808, such as the Neuro-FuzzyStructure Property Model 1100 described above, for example. These Structure-Property Models 842 can also include models derived from docking methods and/or 3D QSAR methods including, but not limited to, pharmacophore identification, structural alignment and molecular superposition, molecular shape analysis, mini-receptors and pseudo-receptors, distance geometry, hypothetical active site lattice, and/or molecular interaction fields. However, it should be understood that the present invention is not limited to these embodiments. - Structure-Property Model Discriminatory Criteria can also be used to determine if a particular compound or list of compounds exhibits selective properties. For example, Structure-Property Model
Discriminatory Criteria 104 can be used to determine whether a particular compound can bind selectively to a specific target (also referred to herein as a Selectivity Criterion). For example, aSelectivity Criterion 104 can be implemented using EQ. 9: - where s
i denotes the selectivity of a particular compound for the i-th property (EQ. 9 assumes that the properties pi are normalized). For example, EQ. 9 can be used to describe whether a particular compound binds selectively to the enzyme Thrombin versus the enzymes Trypsin and Urokinase, by substituting pi with the binding affinities of that compound for Thrombin, Trypsin and Urokinase as predicted by a Thrombin, Trypsin and Urokinase Structure-Property Model 842, respectively. If more than oneStructure Property Models 842 are available for a particular property (or properties), EQ. 9 can be replaced by EQ. 10: - where p
ij is the i-th property of the compound as predicted by the j-th Structure-Property Model 842, and mean(.) is a function that returns the mean of its arguments. - Another such Selection Criterion104 (referred to hereafter as a Patentability Criterion) receives as input a compound (or list of compounds) from the
Compound Library 102, and returns a value indicating whether this compound is protected by an issued US or foreign patent. Preferably, theExperiment Planner 130 searches a patent database to determine is the specified compound (or list of compounds) has been patented or has not been patented. - Another such Selection Criterion104 (referred to hereafter as a Bioavailability Criterion) receives as input a compound (or list of compounds) from the
Compound Library 102, and returns a value that represents the predicted bioavailability of that compound, as inferred by a suitable Bioavailability Structure-Property Model. - Another such Selection Criterion104 (referred to hereafter as a Toxicity Criterion) receives as input a compound (or list of compounds) from the
Compound Library 102, and returns a value that represents the predicted toxicity of that compound, as inferred by a suitable Toxicity Structure-Property Model. - Alternatively, the
Experiment Planner 130 can defineother Selection Criteria 104 that can be derived from information pertaining to a given compound or list of compounds, and that can be used to guide the selection of theDirected Diversity Library 108 for the next iteration. - ii. Second Type of
Selection Criteria 104 - The second type of
Selection Criteria 104 represent specific constraints and/or methods for generating such lists of compounds. A few examples ofsuch Selection Criteria 104 shall now be described. - One
such Selection Criterion 104 defines a list of compounds that should not be included in theDirected Diversity Library 108 for the next iteration (referred to herein as the Excluded Compounds Criterion). For example, these compounds (referred to herein as the Excluded Compounds) can be compounds whose properties of interest are already known (e.g. compounds previously analyzed by the Analysis Module 118). Alternatively, the Excluded Compounds can be compounds whose predicted bioavailability as predicted by a Bioavailability Structure-Property Model can be below a prescribed threshold, compounds whose predicted toxicity as predicted by a Toxicity Structure-Property Model can be above a prescribed threshold, compounds that requireexpensive Reagents 114 to be mixed together in order to be generated by the Synthesis Module 112 (e.g. Reagents 114 whose const exceeds a prescribed value), compounds that cannot be made in an automated or partially automated fashion by theSynthesis Module 112, etc. - The Excluded Compounds can also represent combinations of compounds that cannot all be part of a
Directed Diversity Library 108 for the next iteration. For example, the Excluded Compounds can be a set of compounds that require more than one synthetic scheme to be executed by theSynthesis Module 112 in order to be synthesized. For example, if theCompound Library 102 is comprised of two or more combinatorial chemical libraries, each of which requires a different synthetic scheme to be executed by theSynthesis Module 112 in order for the compounds in these libraries to be synthesized, the Excluded Compounds Criterion can be used to exclude combinations of compounds that cannot all be made using a single synthetic scheme, or to limit the selection of compounds for the nextDirected Diversity Library 108 to a specific combinatorial library (or libraries). Alternatively, the Excluded Compounds can represent combinations of compounds that require more than a prescribed number ofReagents 114 to be mixed together by theSynthesis Module 112 in order for these compounds to be synthesized. However, the present invention is not limited to these embodiments. - Another
such Selection Criterion 104 defines the number and/or subset ofReagents 114 that can be mixed together by theSynthesis Module 112. Such a Selection Criterion limits the selection of theDirected Diversity Library 108 for the next iteration to a specific number and/or subset of building blocks. - Another
such Selection Criterion 104 defines the way in which theReagents 114 are to be mixed together by theSynthesis Module 112. For example, such aSelection Criterion 104 can specify that twentyReagents 114 must be divided into two sets of ten, and these two sets of tenReagents 114 must be mixed together in a combinatorial fashion to generate all one hundred combinations of a combinatorial library with two variable sites (referred to as an Array Design hereafter). However, the present invention is not limited to this embodiment. - b.
Objective Functions 105 - The
Experiment planner 130 uses one ormore Selection Criteria 104 to define one or moreObjective Functions 105. TheObjective Function 105 represents a function and/or algorithm that receives a list of compounds from theCompound Library 102 and a list ofSelection Criteria 104, and returns a numerical value that represents a collective property of the specified compounds. -
- where S is a set of compounds, c
i (S) is the value of the i-th Selection Criterion 104 for the set S, wi is a weighting factor, and f(S) is the value of theObjective Function 105 for the set of compounds S. Alternatively, any other suitable functional form can be used. - An
Objective Function 105 might combine, for example, a Molecular Diversity Criterion with a Molecular Similarity Criterion using EQ. 11. In this case, the weights wi determine the relative influence of the Molecular Diversity Criterion and the Molecular Similarity Criterion. For example, when the Molecular Diversity Criterion and Molecular Similarity Criterion are defined on a similar scale, EQ. 11 can be used to compute a numerical value that reflects the collective ability of a given set of compounds S to satisfy both the Molecular Diversity Criterion and Molecular Similarity Criterion under the specified weights wi . SuchObjective Functions 105 that combinemultiple Selection Criteria 104 are referred to hereafter as Multi-Objective Functions or Multi-Criteria Functions. Alternatively, anObjective Function 105 can include asingle Selection Criterion 104. For example, anObjective Function 105 can simply return the molecular diversity of a collection of compounds, as computed by a Molecular Diversity Criterion. Examples of the use of suchObjective Functions 105 and Multi-Objective Functions (not shown) to select aDirected Diversity Library 108 for the next iteration are described below. - 5. The
Selector 106 - The
Selector 106 selects aDirected Diversity Library 108 for analysis, according to theSelection Criteria 104 and anyObjective Functions 105. Preferably, theDirected Diversity Library 108 is comprised of compounds that are optimal or nearly optimal with respect to the specifiedSelection Criteria 104 andObjective Functions 105. Moreover, theDirected Diversity Library 108 should be comprised of compounds that satisfy any constraints specified by some of theseSelection Criteria 104. - The task of identifying an optimal or nearly optimal set of compounds for the next
Directed Diversity Library 108, given theSelection Criteria 104 andObjective Functions 105, involves a search of all subsets of compounds from theCompound Library 102 that satisfy the constraints defined by theExperiment Planner 130. As used herein, the term ‘constraint’ denotes aSelection Criterion 104 that excludes certain compounds or certain combinations of compounds from being selected as part of theDirected Diversity Library 108 for the next iteration. Contrast constraints toother Selection Criteria 104, which specify desired properties that the selected compounds should possess, either individually or collectively. TheDirected Diversity Library 108 for the next iteration should satisfy any specified constraints and should maximize the desired properties, to the extent possible. - The task of identifying an optimal or nearly optimal set of compounds for the next
Directed Diversity Library 108 can be an enormous combinatorial problem. For example, when oneSelection Criterion 104 limits the selection to an n-membered Compound Library 102, and anotherSelection Criterion 104 specifies that the size of theDirected Diversity Library 108 for the next iteration should be comprised of k compounds from the aforementioned n-membered library, the number of different k-membered subsets of the n-membered library is given by the binomial: - This task is combinatorially explosive because, in all but the simplest cases, N is far too large to allow for the construction and evaluation of every possible subset given current data processing technology. As a result, a variety of stochastic modeling techniques can be employed, that are capable of providing good approximate solutions to combinatorial problems in realistic time frames. However, the present invention envisions and includes the construction and evaluation of every individual k-membered subset once computer technology advances to an appropriate point.
- The
Selector 106 receives theSelection Criteria 104 andObjective Functions 105 and returns theDirected Diversity Library 108. TheSelector 106 preferably uses a stochastic (or exhaustive, if possible) search/optimization technique. - Referring to FIGS. 12 and 13, in one embodiment, the
Selector 106 is coupled to theCompound Database 134, theReagent Database 138 and the Structure-Property Database 126 viadedicated Servers 1204. TheSelector 106 can send a proposedCompound List 1302 theServers 1204. TheServers 1204 can retrieve property values for theCompound List 1302 and return them to theSelector 106 asValues 1304. - Preferably, the
Selector 106 generates an initial list of proposed compounds based onSelection Criteria 104 and then refines the list through an iterative process. For example, theSelector 106 can employ Monte-Carlo Sampling 834,Simulated Annealing 836,Evolutionary Programming 838, and/or aGenetic Algorithm 840, to produce a list of compounds that best satisfy all theSelection Criteria 104 in the manner specified by theObjective Function 105. The list can be refined to become theDirected Diversity Library 108 for the next iteration. - For example, referring to FIG. 13, each
Server 1204 can receive aCompound List 1302 from theSelector 106. TheServers 1204 can access one or more of thedatabases Compound List 1302, and use these property values to compute the values of therespective Selection Criteria 104. TheServers 1204 can return their respective computed values as Selection Criteria Values 1304 forCompound List 1302. - Preferably, the
Server 1204 can be configured by user input. For example, a user might want to select a particular method for computing molecular diversity. Similarly, a user might want to select one or more particular Structure-Property Models 192 for predicting the properties of compounds. - In one embodiment, the
Selector 106 selects theDirected Diversity Library 108 for the next iteration using a Monte-Carlo Sampling 834 orSimulated Annealing 836 algorithm. Operation of this embodiment is described below with reference to FIG. 18. - 6. Structure of the Present Invention
- A lead generation/
optimization system 100 can be implemented as a fully automated system or as a partially automated system that relies, in part, on human interaction. For example, human interaction can be employed to perform or assist in the functions described herein with respect to theSynthesis Module 112 and/or by theAnalysis Module 118 and/or theDirected Diversity Manager 310. - The automated portion of the lead generation/
optimization system 100 can be implemented as hardware, firmware, software or any combination thereof, and can be implemented in one or more computer systems and/or other processing systems. In one embodiment, the automated portion of the invention is directed toward one or more computer systems capable of carrying out the functionality described herein. - Referring to FIG. 19, an
example computer system 1901 includes one or more processors, such asprocessor 1904.Processor 1904 is connected to acommunication bus 1902. Various software embodiments are described in terms of thisexample computer system 1901. After reading this description, it will become apparent to a person skilled in the relevant art how to implement the invention using other computer systems and/or computer architectures. -
Computer system 1902 also includes amain memory 1906, preferably random access memory (RAM), and can also include asecondary memory 1908.Secondary memory 1908 can include, for example, ahard disk drive 1910 and/or aremovable storage drive 1912, representing a floppy disk drive, a magnetic tape drive, an optical disk drive, etc.Removable storage drive 1912 reads from and/or writes to aremovable storage unit 1914 in a well known manner.Removable storage unit 1914, represents a floppy disk, magnetic tape, optical disk, etc. which is read by and written to byremovable storage drive 1912.Removable storage unit 1914 includes a computer usable storage medium having stored therein computer software and/or data. - In alternative embodiments,
secondary memory 1908 can include other similar means for allowing computer programs or other instructions to be loaded intocomputer system 1901. Such means can include, for example, aremovable storage unit 1922 and aninterface 1920. Examples of such can include a program cartridge and cartridge interface (such as that found in video game devices), a removable memory chip (such as an EPROM, or PROM) and associated socket, and otherremovable storage units 1922 andinterfaces 1920 which allow software and data to be transferred from theremovable storage unit 1922 tocomputer system 1901. -
Computer system 1901 can also include acommunications interface 1924.Communications interface 1924 allows software and data to be transferred betweencomputer system 1901 and external devices. Examples ofcommunications interface 1924 include, but are not limited to a modem, a network interface (such as an Ethernet card), a communications port, a PCMCIA slot and card, etc. Software and data transferred viacommunications interface 1924 are in the form of signals which can be electronic, electromagnetic, optical or other signals capable of being received bycommunications interface 1924. Thesesignals 1926 are provided to communications interface via achannel 1928. Thischannel 1928 carriessignals 1926 and can be implemented using wire or cable, fiber optics, a phone line, a cellular phone link, an RF link and other communications channels. - In this document, the terms “computer program medium” and “computer usable medium” are used to generally refer to media such as
removable storage device 1912, a hard disk installed inhard disk drive 1910, and signals 1926. These computer program products are means for providing software tocomputer system 1901. - Computer programs (also called computer control logic) are stored in main memory and/or
secondary memory 1908. Computer programs can also be received viacommunications interface 1924. Such computer programs, when executed, enable thecomputer system 1901 to perform the features of the present invention as discussed herein. In particular, the computer programs, when executed, enable theprocessor 1904 to perform the features of the present invention. Accordingly, such computer programs represent controllers of thecomputer system 1901. - In an embodiment where the invention is implemented using software, the software can be stored in a computer program product and loaded into
computer system 1901 usingremovable storage drive 1912,hard drive 1910 orcommunications interface 1924. The control logic (software), when executed by theprocessor 1904, causes theprocessor 1904 to perform the functions of the invention as described herein. - In another embodiment, the automated portion of the invention is implemented primarily in hardware using, for example, hardware components such as application specific integrated circuits (ASICs). Implementation of the hardware state machine so as to perform the functions described herein will be apparent to persons skilled in the relevant art(s).
- In yet another embodiment, the invention is implemented using a combination of both hardware and software.
- Referring to FIG. 3, a lead generation/
optimization system 300 includes one or more central processing units (CPUs) 302 a, 302 b and 302 c, which can be one or more ofprocessors 1904. CPUs 302 operate according to control logic 304, 306, and 308, which can be software, firmware, hardware or any combination thereof. - Processors302 a, 302 b and 302 c can represent a single processor 302 or can represent multiple processors. Control logic 304, 306, and 308 can be executed on a single processor or on multiple processors 302.
- Control logic304, 306, and 308 preferably represent one or more computer programs such that the processor 302 operates according to software instructions contained in the control logic 304, 306, and 308. Alternatively, the processor 302 and/or the control logic 304, 306, and 308 are implemented as a hardware state machine.
- Processor302 a and control logic 304 collectively represent the
Experiment Planner 130. Processor 302 b and control logic 306 collectively represent theSelector 106. Processor 302 and control logic 308 collectively represent theSynthesis Protocol Generator 202. TheExperiment Planner 130, theSelector 106, and theSynthesis Protocol Generator 202 collectively represent aDirected Diversity Manager 310. -
Directed Diversity Manager 310 can be implemented as part of a variety of computer systems. For example,Directed Diversity Manager 310 can be implemented on an Indigo, Indy, Onyx, Challenge, Power Challenge, Octane or Origin 2000 computer made by Silicon Graphics, Inc., of Mountain View, Calif. Another suitable form for the processor 302 is a DEC Alpha Workstation computer made by Digital Equipment Corporation of Maynard, Mass. Another suitable form for the Processor 302 is one of the Pentium family of processors from Intel, such as the Pentium Pro or Pentium II. Any other suitable computer system could alternatively be used. - A
Communication Medium 312, comprising one or more data buses and/or IO (input/output) interface devices, connect theExperiment Planner 130, theSelector 106, and theSynthesis Protocol Generator 202 to a number of peripheral devices, such as one ormore Input Devices 316, one ormore Output Devices 318, one ormore Synthesis Modules 112, one ormore Analysis Modules 118, and one or moreData Storage Devices 314. - The Input Device(s)316 receive input (such as data, commands, etc.) from human operators and forward such input to the
Experiment Planner 130, theSelector 106, and/or theSynthesis Protocol Generator 202 via theCommunication Medium 312. Any well known, suitable input device can be used in the present invention to receive input, commands, selections, etc., fromoperators 317, such as a keyboard, pointing device (mouse, roller ball, track ball, light pen, etc.), touch screen, voice recognition, etc. User input can also be stored and then retrieved, as appropriate, from data/command files. - The Output Device(s)318 output information to
human operators 317. TheExperiment Planner 130, theSelector 106, and/or theSynthesis Protocol Generator 202 transfer such information to the Output Device(s) 318 via theCommunication Medium 312. Any well known, suitable output device can be used in the present invention, such as a monitor, a printer, a floppy disk drive, a text-to-speech synthesizer, etc. - Preferably, the
Synthesis Module 112 receives Robotic Synthesis Instructions 204 (FIG. 2) from theSynthesis Protocol Generator 202 via theCommunication Medium 312. TheSynthesis Module 112 operates according to the Robotic Synthesis Instructions 204 to selectively combine a particular set ofReagents 114 from theReagent Inventory 116 to thereby generate the compounds from theDirected Diversity Library 108 specified by theSelector 106, that are not retrieved from theChemical Inventory 110. - Where
Directed Diversity Manager 310 is implemented as part of a computer system,Communication Medium 312, Input Device(s) 316 and Output Device(s) 318 can be an integral part of the computer system. - The
Synthesis Module 112 is preferably a robot capable of mix-and-split, solid phase chemistry for coupling chemical building blocks. As used herein, the term “robot” refers to any automated or partially automated device that automatically or semi-automatically performs functions specified by instructions such as the Robotic Synthesis Instructions 204 (FIG. 2) generated by theSynthesis Protocol Generator 202. - The
Synthesis Module 112 preferably performs selective micro-scale solid state synthesis of a specific combinatorial library ofDirected Diversity Library 108 compounds, but is not limited to this embodiment. TheSynthesis Module 112 preferably cleaves and separates the compounds of theDirected Diversity Library 108 from support resin and distributes the compounds into preferably 96 wells with from 1 to 20Directed Diversity Library 108 compounds per well, corresponding to an output of 96 to 1920 compounds per synthetic cycle iteration, but is not limited to this embodiment. This function can alternatively be performed by a well known liquid transfer robot (not shown). Synthesis Module(s) suitable for use with the present invention are well known and are commercially available from a number of manufacturers, such as the following:TABLE 1 Manufacturer City State Model Advanced ChemTech Louisville KY 357 MPS 390 MPS Rainin Woburn MA Symphony Perkin-Elmer Corporation Applied Foster City CA 433A Biosystems Division Millipore Bedford MA 9050 Plus - All of the instruments listed in Table 1 perform solid support-based peptide synthesis only. The Applied Biosystems and the Millipore instruments are single peptide synthesizers. The Rainin Symphony is a multiple peptide synthesizer capable of producing up to twenty peptides simultaneously. The Advanced ChemTech instruments are also multiple peptide synthesizers, but the 357 MPS has a feature utilizing an automated mix-and-split technology. The peptide synthesis technology is preferred in producing the
Directed Diversity Libraries 108 associated with the present invention. See, for example, Gallop, M. A. et al., J. Med. Chem. 37, 1233-1250 (1994), incorporated herein by reference in its entirety. - Peptide synthesis is by no means the only approach envisioned and intended for use with the present invention. Other chemistries for generating the
Directed Diversity Libraries 108 can also be used. For example, the following are suitable: peptoids (PCT Publication No. WO 91/19735, Dec. 26, 1991), encoded peptides (PCT Publication WO 93/20242, Oct. 14, 1993), random bio-oligomers (PCT Publication WO 92/00091, Jan. 9, 1992), benzodiazepines (U.S. Pat. No. 5,288,514), diversomeres such as hydantoins, benzodiazepines and dipeptides (Hobbs DeWitt, S. et al., Proc. Natl. Acad. Sci. USA 90: 6909-6913 (1993)), vinylogous polypeptides (Hagihara et al., J. Amer. Chem. Soc. 114: 6568 (1992)), nonpeptidal peptidomimetics with a Beta-D-Glucose scaffolding (Hirschmann, R. et al., J. Amer. Chem. Soc. 114: 9217-9218 (1992)), analogous organic syntheses of small compound libraries (Chen, C. et al., J. Amer. Chem. Soc. 116: 2661(1994)), oligocarbamates (Cho, C. Y. et al., Science 261: 1303 (1993)), and/or peptidyl phosphonates (Campbell, D. A. et al., J. Org. Chem. 59: 658 (1994)). See, generally, Gordon, E. M. et al., J. Med. Chem. 37: 1385 (1994). The contents of all of the aforementioned publications are incorporated herein by reference. - Alternatively, the
Synthesis Module 112 can be a robot capable of solution-phase synthesis, or a workstation that enables manual synthesis of the compounds in theDirected Diversity Library 108. A number of well-known robotic systems have also been developed for solution phase chemistries. These systems include automated workstations like the automated synthesis apparatus developed by Takeda Chemical Industries, LTD. (Osaka, Japan) and many robotic systems utilizing robotic arms (Zymate II, Zymark Corporation, Hopkinton, Mass.; Orca, Hewlett-Packard, Palo Alto, Calif.) that mimic the manual synthetic operations performed by a chemist. Any of the above devices are suitable for use with the present invention. The nature and implementation of modifications to these devices (if any) so that they can operate as discussed herein will be apparent to persons skilled in the relevant art. - It is noted that the functions performed by the
Synthesis Module 112 can be alternatively performed by human operators, aided or not aided by robots and/or computers. - The Analysis Module(s)118 receives the chemical compounds synthesized by the Synthesis Module(s) 112 or retrieved from the
Chemical Inventory 110. The Analysis Module(s) 118 analyzes these compounds to obtain Structure-Property Data 124 pertaining to the compounds. - FIG. 4 is a more detailed structural block diagram of an embodiment of the Analysis Module(s)118. The Analysis Module(s) 118 include one or
more Assay Modules 402, such as an EnzymeActivity Assay Module 404, a CellularActivity Assay Module 406, aToxicology Assay Module 408, and/or aBioavailability Assay Module 410. The EnzymeActivity Assay Module 404 assays the compounds synthesized by the Synthesis Module(s) 112 using well known procedures to obtain enzyme activity data relating to the compounds. The CellularActivity Assay Module 406 assays the compounds using well known procedures to obtain cellular activity data relating to the compounds. TheToxicology Assay Module 408 assays the compounds using well known procedures to obtain toxicology data relating to the compounds. TheBioavailability Assay Module 410 assays the compounds using well known procedures to obtain bioavailability data relating to the compounds. - The Enzyme
Activity Assay Module 404, CellularActivity Assay Module 406,Toxicology Assay Module 408, andBioavailability Assay Module 410 are implemented in a well known manner to facilitate the preparation of solutions, initiation of the biological or chemical assay, termination of the assay (optional depending on the type of assay) and measurement of the results, commonly using a counting device, spectrophotometer, fluorometer or radioactivity detection device. Each of these steps can be done manually (with or without the aid of robots or computers) or by robots, in a well known manner. Raw data is collected and stored on magnetic media under computer control or input manually into a computer. Useful measurement parameters such as dissociation constants or 50% inhibition concentrations can then be manually or automatically calculated from the observed data, stored on magnetic media and output to a relational database. - The Analysis Module(s)118 optionally include a Structure and
Composition Analysis Module 414 to obtain two dimensional structure and composition data relating to the compounds. Preferably, the structure andcomposition analysis module 414 is implemented using a liquid chromatograph device and/or a mass spectrometer. In one embodiment, a sampling robot (not shown) transfers aliquots from the 96 wells to a coupled liquid chromatography-mass spectrometry system to perform sample analysis. - The Structure and
Composition Analysis Module 414 can be utilized to determine product composition and to monitor reaction progress by comparison of the experimental results to the theoretical results predicted by theSynthesis Protocol Generator 202. The Analysis Module(s) 118 can use, but is not limited to, infra-red spectroscopy, decoding of a molecular tag, mass spectrometry (MS), gas chromatography (GC), liquid chromatography (LC), or combinations of these techniques (i.e., GC-MS, LC-MS, or MS-MS). Preferably, the Structure andComposition Analysis Module 414 is implemented using a mass spectrometric technique such as Fast Atom Bombardment Mass Spectrometry (FABSMS) or triple quadrapole ion spray mass spectrometry, optionally coupled to a liquid chromatograph, or matrix-assisted laser desorption ionization time-of-flight mass spectrometry (MALDI-TOF MS). MALDI-TOF MS is well known and is described in a number of references, such as: Brummell et al., Science 264:399 (1994); Zambias et al., Tetrahedron Lett. 35:4283 (1994), both incorporated herein by reference in their entireties. - Liquid chromatograph devices, gas chromatograph devices, and mass spectrometers suitable for use with the present invention are well known and are commercially available from a number of manufacturers, such as the following:
TABLE 2 GAS CHROMATOGRAPHY Manufacturer City State Model Hewlett-Packard Company Palo Alto CA 5890 Varian Associates Inc. Palo Alto CA Shimadzu Scientific Inst. Columbia MD GC-17A Fisons Instruments Beverly MA GC 8000 -
TABLE 3 LIQUID CHROMATOGRAPHY Manufacturer City State Model Hewlett-Packard Company Palo Alto CA 1050, 1090 Varian Associates Inc. Palo Alto CA Rainin Instrument Co. Woburn MA Shimadzu Scientific Inst. Columbia MD LC-10A Waters Chromatography Milford MA Millenium Perkin-Elmer Corporation Norwalk CT Hitachi Instruments Inc. San Jose CA -
TABLE 4 MASS SPECTROSCOPY Manufacturer City State Model Hewlett-Packard Company Palo Alto CA Varian Associates Inc. Palo Alto CA Kratos Analytical Inc. Ramsey NJ MS80RFAQ Finnigan MAT San Jose CA Vision 2000, TSQ-700 Fisons Instruments Beverly MA API LC/MS, AutoSpec Perkin-Elmer Corporation Norwalk CT API-III - Modifications to these devices may be necessary to fully automate both the loading of samples on the systems as well as the comparison of the experimental and predicted results. The extent of the modification can vary from instrument to instrument. The nature and implementation of such modifications will be apparent to persons skilled in the art.
- The Analysis Module(s)118 can optionally further include a Chemical
Synthesis Indicia Generator 412 that analyzes the structure and composition data obtained by the Structure andComposition Analysis Module 414 to determine which compounds were adequately synthesized by the Synthesis Module(s) 112, and which compounds were not adequately synthesized by the Synthesis Module(s) 112. In an embodiment, the ChemicalSynthesis Indicia Generator 412 is implemented using a processor, such as Processor 302, operating in accordance with appropriate control logic, such as Control Logic 304, 306, and/or 308. Preferably, the Control Logic 304, 306, and/or 308 represents a computer program such that the Processor 302 operates in accordance with instructions in the Control Logic 304, 306, and/or 308 to determine which compounds were adequately synthesized by the Synthesis Module(s) 112, and which compounds were not adequately synthesized by the Synthesis Module(s) 112. Persons skilled in the relevant art will be able to produce such Control Logic 304, 306, and/or 308 based on the discussion of the ChemicalSynthesis Indicia Generator 412 contained herein. - The Analysis Module(s)118 can also include a three dimensional (3D)
Receptor Mapping Module 418 to obtain three dimensional structure data relating to a receptor binding site. The 3DReceptor Mapping Module 418 preferably determines the three dimensional structure of a receptor binding site empirically through x-ray crystallography and/or nuclear magnetic resonance spectroscopy, and/or as a result of the application of extensive 3D QSAR (quantitative structure-activity relationship) and receptor field analysis procedures, well known to persons skilled in the art and described in: “Strategies for Indirect Computer-Aided Drug Design”, Gilda H. Loew et al., Pharmaceutical Research, Volume 10, No. 4, pages 475-486 (1993); “Three Dimensional Structure Activity Relationships”, G. R. Marshall et al., Trends In Pharmceutical Science, 9: 285-289 (1988). Both of these documents are herein incorporated by reference in their entireties. - The functions performed by the
Analysis Modules 118 can alternatively be performed by human operators, with or without the aid of robots and/or computers. - The Analysis Module(s)118 can additionally include a Physical and/or Electronic Property Analysis Module(s) 416 that analyzes the compounds synthesized by the Synthesis Module(s) 112 to obtain physical and/or electronic property data relating to the compounds. Such properties can include water/octanol partition coefficients, molar refractivity, dipole moment, fluorescence etc. Such properties can either be measured experimentally or computed using methods well known to persons skilled in the art.
- Referring again to FIG. 3, the
Data Storage Device 314 is a read/write high storage capacity device such as a tape drive unit or a hard disk unit. Data storage devices suitable for use with the present invention are well known and are commercially available from a number of manufacturers, such as the 2 gigabyte Differential System Disk, part number FTO-SD8-2NC, and the 10 gigabyte DLT tape drive, part number P-W-DLT, both made by Silicon Graphics, Inc., of Mountain View, Calif. TheReagent Database 138,Compound Database 134, and Structure-Property Database 126 are stored in theData Storage Device 314. - The
Reagent Database 138 contains information pertaining to the reagents in theReagent Inventory 116. In particular, theReagent Database 138 contains information pertaining to the chemical substructures, chemical properties, physical properties, biological properties, and electronic properties of the reagents in theReagent Inventory 116. - The Structure-
Property Database 126 stores Structure-Property Data 124, 128 (FIG. 1) pertaining to the compounds that were synthesized by the Synthesis Module(s) 112. Such Structure-Property Data Property Data Property Database 126 via theCommunication Medium 312. - FIG. 5 is a more detailed block diagram of an embodiment of the Structure-
Property Database 126. The Structure-Property Database 126 includes a Structure andComposition Database 502, a Physical andElectronic Properties Database 504, aChemical Synthesis Database 506, aChemical Properties Database 508, a 3DReceptor Map Database 510, and aBiological Properties Database 512. The Structure andComposition Database 502 stores Structure andComposition Data 514 pertaining to compounds synthesized by the Synthesis Module(s) 112 and analyzed by the Analysis Module(s) 118. Similarly, the Physical andElectronic Properties Database 504,Chemical Synthesis Database 506,Chemical Properties Database Receptor Map Database 510, andBiological Properties Database 512 store Physical andElectronic Properties Data 516,Chemical Synthesis Data 518,Chemical Properties Data Receptor Map Data 522, andBiological Properties Data 524, respectively, pertaining to compounds retrieved from theChemical Inventory 110 and/or synthesized by the Synthesis Module(s) 112, and analyzed by the Analysis Module(s) 118. The Structure andComposition Data 514, Physical andElectronic Properties Data 516,Chemical Synthesis Data 518,Chemical Properties Data Receptor Map Data 522, andBiological Properties Data 524 collectively represent the Structure-Property Data - In an embodiment, the Structure and
Composition Database 502, Physical andElectronic Properties Database 504,Chemical Synthesis Database 506,Chemical Properties Database Receptor Map Database 510, andBiological Properties Database 512 each include one record for each chemical compound retrieved from theChemical Inventory 110 and/or synthesized by the Synthesis Module(s) 112 and analyzed by the Analysis Module(s) 118 (other database structures could alternatively be used). - 7. Operation of the Present Invention
- The operation of the lead generation/
optimization system 100 shall now be described in detail with reference to the process flowchart 600 of FIG. 6. Steps 602-618 in process flowchart 600 represent a preferred method for identifying chemical compounds having desired properties. - The lead generation/
optimization system 100 implements an iterative process where, during each iteration: - (1) a set of
Selection Criteria 104 and/or one or more Objective Functions are defined (step 602); - (2) a
Directed Diversity Library 108 is selected (step 604); - (3a) compounds in the
Directed Diversity Library 108 are retrieved from the Chemical Inventory 110 (step 606); and/or - (3b) compounds in the
Directed Diversity Library 108 that were not retrieved from theChemical Inventory 110 are synthesized (step 608); - (4) the compounds in the
Directed Diversity Library 108 are analyzed to obtain Structure-Property Data 124 pertaining to compounds (step 612); - (5) the Structure-
Property Data 124 are stored in a Structure-Property Database 126 (step 614); - (6)
new Leads 122 are identified and classified (step 616); - (7) Structure-Property Models with enhanced predictive and discriminating capabilities are constructed and/or refined to allow the selection and/or refinement of a new set of
Selection Criteria 104 for the next iteration (step 618). - In an embodiment, steps602-618 of flowchart 600 are performed during each iteration of the iterative process as indicated by control line 620 in flowchart 600.
- Referring to FIG. 6, the process begins at
step 602, where theExperiment Planner 130 definesSelection Criteria 104 and/or one or moreObjective Functions 105. TheExperiment Planner 130 definesSelection Criteria 104 and/orObjective Functions 105 based on current Structure-Property Data 124 and Historical Structure-Property Data 128. Historical Structure-Property Data 128 can be identified from previous iterations of the lead generation/optimization system 100 and/or from other independent experiments. TheExperiment Planner 130 can also defineSelection Criteria 104 and/orObjective Functions 105 based on one or more of:Compound Data 132;Reagent Data 136; DesiredProperties 120; and Structure-Property Models 192. TheSelection Criteria 104 and/orObjective Functions 105 are sent to theSelector 106. Additional details ofstep 602 are provided below, in the description of the next iteration of the process. - In
step 604, theSelector 106 selects aDirected Diversity Library 108. TheSelector 106 uses theSelection Criteria 104 and/orObjective Functions 105 that were defined by theExperiment Planner 130 in step 603. TheSelector 106 can use a stochastic (or exhaustive, if possible) search/optimization technique. The search can include, but is not limited to, Monte-Carlo Sampling 834,Simulated Annealing 836,Evolutionary Programming 838, and/or aGenetic Algorithm 840, to produce a list of compounds that best satisfy all theSelection Criteria 104 in the manner specified by theObjective Function 105, and will comprise theDirected Diversity Library 108 for the next iteration. - In one embodiment, the
Selector 106 selects theDirected Diversity Library 108 for the next iteration using a Monte-Carlo Sampling 834 orSimulated Annealing 836 algorithm. In this embodiment, a collection of compounds that satisfies all the constraints specified by theExperiment Planner 130 represents a ‘state’, and is encoded in a manner that is most appropriate given those constraints. Thus, the precise encoding of a state can vary, depending on some of theSelection Criteria 104 specified by theExperiment Planner 130. - Referring to the process flowchart of FIG. 18, the process of
step 604 is illustrated in greater detail for where a Monte-Carlo Sampling 834 orSimulated Annealing 836 algorithm is used. - In step1804, a state , i.e., the collection of compounds that will comprise the
Directed Diversity Library 108 for the next iteration, is initialized preferably at random. Other initialization approaches could alternatively be used, such as biased or human input. The state is initialized by selecting a set of compounds and/or a set of reagents preferably at random. - In steps1806-1816, the state is gradually refined by a series of small stochastic ‘steps’. The term ‘step’ means a stochastic (random or partially random) modification of the state's composition, i.e. the compounds comprising the state.
- In step1806, the state is modified. Modification can include sending an randomly generated state to the
Server 1204 asCompound List 1302 and receivingValues 1304 for the compounds in theCompound List 1302. The initial state can then be modified, for example, by replacing a compound currently in the state with a compound not currently in the state, or by replacing a building block of one or more compounds currently in the state. The new state can be sent to theServer 1204 asCompound List 1302 andValues 1304 can be returned for the new state. - In step1808, the quality of the new state can be assessed using the
Objective Function 105 specified by theExperiment Planner 130. The quality can be assessed by comparing the new state to the old state using the Metropolis criterion. Alternatively, any other suitable comparison criterion can be used. - In step1810, if the new state is approved, processing proceeds to step 1812, where the
Selector 106 replaces the old state with the new state. If the new state is not approved, processing proceeds to step 1814, where theSelector 106 discards the new state. - From steps1812 and 1814, processing proceeds to step 1816, where the
Selector 106 determines whether to repeat steps 1806-1814 or use the current state as the nextDirected Diversity Library 108. - Steps1806-1816 can be performed under control of a Monte-
Carlo Sampling protocol 834, aSimulated Annealing protocol 836, or variants thereof, which are well known to persons skilled in the art. However, it should be understood that the system of the present invention is not limited to these embodiments. - For example, the
Selector 106 can useEvolutionary Programming 838 orGenetic Algorithms 840, where the population of states (or chromosomes) is initialized at random and is allowed to evolve through the repeated application of genetic operators, such as crossover, mutation, and selection. The genetic operators alter the composition of the states, either individually (e.g. mutation), or by mixing elements of two or more states (e.g. crossover) in some prescribed manner. Selection is probabilistic, and is based on the relative fitness of these states as measured by theObjective Function 105. As in the case of Monte-Carlo Sampling 834 andSimulated Annealing 836 described above, the states (or chromosomes) are encoded in a manner that is most appropriate given the constraints specified by theExperiment Planner 130. - In addition to
Evolutionary Programming 838 andGenetic Algorithms 840, theSelector 106 can also use any other suitable search/optimization algorithm to identify the optimal (or a nearly optimal)Directed Diversity Library 108. - Thus, the precise encoding of a state in
step 604 can vary, depending on, among other things, theSelection Criteria 104 specified by theExperiment Planner 130. The implementation of these methods should be straightforward to persons skilled in the art. - Several examples are provided below to illustrate how one or
more Selection Criteria 104 can be combined by one or moreObjective Functions 105, and how theSelection Criteria 104 andObjective Functions 105 can be used to select aDirected Diversity Library 108 for a next iteration. These examples are provided to illustrate the present invention, not to limit it. - In the first example, the
Selector 106 usesSimulated Annealing 836 to identify a set of 50 compounds from a 10,000-membered Compound Library 102 that maximize theObjective Function 105 given by EQ. 13: - ƒ(S)=D(S) EQ. 13
- using the Molecular Diversity Criterion described in EQ. 4, and a Euclidean distance measure defined in a normalized 2-dimensional property space (in the example below, the properties of these 10,000 compounds represent uniformly distributed random deviates in the unit square). In a preferred embodiment, the system encodes a state by a pair of index lists, one containing the indices of the compounds currently in the set (Included Set), and another containing the indices of the compounds not currently in the set (Excluded Set). A step (i.e. a modification of the composition of the current state) is performed by swapping one or more indices from the Included and Excluded Sets. The search was carried out in 30 temperature cycles, using 1,000 sampling steps per cycle, an exponential cooling schedule, and the Metropolis acceptance criterion.
- The results of the simulation are shown in FIG. 14, where, as the simulation progresses, the selected compounds assume an optimal distribution, i.e. the diversity (spread) of these compounds is maximized. The set of compounds highlighted in FIG. 14 represent a
Directed Diversity Library 108 for the next iteration, selected according to the prescribedSelection Criteria 104 and theObjective Function 105 in EQ. 13. - In the second example, the
Selector 106 usesSimulated Annealing 836 to identify a set of 50 compounds from a 10,000-membered Compound Library 102 that maximize theObjective Function 105 given by EQ. 14: - ƒ(S)=−M(S,L) EQ. 14
- using the Molecular Similarity Criterion described in EQ. 8, a set of 4 reference compounds (chosen at random), and a Euclidean distance measure defined in a normalized 2-dimensional property space. As in the previous example, the properties of these 10,000 compounds represent uniformly distributed random deviates in the unit square. The search was carried out in 30 temperature cycles, using 1,000 sampling steps per cycle, an exponential cooling schedule, and the Metropolis acceptance criterion.
- The results of the simulation are shown in FIG. 14. As can be seen from FIG. 15, as the simulation progresses, the selected compounds assume an optimal distribution, i.e. the selected compounds cluster tightly around the specified reference compounds. The set of compounds highlighted in FIG. 15 represent a
Directed Diversity Library 108 for the next iteration, selected according to the prescribedSelection Criteria 104 and theObjective Function 105 in EQ. 14. - In the third example, the
Selector 106 usesSimulated Annealing 836 to identify a set of 50 compounds from a 10,000-membered Compound Library 102 that maximize theObjective Function 105 given by EQ. 15: - ƒ(S)=2D(S)−M(S,L) EQ. 15
- using the Molecular Diversity Criterion described in EQ. 4, the Molecular Similarity Criterion described in EQ. 8, a set of 4 reference compounds (chosen at random), and a Euclidean distance measure defined in a normalized 2-dimensional property space. As in the previous example, the properties of these 10,000 compounds represent uniformly distributed random deviates in the unit square. The search was carried out in 30 temperature cycles, using 1,000 sampling steps per cycle, an exponential cooling schedule, and the Metropolis acceptance criterion.
- EQ. 15 represents a Multi-Objective Function, i.e. an
Objective Function 105 that combines two, rather than one,Selection Criteria 104. TheObjective Function 105 in EQ. 15 represents anObjective Function 105 that combines molecular diversity and molecular similarity. That is, theObjective Function 105 in EQ. 15 favors solutions that are both diverse and focused. The results of the simulation are shown in FIG. 16. As can be seen from FIG. 16, as the simulation progresses, the selected compounds assume an optimal distribution, i.e. the selected compounds become both diverse and focused. The set of compounds highlighted in FIG. 16 represent aDirected Diversity Library 108 for the next iteration, selected according to the prescribedSelection Criteria 104 and theObjective Function 105 in EQ. 15. - In
optional steps Directed Diversity Library 108 are retrieved or synthesized.Steps steps Directed Diversity Library 108 were previously synthesized, they are retrieved from a chemical inventory instep 606 rather than re-synthesized; when compounds specified in theDirected Diversity Library 108 were not previously synthesized, they are synthesized instep 608. Alternatively, either ofsteps - In
optional step 606, theDirected Diversity Manager 310 retrieves compounds specified in theDirected Diversity Library 108 that are available in theChemical Inventory 110. TheChemical Inventory 110 represents any source of available compounds including, but not limited to, a corporate chemical inventory, a supplier of commercially available chemical compounds, a natural product collection, etc. - In one embodiment, the
Directed Diversity Manager 310 searches theChemical Inventory 110 to identify and retrieve existing compounds of theDirected Diversity Library 108. Alternatively, a subset of theDirected Diversity Library 108, as determined by user input, for example, can be searched for and retrieved from theChemical Inventory 110. - In
optional step 608, the compounds in theDirected Diversity Library 108 that were not retrieved from theChemical Inventory 110 instep 606, are synthesized. In one embodiment,step 608 is performed by one or more are automatedrobotic Synthesis Modules 112 that receive Robotic Synthesis Instructions 204 from theSynthesis Protocol Generator 202. - More specifically, the
Directed Diversity Manager 310 selectsReagent Data 136 from theReagent Database 138 and generates Robotic Synthesis Instructions 204. TheReagent Data 136 identifiesReagents 114 in theReagent Inventory 116 that are to be mixed by the one ormore Synthesis Modules 112. The Robotic Synthesis Instructions 204 identify the manner in whichsuch Reagents 114 are to be mixed. The manner of mixing can include identifyingReagents 114 to be mixed together, and specifying chemical and/or physical conditions for mixing, such as temperature, length of time, stirring, etc. The one ormore Synthesis Modules 112 synthesize compounds in theDirected Diversity Library 108, using selectedReagents 114 from theReagent Inventory 116, in accordance with the Robotic Synthesis Instructions 204. - In another embodiment,
optional step 608 is performed semi-automatically or manually. The chemical compounds that were retrieved from theChemical Inventory 110 and/or synthesized by the Synthesis Modules 112 (or synthesized manually) collectively represent physical compounds from aDirected Diversity Library 108. - In
step 612, one ormore Analysis Modules 118 analyze the compounds in theDirected Diversity Library 108 to obtain Structure-Property data 124, pertaining to the compounds. TheAnalysis Modules 118 receive compounds that were retrieved from theChemical Inventory 110 instep 606 and compounds that were synthesized by theSynthesis Modules 112 instep 610. - In one embodiment of
step 612, one ormore Assay Modules 402 can robotically assay the chemical compounds in theDirected Diversity Library 108 to obtainPhysical Properties Data 516,Chemical Properties Data 520 andBiological Properties Data 524, pertaining to the chemical compounds. - For example, the Enzyme
Activity Assay Module 404 can robotically assay the chemical compounds using well known assay techniques to obtain enzyme activity data relating to the compounds. Enzyme activity data can include inhibition constants Ki , maximal velocity Vmax , etc. The CellularActivity Assay Module 406 can robotically assay the compounds using well known assay techniques to obtain cellular activity data relating to the compounds. TheToxicology Assay Module 408 can robotically assay the compounds using well known assay techniques to obtain toxicology data relating to the compounds. TheBioavailability Assay Module 410 can robotically assay the compounds using well known assay techniques to obtain bioavailability data relating to the compounds. The enzyme activity data, cellular activity data, toxicology data, and bioavailability data represent thePhysical Properties Data 516,Chemical Properties Data 520 andBiological Properties Data 524. Alternatively,Physical Properties Data 516 can be obtained by the Physical and ElectronicProperty Analysis Module 416. - Also during
step 612, the Physical and ElectronicProperties Analysis Module 416 can analyze the chemical compounds contained in theDirected Diversity Library 108 to obtainElectronic Properties Data 516 pertaining to the chemical compounds. TheElectronic Properties Data 516 is stored in the Physical andElectronic Properties Database 504 duringstep 614. - Also during
step 612, the 3Dreceptor mapping module 418 can obtain 3DReceptor Map Data 522 representing the three-dimensional structure pertaining to a receptor binding site being tested. The 3DReceptor Mapping Module 418 preferably determines the three-dimensional structure of the receptor binding site empirically through X-ray crystallography, nuclear magnetic resonance spectroscopy, and/or as a result of the application of 3D QSAR and receptor field analysis procedures. TheReceptor Map Data 522 is stored in theReceptor Map Database 510 duringstep 614. - Also during
step 612, an optional Structure andComposition Analysis Module 414 can analyze the chemical compounds contained in theDirected Diversity Library 108 to obtain Structure andComposition Data 514 pertaining to the chemical compounds. The Structure andComposition Data 514 is stored in the Structure andComposition Database 502 duringstep 614. - In one embodiment,
step 612 is performed robotically, under control of one or more computer programs. Alternatively, step 612 can be performed manually or by some combination of the two. - In
step 614, the one ormore Analysis Modules 118 store the Structure-Property Data 124 obtained instep 612. The Structure-Property Data 124 can be stored in the Structure-Property Database 126 of theData Storage Device 314. The Structure-Property Database 126 can also store Historical Structure-Property Data 128. Historical Structure-Property Data 128 can be associated with chemical compounds that were synthesized and analyzed in previous iterations by theSynthesis Modules 112 and theAnalysis Modules 118, respectively. Historical Structure-Property Data 128 can also include other pertinent Structure-Property Data obtained from independent experiments. - Using the example from
step 612, thePhysical Properties Data 516 can be stored in the Physical andElectronic Properties Database 504, theChemical Properties Data 520 can be stored in theChemical Properties Database 508 and theBiological Properties Data 524 can be stored in theBiological Properties Database 512. - In one embodiment of the present invention, during execution of
steps Analysis Modules 118, as shall now be described. - Referring to FIG. 7, the process begins at
step 702, where the Structure andComposition Analysis Module 414 analyzes chemical compounds to obtain Structure andComposition Data 514. Preferably, the Structure andComposition Analysis Module 414 analyzes the chemical compounds using well known mass spectra analysis techniques. - In
step 704, the Structure andComposition Data 514 is stored in a Structure andComposition Database 502 that forms part of the Structure-Property Database 126. - In
step 706, the ChemicalSynthesis Indicia Generator 412 retrieves predicted Structure andComposition Data 514 relating to the compounds. The data is retrieved from the Structure-Property Database 126. Preferably, the retrieved data includes predicted mass and structural data for the compounds. - In
step 708, the ChemicalSynthesis Indicia Generator 412 compares the measured Structure andComposition Data 514 to the predicted data to generateChemical Synthesis Indicia 518. Based on the comparisons, theChemical Synthesis Indicia 518 identifies chemical compounds that were adequately synthesized and chemical compounds that were not adequately synthesized. - Preferably, during
step 708, the ChemicalSynthesis Indicia Generator 412 compares the measured mass of each compound to the predicted mass of the compound. If the measured mass and the predicted mass differ by less than a predetermined amount, the ChemicalSynthesis Indicia Generator 412 determines that the chemical compound was adequately synthesized. If the measured mass and the predicted mass differ by more than the predetermined amount, the ChemicalSynthesis Indicia Generator 412 determines that the chemical compound was not adequately synthesized. This predetermined amount can depend on the sensitivity of the instrument used for the structure and composition analysis. - In
step 710, the ChemicalSynthesis Indicia Generator 412 generatesChemical Synthesis Indicia 518 pertaining to the compounds in theDirected Diversity Library 108, and stores suchChemical Synthesis Indicia 518 in theChemical Synthesis Database 506. TheChemical Synthesis Indicia 518 for each compound is a first value (such as “1”) if the compound was adequately synthesized (as determined in step 708), and is a second value (such as “0”) if the compound was not adequately synthesized. - After
step 710, control passes to step 616. - In
step 616, theDirected Diversity Manager 310 compares the Structure-Property Data 124, pertaining to the compounds in theDirected Diversity Library 108, to the DesiredProperties 120. The DesiredProperties 120 might have been entered by human operators using theinput device 316, or read from a computer file. TheDirected Diversity Manager 310 compares the data to determine whether any of the compounds substantially conforms to the DesiredProperties 120. When a compound substantially conforms to the DesiredProperties 120, it can be classified as aLead compound 122. - When an insufficient number of compounds substantially exhibit the Desired
Properties 120, (i.e., an insufficient number of Lead Compounds 122), the compounds can be rated in order to selectnew Leads 122. TheDirected Diversity Manager 310 can assign one or more rating factors to each compound in theDirected Diversity Library 108, based on how closely the compound's properties match the DesiredProperties 120. The one or more rating factors can be represented by numerical or linguistic values. Numerical rating factors represent a sliding scale between a low value, corresponding to a property profile far from the Prescribed Set ofProperties 120, and a high value, corresponding to a property profile identical, or very similar, to the Prescribed Set ofProperties 120. Linguistic rating factors can include values such as “poor,” “average,” “good,” “very good,” etc. - In
optional step 618, one or more Structure-Property Models 192 are generated and/or refined. Structure-Property Models 192 are generated and/or refined to conform to observed Structure-Property Data 124 and Historical Structure-Property Data 128. The resulting Structure-Property Models 192 can be used by theExperiment Planner 130 and/or theSelector 106 to predict the properties of compounds in theCompound Library 102 whose real properties are hitherto unknown. The Structure-Property Models can be used by theExperiment Planner 130 to define and/or refine a set ofSelection Criteria 104 that depend upon the predictions of the Structure-Property Models. - Referring to the process flowchart of FIG. 17,
step 618 shall now be described in detail. The process begins atstep 1702 where one ormore Model Structures 820 are defined by Structure-Property Model Generator 800. The Structure-Property Model Generator 800 can defines ModelStructures 820 based onStatistics 802,Neural Networks 804,Fuzzy Logic 806, and/or other Model-Specific Methods 808. TheModel Structure 820 can combine elements ofStatistics 802,Neural Networks 804,Fuzzy Logic 806, and/or Model-Specific Methods 808.Such Model Structures 820 are hereafter referred to as Hybrid Model Structures or Hybrid Models. - In
step 1704, Structure-Property Model Generator 800 receives Structure-Property Data Property Data Structure Data 824 andProperty Data 828. - In
step 1706,Structure Data 824 is encoded as EncodedStructure Data 826.Structure Data 824 is encoded in a form that is appropriate for theparticular Model Structure 820. - In
step 1708,Property Data 828 is encoded as EncodedProperty Data 830.Property Data 828 is encoded in a form that is appropriate for the particular Model Structure. - In
step 1710, theTrainer 822 optimizes, or trains, theModel Structure 820 that was generated instep 1702.Trainer 822 uses EncodedStructure Data 826, and EncodedProperty Data 830 to derive one or more Structure-Property Models 842.Trainer 822 uses one or more ofGradient Minimization 832, Monte-Carlo Sampling 834,Simulated Annealing 836,Evolutionary Programming 838, and/or aGenetic Algorithm 840, depending upon the type ofStructure Model 820 that is being optimized. - After
step 1710,step 618 is complete and control passes back to step 602 for defining another set ofSelection Criteria 104 and/orObjective Functions 105 and then to step 604 for selecting anotherDirected Diversity Library 108 to analyze. TheDirected Diversity Library 108 for the next iteration can be selected using one ormore Selection Criteria 104, one or moreObjective Functions 105, and one or more selection phases. As used herein, a selection phase refers to a single run of theSelector 106 using a Monte-Carlo Sampling 834,Simulated Annealing 836,Evolutionary Programming 838, and/or aGenetic Algorithm 840. - 8. Conclusions
- The present invention has been described above with the aid of functional building blocks illustrating the performance of specified functions and relationships thereof. The boundaries of these functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternate boundaries can be defined so long as the specified functions and relationships thereof are appropriately performed. Any such alternate boundaries are thus within the scope and spirit of the claimed invention. One skilled in the art will recognize that these functional building blocks can be implemented by discrete components, application specific integrated circuits, processors executing appropriate software and the like or any combination thereof.
- While various embodiments of the present invention have been described above, it should be understood that they have been presented by way of example only, and not limitation. Thus, the breadth and scope of the present invention should not be limited by any of the above-described exemplary embodiments, but should be defined only in accordance with the following claims and their equivalents.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/170,628 US20030014191A1 (en) | 1996-11-04 | 2002-06-14 | System, method and computer program product for identifying chemical compounds having desired properties |
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US3018796P | 1996-11-04 | 1996-11-04 | |
US08/963,870 US6421612B1 (en) | 1996-11-04 | 1997-11-04 | System, method and computer program product for identifying chemical compounds having desired properties |
US10/170,628 US20030014191A1 (en) | 1996-11-04 | 2002-06-14 | System, method and computer program product for identifying chemical compounds having desired properties |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US08/963,870 Continuation US6421612B1 (en) | 1996-11-04 | 1997-11-04 | System, method and computer program product for identifying chemical compounds having desired properties |
Publications (1)
Publication Number | Publication Date |
---|---|
US20030014191A1 true US20030014191A1 (en) | 2003-01-16 |
Family
ID=21852972
Family Applications (4)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US08/963,870 Expired - Lifetime US6421612B1 (en) | 1996-11-04 | 1997-11-04 | System, method and computer program product for identifying chemical compounds having desired properties |
US08/963,872 Expired - Lifetime US6295514B1 (en) | 1996-11-04 | 1997-11-04 | Method, system, and computer program product for representing similarity/dissimilarity between chemical compounds |
US09/802,956 Expired - Fee Related US7188055B2 (en) | 1996-11-04 | 2001-03-12 | Method, system, and computer program for displaying chemical data |
US10/170,628 Abandoned US20030014191A1 (en) | 1996-11-04 | 2002-06-14 | System, method and computer program product for identifying chemical compounds having desired properties |
Family Applications Before (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US08/963,870 Expired - Lifetime US6421612B1 (en) | 1996-11-04 | 1997-11-04 | System, method and computer program product for identifying chemical compounds having desired properties |
US08/963,872 Expired - Lifetime US6295514B1 (en) | 1996-11-04 | 1997-11-04 | Method, system, and computer program product for representing similarity/dissimilarity between chemical compounds |
US09/802,956 Expired - Fee Related US7188055B2 (en) | 1996-11-04 | 2001-03-12 | Method, system, and computer program for displaying chemical data |
Country Status (7)
Country | Link |
---|---|
US (4) | US6421612B1 (en) |
EP (2) | EP0935784A2 (en) |
JP (2) | JP2001507675A (en) |
AU (2) | AU732397B2 (en) |
CA (2) | CA2269669A1 (en) |
IL (2) | IL129498A0 (en) |
WO (2) | WO1998020459A1 (en) |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040105817A1 (en) * | 2002-10-30 | 2004-06-03 | Sylvain Gilat | Identifying therapeutic compounds based on their physical-chemical properties |
US20040148265A1 (en) * | 1998-06-19 | 2004-07-29 | Schwartz Steven D. | Neural network methods to predict enzyme inhibitor or receptor ligand potency |
US20050125210A1 (en) * | 2003-11-21 | 2005-06-09 | Pearlman Robert S. | System and method for providing a canonical structural representation of chemical compounds |
US20060106487A1 (en) * | 2004-10-05 | 2006-05-18 | Allen Robert M | Programmable load forming system, components thereof, and methods of use |
US20080256016A1 (en) * | 2007-04-11 | 2008-10-16 | Oracle International Corporation | Bayes-like classifier with fuzzy likelihood |
US20080262467A1 (en) * | 2005-02-16 | 2008-10-23 | Humphrey Joseph A C | Blood Flow Bypass Catheters and Methods for the Delivery of Medium to the Vasculature and Body Ducts |
US20080306966A1 (en) * | 2000-05-05 | 2008-12-11 | Cambridgesoft Corporation | Deriving Product Information |
US20100299294A1 (en) * | 2009-05-20 | 2010-11-25 | Mott Jack E | Apparatus, system, and method for determining a partial class membership of a data record in a class |
US20110046910A1 (en) * | 2005-04-15 | 2011-02-24 | Thermo Crs Ltd. | Method and System for Sample Testing |
WO2013163068A1 (en) * | 2012-04-23 | 2013-10-31 | Targacept, Inc. | Chemical entity search, for a collaboration and content management system |
WO2015060960A3 (en) * | 2013-10-25 | 2015-06-04 | The Boeing Company | Product chemical profile system |
US9336302B1 (en) | 2012-07-20 | 2016-05-10 | Zuci Realty Llc | Insight and algorithmic clustering for automated synthesis |
US20170053218A9 (en) * | 2012-09-28 | 2017-02-23 | Rex Wiig | System and method of a requirement, compliance and resource management methodology |
WO2017074455A1 (en) * | 2015-10-30 | 2017-05-04 | Halliburton Energy Services, Inc. | Producing chemical formulations with cognitive computing |
US20190043544A1 (en) * | 2012-09-10 | 2019-02-07 | Texas Instruments Incorporated | Processing device with nonvolatile logic array backup |
US10268974B2 (en) * | 2012-09-28 | 2019-04-23 | Rex Wiig | System and method of a requirement, compliance and resource management |
US20190286792A1 (en) * | 2018-03-13 | 2019-09-19 | International Business Machines Corporation | Chemical compound discovery using machine learning technologies |
WO2021154740A1 (en) * | 2020-01-27 | 2021-08-05 | Potion Ai, Inc. | Methods, systems and apparatus for generating chemical data sequences using neural networks for de novo chemical formulations |
US11205103B2 (en) | 2016-12-09 | 2021-12-21 | The Research Foundation for the State University | Semisupervised autoencoder for sentiment analysis |
Families Citing this family (192)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5463564A (en) | 1994-09-16 | 1995-10-31 | 3-Dimensional Pharmaceuticals, Inc. | System and method of automatically generating chemical compounds with desired properties |
WO1998020459A1 (en) | 1996-11-04 | 1998-05-14 | 3-Dimensional Pharmaceuticals, Inc. | System, method, and computer program product for the visualization and interactive processing and analysis of chemical data |
US6453246B1 (en) * | 1996-11-04 | 2002-09-17 | 3-Dimensional Pharmaceuticals, Inc. | System, method, and computer program product for representing proximity data in a multi-dimensional space |
US6571227B1 (en) | 1996-11-04 | 2003-05-27 | 3-Dimensional Pharmaceuticals, Inc. | Method, system and computer program product for non-linear mapping of multi-dimensional data |
EP1007737A4 (en) * | 1997-07-25 | 2002-07-03 | Affymetrix Inc | System for providing a polymorphism database |
US6968342B2 (en) * | 1997-12-29 | 2005-11-22 | Abel Wolman | Energy minimization for data merging and fusion |
US7321828B2 (en) * | 1998-04-13 | 2008-01-22 | Isis Pharmaceuticals, Inc. | System of components for preparing oligonucleotides |
US20040186071A1 (en) | 1998-04-13 | 2004-09-23 | Bennett C. Frank | Antisense modulation of CD40 expression |
US20030228597A1 (en) * | 1998-04-13 | 2003-12-11 | Cowsert Lex M. | Identification of genetic targets for modulation by oligonucleotides and generation of oligonucleotides for gene modulation |
JP3892166B2 (en) * | 1998-09-11 | 2007-03-14 | 独立行政法人理化学研究所 | Method for predicting molecular reaction characteristics |
US6594673B1 (en) * | 1998-09-15 | 2003-07-15 | Microsoft Corporation | Visualizations for collaborative information |
US7199809B1 (en) * | 1998-10-19 | 2007-04-03 | Symyx Technologies, Inc. | Graphic design of combinatorial material libraries |
SE9804127D0 (en) | 1998-11-27 | 1998-11-27 | Astra Ab | New method |
US7912689B1 (en) | 1999-02-11 | 2011-03-22 | Cambridgesoft Corporation | Enhancing structure diagram generation through use of symmetry |
US7295931B1 (en) * | 1999-02-18 | 2007-11-13 | Cambridgesoft Corporation | Deriving fixed bond information |
US6647341B1 (en) * | 1999-04-09 | 2003-11-11 | Whitehead Institute For Biomedical Research | Methods for classifying samples and ascertaining previously unknown classes |
US7324926B2 (en) * | 1999-04-09 | 2008-01-29 | Whitehead Institute For Biomedical Research | Methods for predicting chemosensitivity or chemoresistance |
US6937330B2 (en) | 1999-04-23 | 2005-08-30 | Ppd Biomarker Discovery Sciences, Llc | Disposable optical cuvette cartridge with low fluorescence material |
US6721754B1 (en) * | 1999-04-28 | 2004-04-13 | Arena Pharmaceuticals, Inc. | System and method for database similarity join |
AU4565600A (en) * | 1999-06-18 | 2001-01-09 | Synt:Em (S.A.) | Identifying active molecules using physico-chemical parameters |
US7225172B2 (en) * | 1999-07-01 | 2007-05-29 | Yeda Research And Development Co. Ltd. | Method and apparatus for multivariable analysis of biological measurements |
US6687395B1 (en) * | 1999-07-21 | 2004-02-03 | Surromed, Inc. | System for microvolume laser scanning cytometry |
DE19936148A1 (en) * | 1999-07-31 | 2001-02-01 | Abb Research Ltd | Procedure for determining spray parameters for a paint spraying system |
US6243615B1 (en) * | 1999-09-09 | 2001-06-05 | Aegis Analytical Corporation | System for analyzing and improving pharmaceutical and other capital-intensive manufacturing processes |
US6665685B1 (en) * | 1999-11-01 | 2003-12-16 | Cambridge Soft Corporation | Deriving database interaction software |
US20020156587A1 (en) * | 2000-02-10 | 2002-10-24 | Woolf Peter James | Method of analyzing gene expression data using fuzzy logic |
US6587845B1 (en) * | 2000-02-15 | 2003-07-01 | Benjamin B. Braunheim | Method and apparatus for identification and optimization of bioactive compounds using a neural network |
US7416524B1 (en) | 2000-02-18 | 2008-08-26 | Johnson & Johnson Pharmaceutical Research & Development, L.L.C. | System, method and computer program product for fast and efficient searching of large chemical libraries |
US7113919B1 (en) * | 2000-02-29 | 2006-09-26 | Chemdomain, Inc. | System and method for configuring products over a communications network |
US6671627B2 (en) | 2000-02-29 | 2003-12-30 | 3-D Pharmaceuticals, Inc. | Method and computer program product for designing combinatorial arrays |
US6907350B2 (en) * | 2000-03-13 | 2005-06-14 | Chugai Seiyaku Kabushiki Kaisha | Method, system and apparatus for handling information on chemical substances |
US7039621B2 (en) | 2000-03-22 | 2006-05-02 | Johnson & Johnson Pharmaceutical Research & Development, L.L.C. | System, method, and computer program product for representing object relationships in a multidimensional space |
US7216113B1 (en) * | 2000-03-24 | 2007-05-08 | Symyx Technologies, Inc. | Remote Execution of Materials Library Designs |
AU2001249805A1 (en) | 2000-04-03 | 2001-10-15 | 3-Dimensional Pharmaceuticals, Inc. | Method, system, and computer program product for representing object relationships in a multidimensional space |
US7272509B1 (en) | 2000-05-05 | 2007-09-18 | Cambridgesoft Corporation | Managing product information |
AU2001261311A1 (en) * | 2000-05-09 | 2001-11-20 | Pharmacia And Upjohn | Chemical structure identification |
JP2001331509A (en) * | 2000-05-22 | 2001-11-30 | Hitachi Ltd | Relational database processor, relational database processing method, and computer-readable recording medium recorded with relational database processing program |
KR20030051435A (en) | 2000-06-19 | 2003-06-25 | 코렐로직 시스템즈, 인크. | Heuristic method of classification |
MXPA03000506A (en) * | 2000-07-18 | 2004-09-10 | Correlogic Systems Inc | A process for discriminating between biological states based on hidden patterns from biological data. |
US20020010555A1 (en) * | 2000-07-20 | 2002-01-24 | Pfizer Inc. | Ionization polarity prediction of compounds for efficient mass spectrometry |
AU2001280889A1 (en) * | 2000-07-31 | 2002-02-13 | Gene Logic, Inc. | Molecular toxicology modeling |
US7590493B2 (en) * | 2000-07-31 | 2009-09-15 | Ocimum Biosolutions, Inc. | Methods for determining hepatotoxins |
CA2419600A1 (en) * | 2000-08-22 | 2002-02-28 | 3-Dimensional Pharmaceuticals, Inc. | Method, system, and computer program product for determining properties of combinatorial library products from features of library building blocks |
WO2002025504A2 (en) | 2000-09-20 | 2002-03-28 | Lobanov Victor S | Method, system, and computer program product for encoding and building products of a virtual combinatorial library |
US6787761B2 (en) * | 2000-11-27 | 2004-09-07 | Surromed, Inc. | Median filter for liquid chromatography-mass spectrometry data |
CA2429824A1 (en) * | 2000-11-28 | 2002-06-06 | Surromed, Inc. | Methods for efficiently mining broad data sets for biological markers |
GB2375536A (en) * | 2000-12-01 | 2002-11-20 | Univ Sheffield | Combinatorial molecule design system and method |
EP1350214A4 (en) * | 2000-12-15 | 2009-06-10 | Symyx Technologies Inc | Methods and apparatus for designing high-dimensional combinatorial experiments |
US7085773B2 (en) * | 2001-01-05 | 2006-08-01 | Symyx Technologies, Inc. | Laboratory database system and methods for combinatorial materials research |
AU2002240131A1 (en) * | 2001-01-26 | 2002-08-06 | Bioinformatics Dna Codes, Llc | Modular computational models for predicting the pharmaceutical properties of chemical compounds |
US7250950B2 (en) * | 2001-01-29 | 2007-07-31 | Symyx Technologies, Inc. | Systems, methods and computer program products for determining parameters for chemical synthesis |
WO2002061419A1 (en) * | 2001-01-29 | 2002-08-08 | 3-Dimensional Pharmaceuticals, Inc. | Method, system, and computer program product for analyzing combinatorial libraries |
US7167851B2 (en) * | 2001-01-31 | 2007-01-23 | Accelrys Software Inc. | One dimensional molecular representations |
US6615211B2 (en) * | 2001-03-19 | 2003-09-02 | International Business Machines Corporation | System and methods for using continuous optimization for ordering categorical data sets |
EP1402454A2 (en) * | 2001-04-06 | 2004-03-31 | Axxima Pharmaceuticals Aktiengesellschaft | Method for generating a quantitative structure property activity relationship |
DE10119853A1 (en) | 2001-04-24 | 2003-01-09 | Bayer Ag | Hybrid model and method for determining mechanical properties and processing properties of an injection molded part |
AU2002256166A1 (en) * | 2001-04-25 | 2002-11-11 | Bristol-Myers Squibb Company | Method of molecular structure recognition |
US20020194187A1 (en) * | 2001-05-16 | 2002-12-19 | Mcneil John | Multi-paradigm knowledge-bases |
AU2002259258A1 (en) * | 2001-05-17 | 2002-11-25 | Entelos, Inc. | Apparatus and method for validating a computer model |
US7246329B1 (en) * | 2001-05-18 | 2007-07-17 | Autodesk, Inc. | Multiple menus for use with a graphical user interface |
EP1392871A4 (en) * | 2001-05-22 | 2006-04-19 | Gene Logic Inc | Molecular toxicology modeling |
US20070015146A1 (en) * | 2001-05-22 | 2007-01-18 | Gene Logic, Inc. | Molecular nephrotoxicology modeling |
US6584413B1 (en) * | 2001-06-01 | 2003-06-24 | Sandia Corporation | Apparatus and system for multivariate spectral analysis |
WO2003001391A1 (en) * | 2001-06-21 | 2003-01-03 | Bell Robert A | Method and apparatus for spatially coordinating, storing and manipulating computer aided design drawings |
US20070054269A1 (en) * | 2001-07-10 | 2007-03-08 | Mendrick Donna L | Molecular cardiotoxicology modeling |
US7447594B2 (en) * | 2001-07-10 | 2008-11-04 | Ocimum Biosolutions, Inc. | Molecular cardiotoxicology modeling |
WO2003068908A2 (en) * | 2001-07-10 | 2003-08-21 | Gene Logic, Inc. | Cardiotoxin molecular toxicology modeling |
US20030018598A1 (en) * | 2001-07-19 | 2003-01-23 | Cawse James Norman | Neural network method and system |
US6873915B2 (en) * | 2001-08-24 | 2005-03-29 | Surromed, Inc. | Peak selection in multidimensional data |
US6954744B2 (en) * | 2001-08-29 | 2005-10-11 | Honeywell International, Inc. | Combinatorial approach for supervised neural network learning |
US7106903B2 (en) * | 2001-09-24 | 2006-09-12 | Vima Technologies, Inc. | Dynamic partial function in measurement of similarity of objects |
US6835927B2 (en) * | 2001-10-15 | 2004-12-28 | Surromed, Inc. | Mass spectrometric quantification of chemical mixture components |
US20050010603A1 (en) * | 2001-10-31 | 2005-01-13 | Berks Andrew H. | Display for Markush chemical structures |
DE10156245A1 (en) * | 2001-11-15 | 2003-06-05 | Bayer Ag | Methods for the identification of pharmacophores |
US7363311B2 (en) * | 2001-11-16 | 2008-04-22 | Nippon Telegraph And Telephone Corporation | Method of, apparatus for, and computer program for mapping contents having meta-information |
US20030139907A1 (en) * | 2002-01-24 | 2003-07-24 | Mccarthy Robert J | System, Method, and Product for Nanoscale Modeling, Analysis, Simulation, and Synthesis (NMASS) |
CA2471661A1 (en) * | 2002-01-31 | 2003-08-07 | Gene Logic, Inc. | Molecular hepatotoxicology modeling |
KR20030066095A (en) * | 2002-02-04 | 2003-08-09 | 주식회사 넥스트테크 | Chemical Information providing system on search engine for development of new-material |
DE10209146A1 (en) * | 2002-03-01 | 2003-09-18 | Bayer Ag | Method and system for the automatic planning of experiments |
US20030168585A1 (en) * | 2002-03-05 | 2003-09-11 | Michael Wall | Determination of sample purity through mass spectroscopy analysis |
US20050177280A1 (en) * | 2002-03-22 | 2005-08-11 | Morphochem Aktiengesellschaft Fur Kombinatorische Chemie | Methods and systems for discovery of chemical compounds and their syntheses |
CA2480202A1 (en) * | 2002-04-10 | 2003-10-23 | Transtech Pharma, Inc. | System and method for data analysis, manipulation, and visualization |
AU2003239409A1 (en) * | 2002-05-09 | 2003-11-11 | Surromed, Inc. | Methods for time-alignment of liquid chromatography-mass spectrometry data |
US7805437B1 (en) * | 2002-05-15 | 2010-09-28 | Spotfire Ab | Interactive SAR table |
US7046247B2 (en) * | 2002-05-15 | 2006-05-16 | Hewlett-Packard Development Company, L.P. | Method for visualizing graphical data sets having a non-uniform graphical density for display |
IL164829A0 (en) * | 2002-05-22 | 2005-12-18 | Grant Plummer | Processing system for remote chemical identification |
CA2487454A1 (en) * | 2002-05-28 | 2003-12-04 | The Trustees Of The University Of Pennsylvania | Methods, systems, and computer program products for computational analysis and design of amphiphilic polymers |
US7395160B2 (en) * | 2002-07-29 | 2008-07-01 | Correlogic Systems, Inc. | Quality assurance/quality control for electrospray ionization processes |
AT412678B (en) * | 2002-09-30 | 2005-05-25 | Gerhard Dr Kranner | METHOD FOR COMPUTER-ASSISTED PREPARATION OF PROGNOSES FOR OPERATIONAL SYSTEMS AND SYSTEM FOR CREATING PROGNOSES FOR OPERATIONAL SYSTEMS |
US6947579B2 (en) | 2002-10-07 | 2005-09-20 | Technion Research & Development Foundation Ltd. | Three-dimensional face recognition |
US7580903B2 (en) * | 2002-10-24 | 2009-08-25 | Complex Systems Engineering, Inc. | Process for the creation of fuzzy cognitive maps from Monte Carlo simulation generated Meta Model |
TW200411574A (en) * | 2002-12-31 | 2004-07-01 | Ind Tech Res Inst | Artificial intelligent system for classification of protein family |
US7213034B2 (en) * | 2003-01-24 | 2007-05-01 | Symyx Technologies, Inc. | User-configurable generic experiment class for combinatorial materials research |
US7013238B1 (en) * | 2003-02-24 | 2006-03-14 | Microsoft Corporation | System for delivering recommendations |
WO2005007806A2 (en) * | 2003-05-07 | 2005-01-27 | Duke University | Protein design for receptor-ligand recognition and binding |
CA2466792A1 (en) * | 2003-05-16 | 2004-11-16 | Affinium Pharmaceuticals, Inc. | Evaluation of spectra |
US7853406B2 (en) * | 2003-06-13 | 2010-12-14 | Entelos, Inc. | Predictive toxicology for biological systems |
GB2403636A (en) * | 2003-07-02 | 2005-01-05 | Sony Uk Ltd | Information retrieval using an array of nodes |
WO2005008444A2 (en) * | 2003-07-14 | 2005-01-27 | Matt Pallakoff | System and method for a portbale multimedia client |
MXPA06001170A (en) * | 2003-08-01 | 2006-05-15 | Correlogic Systems Inc | Multiple high-resolution serum proteomic features for ovarian cancer detection. |
CA2533922A1 (en) * | 2003-08-07 | 2005-02-17 | Gene Logic, Inc. | Primary rat hepatocyte toxicity modeling |
US20050065733A1 (en) * | 2003-08-08 | 2005-03-24 | Paul Caron | Visualization of databases |
US20050079476A1 (en) * | 2003-10-10 | 2005-04-14 | Sutherland Scot M. | Method of predictive assessment |
US20050114331A1 (en) * | 2003-11-26 | 2005-05-26 | International Business Machines Corporation | Near-neighbor search in pattern distance spaces |
WO2005052819A1 (en) * | 2003-11-28 | 2005-06-09 | Fujitsu Limited | Material name setting support device, material name setting support program, and material name setting support method |
EP1709442A4 (en) * | 2003-12-11 | 2010-01-20 | Correlogic Systems Inc | Method of diagnosing biological states through the use of a centralized, adaptive model, and remote sample processing |
WO2005059779A2 (en) * | 2003-12-16 | 2005-06-30 | Symyx Technologies, Inc. | Indexing scheme for formulation workflows |
US20050177318A1 (en) * | 2004-02-10 | 2005-08-11 | National Institute Of Statistical Sciences | Methods, systems and computer program products for identifying pharmacophores in molecules using inferred conformations and inferred feature importance |
US20080281526A1 (en) * | 2004-03-22 | 2008-11-13 | Diggans James C | Methods For Molecular Toxicology Modeling |
US20050222828A1 (en) * | 2004-04-02 | 2005-10-06 | Ehtibar Dzhafarov | Method for computing subjective dissimilarities among discrete entities |
US7248360B2 (en) * | 2004-04-02 | 2007-07-24 | Ppd Biomarker Discovery Sciences, Llc | Polychronic laser scanning system and method of use |
CA2562343A1 (en) * | 2004-04-07 | 2005-10-27 | Gene Logic, Inc. | Hepatotoxicity molecular models |
US20050278308A1 (en) * | 2004-06-01 | 2005-12-15 | Barstow James F | Methods and systems for data integration |
EP1603035A3 (en) * | 2004-06-03 | 2008-03-05 | MDL Information Systems, Inc. | Methods and apparatus for visual application design |
US7702467B2 (en) * | 2004-06-29 | 2010-04-20 | Numerate, Inc. | Molecular property modeling using ranking |
US20060052943A1 (en) * | 2004-07-28 | 2006-03-09 | Karthik Ramani | Architectures, queries, data stores, and interfaces for proteins and drug molecules |
US20060031027A1 (en) * | 2004-08-03 | 2006-02-09 | Alman David H | Method and apparatus for predicting properties of a chemical mixture |
NZ553527A (en) | 2004-09-15 | 2009-12-24 | Bp Oil Int | Process for evaluating a refinery feedstock |
DE102006001780A1 (en) * | 2005-01-14 | 2006-08-24 | Siemens Corp. Research, Inc. | Method for diagnosis of amylotrophic lateral sclerosis, comprising surface-enhanced desorption-ionisation mass spectrometry of proteins from patients and analysing peak values on an alternating decision tree |
WO2006081428A2 (en) * | 2005-01-27 | 2006-08-03 | Symyx Technologies, Inc. | Parser for generating structure data |
US20070003996A1 (en) * | 2005-02-09 | 2007-01-04 | Hitt Ben A | Identification of bacteria and spores |
FI20055198A (en) * | 2005-04-28 | 2006-10-29 | Valtion Teknillinen | Visualization technology for biological information |
US20080312514A1 (en) * | 2005-05-12 | 2008-12-18 | Mansfield Brian C | Serum Patterns Predictive of Breast Cancer |
WO2007022110A2 (en) * | 2005-08-12 | 2007-02-22 | Symyx Technologies, Inc. | Event-based library process design |
US20070150424A1 (en) * | 2005-12-22 | 2007-06-28 | Pegasus Technologies, Inc. | Neural network model with clustering ensemble approach |
US7978889B2 (en) * | 2006-01-27 | 2011-07-12 | Michael Valdiserri | Automatic engine for 3D object generation from volumetric scan data and method |
US20080015833A1 (en) * | 2006-07-13 | 2008-01-17 | Searete Llc, A Limited Liability Corporation Of The State Of Delaware | Methods and systems for molecular inhibition of protein misfolding |
US20080015834A1 (en) * | 2006-07-13 | 2008-01-17 | Searete Llc, A Limited Liability Corporation Of The State Of Delaware | Methods and systems for molecular inhibition |
US20090082344A1 (en) * | 2006-07-13 | 2009-03-26 | Searete Llc | Methods and systems for treating disease |
US20080014572A1 (en) * | 2006-07-13 | 2008-01-17 | Searete Llc, A Limited Liability Corporation Of The State Of Delaware | Methods and systems for molecular inhibition |
US20080165135A1 (en) * | 2007-01-10 | 2008-07-10 | Jao-Ching Lin | Functional expansion system for a touch pad |
WO2008096424A1 (en) * | 2007-02-07 | 2008-08-14 | Fujitsu Limited | Molecule designing method, program, and storage medium |
WO2008100633A1 (en) * | 2007-02-15 | 2008-08-21 | The Board Of Trustees Of The Leland Stanford Junior University | Precursor selection method for chemical vapor deposition techniques |
US7785218B2 (en) * | 2007-03-26 | 2010-08-31 | Acushnet Company | Custom milled iron set |
WO2008116495A1 (en) * | 2007-03-26 | 2008-10-02 | Molcode Ltd | Method and apparatus for the design of chemical compounds with predetermined properties |
US8645440B2 (en) | 2007-06-11 | 2014-02-04 | Guy Rosman | Acceleration of multidimensional scaling by vector extrapolation techniques |
MY150234A (en) | 2007-06-29 | 2013-12-31 | Ahn Gook Pharmaceutical Company Ltd | Predictive markers for ovarian cancer |
JP5052985B2 (en) * | 2007-07-31 | 2012-10-17 | 住友重機械工業株式会社 | Molecular simulation method, molecular simulation apparatus, molecular simulation program, and recording medium recording the program |
US8993714B2 (en) * | 2007-10-26 | 2015-03-31 | Imiplex Llc | Streptavidin macromolecular adaptor and complexes thereof |
USRE47320E1 (en) | 2007-11-20 | 2019-03-26 | Ionis Pharmaceuticals, Inc. | Modulation of CD40 expression |
US9087164B2 (en) * | 2008-01-26 | 2015-07-21 | National Semiconductor Corporation | Visualization of tradeoffs between circuit designs |
US7966588B1 (en) | 2008-01-26 | 2011-06-21 | National Semiconductor Corporation | Optimization of electrical circuits |
US20090228445A1 (en) * | 2008-03-04 | 2009-09-10 | Systems Biology (1) Pvt. Ltd. | Automated molecular mining and activity prediction using xml schema, xml queries, rule inference and rule engines |
US8032469B2 (en) * | 2008-05-06 | 2011-10-04 | Microsoft Corporation | Recommending similar content identified with a neural network |
US20100030035A1 (en) * | 2008-08-04 | 2010-02-04 | The Hong Kong Polytechnic University | Fuzzy system for cardiovascular disease and stroke risk assessment |
US9102526B2 (en) | 2008-08-12 | 2015-08-11 | Imiplex Llc | Node polypeptides for nanostructure assembly |
US20100070200A1 (en) * | 2008-09-17 | 2010-03-18 | Mehmet Sarikaya | Method and system for designing polypeptides and polypeptide-like polymers with specific chemical and physical characteristics |
WO2010132363A1 (en) | 2009-05-11 | 2010-11-18 | Imiplex Llc | Method of protein nanostructure fabrication |
US20110202328A1 (en) * | 2009-10-02 | 2011-08-18 | Exxonmobil Research And Engineering Company | System for the determination of selective absorbent molecules through predictive correlations |
US8672685B2 (en) * | 2009-10-07 | 2014-03-18 | Bitwixt Software Systems Llc | Electron configuration teaching systems and methods |
GB2475473B (en) * | 2009-11-04 | 2015-10-21 | Nds Ltd | User request based content ranking |
US8954893B2 (en) * | 2009-11-06 | 2015-02-10 | Hewlett-Packard Development Company, L.P. | Visually representing a hierarchy of category nodes |
WO2011081950A1 (en) * | 2009-12-14 | 2011-07-07 | Massachussets Institute Of Technology | Methods, systems and media utilizing ranking techniques in machine learning |
US8530838B2 (en) * | 2009-12-29 | 2013-09-10 | Saint-Gobain Ceramics & Plastics, Inc. | Radiation detection system and method of indicating presence of radiation |
JP2011160379A (en) * | 2010-02-04 | 2011-08-18 | Sony Corp | Image processing device and method, and program therefor |
AU2011214465A1 (en) | 2010-02-10 | 2012-08-30 | Novartis Ag | Methods and compounds for muscle growth |
US8712741B2 (en) | 2010-06-28 | 2014-04-29 | National Semiconductor Corporation | Power supply architecture system designer |
CN102541286B (en) * | 2010-12-24 | 2015-09-16 | 北大方正集团有限公司 | For building the method and apparatus of organic chemical structural formula |
CN102566876B (en) * | 2010-12-24 | 2015-03-25 | 北大方正集团有限公司 | Method and device for switching focus of organic chemical structural formula |
CN102541423A (en) * | 2010-12-24 | 2012-07-04 | 北大方正集团有限公司 | Method and device for compiling organic chemical structural formulas |
US9218460B2 (en) * | 2011-05-09 | 2015-12-22 | The Regents Of The University Of California | Defining and mining a joint pharmacophoric space through geometric features |
CN102855230A (en) * | 2011-06-30 | 2013-01-02 | 北大方正集团有限公司 | Method and device for editing organic chemical structural formula |
US20130041894A1 (en) * | 2011-08-10 | 2013-02-14 | International Business Machines Corporation | Mitigating Environment, Health, and Safety Complications |
CN103150296B (en) * | 2011-12-06 | 2016-01-20 | 北大方正集团有限公司 | The edit methods of atom belonging and device |
EP2828779B1 (en) * | 2012-03-21 | 2019-05-08 | Zymeworks Inc. | Systems and methods for making two dimensional graphs of macromolecules |
US10168885B2 (en) | 2012-03-21 | 2019-01-01 | Zymeworks Inc. | Systems and methods for making two dimensional graphs of complex molecules |
KR102029055B1 (en) * | 2013-02-08 | 2019-10-07 | 삼성전자주식회사 | Method and apparatus for high-dimensional data visualization |
CN105593861A (en) * | 2013-10-23 | 2016-05-18 | 陶氏环球技术有限责任公司 | Methods, systems, and devices for designing molecules |
US10733499B2 (en) | 2014-09-02 | 2020-08-04 | University Of Kansas | Systems and methods for enhancing computer assisted high throughput screening processes |
KR101684742B1 (en) | 2014-11-27 | 2016-12-09 | 이화여자대학교 산학협력단 | Method and system for drug virtual screening and construction of focused screening library |
WO2017161250A1 (en) | 2016-03-17 | 2017-09-21 | Elsevier, Inc. | Systems and methods for electronic searching of materials and material properties |
US10915808B2 (en) * | 2016-07-05 | 2021-02-09 | International Business Machines Corporation | Neural network for chemical compounds |
WO2018018025A1 (en) * | 2016-07-21 | 2018-01-25 | Ayasdi, Inc. | Topological data analysis of data from a fact table and related dimension tables |
US10998087B2 (en) | 2016-08-25 | 2021-05-04 | The Government of the United States of Amercia as represented by the Secretary of Homeland Security | Systems and methodologies for desigining simulant compounds |
WO2018098588A1 (en) * | 2016-12-02 | 2018-06-07 | Lumiant Corporation | Computer systems for and methods of identifying non-elemental materials based on atomistic properties |
US10832800B2 (en) | 2017-01-03 | 2020-11-10 | International Business Machines Corporation | Synthetic pathway engine |
US10430395B2 (en) | 2017-03-01 | 2019-10-01 | International Business Machines Corporation | Iterative widening search for designing chemical compounds |
JP2020513126A (en) * | 2017-04-03 | 2020-04-30 | アメリカン ケミカル ソサイエティ | System and method for query and index optimization for retrieving data in an instance of a formulation data structure from a database |
EA201992476A1 (en) * | 2017-04-18 | 2020-02-25 | Икс-Чем, Инк. | METHODS FOR IDENTIFICATION OF COMPOUNDS |
JP7201981B2 (en) * | 2017-06-30 | 2023-01-11 | 学校法人 明治薬科大学 | Prediction device, prediction method and prediction program |
CN111201085B (en) * | 2017-06-30 | 2022-05-13 | 思研(Sri)国际顾问与咨询公司 | Apparatus for reaction screening and optimization and method thereof |
US10229092B2 (en) | 2017-08-14 | 2019-03-12 | City University Of Hong Kong | Systems and methods for robust low-rank matrix approximation |
US10426424B2 (en) | 2017-11-21 | 2019-10-01 | General Electric Company | System and method for generating and performing imaging protocol simulations |
CN109539596B (en) * | 2018-11-28 | 2020-10-23 | 西安工程大学 | GA-GRNN-based solar heat collection system photo-thermal efficiency prediction method |
JP7330712B2 (en) * | 2019-02-12 | 2023-08-22 | 株式会社日立製作所 | Material property prediction device and material property prediction method |
CN110444250A (en) * | 2019-03-26 | 2019-11-12 | 广东省微生物研究所(广东省微生物分析检测中心) | High-throughput drug virtual screening system based on molecular fingerprint and deep learning |
US20200372977A1 (en) * | 2019-05-22 | 2020-11-26 | International Business Machines Corporation | Automated transitive read-behind analysis in big data toxicology |
US11710261B2 (en) * | 2019-07-29 | 2023-07-25 | University Of Southern California | Scan-specific recurrent neural network for image reconstruction |
EP4206971A4 (en) * | 2020-08-31 | 2024-03-06 | Panasonic Ip Man Co Ltd | Property display method, property display device, information processing device, and program |
US20220092454A1 (en) * | 2020-09-24 | 2022-03-24 | Microsoft Technology Licensing, Llc | Mixing techniques for probabilistic quantum circuits with fallback |
WO2022130648A1 (en) | 2020-12-18 | 2022-06-23 | 富士通株式会社 | Information processing program, information processing method, and information processing device |
JP2022118555A (en) * | 2021-02-02 | 2022-08-15 | 富士通株式会社 | Optimization device, optimization method, and optimization program |
Citations (60)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US543796A (en) * | 1895-07-30 | Apparatus for separating dust | ||
US4773099A (en) * | 1985-10-10 | 1988-09-20 | The Palantir Corporation | Pattern classification means for use in a pattern recognition system |
US4811217A (en) * | 1985-03-29 | 1989-03-07 | Japan Association For International Chemical Information | Method of storing and searching chemical structure data |
US4859736A (en) * | 1987-03-30 | 1989-08-22 | Ciba-Geigy Corporation | Synthetic polystyrene resin and its use in solid phase peptide synthesis |
US4908773A (en) * | 1987-04-06 | 1990-03-13 | Genex Corporation | Computer designed stabilized proteins and method for producing same |
US4935875A (en) * | 1987-12-02 | 1990-06-19 | Data Chem, Inc. | Chemical analyzer |
US4939666A (en) * | 1987-09-02 | 1990-07-03 | Genex Corporation | Incremental macromolecule construction methods |
US5010175A (en) * | 1988-05-02 | 1991-04-23 | The Regents Of The University Of California | General method for producing and selecting peptides with specific properties |
US5025388A (en) * | 1988-08-26 | 1991-06-18 | Cramer Richard D Iii | Comparative molecular field analysis (CoMFA) |
US5095443A (en) * | 1988-10-07 | 1992-03-10 | Ricoh Company, Ltd. | Plural neural network system having a successive approximation learning method |
US5155801A (en) * | 1990-10-09 | 1992-10-13 | Hughes Aircraft Company | Clustered neural networks |
US5167009A (en) * | 1990-08-03 | 1992-11-24 | E. I. Du Pont De Nemours & Co. (Inc.) | On-line process control neural network using data pointers |
US5181259A (en) * | 1990-09-25 | 1993-01-19 | The United States Of America As Represented By The Administrator Of The National Aeronautics And Space Administration | General method of pattern classification using the two domain theory |
US5240680A (en) * | 1991-12-19 | 1993-08-31 | Chiron Corporation | Automated apparatus for use in peptide synthesis |
US5260882A (en) * | 1991-01-02 | 1993-11-09 | Rohm And Haas Company | Process for the estimation of physical and chemical properties of a proposed polymeric or copolymeric substance or material |
US5265030A (en) * | 1990-04-24 | 1993-11-23 | Scripps Clinic And Research Foundation | System and method for determining three-dimensional structures of proteins |
US5270170A (en) * | 1991-10-16 | 1993-12-14 | Affymax Technologies N.V. | Peptide library and screening method |
US5288514A (en) * | 1992-09-14 | 1994-02-22 | The Regents Of The University Of California | Solid phase and combinatorial synthesis of benzodiazepine compounds on a solid support |
US5323471A (en) * | 1991-09-12 | 1994-06-21 | Atr Auditory And Visual Perception Research Laboratories | Pattern recognition apparatus and pattern learning apparatus employing neural net including excitatory element-inhibitory element pair couplings |
US5331573A (en) * | 1990-12-14 | 1994-07-19 | Balaji Vitukudi N | Method of design of compounds that mimic conformational features of selected peptides |
US5434796A (en) * | 1993-06-30 | 1995-07-18 | Daylight Chemical Information Systems, Inc. | Method and apparatus for designing molecules with desired properties by evolving successive populations |
US5436850A (en) * | 1991-07-11 | 1995-07-25 | The Regents Of The University Of California | Method to identify protein sequences that fold into a known three-dimensional structure |
US5442122A (en) * | 1992-11-09 | 1995-08-15 | Shimadzu Corporation | Dibenzosuberyl and dibenzosuberenyl derivatives |
US5463564A (en) * | 1994-09-16 | 1995-10-31 | 3-Dimensional Pharmaceuticals, Inc. | System and method of automatically generating chemical compounds with desired properties |
US5499193A (en) * | 1991-04-17 | 1996-03-12 | Takeda Chemical Industries, Ltd. | Automated synthesis apparatus and method of controlling the apparatus |
US5519635A (en) * | 1993-09-20 | 1996-05-21 | Hitachi Ltd. | Apparatus for chemical analysis with detachable analytical units |
US5524065A (en) * | 1992-02-07 | 1996-06-04 | Canon Kabushiki Kaisha | Method and apparatus for pattern recognition |
US5526281A (en) * | 1993-05-21 | 1996-06-11 | Arris Pharmaceutical Corporation | Machine-learning approach to modeling biological activity for molecular design and to modeling other characteristics |
US5549974A (en) * | 1994-06-23 | 1996-08-27 | Affymax Technologies Nv | Methods for the solid phase synthesis of thiazolidinones, metathiazanones, and derivatives thereof |
US5553225A (en) * | 1994-10-25 | 1996-09-03 | International Business Machines Corporation | Method and apparatus for combining a zoom function in scroll bar sliders |
US5565325A (en) * | 1992-10-30 | 1996-10-15 | Bristol-Myers Squibb Company | Iterative methods for screening peptide libraries |
US5585277A (en) * | 1993-06-21 | 1996-12-17 | Scriptgen Pharmaceuticals, Inc. | Screening method for identifying ligands for target proteins |
US5598510A (en) * | 1993-10-18 | 1997-01-28 | Loma Linda University Medical Center | Self organizing adaptive replicate (SOAR) |
US5602938A (en) * | 1994-05-20 | 1997-02-11 | Nippon Telegraph And Telephone Corporation | Method of generating dictionary for pattern recognition and pattern recognition method using the same |
US5602755A (en) * | 1995-06-23 | 1997-02-11 | Exxon Research And Engineering Company | Method for predicting chemical or physical properties of complex mixtures |
US5621861A (en) * | 1993-07-27 | 1997-04-15 | Matsushita Electric Industrial Co., Ltd. | Method of reducing amount of data required to achieve neural network learning |
US5634017A (en) * | 1994-09-22 | 1997-05-27 | International Business Machines Corporation | Computer system and method for processing atomic data to calculate and exhibit the properties and structure of matter based on relativistic models |
US5635598A (en) * | 1993-06-21 | 1997-06-03 | Selectide Corporation | Selectively cleavabe linners based on iminodiacetic acid esters for solid phase peptide synthesis |
US5670326A (en) * | 1994-04-05 | 1997-09-23 | Pharmagenics, Inc. | Reiterative method for screening combinatorial libraries |
US5679582A (en) * | 1993-06-21 | 1997-10-21 | Scriptgen Pharmaceuticals, Inc. | Screening method for identifying ligands for target proteins |
US5703792A (en) * | 1993-05-21 | 1997-12-30 | Arris Pharmaceutical Corporation | Three dimensional measurement of molecular diversity |
US5712565A (en) * | 1994-06-22 | 1998-01-27 | Seagate Technology, Inc. | MR sensor having thick active region between two thinner inactive MR regions topped with respective permanent magnets |
US5712171A (en) * | 1995-01-20 | 1998-01-27 | Arqule, Inc. | Method of generating a plurality of chemical compounds in a spatially arranged array |
US5734796A (en) * | 1995-09-29 | 1998-03-31 | Ai Ware, Inc. | Self-organization of pattern data with dimension reduction through learning of non-linear variance-constrained mapping |
US5740326A (en) * | 1994-07-28 | 1998-04-14 | International Business Machines Corporation | Circuit for searching/sorting data in neural networks |
US5789160A (en) * | 1990-06-11 | 1998-08-04 | Nexstar Pharmaceuticals, Inc. | Parallel selex |
US5807754A (en) * | 1995-05-11 | 1998-09-15 | Arqule, Inc. | Combinatorial synthesis and high-throughput screening of a Rev-inhibiting arylidenediamide array |
US5811241A (en) * | 1995-09-13 | 1998-09-22 | Cortech, Inc. | Method for preparing and identifying N-substitued 1,4-piperazines and N-substituted 1,4-piperazinediones |
US5832494A (en) * | 1993-06-14 | 1998-11-03 | Libertech, Inc. | Method and apparatus for indexing, searching and displaying data |
US5861532A (en) * | 1997-03-04 | 1999-01-19 | Chiron Corporation | Solid-phase synthesis of N-alkyl amides |
US5875284A (en) * | 1990-03-12 | 1999-02-23 | Fujitsu Limited | Neuro-fuzzy-integrated data processing system |
US5908960A (en) * | 1997-05-07 | 1999-06-01 | Smithkline Beecham Corporation | Compounds |
US5933819A (en) * | 1997-05-23 | 1999-08-03 | The Scripps Research Institute | Prediction of relative binding motifs of biologically active peptides and peptide mimetics |
US6014661A (en) * | 1996-05-06 | 2000-01-11 | Ivee Development Ab | System and method for automatic analysis of data bases and for user-controlled dynamic querying |
US6026397A (en) * | 1996-05-22 | 2000-02-15 | Electronic Data Systems Corporation | Data analysis system and method |
US6037135A (en) * | 1992-08-07 | 2000-03-14 | Epimmune Inc. | Methods for making HLA binding peptides and their uses |
US6049797A (en) * | 1998-04-07 | 2000-04-11 | Lucent Technologies, Inc. | Method, apparatus and programmed medium for clustering databases with categorical attributes |
US6185506B1 (en) * | 1996-01-26 | 2001-02-06 | Tripos, Inc. | Method for selecting an optimally diverse library of small molecules based on validated molecular structural descriptors |
US6295514B1 (en) * | 1996-11-04 | 2001-09-25 | 3-Dimensional Pharmaceuticals, Inc. | Method, system, and computer program product for representing similarity/dissimilarity between chemical compounds |
US6571227B1 (en) * | 1996-11-04 | 2003-05-27 | 3-Dimensional Pharmaceuticals, Inc. | Method, system and computer program product for non-linear mapping of multi-dimensional data |
Family Cites Families (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4939668A (en) * | 1987-08-24 | 1990-07-03 | International Business Machines Corp. | System for designing intercommunications networks |
US5993819A (en) | 1987-09-08 | 1999-11-30 | Duke University | Synthetic vaccine for protection against human immunodeficiency virus infection |
US5147608A (en) | 1988-04-29 | 1992-09-15 | Millipore Corporation | Apparatus and process for performing repetitive chemical processing |
DE58906153D1 (en) | 1988-08-24 | 1993-12-16 | Siemens Ag | Process for the chemical decontamination of the surface of a metallic component of a nuclear reactor plant. |
IE66205B1 (en) | 1990-06-14 | 1995-12-13 | Paul A Bartlett | Polypeptide analogs |
US5650489A (en) | 1990-07-02 | 1997-07-22 | The Arizona Board Of Regents | Random bio-oligomer library, a method of synthesis thereof, and a method of use thereof |
US5573905A (en) | 1992-03-30 | 1996-11-12 | The Scripps Research Institute | Encoded combinatorial chemical libraries |
US5699268A (en) * | 1995-03-24 | 1997-12-16 | University Of Guelph | Computational method for designing chemical structures having common functional characteristics |
CA2203832A1 (en) | 1995-09-08 | 1997-03-13 | Jaime E. Arenas | Screen for compounds with affinity for rna |
CA2184195C (en) | 1995-10-25 | 2002-04-16 | Andrew Pakula | Screening method for identifying ligands for target proteins |
CA2236339A1 (en) | 1995-12-07 | 1997-06-12 | Robert T. Sauer | A fluorescence-based screening method for identifying ligands |
US5712564A (en) | 1995-12-29 | 1998-01-27 | Unisys Corporation | Magnetic ink recorder calibration apparatus and method |
WO1997027559A1 (en) | 1996-01-26 | 1997-07-31 | Patterson David E | Method of creating and searching a molecular virtual library using validated molecular structure descriptors |
EP0818744A3 (en) | 1996-07-08 | 1998-07-08 | Proteus Molecular Design Limited | Process for selecting candidate drug compounds |
-
1997
- 1997-11-04 WO PCT/US1997/020919 patent/WO1998020459A1/en not_active Application Discontinuation
- 1997-11-04 AU AU54407/98A patent/AU732397B2/en not_active Ceased
- 1997-11-04 US US08/963,870 patent/US6421612B1/en not_active Expired - Lifetime
- 1997-11-04 AU AU51800/98A patent/AU722989B2/en not_active Ceased
- 1997-11-04 IL IL12949897A patent/IL129498A0/en unknown
- 1997-11-04 WO PCT/US1997/020918 patent/WO1998020437A2/en not_active Application Discontinuation
- 1997-11-04 CA CA002269669A patent/CA2269669A1/en not_active Abandoned
- 1997-11-04 IL IL12972897A patent/IL129728A0/en unknown
- 1997-11-04 EP EP97948320A patent/EP0935784A2/en not_active Withdrawn
- 1997-11-04 JP JP52190298A patent/JP2001507675A/en not_active Withdrawn
- 1997-11-04 JP JP52190398A patent/JP2001503546A/en not_active Withdrawn
- 1997-11-04 CA CA002270527A patent/CA2270527A1/en not_active Abandoned
- 1997-11-04 EP EP97946679A patent/EP0935789A1/en not_active Withdrawn
- 1997-11-04 US US08/963,872 patent/US6295514B1/en not_active Expired - Lifetime
-
2001
- 2001-03-12 US US09/802,956 patent/US7188055B2/en not_active Expired - Fee Related
-
2002
- 2002-06-14 US US10/170,628 patent/US20030014191A1/en not_active Abandoned
Patent Citations (72)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US543796A (en) * | 1895-07-30 | Apparatus for separating dust | ||
US4811217A (en) * | 1985-03-29 | 1989-03-07 | Japan Association For International Chemical Information | Method of storing and searching chemical structure data |
US4773099A (en) * | 1985-10-10 | 1988-09-20 | The Palantir Corporation | Pattern classification means for use in a pattern recognition system |
US4859736A (en) * | 1987-03-30 | 1989-08-22 | Ciba-Geigy Corporation | Synthetic polystyrene resin and its use in solid phase peptide synthesis |
US4908773A (en) * | 1987-04-06 | 1990-03-13 | Genex Corporation | Computer designed stabilized proteins and method for producing same |
US4939666A (en) * | 1987-09-02 | 1990-07-03 | Genex Corporation | Incremental macromolecule construction methods |
US4935875A (en) * | 1987-12-02 | 1990-06-19 | Data Chem, Inc. | Chemical analyzer |
US5010175A (en) * | 1988-05-02 | 1991-04-23 | The Regents Of The University Of California | General method for producing and selecting peptides with specific properties |
US5025388A (en) * | 1988-08-26 | 1991-06-18 | Cramer Richard D Iii | Comparative molecular field analysis (CoMFA) |
US5307287A (en) * | 1988-08-26 | 1994-04-26 | Tripos Associates, Inc. | Comparative molecular field analysis (COMFA) |
US5095443A (en) * | 1988-10-07 | 1992-03-10 | Ricoh Company, Ltd. | Plural neural network system having a successive approximation learning method |
US5875284A (en) * | 1990-03-12 | 1999-02-23 | Fujitsu Limited | Neuro-fuzzy-integrated data processing system |
US5265030A (en) * | 1990-04-24 | 1993-11-23 | Scripps Clinic And Research Foundation | System and method for determining three-dimensional structures of proteins |
US5789160A (en) * | 1990-06-11 | 1998-08-04 | Nexstar Pharmaceuticals, Inc. | Parallel selex |
US5167009A (en) * | 1990-08-03 | 1992-11-24 | E. I. Du Pont De Nemours & Co. (Inc.) | On-line process control neural network using data pointers |
US5181259A (en) * | 1990-09-25 | 1993-01-19 | The United States Of America As Represented By The Administrator Of The National Aeronautics And Space Administration | General method of pattern classification using the two domain theory |
US5155801A (en) * | 1990-10-09 | 1992-10-13 | Hughes Aircraft Company | Clustered neural networks |
US5331573A (en) * | 1990-12-14 | 1994-07-19 | Balaji Vitukudi N | Method of design of compounds that mimic conformational features of selected peptides |
US5612895A (en) * | 1990-12-14 | 1997-03-18 | Balaji; Vitukudi N. | Method of rational drug design based on ab initio computer simulation of conformational features of peptides |
US5260882A (en) * | 1991-01-02 | 1993-11-09 | Rohm And Haas Company | Process for the estimation of physical and chemical properties of a proposed polymeric or copolymeric substance or material |
US5499193A (en) * | 1991-04-17 | 1996-03-12 | Takeda Chemical Industries, Ltd. | Automated synthesis apparatus and method of controlling the apparatus |
US5436850A (en) * | 1991-07-11 | 1995-07-25 | The Regents Of The University Of California | Method to identify protein sequences that fold into a known three-dimensional structure |
US5323471A (en) * | 1991-09-12 | 1994-06-21 | Atr Auditory And Visual Perception Research Laboratories | Pattern recognition apparatus and pattern learning apparatus employing neural net including excitatory element-inhibitory element pair couplings |
US5270170A (en) * | 1991-10-16 | 1993-12-14 | Affymax Technologies N.V. | Peptide library and screening method |
US5240680A (en) * | 1991-12-19 | 1993-08-31 | Chiron Corporation | Automated apparatus for use in peptide synthesis |
US5524065A (en) * | 1992-02-07 | 1996-06-04 | Canon Kabushiki Kaisha | Method and apparatus for pattern recognition |
US6037135A (en) * | 1992-08-07 | 2000-03-14 | Epimmune Inc. | Methods for making HLA binding peptides and their uses |
US5288514A (en) * | 1992-09-14 | 1994-02-22 | The Regents Of The University Of California | Solid phase and combinatorial synthesis of benzodiazepine compounds on a solid support |
US5545568A (en) * | 1992-09-14 | 1996-08-13 | The Regents Of The University Of California | Solid phase and combinatorial synthesis of compounds on a solid support |
US5565325A (en) * | 1992-10-30 | 1996-10-15 | Bristol-Myers Squibb Company | Iterative methods for screening peptide libraries |
US5442122A (en) * | 1992-11-09 | 1995-08-15 | Shimadzu Corporation | Dibenzosuberyl and dibenzosuberenyl derivatives |
US5526281A (en) * | 1993-05-21 | 1996-06-11 | Arris Pharmaceutical Corporation | Machine-learning approach to modeling biological activity for molecular design and to modeling other characteristics |
US5703792A (en) * | 1993-05-21 | 1997-12-30 | Arris Pharmaceutical Corporation | Three dimensional measurement of molecular diversity |
US5832494A (en) * | 1993-06-14 | 1998-11-03 | Libertech, Inc. | Method and apparatus for indexing, searching and displaying data |
US5585277A (en) * | 1993-06-21 | 1996-12-17 | Scriptgen Pharmaceuticals, Inc. | Screening method for identifying ligands for target proteins |
US5679582A (en) * | 1993-06-21 | 1997-10-21 | Scriptgen Pharmaceuticals, Inc. | Screening method for identifying ligands for target proteins |
US5635598A (en) * | 1993-06-21 | 1997-06-03 | Selectide Corporation | Selectively cleavabe linners based on iminodiacetic acid esters for solid phase peptide synthesis |
US5434796A (en) * | 1993-06-30 | 1995-07-18 | Daylight Chemical Information Systems, Inc. | Method and apparatus for designing molecules with desired properties by evolving successive populations |
US5621861A (en) * | 1993-07-27 | 1997-04-15 | Matsushita Electric Industrial Co., Ltd. | Method of reducing amount of data required to achieve neural network learning |
US5519635A (en) * | 1993-09-20 | 1996-05-21 | Hitachi Ltd. | Apparatus for chemical analysis with detachable analytical units |
US5598510A (en) * | 1993-10-18 | 1997-01-28 | Loma Linda University Medical Center | Self organizing adaptive replicate (SOAR) |
US5866334A (en) * | 1994-04-05 | 1999-02-02 | Genzyme Corporation | Determination and identification of active compounds in a compound library |
US5670326A (en) * | 1994-04-05 | 1997-09-23 | Pharmagenics, Inc. | Reiterative method for screening combinatorial libraries |
US5602938A (en) * | 1994-05-20 | 1997-02-11 | Nippon Telegraph And Telephone Corporation | Method of generating dictionary for pattern recognition and pattern recognition method using the same |
US5712565A (en) * | 1994-06-22 | 1998-01-27 | Seagate Technology, Inc. | MR sensor having thick active region between two thinner inactive MR regions topped with respective permanent magnets |
US5549974A (en) * | 1994-06-23 | 1996-08-27 | Affymax Technologies Nv | Methods for the solid phase synthesis of thiazolidinones, metathiazanones, and derivatives thereof |
US5740326A (en) * | 1994-07-28 | 1998-04-14 | International Business Machines Corporation | Circuit for searching/sorting data in neural networks |
US5463564A (en) * | 1994-09-16 | 1995-10-31 | 3-Dimensional Pharmaceuticals, Inc. | System and method of automatically generating chemical compounds with desired properties |
US6434490B1 (en) * | 1994-09-16 | 2002-08-13 | 3-Dimensional Pharmaceuticals, Inc. | Method of generating chemical compounds having desired properties |
US5684711A (en) * | 1994-09-16 | 1997-11-04 | 3-Dimensional Pharmaceuticals, Inc. | System, method, and computer program for at least partially automatically generating chemical compounds having desired properties |
US5574656A (en) * | 1994-09-16 | 1996-11-12 | 3-Dimensional Pharmaceuticals, Inc. | System and method of automatically generating chemical compounds with desired properties |
US5901069A (en) * | 1994-09-16 | 1999-05-04 | 3-Dimensional Pharmaceuticals, Inc. | System, method, and computer program product for at least partially automatically generating chemical compounds with desired properties from a list of potential chemical compounds to synthesize |
US5858660A (en) * | 1994-09-20 | 1999-01-12 | Nexstar Pharmaceuticlas, Inc. | Parallel selex |
US5634017A (en) * | 1994-09-22 | 1997-05-27 | International Business Machines Corporation | Computer system and method for processing atomic data to calculate and exhibit the properties and structure of matter based on relativistic models |
US5553225A (en) * | 1994-10-25 | 1996-09-03 | International Business Machines Corporation | Method and apparatus for combining a zoom function in scroll bar sliders |
US5736412A (en) * | 1995-01-20 | 1998-04-07 | Arqule, Inc. | Method of generating a plurality of chemical compounds in a spatially arranged array |
US5712171A (en) * | 1995-01-20 | 1998-01-27 | Arqule, Inc. | Method of generating a plurality of chemical compounds in a spatially arranged array |
US5807754A (en) * | 1995-05-11 | 1998-09-15 | Arqule, Inc. | Combinatorial synthesis and high-throughput screening of a Rev-inhibiting arylidenediamide array |
US5602755A (en) * | 1995-06-23 | 1997-02-11 | Exxon Research And Engineering Company | Method for predicting chemical or physical properties of complex mixtures |
US5811241A (en) * | 1995-09-13 | 1998-09-22 | Cortech, Inc. | Method for preparing and identifying N-substitued 1,4-piperazines and N-substituted 1,4-piperazinediones |
US5734796A (en) * | 1995-09-29 | 1998-03-31 | Ai Ware, Inc. | Self-organization of pattern data with dimension reduction through learning of non-linear variance-constrained mapping |
US6185506B1 (en) * | 1996-01-26 | 2001-02-06 | Tripos, Inc. | Method for selecting an optimally diverse library of small molecules based on validated molecular structural descriptors |
US6014661A (en) * | 1996-05-06 | 2000-01-11 | Ivee Development Ab | System and method for automatic analysis of data bases and for user-controlled dynamic querying |
US6026397A (en) * | 1996-05-22 | 2000-02-15 | Electronic Data Systems Corporation | Data analysis system and method |
US6421612B1 (en) * | 1996-11-04 | 2002-07-16 | 3-Dimensional Pharmaceuticals Inc. | System, method and computer program product for identifying chemical compounds having desired properties |
US6571227B1 (en) * | 1996-11-04 | 2003-05-27 | 3-Dimensional Pharmaceuticals, Inc. | Method, system and computer program product for non-linear mapping of multi-dimensional data |
US6295514B1 (en) * | 1996-11-04 | 2001-09-25 | 3-Dimensional Pharmaceuticals, Inc. | Method, system, and computer program product for representing similarity/dissimilarity between chemical compounds |
US5861532A (en) * | 1997-03-04 | 1999-01-19 | Chiron Corporation | Solid-phase synthesis of N-alkyl amides |
US5908960A (en) * | 1997-05-07 | 1999-06-01 | Smithkline Beecham Corporation | Compounds |
US5933819C1 (en) * | 1997-05-23 | 2001-11-13 | Scripps Research Inst | Prediction of relative binding motifs of biologically active peptides and peptide mimetics |
US5933819A (en) * | 1997-05-23 | 1999-08-03 | The Scripps Research Institute | Prediction of relative binding motifs of biologically active peptides and peptide mimetics |
US6049797A (en) * | 1998-04-07 | 2000-04-11 | Lucent Technologies, Inc. | Method, apparatus and programmed medium for clustering databases with categorical attributes |
Cited By (39)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040148265A1 (en) * | 1998-06-19 | 2004-07-29 | Schwartz Steven D. | Neural network methods to predict enzyme inhibitor or receptor ligand potency |
US6895396B2 (en) * | 1998-06-19 | 2005-05-17 | Albert Einstein College Of Medicine Of Yeshiva University | Neural network methods to predict enzyme inhibitor or receptor ligand potency |
US20080306966A1 (en) * | 2000-05-05 | 2008-12-11 | Cambridgesoft Corporation | Deriving Product Information |
US20090163529A1 (en) * | 2002-10-30 | 2009-06-25 | Edison Pharmaceuticals, Inc. | Identifying therapeutic compounds based on their physical-chemical properties |
US20040105817A1 (en) * | 2002-10-30 | 2004-06-03 | Sylvain Gilat | Identifying therapeutic compounds based on their physical-chemical properties |
US7491312B2 (en) | 2002-10-30 | 2009-02-17 | Edison Pharmaceuticals, Inc. | Identifying therapeutic compounds based on their physical-chemical properties |
US20050125210A1 (en) * | 2003-11-21 | 2005-06-09 | Pearlman Robert S. | System and method for providing a canonical structural representation of chemical compounds |
US20050159900A1 (en) * | 2003-11-21 | 2005-07-21 | Pearlman Robert S. | System and method for identifying structures for a chemical compound |
WO2005052746A3 (en) * | 2003-11-21 | 2006-03-02 | Optive Res Inc | Method for providing a canonical structural representation |
US20060106487A1 (en) * | 2004-10-05 | 2006-05-18 | Allen Robert M | Programmable load forming system, components thereof, and methods of use |
US8000837B2 (en) | 2004-10-05 | 2011-08-16 | J&L Group International, Llc | Programmable load forming system, components thereof, and methods of use |
US8255193B2 (en) | 2005-02-16 | 2012-08-28 | University Of Virginia Patent Foundation | Blood flow bypass catheters and methods for the delivery of medium to the vasculature and body ducts |
US20100160896A1 (en) * | 2005-02-16 | 2010-06-24 | University Of Virginia Patent Foundation | Blood Flow Bypass Catheters and Methods for the Delivery of Medium to the Vasculature and Body Ducts |
US20080262467A1 (en) * | 2005-02-16 | 2008-10-23 | Humphrey Joseph A C | Blood Flow Bypass Catheters and Methods for the Delivery of Medium to the Vasculature and Body Ducts |
US8655798B2 (en) | 2005-02-16 | 2014-02-18 | University Of Virginia Patent Foundation | Blood flow bypass catheters and methods for the delivery of medium to the vasculature and body ducts |
US20110046910A1 (en) * | 2005-04-15 | 2011-02-24 | Thermo Crs Ltd. | Method and System for Sample Testing |
US8229875B2 (en) * | 2007-04-11 | 2012-07-24 | Oracle International Corporation | Bayes-like classifier with fuzzy likelihood |
US20080256016A1 (en) * | 2007-04-11 | 2008-10-16 | Oracle International Corporation | Bayes-like classifier with fuzzy likelihood |
US20100299294A1 (en) * | 2009-05-20 | 2010-11-25 | Mott Jack E | Apparatus, system, and method for determining a partial class membership of a data record in a class |
US8103672B2 (en) | 2009-05-20 | 2012-01-24 | Detectent, Inc. | Apparatus, system, and method for determining a partial class membership of a data record in a class |
WO2013163068A1 (en) * | 2012-04-23 | 2013-10-31 | Targacept, Inc. | Chemical entity search, for a collaboration and content management system |
US9607023B1 (en) | 2012-07-20 | 2017-03-28 | Ool Llc | Insight and algorithmic clustering for automated synthesis |
US9336302B1 (en) | 2012-07-20 | 2016-05-10 | Zuci Realty Llc | Insight and algorithmic clustering for automated synthesis |
US11216428B1 (en) | 2012-07-20 | 2022-01-04 | Ool Llc | Insight and algorithmic clustering for automated synthesis |
US10318503B1 (en) | 2012-07-20 | 2019-06-11 | Ool Llc | Insight and algorithmic clustering for automated synthesis |
US20190043544A1 (en) * | 2012-09-10 | 2019-02-07 | Texas Instruments Incorporated | Processing device with nonvolatile logic array backup |
US9646279B2 (en) * | 2012-09-28 | 2017-05-09 | Rex Wiig | System and method of a requirement, compliance and resource management |
US10268974B2 (en) * | 2012-09-28 | 2019-04-23 | Rex Wiig | System and method of a requirement, compliance and resource management |
US20170053218A9 (en) * | 2012-09-28 | 2017-02-23 | Rex Wiig | System and method of a requirement, compliance and resource management methodology |
WO2015060960A3 (en) * | 2013-10-25 | 2015-06-04 | The Boeing Company | Product chemical profile system |
US10943194B2 (en) | 2013-10-25 | 2021-03-09 | The Boeing Company | Product chemical profile system |
GB2557113A (en) * | 2015-10-30 | 2018-06-13 | Halliburton Energy Services Inc | Producing chemical formulations with cognitive computing |
WO2017074455A1 (en) * | 2015-10-30 | 2017-05-04 | Halliburton Energy Services, Inc. | Producing chemical formulations with cognitive computing |
AU2015413368B2 (en) * | 2015-10-30 | 2021-09-09 | Halliburton Energy Services, Inc. | Producing chemical formulations with cognitive computing |
GB2557113B (en) * | 2015-10-30 | 2022-07-20 | Halliburton Energy Services Inc | Producing chemical formulations with cognitive computing |
US11205103B2 (en) | 2016-12-09 | 2021-12-21 | The Research Foundation for the State University | Semisupervised autoencoder for sentiment analysis |
US20190286792A1 (en) * | 2018-03-13 | 2019-09-19 | International Business Machines Corporation | Chemical compound discovery using machine learning technologies |
WO2021154740A1 (en) * | 2020-01-27 | 2021-08-05 | Potion Ai, Inc. | Methods, systems and apparatus for generating chemical data sequences using neural networks for de novo chemical formulations |
US11742060B2 (en) | 2020-01-27 | 2023-08-29 | Potion Ai, Inc. | Methods, systems and apparatus for generating chemical data sequences using neural networks for de novo chemical formulations |
Also Published As
Publication number | Publication date |
---|---|
JP2001507675A (en) | 2001-06-12 |
IL129498A0 (en) | 2000-02-29 |
US6421612B1 (en) | 2002-07-16 |
AU732397B2 (en) | 2001-04-26 |
EP0935784A2 (en) | 1999-08-18 |
AU722989B2 (en) | 2000-08-17 |
IL129728A0 (en) | 2000-02-29 |
CA2270527A1 (en) | 1998-05-14 |
US20020069043A1 (en) | 2002-06-06 |
WO1998020437A3 (en) | 1998-06-25 |
AU5440798A (en) | 1998-05-29 |
EP0935789A1 (en) | 1999-08-18 |
WO1998020437A2 (en) | 1998-05-14 |
CA2269669A1 (en) | 1998-05-14 |
US6295514B1 (en) | 2001-09-25 |
US7188055B2 (en) | 2007-03-06 |
AU5180098A (en) | 1998-05-29 |
WO1998020459A1 (en) | 1998-05-14 |
JP2001503546A (en) | 2001-03-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US6421612B1 (en) | System, method and computer program product for identifying chemical compounds having desired properties | |
US6434490B1 (en) | Method of generating chemical compounds having desired properties | |
Mitra et al. | Bioinformatics with soft computing | |
EP0943131B1 (en) | Method, system and program for synthesis-based simulation of chemicals having biological functions | |
Bunin et al. | Chemoinformatics theory | |
US20050177280A1 (en) | Methods and systems for discovery of chemical compounds and their syntheses | |
EA005286B1 (en) | Method of operating a computer system to perform a discrete substructural analysis | |
US20040117164A1 (en) | Method and system for artificial intelligence directed lead discovery in high throughput screening data | |
Gorse et al. | Functional diversity of compound libraries | |
US9218460B2 (en) | Defining and mining a joint pharmacophoric space through geometric features | |
WO1998046998A1 (en) | An optimal dissimilarity method for choosing distinctive items of information from a large body of information | |
Downs | 3.2 Clustering of Chemical Structure Databases for Compound Selection | |
Wiggins et al. | Process pathway inference via time series analysis | |
US6834239B2 (en) | Method, system, and computer program product for determining properties of combinatorial library products from features of library building blocks | |
US6671627B2 (en) | Method and computer program product for designing combinatorial arrays | |
Gillet | Applications of evolutionary computation in drug design | |
Mishra et al. | Insilco qsar modeling and drug development process | |
Willett | Subset-Selection Methods For Chemical Databases: Methods for Subset Selection | |
Zhang et al. | Fuzzy system methods in modeling gene expression and analyzing protein networks | |
Arnold et al. | Metabolomics | |
Gillet et al. | Dissimilarity-based compound selection for library design | |
Ma | Effective techniques for gene expression data mining | |
Jourdan et al. | Evolutionary feature selection for bioinformatics | |
Wrede | An Explorative Parameter Sweep: Spatial-temporal Data Mining in Stochastic Reaction-diffusion Simulations | |
Gan | The Application of Spectral Clustering in Drug Discovery |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: 3-DIMENSIONAL PHARMACEUTICALS, INC., PENNSYLVANIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:AGRAFIOTIS, DIMITRIS K.;BONE, ROGER F.;SALEMME, FRANCIS R.;AND OTHERS;REEL/FRAME:013002/0198;SIGNING DATES FROM 19980501 TO 19980506 |
|
AS | Assignment |
Owner name: JOHNSON & JOHNSON PHARMACEUTICAL RESEARCH AND DEVE Free format text: MERGER;ASSIGNOR:3-DIMENSIONAL PHARMACEUTICALS, INC.;REEL/FRAME:017523/0403 Effective date: 20040624 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |