WO1991018364A1 - Method for predicting the future occurrence of clinically occult or non-existent medical conditions - Google Patents

Method for predicting the future occurrence of clinically occult or non-existent medical conditions Download PDF

Info

Publication number
WO1991018364A1
WO1991018364A1 PCT/US1991/003302 US9103302W WO9118364A1 WO 1991018364 A1 WO1991018364 A1 WO 1991018364A1 US 9103302 W US9103302 W US 9103302W WO 9118364 A1 WO9118364 A1 WO 9118364A1
Authority
WO
WIPO (PCT)
Prior art keywords
neural network
unit
histograms
units
dna
Prior art date
Application number
PCT/US1991/003302
Other languages
French (fr)
Inventor
Peter M. Ravdin
William L. Mcguire
Gary M. Clark
Original Assignee
Board Of Regents, The University Of Texas System
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Board Of Regents, The University Of Texas System filed Critical Board Of Regents, The University Of Texas System
Publication of WO1991018364A1 publication Critical patent/WO1991018364A1/en

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • G06N3/065Analogue means
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S706/00Data processing: artificial intelligence
    • Y10S706/902Application using ai with detail of the ai system
    • Y10S706/924Medical

Definitions

  • the present invention relates to a method for predicting the future occurrence of medical conditions that have not yet occurred or which are clinically occult.
  • Neural networks are well known and have been used to implement computational methods that learn to distinguish between objects or classes of events.
  • the networks are first trained by presentation of known data about objects or classes of events, and then are applied to distinguish between unknown objects or classes of events.
  • neural networks have been applied in medicine to diagnose diseases based on existing symptoms, and to prescribe treatments for the diagnosed diseases, to date, there has been no application of such networks to predict future occurrence of disease which is clinically occult or which has not yet occurred, or to predict the relapse of disease that has presumably been cured.
  • prognostication is important in all branches of medicine. For example, it is useful in the field of oncology in order to improve the prediction of prognosis of patients so that appropriate therapy can be selected. This goal is of particular importance in the selection of treatment of breast cancer patients who are presumably rendered disease free after the removal of the primary tumor within the breast, and who have no pathological evidence of axillary lymph node involvement. Most of these patients will have been surgically cured, but a substantial minority will relapse.
  • Multivariate analysis is a powerful tool but suffers from the disadvantage that it is often unable to effectively analyze outcome based on a highly non-linear input variable. In addition it is at particular disadvantage in analyzing interactions between several non-linear variables (where for example multiple peaks and troughs of recurrence probability may exist) .
  • DNA cytophotometry process images of cells or cell components to quantitatively estimate a number of nuclear and cellular parameters.
  • DNA flow cytometry process images of cells or cell components to quantitatively estimate a number of nuclear and cellular parameters.
  • DNA flow cytometry process images of cells or cell components to quantitatively estimate a number of nuclear and cellular parameters.
  • DNA flow cytometry process images of cells or cell components to quantitatively estimate a number of nuclear and cellular parameters.
  • DNA flow cytometry process images of cells or cell components to quantitatively estimate a number of nuclear and cellular parameters.
  • the basis of DNA flow cytometry is the measurement of the level of DNA in individual cells.
  • the technique results in DNA histograms indicating the number of cells having different levels of DNA.
  • DNA histograms obtained through flow cytometry are interpreted as having cells in three basic regions: cells in the G1/G0 phase of the cell cycle before replication of DNA; cells in the S- phase which are actively replicating DNA; and cells in the G2/M phase of the cell after DNA replication but before cell
  • Tumor cells are conventionally interpreted as diploid if they have a G0/G1 peak with a DNA content that is that of normal cells, if there are no other peaks in the histogram with an arbitrary cut-off percentage of counts (usually 10%) of that peak value, and if the G0/G1 peak in the histogram is narrow enough to be considered to represent cells of one population.
  • S-phase counts of a DNA histogram lie in that region between the G0/G1 peak and the G2/M peak.
  • the present invention avoids the drawbacks of the prior art by presenting a medical prognostication method employing a neural network which is trained using sets of data including prognostic variables and corresponding disease or medical condition occurrence. After training, sets of test data, including the same prognostic variables with unknown disease occurrence, are tested to predict the future occurrence of the disease or medical condition.
  • a neural network of the back-propagation class is trained using the back- propagation of errors training algorithm with data of patients with known prognostic variables and disease occurrence. After training, the method then uses the trained neural network to predict future disease occurrence using sets of prognostic variables for which disease occurrence is not known.
  • a neural network is used to implement a method to analyze the outcome of breast cancer patients who have been apparently cured, but who are at risk for relapse.
  • a neural network is used to implement a method to analyze the risk of developing diabetes mellitus.
  • the present invention uses as prognostic variables progesterone receptor values, tumor size, cathepsin D protein levels and HER-2/neu protein levels. These prognostic variables are quantized into discrete variables and applied to input units of a three-level neural network having two output units, one representing relapse and one representing non-relapse.
  • the present invention when analyzing the risk of developing diabetes mellitus, uses as prognostic variables age, fasting glucose level, two-hour glucose level, fasting insulin level, and body mass index.
  • a neural network is used to implement a method to analyze the risk of relapse of axillary node positive breast cancer patients based on histograms of DNA flow cytometric analysis of primary tumors. This method can be used to replace, or in conjunction with, convention DNA flow cytometric analysis.
  • the present invention can be used as a tool in the prognosis of diseases and medical conditions such as other forms of cancer, cardiovascular disease, post- operative complications for various operative procedures, anesthesia related complications for various anesthetics, obstetrical complications, psychiatric problems, and other health related events.
  • Fig. 1 is a schematic representation of a neural network used to practice the method of the present invention to predict relapse of breast cancer.
  • Fig. 2 is a graph of the performance of the present invention in predicting relapse of breast cancer plotted against the number of training iterations.
  • Figs. 3A and B are graphs of the performance of the present invention to predict relapse rates and mortality of breast cancer patients plotted against follow-up time.
  • Fig. 4 is a schematic representation of a neural network used to practice the method of the present invention to predict future occurrence of diabetes.
  • Fig. 5 is a schematic representation of a neural network used to practice another embodiment of the method of the present invention to predict relapse of breast cancer using DNA flow cytometric histograms.
  • Fig. 6 is a graph of a typical diploid DNA histogram.
  • Fig. 7 is a graph of a typical aneuploid DNA histogram.
  • Fig. 8 is a graph of compressed DNA histograms.
  • Fig. 1 presented is a schematic representation of a neural network used to practice the method of the present invention to predict the relapse rate of breast cancer patients.
  • a three-layer feed-forward neural network is used of the bac -propagation class, and is trained using the back propagation of errors algorithm, both of which are known in the art.
  • the network was simulated on a Macintosh IIx computer using MacBrain 2.0 software developed by Neuronics, Inc. of Cambridge, Massachusetts.
  • neural networks for example adaline networks, adaptive resonance theory networks, bi-directional associative memory networks, back propagation networks, Boltzman back propagation networks, counter propagation networks, Hamming networks, Hopfield networks, Madaline networks, probabilistic neural networks, recirculation networks, spatio-temporal pattern recognition networks, and others, can be used for this and other embodiments without departing from the spirit and scope of the present invention.
  • training algorithms such as for example, the pocket algorithm, delta rule, counter propagation, Hebb rule, Hopfield rule, Windrow-Hoff rule, adaline rule, Kohonene rule, and similar neural network training algorithms, can also be used without departing from the spirit and scope of the present invention.
  • the neural network includes three layers, an input layer having sixteen units, il-il6, a hidden layer having twelve units, hl7-h28, and an output layer including two output units, o29 and o30.
  • each of the input units il-16 is connected to each of the hidden layer units hl7-h28
  • each of the hidden layer units hl7-h28 is connected to each of the output units o29 and o30.
  • connections between the various units of the neural network are weighted, and the output, or "activation state" of one unit is multiplied by the weight of the connection before application as an input to a unit in the next layer.
  • each input unit il-il6 is simply determined by the input of a one or a zero value depending on whether the relevant prognostic variable of a particular patient was positive or not.
  • the inputs of each input unit il-il6 are all applied at the same time, and are used to determine the activation states of the hidden layer units, hl7-h28.
  • the activation state of each hidden layer unit is calculated according to the sigmoidal activation equation:
  • Wi is the sum of all weighted inputs to the unit (i.e., the sum of the activation states of the inputs to the unit, each multiplied by the relevant connection weight), and where x is equal to 0.2.
  • the values for the activation states of each unit calculated using this activation equation are values between zero and one.
  • the activation states of all hidden layer units, hl7-h28, are calculated at the same time.
  • the hidden layer activation states are then used to calculate the activation states of output units, o29 and o30, using the same activation equation used to calculate the states of the hidden layer units. For each output unit, the contribution of each hidden layer unit to each output unit is calculated by multiplying the activation state of each hidden layer unit by the connection weight between it and the relevant output unit.
  • Non-relapse is defined as the output unit representing non-relapse having an activation state greater than 0.5 and the output unit representing relapse having an activation state less than 0.5.
  • Relapse is defined as all other states.
  • the activation states of output units o29 and o30 are compared with known relapse data and are used to train the neural network.
  • Input units are divided into groups according to a particular prognostic input variable, and each input unit is dedicated to a particular range for the corresponding input variable.
  • the input prognostic variables used to predict the occurrence of relapse in breast cancer are: progesterone receptor values (PgR) measured in femtomoles per milligram; tumor size, measured in centimeters; cathepsin D protein level, measured in expression units; and HER-2/neu protein level, also measured in expression units.
  • PgR progesterone receptor values
  • Units il-i4 are used for coding progesterone receptor values, units i5-i7 for tumor size, units i8-ill for cathepsin D levels, and units il2-il6 for HER-2/neu protein levels. These four prognostic variables are quantized into discrete values, with each input unit representing approximately 1/N of the patients within the training data set, where N equals the number of input units used to input the information for a particular prognostic variable. Table I shows the actual cut-off values used for each prognostic input variable.
  • a data set including 199 breast cancer patients having no evidence of axillary lymph node involvement was used.
  • the information for each patient included values for the prognostic variables, as well as follow-up information including relapse occurrence and mortality.
  • connection weights between the units were initially randomly assigned to values between negative one and one, and using the known back propagation of errors algorithm for training, the entire training set of 133 was applied to the neural network for several iterations, during which the individual connection weights are adjusted.
  • Table II presents the connection weights between all of the units of Fig. 1 after 35 training iterations of the training set.
  • Fig. 2 is a graph illustrating the learning process through successive learning iterations in order to make successful predictions of patient relapse in the set of 66 patients in the testing set.
  • the network is at first essentially making random guesses. After 20 learning iterations, the network is making predictions that are significantly better than chance with p less than 0.012. After 35 training iterations through the entire training set, the network was correct in 50% of its prediction of relapse (14 of 28) and 87% of its predictions of non-relapse (33 of 38) .
  • the method of the present invention identified patients with a high (50%) and low (13%) overall relapse rate (p less than 0.002) .
  • Figs. 3A and 3B Disease free survival and overall survival curves for the neural network of Fig. 1 with the connection weights shown in Table II, are shown in graphical form in Figs. 3A and 3B. Projected disease free survival at five years is 86% and 46% in the low and high risk groups, respectively. Projected overall survival at five years was 94% and 67% in the low and high risk groups, respectively. Thus, the predictions produced by the method of the present invention identified a subset of high risk patients that had 3.3 times the relapse rate and 4.4 times the mortality rate of the low risk group.
  • Table III shows the disease free and overall survival rates at five years predicted using the neural network method of the present invention, compared with predictions made according to individual prognostic variables using known cut points for the individual variables.
  • Table III indicates that the method of the present invention is superior to any single prognostic variable for prediction of disease free survival (which was the criteria used to train the network) , and is better than progesterone receptor status, tumor size, or HER- 2/neu, and is equivalent to cathepsin-D in predicting overall survival.
  • prognostic variables other than those presented above could be used.
  • prognostic variables that may be useful for this purpose include % S- phase, nuclear grade, histologic grade, epidermal growth factor content, insulin like growth factor content, other growth factor receptors, transforming growth factor content, epidermal growth factor content, other growth factor and hormone contents, heat shock protein 27 content, other heat shock proteins, Ki67 content, DNA polymerase content, etc.
  • prognostic variables include proposed treatments.
  • possible treatments usable as prognostic variables include various surgical procedures, radiotherapy, or chemotherapy, and combinations thereof.
  • prognostic variables are chosen for their capability, either alone or in combination with other variables, to assist in the prediction of the particular medical condition under consideration.
  • the present invention can be applied to predict the possible onset or occurrence of diseases and medical conditions other than breast cancer.
  • the present invention can be applied to evaluate the probability of people developing diabetes mellitus.
  • Fig. 4 presented is a schematic representation of a neural network used to implement the method of the present invention to predict the occurrence of diabetes.
  • the neural network of Fig. 4 similar to that of Fig. 1, is a three layer back-propagation network, and includes an input layer having 23 input units, i31- i53; a hidden layer having 10 hidden units, h54-h63; and an output layer having two output units, o64 and o65.
  • each input unit i31 and the hidden units are shown, and the weighted connections between only one hidden unit h54 and the output units are shown.
  • the activation states of each input unit i31-i53 are applied through weighted connections to each hidden unit h54-h63, and the activation states of each hidden unit h54-h63 are applied to each output unit, o64 and o65.
  • the activation states of input units i31-i53 are either a one or a zero depending upon the value of the relevant input variable
  • the activation states of hidden units h54-h63 and output units o64 and o65 are calculated using the weighted activation states of the previous layer in the sigmoidal activation equation
  • the value of the connection weights is determined during training of the network using the known back propagation of errors training algorithm.
  • the neural network of Fig. 4 was simulated on a Sun Sparc Station computer using the NeuralWorks Professional II software package available from NeuralWare Inc., of Pittsburgh, Pennsylvania. To train the neural network of Fig.
  • the 23 input units i31-i53 are divided into five groups, each group dedicated to a particular prognostic variable.
  • the prognostic variables were patient age in years, fasting glucose level in milligrams per deciliter, two-hour post-prandial glucose level in milligrams per deciliter, fasting serum insulin level in micro International Units per milliliter and body mass index (BMI) in kilograms per meter squared.
  • Input units i31-i34 are dedicated to age
  • i35-i38 are dedicated to fasting glucose level
  • i39-i43 are dedicated to two- hour glucose level
  • i44-i48 are dedicated to fasting insulin level
  • i49-i53 are dedicated to BMI.
  • Table IV The particular cut points for these input variables are shown in Table IV.
  • the neural network of Fig. 4 was trained using the known back-propagation of errors algorithm with data from 466 of the 699 participants of the data set being used for training. After 12 training iterations, the connection weights between the various units of Fig. 4 are summarized in Table V. TABLE V
  • Table VI summarizes the results of the performance of the neural network method of the present invention to predict the occurrence of diabetes compared with the prognostic capabilities of the individual input variables separately.
  • the results indicate that the present invention is capable of identifying a subset of patients with a higher risk of developing diabetes (30%) than has been shown by prior techniques, for example, the technique mentioned in the Haffner article was capable of identifying a group of people with only a 10-20% risk of developing diabetes.
  • the higher accuracy attainable by the present invention has obvious clinical utility as it allows the identification of a subset with a very high risk of developing diabetes for whom special intervention and special further screening would be justified.
  • the pattern recognition capabilities of neural network computational systems are exploited to analyze DNA flow cytophotometric measurements for features that correlate with prognosis.
  • a neural network is used to predict prognosis of disease from DNA cytophotometric measurement data.
  • a neural network is trained to directly predict the risk of relapse of axillary node positive breast cancer patients based on the DNA histograms of their primary tumors.
  • the results of the present invention are compared to and used in conjunction with conventional DNA flow cytometric analysis in order to improve the prediction of prognosis in breast cancer.
  • the analysis was done using the DNA flow histograms of 381 patients who had histologically proven axillary lymph node involvement at the time of a diagnosis of breast cancer, and who had been clinically followed for at least two years or until relapse. All 381 patients had histograms that by use of conventional techniques were interpretable for both ploidy and S-phase.
  • the 381 patients were randomly assigned to independent training and testing subsets.
  • the 381 patients were randomly assigned to a 191 patient training set or to a 190 patient testing set.
  • the training and testing subsets were generated from the training and testing sets used for the combined model.
  • the training set included 98 patients, and the testing set included 84 patients.
  • the training set included 93 patients and the testing set included 106 patients.
  • tumor specimens were prepared by freezing and pulverizing fresh tumor specimens into a coarse powder. The powder was then homogenized into a Tris sucrose buffer, filtered through 210 and 53 micron nylon meshes and debris was removed by a sucrose cushion technique. After centrifugation at 1500g for 45 minutes, the pellet was resuspended in MEM containing 10% fetal bovine serum. The DNA in the nuclei was then stained with propidium iodide.
  • Nuclei were then pelleted by centrifugation, resuspended in staining buffer, syringed through a 27 gauge needle to break up any clumps, filtered through a 37 micro mesh and injected into a flow cytometer.
  • the flow cytometer used was a Epic V flow cytometer available from Coulter Electronics, of Hialeah, Florida, fitted with a single Inova 90 argon laser available from Coherent Laser Products Division, of Palo Alto, California. Laser emission was 400mW at 488nm. Approximately 50,000 tumor events were acquired on a single-parameter 256 channel integrated fluorescence histogram.
  • DNA content or ploidy in a sample was confirmed as diploid if the G0/G1 peak fell between channel 60 to 64 of the 256 channel histogram.
  • the DNA content was defined as aneuploid if two discrete G0/G1 peaks occurred with the aneuploid G0/G1 peak having at least 10% of the 50,000 sample events collected, and having a corresponding G2/M peak.
  • Samples were rejected as uninterpretable if the sample quality was poor (for example, excess cell debris or too few cells) , or if the histogram lacked resolution to distinguish two separate peaks. Coefficients of variation of the G0/G1 peak width were required to be less than or equal to 5% to be considered valuable for this conventional study.
  • a neural network is used to assess data derived from the 256 channel DNA flow cytometric histograms in order to determine the risk of relapse of breast cancer.
  • the neural network used in this embodiment of the invention is presented.
  • This neural network was simulated using Nworks software available from NeuralWare, Inc. of Pittsburgh, Pennsylvania.
  • the network includes 33 input units i66-i98, two hidden layer units, h99 and hlOO, and one output layer unit, olOl.
  • One of the input units, i66 is a bias unit that has a constant input value of l.o.
  • Bias unit i66 has connection weights to both units of the hidden layer, h99 and hlOO, and a connection weight to the unit of the output layer, olOl.
  • connection weights to both units of the hidden layer, h99 and hlOO and a connection weight to the unit of the output layer, olOl.
  • each input unit i66-i98 is also connected through weighted connections to hidden layer unit hlOO.
  • the transfer functions were linear in the input layer units i66-i98, and were hyperbolic tangent functions
  • TANH(x) in hidden layer units h99 and hi00 and in output layer unit olOl.
  • Use of the hyperbolic tangent function allows scaling of the activation weights to values between -1.0 and 1.0.
  • the data from the original 256 channel histograms were compressed into 32 channels for application to the neural network of Fig. 5. This compression was done in order to improve the convergence of the network, and was achieved by summing the number of events in 8 consecutive channels of the 256 channel histogram, adding 1.0, taking the log base 10, and dividing by 6.
  • the values in each of the resulting 32 channels in the training examples were further normalized by finding the maximum and minimum values in each channel and by linearly transforming all values in a given channel to lie between -1.0 and 1.0 for presentation to the network.
  • the cumulative back propagation of errors learning algorithm was used for training the network of Fig. 5 with an epoch of 10 (i.e., correction of connection weight was done after every ten data representations) .
  • the 32 compressed histogram channels served as inputs for the 32 input units i67-i98, and the relapse status was presented by the output unit olOl with a 1 representing relapse and a 0 representing non-relapse.
  • the network was tested for its ability to generize using a second independent test set of patients. The network was tested after each 250 histogram presentations. The network appeared to reach the best solution within 3,000 data representations, and the performance degraded thereafter.
  • the first used all of the histogram data divided into a training set and a testing set.
  • the second and third models used only patients from the combined model which were defined by conventional histogram analysis as being diploid or aneuploid.
  • the relapse rate of high and low risk groups was calculated, and the differences in relapse rate were calculated using the chi-square test.
  • the 381 histograms were randomly assigned to a 191 patient training set or a 190 patient testing set, and the 191 patients in the training set were used to train a series of four neural networks, each of which had a structure identical to that of Fig. 5, with the only difference being the use of four different sets of initial conditions for the connection weights at the beginning of the training session.
  • the connection weights in the neural network were initially set before training to random values between -0.2 and 0.2.
  • Table VII presents the connection weights for network no. 1 of the combined model.
  • Table VIII shows the differences in relapse rates in the neural network defined as low and high risk groups.
  • the weakest network was capable of separating the patients in the testing subsets into a low risk half with a risk of relapse of 13.6% versus a high risk half with a risk of relapse of 26.4%. This discrimination was better than that provided by conventional analysis of ploidy status which separated the patients in the testing set into a diploid set with a relapse rate of 15.5% (43.1% of the patients) , and an aneuploid set with a relapse rate of 23.6% (56.9% of the patients).
  • the p value for differences in relapse rate based on ploidy status alone did not reach statistical significance (p > 0.10).
  • a combination of conventional techniques and a neural network approach was used. This was accomplished by training a series of four networks, identical in structure to that of Fig. 5, each with different initial conditions, to analyze prognosis after presentation of exclusively diploid DNA histograms (diploid model) , and then by training a series of four networks, each with structure identical to that of Fig. 5 with different initial conditions, to analyze prognosis after presentation of exclusively aneuploid DNA histograms as defined by conventional histogram analysis (aneuploid model) .
  • Table IX presents the connection weights for network no. 1 of the diploid model.
  • Table X illustrates that the four neural networks of the diploid model were able quickly and consistently to learn to discriminate between patients with diploid tumors who had a low risk versus a high risk for relapse.
  • each of the four networks has the structure of Fig. 5, with each having different initial conditions.
  • Table XI presents the connection weights for network no. 1 of the aneuploid model.
  • Table XII shows the differences in relapse rates that the aneuploid model neural networks defined as low and high risk groups.
  • Figs. 6 and 7 respectively show typical examples of 256 channel diploid and aneuploid histograms.
  • the diploid histogram of Fig. 6 demonstrates a diploid GO/Gl peak 102, a diploid G2/M peak 103 and a S-phase region 104 between the two.
  • the aneuploid histogram of Fig. 7 also demonstrates a GO/Gl peak 105 and a diploid G2/M peak 106.
  • the aneuploid DNA histogram of Fig. 7 demonstrates an aneuploid GO/Gl peak 107 and an aneuploid G2/M peak lO ⁇ .
  • Fig. 8 shows typical examples of diploid and aneuploid histograms after compression into 32 channels in accordance with the present invention.
  • the compressed diploid histogram is shown by unfilled circles, and the compressed aneuploid histogram is shown by filled circles.
  • the diploid histogram shows the typical early diploid G0/G1 peak 109 in channels 7 and 8 followed by a G2/M peak 110 in channels 14, 15 and 16.
  • the aneuploid histogram shows in addition to these diploid peaks, an aneuploid G0/G1 peak 111 in channels 10 and 11, and an aneuploid G0/M peak 112 in channels 21 and 22.
  • Fig. 8 illustrates the complexity of the histograms. For example, there are a large number of nuclei that do not stain with an intensity represented by any of the channels. This background noise exhibits an exponential decay from low numbered channels to high numbered channels. Discrimination of these background counts from those that are truly S-phase nuclei lying between the G0/G1 peaks and the G2/M peaks is a complex task particularly when multiple G0/G1 peaks exist. There are also frequently peaks that do not correspond to those expected by simple models, such as peak 113 appearing in the diploid histogram in channel 20.

Abstract

A method is presented for evaluating data to predict the future occurrence of a medical condition that is presently clinically occult or which has not yet occurred. Specifically, the method uses a neural network to analyze and interpret DNA flow cytometric histograms. A first set of DNA histograms taken from tumors from patients having known relapse rates are used to train the neural network, and then the trained network is applied to predict the relapse rates of patients using DNA histograms of tumors from those patients. Prognosis according to this method can be performed using only diploid histograms, using only aneuploid histograms, or using a combination of diploid and aneuploid histograms.

Description

METHOD FOR PREDICTING THE FUTURE OCCURRENCE OF CLINICALLY OCCULT OR NON-EXISTENT MEDICAL CONDITIONS
The present invention relates to a method for predicting the future occurrence of medical conditions that have not yet occurred or which are clinically occult.
Neural networks are well known and have been used to implement computational methods that learn to distinguish between objects or classes of events. The networks are first trained by presentation of known data about objects or classes of events, and then are applied to distinguish between unknown objects or classes of events. While neural networks have been applied in medicine to diagnose diseases based on existing symptoms, and to prescribe treatments for the diagnosed diseases, to date, there has been no application of such networks to predict future occurrence of disease which is clinically occult or which has not yet occurred, or to predict the relapse of disease that has presumably been cured.
Such prognostication is important in all branches of medicine. For example, it is useful in the field of oncology in order to improve the prediction of prognosis of patients so that appropriate therapy can be selected. This goal is of particular importance in the selection of treatment of breast cancer patients who are presumably rendered disease free after the removal of the primary tumor within the breast, and who have no pathological evidence of axillary lymph node involvement. Most of these patients will have been surgically cured, but a substantial minority will relapse.
Several recent studies suggest that certain breast cancer patients without axillary lymph node involvement can benefit by adjuvant chemotherapy or hormonal therapy. However, not all the individual patients actually benefit from this therapy, and a majority of these patients receive therapy that is not necessary.
Prior efforts to predict breast cancer prognosis use a number of biochemical, molecular biologic and biophysical input variables that can be used to describe the cells in a tumor. When such multiple input variables are available, typically various combinations of the input variables are assessed using multivariate analysis. Multivariate analysis is a powerful tool but suffers from the disadvantage that it is often unable to effectively analyze outcome based on a highly non-linear input variable. In addition it is at particular disadvantage in analyzing interactions between several non-linear variables (where for example multiple peaks and troughs of recurrence probability may exist) . All of this is particularly true when a given input variable is included in one of two input states (as is commonly done in clinical medicine) , with an optimum threshold or cut-point between the two states being determined by maximizing a likelihood function using regression analysis. While such multivariate analysis is not without advantage, it suffers from drawbacks because defining a single cut-point between two states of an input variable effectively ignores important non-linearities in the input variable. In addition, multivariate analysis can miss cross-correlation effects between input variables. Other clinically occult diseases which have known multiple risk factors, for example, coronary heart disease or diabetes, would also benefit from improved prognostication methods.
Other methods have also proven to be important tools in the prediction of prognosis in breast cancer, and other tumor types. Such methods, known as DNA cytophotometry, process images of cells or cell components to quantitatively estimate a number of nuclear and cellular parameters. Of particular interest is DNA flow cytometry. The basis of DNA flow cytometry is the measurement of the level of DNA in individual cells. The technique results in DNA histograms indicating the number of cells having different levels of DNA. Conventionally, DNA histograms obtained through flow cytometry are interpreted as having cells in three basic regions: cells in the G1/G0 phase of the cell cycle before replication of DNA; cells in the S- phase which are actively replicating DNA; and cells in the G2/M phase of the cell after DNA replication but before cell replication.
Tumor cells are conventionally interpreted as diploid if they have a G0/G1 peak with a DNA content that is that of normal cells, if there are no other peaks in the histogram with an arbitrary cut-off percentage of counts (usually 10%) of that peak value, and if the G0/G1 peak in the histogram is narrow enough to be considered to represent cells of one population. S-phase counts of a DNA histogram lie in that region between the G0/G1 peak and the G2/M peak.
Several complex mathematical formulae have been developed to count the number of S-phase events while subtracting out events due to the tails of the G0/G1 and G2/M peaks, and while subtracting out the effects of contaminating cell debris. A particular sophisticated method known as SFIT uses second degree polynomials to perform this subtraction. These mathematical formulae are particularly complex for aneuploid histograms when they often have to deal with cell kinetics from cell populations that are both diploid and aneuploid. All of these mathematical approaches are however based on a mechanistic view of cells being in either the GO/Gl, S or G2/M phases of the cell cycle.
As such, present techniques for analyzing DNA histograms resulting from flow cytometry ignore other patterns occurring in the DNA histograms which correlate with the risk of cancer relapse.
The present invention avoids the drawbacks of the prior art by presenting a medical prognostication method employing a neural network which is trained using sets of data including prognostic variables and corresponding disease or medical condition occurrence. After training, sets of test data, including the same prognostic variables with unknown disease occurrence, are tested to predict the future occurrence of the disease or medical condition.
According to the present invention, a neural network of the back-propagation class is trained using the back- propagation of errors training algorithm with data of patients with known prognostic variables and disease occurrence. After training, the method then uses the trained neural network to predict future disease occurrence using sets of prognostic variables for which disease occurrence is not known.
In two exemplary embodiments of the invention, a neural network is used to implement a method to analyze the outcome of breast cancer patients who have been apparently cured, but who are at risk for relapse. In another exemplary embodiment, a neural network is used to implement a method to analyze the risk of developing diabetes mellitus.
Specifically, in a first embodiment when predicting the relapse of cancer in breast cancer patients, the present invention uses as prognostic variables progesterone receptor values, tumor size, cathepsin D protein levels and HER-2/neu protein levels. These prognostic variables are quantized into discrete variables and applied to input units of a three-level neural network having two output units, one representing relapse and one representing non-relapse.
In a second embodiment of the invention, when analyzing the risk of developing diabetes mellitus, the present invention uses as prognostic variables age, fasting glucose level, two-hour glucose level, fasting insulin level, and body mass index.
In accordance with yet another embodiment of the present invention, a neural network is used to implement a method to analyze the risk of relapse of axillary node positive breast cancer patients based on histograms of DNA flow cytometric analysis of primary tumors. This method can be used to replace, or in conjunction with, convention DNA flow cytometric analysis.
Application of the present invention to predict the occurrence or relapse of other diseases, or to predict the mortality rate of diseases and other medical conditions (such as for an actuarial analysis) is also possible. For example, the present invention can be used as a tool in the prognosis of diseases and medical conditions such as other forms of cancer, cardiovascular disease, post- operative complications for various operative procedures, anesthesia related complications for various anesthetics, obstetrical complications, psychiatric problems, and other health related events.
Fig. 1 is a schematic representation of a neural network used to practice the method of the present invention to predict relapse of breast cancer.
Fig. 2 is a graph of the performance of the present invention in predicting relapse of breast cancer plotted against the number of training iterations.
Figs. 3A and B are graphs of the performance of the present invention to predict relapse rates and mortality of breast cancer patients plotted against follow-up time.
Fig. 4 is a schematic representation of a neural network used to practice the method of the present invention to predict future occurrence of diabetes.
Fig. 5 is a schematic representation of a neural network used to practice another embodiment of the method of the present invention to predict relapse of breast cancer using DNA flow cytometric histograms.
Fig. 6 is a graph of a typical diploid DNA histogram.
Fig. 7 is a graph of a typical aneuploid DNA histogram.
Fig. 8 is a graph of compressed DNA histograms.
Referring to Fig. 1, presented is a schematic representation of a neural network used to practice the method of the present invention to predict the relapse rate of breast cancer patients. In this embodiment, a three-layer feed-forward neural network is used of the bac -propagation class, and is trained using the back propagation of errors algorithm, both of which are known in the art. The network was simulated on a Macintosh IIx computer using MacBrain 2.0 software developed by Neuronics, Inc. of Cambridge, Massachusetts.
It should be emphasized that other forms of neural networks, for example adaline networks, adaptive resonance theory networks, bi-directional associative memory networks, back propagation networks, Boltzman back propagation networks, counter propagation networks, Hamming networks, Hopfield networks, Madaline networks, probabilistic neural networks, recirculation networks, spatio-temporal pattern recognition networks, and others, can be used for this and other embodiments without departing from the spirit and scope of the present invention.
In addition, different training algorithms, such as for example, the pocket algorithm, delta rule, counter propagation, Hebb rule, Hopfield rule, Windrow-Hoff rule, adaline rule, Kohonene rule, and similar neural network training algorithms, can also be used without departing from the spirit and scope of the present invention.
In Fig. 1, the neural network includes three layers, an input layer having sixteen units, il-il6, a hidden layer having twelve units, hl7-h28, and an output layer including two output units, o29 and o30. For clarity of presentation, only the connections between one of the input units, il, and the hidden layer units are shown, and only the connections between one of the hidden units, hl7, and the output layer units are shown. In actuality, each of the input units il-16 is connected to each of the hidden layer units hl7-h28, and each of the hidden layer units hl7-h28 is connected to each of the output units o29 and o30. Thus, there are a total of 192 connections between the input and hidden layer units, and a total of 28 connections between the hidden layer and output layer units.
After training (described in more detail below) , the connections between the various units of the neural network are weighted, and the output, or "activation state" of one unit is multiplied by the weight of the connection before application as an input to a unit in the next layer.
The activation state of each input unit il-il6 is simply determined by the input of a one or a zero value depending on whether the relevant prognostic variable of a particular patient was positive or not. The inputs of each input unit il-il6 are all applied at the same time, and are used to determine the activation states of the hidden layer units, hl7-h28. The activation state of each hidden layer unit is calculated according to the sigmoidal activation equation:
Activation State = 1/(l+e~wi/χ)
Where: Wi is the sum of all weighted inputs to the unit (i.e., the sum of the activation states of the inputs to the unit, each multiplied by the relevant connection weight), and where x is equal to 0.2. The values for the activation states of each unit calculated using this activation equation are values between zero and one. The activation states of all hidden layer units, hl7-h28, are calculated at the same time. The hidden layer activation states are then used to calculate the activation states of output units, o29 and o30, using the same activation equation used to calculate the states of the hidden layer units. For each output unit, the contribution of each hidden layer unit to each output unit is calculated by multiplying the activation state of each hidden layer unit by the connection weight between it and the relevant output unit.
The activation states of output units o29 and o30, one representing relapse and the other representing non- relapse, are then used to predict prognosis based on the input data. Non-relapse is defined as the output unit representing non-relapse having an activation state greater than 0.5 and the output unit representing relapse having an activation state less than 0.5. Relapse is defined as all other states. Alternatively, in a known manner, during training, the activation states of output units o29 and o30 are compared with known relapse data and are used to train the neural network.
Input units, il-il6, are divided into groups according to a particular prognostic input variable, and each input unit is dedicated to a particular range for the corresponding input variable. In this exemplary embodiment, the input prognostic variables used to predict the occurrence of relapse in breast cancer are: progesterone receptor values (PgR) measured in femtomoles per milligram; tumor size, measured in centimeters; cathepsin D protein level, measured in expression units; and HER-2/neu protein level, also measured in expression units. Units il-i4 are used for coding progesterone receptor values, units i5-i7 for tumor size, units i8-ill for cathepsin D levels, and units il2-il6 for HER-2/neu protein levels. These four prognostic variables are quantized into discrete values, with each input unit representing approximately 1/N of the patients within the training data set, where N equals the number of input units used to input the information for a particular prognostic variable. Table I shows the actual cut-off values used for each prognostic input variable.
TABLE I
PARTITIONING OF PROGNOSTIC VARIABLES FOR RELAPSE OF BREAST CANCER
Break points For Input Units
Input Unit il i2 i3 i4 i5 i6 i7 i8 i9 ilO ill il2 il3 il4 il5 il6
Figure imgf000013_0001
To demonstrate the present invention, a data set including 199 breast cancer patients having no evidence of axillary lymph node involvement was used. The information for each patient included values for the prognostic variables, as well as follow-up information including relapse occurrence and mortality.
Patients were randomly assigned either to a training set of 133 patients, which were used to teach the neural network, or a test set of 66 patients that were used to test the ability of the invention to generize from the training set to patients that the network had not previously processed. For teaching, connection weights between the units were initially randomly assigned to values between negative one and one, and using the known back propagation of errors algorithm for training, the entire training set of 133 was applied to the neural network for several iterations, during which the individual connection weights are adjusted.
Table II presents the connection weights between all of the units of Fig. 1 after 35 training iterations of the training set.
TABLE II
NEURAL NETWORK CONNECTION WEIGHTS FOR BREAST CANCER PROGNOSIS
WEIGHTS TO OTHER UNITS FROM UNIT NO. il
Figure imgf000014_0001
WEIGHTS TO OTHER UNITS FROM UNIT NO. ±3 TO UNIT NO. WEIGHT hl7 .2177 hlδ .3684 hl9 -.6216 h20 .3037 h21 .6173 h22 -.5931 h23 .3495 h24 .5605 h25 .4034 h26 .9060 h27 -.4727 h28 -.2158
WEIGHTS TO OTHER UNITS FROM UNIT NO. i4
TO UNIT NO. WEIGHT hl7 -.9462 hl8 .2226 hl9 -.2883 h20 -.6021 h21 -.7095 h22 1.0270 h23 -.3167 h24 -.1635 h25 -.6571 h26 .8319 h27 .8953 h28 -.5648
WEIGHTS TO OTHER UNITS FROM UNIT NO. i5
TO UNIT NO. WEIGHT hl7 .1531 hl8 .1268 hl9 -.7091 h20 .3348 h21 .6779 h22 .4736 h23 -.9540 h24 .4526 h25 .2061 h26 .4309 h27 -.4772 h28 .0857 WEIGHTS TO OTHER UNITS FROM UNIT NO. 16
Figure imgf000016_0001
WEIGHTS TO OTHER UNITS FROM UNIT NO. i7
TO UNIT NO. WEIGHT hl7 .2973 hl8 .5655 hl9 -.0527 h20 -.5303 h21 .3262 h22 .3978 h23 .2357 h24 .2703 h25 -.0321 h26 .1099 h27 .5590 h28 -.9910
WEIGHTS TO OTHER UNITS FROM UNIT NO. ±8
TO UNIT NO. WEIGHT hl7 -.4898 hl8 .1949 hl9 .7300 h20 .4900 h21 .4681 h22 -.2509 h23 .8567 h24 .7537 h25 -.9103 h26 -.8041 h27 -.7081 h28 .2244 WEIGHTS TO OTHER UNITS FROM UNIT NO. 19
Figure imgf000017_0001
WEIGHTS TO OTHER UNITS FROM UNIT NO. ilO
Figure imgf000017_0002
WEIGHTS TO OTHER UNITS FROM UNIT NO. ill
TO UNIT NO. WEIGHT hl7 .8557 hl8 -.0135 hl9 .6326 h20 .8184 h21 -.5939 h22 .6688 h23 .2576 h24 .3858 h25 -.6883 h26 .7371 h27 .0915 h28 .0777 WEIGHTS TO OTHER UNITS FROM UNIT NO. 112
TO UNIT NO. WEIGHT hl7 .2551 hl8 -.7630 hl9 .4250 h20 -.6815 h21 -.6349 h22 -.7121 h.23 -.5656 h24 -.9477 h25 -.6794 h26 .8768 h27 -.2034 h28 -.8815
WEIGHTS TO OTHER UNITS FROM UNIT NO. 113 TO UNIT NO. WEIGHT hl7 -.4228 hl8 .3897 hl9 -.0624 h20 -.1254 h21 -.3533 h22 .2385 h23 .3233 h24 -.0850 h25 -.1422 h26 -.2288 h27 -.0247 h28 -.2515 WEIGHTS TO OTHER UNITS FROM UNIT NO. 114
TO UNIT NO. WEIGHT hl7 -.0540 hlδ -.6162 hl9 -.0116 h20 -.5482 h21 .7184 h22 .6320 h23 -.1000 h24 .9118 h25 .4984 h26 -.1832 h27 -.00292 h28 -.4219 WEIGHTS TO OTHER UNITS FROM UNIT NO. 115
TO UNIT NO. WEIGHT hl7 .7094 hl8 -.3693 hl9 -.6769 h20 .02239 h21 -.5683 h22 -.0709 h23 -.7757 h24 .8521 h25 -.1306 h26 -.1435 h27 .4659 h28 .0634
WEIGHTS TO OTHER UNITS FROM UNIT NO. il6
TO UNIT NO. WEIGHT hl7 -.2348 hlδ -.1824 hl9 .00003935 h20 -.7847 h21 .2633 h22 -.6176 h23 .3287 h24 .07218 h25 .4740 h26 -.6646 h27 -.5862 h28 .2196
WEIGHTS TO OTHER UNITS FROM UNIT NO. hl7
TO UNIT NO. WEIGHT θ29 .2265 O30 -.04278
WEIGHTS TO OTHER UNITS FROM UNIT NO. hl8
TO UNIT NO. WEIGHT θ29 -.08828 O30 .3799 WEIGHTS TO OTHER UNITS FROM UNIT NO. hl9
TO UNIT NO. WEIGHT θ29 .1962
O30 .6201
WEIGHTS TO OTHER UNITS FROM UNIT NO. h20 TO UNIT NO. WEIGHT θ29 -.2701
O30 .07405 WEIGHTS TO OTHER UNITS FROM UNIT NO. h21
TO UNIT NO. WEIGHT θ29 -.8166 O30 .1395
WEIGHTS TO OTHER UNITS FROM UNIT NO. h22
TO UNIT NO. WEIGHT θ29 .2786
O30 -.3387
WEIGHTS TO OTHER UNITS FROM UNIT NO. h23
TO UNIT NO. WEIGHT
029 .7487
030 -.3231
WEIGHTS TO OTHER UNITS FROM UNIT NO. h24
TO UNIT NO. WEIGHT θ29 -.5020
O30 .4133
WEIGHTS TO OTHER UNITS FROM UNIT NO. h25 TO UNIT NO. WEIGHT θ29 .7717
O30 -.1384 WEIGHTS TO OTHER UNITS FROM UNIT NO. h26
TO UNIT NO. WEIGHT θ29 .5795
O30 .3488
WEIGHTS TO OTHER UNITS FROM UNIT NO. h27 TO UNIT NO. WEIGHT θ29 -.5913
O30 .6090 WEIGHTS TO OTHER UNITS FROM UNIT NO. h28
TO UNIT NO. WEIGHT θ29 -.2601 O30 1.0440
Fig. 2 is a graph illustrating the learning process through successive learning iterations in order to make successful predictions of patient relapse in the set of 66 patients in the testing set. As can be seen in Fig. 2, the network is at first essentially making random guesses. After 20 learning iterations, the network is making predictions that are significantly better than chance with p less than 0.012. After 35 training iterations through the entire training set, the network was correct in 50% of its prediction of relapse (14 of 28) and 87% of its predictions of non-relapse (33 of 38) . Thus, the method of the present invention identified patients with a high (50%) and low (13%) overall relapse rate (p less than 0.002) .
Disease free survival and overall survival curves for the neural network of Fig. 1 with the connection weights shown in Table II, are shown in graphical form in Figs. 3A and 3B. Projected disease free survival at five years is 86% and 46% in the low and high risk groups, respectively. Projected overall survival at five years was 94% and 67% in the low and high risk groups, respectively. Thus, the predictions produced by the method of the present invention identified a subset of high risk patients that had 3.3 times the relapse rate and 4.4 times the mortality rate of the low risk group. The data presented in Table III shows the disease free and overall survival rates at five years predicted using the neural network method of the present invention, compared with predictions made according to individual prognostic variables using known cut points for the individual variables.
TABLE III
PROGNOSTIC UTILITY OF ANALYTICAL TECHNIQUE WITH PATIENT DATA FOR PREDICTING RELAPSE OF OR DEATH DUE TO BREAST CANCER
Risk of Relapse at 5 years (%)
Predictor Favorable Unfavorable
Neural Network 16
Progesterone Receptor 34
Tumor Size 34
Cathepsin-D 25
HER-2/neu 31
Figure imgf000022_0001
Overall Risk of Death at 5 Years (%)
Predictor Favorable Unfavorable
Figure imgf000022_0002
In Table III, percentages of patients in the unfavorable groups (i.e., predicted to relapse) were 42, 49, 64, 32 and 14% of the test set for the neural network method of the present invention, PgR, tumor size, cathepsin-D and HER-2/neu subsets, respectively. For the prognostic variables considered individually, patients were partitioned between favorable and unfavorable subsets as follows: for progesterone receptor with a cut point of > 4.9 fm/mg protein, for tumor size with a cut point of > 2 cm, for cathepsin-D with a cut point of 75 expression units, and for HER-2/neu with a cut point of 100 expression units. Table III indicates that the method of the present invention is superior to any single prognostic variable for prediction of disease free survival (which was the criteria used to train the network) , and is better than progesterone receptor status, tumor size, or HER- 2/neu, and is equivalent to cathepsin-D in predicting overall survival.
It should be emphasized that in predicting the possibility of relapse and mortality rate of breast cancer, prognostic variables other than those presented above could be used. For example, other prognostic variables that may be useful for this purpose include % S- phase, nuclear grade, histologic grade, epidermal growth factor content, insulin like growth factor content, other growth factor receptors, transforming growth factor content, epidermal growth factor content, other growth factor and hormone contents, heat shock protein 27 content, other heat shock proteins, Ki67 content, DNA polymerase content, etc. Still other possible prognostic variables include proposed treatments. For example, for breast cancer, possible treatments usable as prognostic variables include various surgical procedures, radiotherapy, or chemotherapy, and combinations thereof. In general, prognostic variables are chosen for their capability, either alone or in combination with other variables, to assist in the prediction of the particular medical condition under consideration. In addition, the present invention can be applied to predict the possible onset or occurrence of diseases and medical conditions other than breast cancer.
For example, and according to a second exemplary embodiment, the present invention can be applied to evaluate the probability of people developing diabetes mellitus. Referring to Fig. 4, presented is a schematic representation of a neural network used to implement the method of the present invention to predict the occurrence of diabetes. The neural network of Fig. 4, similar to that of Fig. 1, is a three layer back-propagation network, and includes an input layer having 23 input units, i31- i53; a hidden layer having 10 hidden units, h54-h63; and an output layer having two output units, o64 and o65. Once again, for clarity of presentation the weighted connections between only one input unit i31 and the hidden units are shown, and the weighted connections between only one hidden unit h54 and the output units are shown. In fact, the activation states of each input unit i31-i53 are applied through weighted connections to each hidden unit h54-h63, and the activation states of each hidden unit h54-h63 are applied to each output unit, o64 and o65. As before, the activation states of input units i31-i53 are either a one or a zero depending upon the value of the relevant input variable, the activation states of hidden units h54-h63 and output units o64 and o65 are calculated using the weighted activation states of the previous layer in the sigmoidal activation equation, and the value of the connection weights is determined during training of the network using the known back propagation of errors training algorithm. The neural network of Fig. 4 was simulated on a Sun Sparc Station computer using the NeuralWorks Professional II software package available from NeuralWare Inc., of Pittsburgh, Pennsylvania. To train the neural network of Fig. 4, the data was taken from a large survey of non-diabetics in San Antonio, Texas reported in Haffner, et al. Diabetes. 39:283-288, 1990. In this survey, 699 participants were subjected to laboratory tests and physical examination. Then, after eight years, the participants were re-surveyed and their medical records were reviewed to see who had developed diabetes. Approximately 5% of these individuals in the initial survey group developed diabetes.
The 23 input units i31-i53 are divided into five groups, each group dedicated to a particular prognostic variable. In this embodiment, the prognostic variables were patient age in years, fasting glucose level in milligrams per deciliter, two-hour post-prandial glucose level in milligrams per deciliter, fasting serum insulin level in micro International Units per milliliter and body mass index (BMI) in kilograms per meter squared. Input units i31-i34 are dedicated to age, i35-i38 are dedicated to fasting glucose level, i39-i43 are dedicated to two- hour glucose level, i44-i48 are dedicated to fasting insulin level, and i49-i53 are dedicated to BMI. The particular cut points for these input variables are shown in Table IV.
TABLE IV
PARTITIONING OF PROGNOSTIC VARIABLES FOR LATE ONSET (TYPE II) DIABETES MELLITUS BREAK POINTS FOR INPUT UNITS
Unit i31 i32 i33 i34 i35 i36 i37 i38 i39 i40 i41 i42 i43 i44 i45 i46 i47 i48 i49 i50 i51 i52 i53
Figure imgf000026_0001
The neural network of Fig. 4 was trained using the known back-propagation of errors algorithm with data from 466 of the 699 participants of the data set being used for training. After 12 training iterations, the connection weights between the various units of Fig. 4 are summarized in Table V. TABLE V
NEURAL NETWORK CONNECTION WEIGHTS FOR THE PREDICTION OF DIABETES MELLITUS
WEIGHTS TO OTHER UNITS FROM UNIT NO. 131
Figure imgf000027_0001
WEIGHTS TO OTHER UNITS FROM UNIT NO. 133
Figure imgf000027_0002
WEIGHTS TO OTHER UNITS FROM UNIT NO. 134
Figure imgf000028_0001
WEIGHTS TO OTHER UNITS FROM UNIT NO. 137 I O. WEIGHT
Figure imgf000029_0001
WEIGHTS TO OTHER UNITS FROM UNIT NO. 138
Figure imgf000029_0002
WEIGHTS TO OTHER UNITS FROM UNIT NO. i39
TO UNIT NO. WEIGHT h54 -.1423 h55 .6264 h56 -.1112 h57 .2649 h58 -.5834 h59 .7100 h60 -.0177 h61 .4389 h62 .5030 h63 .0028 WEIGHTS TO OTHER UNITS FROM UNIT NO. 140
Figure imgf000030_0001
WEIGHTS TO OTHER UNITS FROM UNIT NO. 143
TO UNIT NO. WEIGHT
Figure imgf000031_0001
WEIGHTS TO OTHER UNITS FROM UNIT NO. i4
TO UNIT NO. WEIGHT h54 -.6180 h55 .3870 h56 -.6375 h57 .0820 h58 -.2286 h59 .6442 h60 .7407 h61 .8418 h62 -.2558 h63 -.2181
WEIGHTS TO OTHER UNITS FROM UNIT NO. 145
TO UNIT NO. WEIGHT h54 -.7152 h55 .8204 h56 -.2097 h57 .3840 h58 .5409 h59 -.3277 h60 .2814 h61 .7492 h62 -.3517 h63 .4750 WEIGHTS TO OTHER UNITS FROM UNIT NO. 146
Figure imgf000032_0001
WEIGHTS TO OTHER UNITS FROM UNIT NO. i49
TO UNIT NO. WEIGHT
Figure imgf000033_0001
WEIGHTS TO OTHER UNITS FROM UNIT NO. 151
Figure imgf000033_0002
WEIGHTS TO OTHER UNITS FROM UNIT NO. 152
TO UNIT NO. WEIGHT h54 -.8303 h55 .6444 h56 -.1304 h57 .6764 h58 .2331 h59 -.5741 h60 .4140 h61 .6549 h62 -.3536 h63 -.6422
WEIGHTS TO OTHER UNITS FROM UNIT NO. 153
TO UNIT NO. W
Figure imgf000034_0001
WEIGHTS TO OTHER UNITS FROM UNIT NO. h54
TO UNIT NO. WEIGHT θ64 .7988 θ65 -.7955
WEIGHTS TO OTHER UNITS FROM UNIT NO. h55 TO UNIT NO. WEIGHT
064 -2.5500
065 2.7674 WEIGHTS TO OTHER UNITS FROM UNIT NO. h56
TO UNIT NO. WEIGHT θ64 -.7810 θ65 .1818 WEIGHTS TO OTHER UNITS FROM UNIT NO. h57
TO UNIT NO. WEIGHT o64 -1.3432 θ65 1.5711
WEIGHTS TO OTHER UNITS FROM UNIT NO. h58 TO UNIT NO. WEIGHT θ64 -.0919 θ65 -.4624 WEIGHTS TO OTHER UNITS FROM UNIT NO. h59
TO UNIT NO. WEIGHT
064 -.9766 θ65 .7307
WEIGHTS TO OTHER UNITS FROM UNIT NO. h60
TO UNIT NO. WEIGHT θ64 -1.2446
065 1.2364
WEIGHTS TO OTHER UNITS FROM UNIT NO. h61
TO UNIT NO. WEIGHT θ64 -2.4225 θ65 2.0069
WEIGHTS TO OTHER UNITS FROM UNIT NO. h62
TO UNIT NO. WEIGHT θ64 .4785 θ65 .6323
WEIGHTS TO OTHER UNITS FROM UNIT NO. h63 TO UNIT NO. WEIGHT θ64 -1.1616 θ65 1.2554 After training, data from the remaining 233 patients of the data set was used to evaluate the ability of the network to predict the occurrence of diabetes.
Table VI summarizes the results of the performance of the neural network method of the present invention to predict the occurrence of diabetes compared with the prognostic capabilities of the individual input variables separately. The results indicate that the present invention is capable of identifying a subset of patients with a higher risk of developing diabetes (30%) than has been shown by prior techniques, for example, the technique mentioned in the Haffner article was capable of identifying a group of people with only a 10-20% risk of developing diabetes. The higher accuracy attainable by the present invention has obvious clinical utility as it allows the identification of a subset with a very high risk of developing diabetes for whom special intervention and special further screening would be justified.
TABLE VI
PROGNOSTIC UTILITY OF ANALYTICAL TECHNIQUE WITH PATIENT DATA FOR PREDICTING ONSET OF DIABETES MELLITUS
Risk of Developing Type II Diabetes After 8 Years of Follow Up (%)
Predictor Favorable Unfavor bl
Figure imgf000036_0001
According to yet another embodiment of the present invention, the pattern recognition capabilities of neural network computational systems are exploited to analyze DNA flow cytophotometric measurements for features that correlate with prognosis. At present, several techniques exist to quantitatively estimate a number of nuclear and cellular parameters by processing images of cells obtained by microscopy. Such parameters include nuclear size, nuclear DNA staining, number of nucleoli, and other cellular and nuclear parameters. Data from such studies can be represented in a statistical format such as a histogram of events for a single measured parameter, or in other complex forms with a plurality of parameters for individual cells plotted against each other or plotted against the number of events.
Present analysis of such complex data sets is often quite complex, and can ignore significant attributes of the data that may be valuable in predicting clinical outcome. In accordance with the present invention, a neural network is used to predict prognosis of disease from DNA cytophotometric measurement data.
In particular, according to an exemplary embodiment of the present invention, a neural network is trained to directly predict the risk of relapse of axillary node positive breast cancer patients based on the DNA histograms of their primary tumors. As presented below, the results of the present invention are compared to and used in conjunction with conventional DNA flow cytometric analysis in order to improve the prediction of prognosis in breast cancer.
The analysis was done using the DNA flow histograms of 381 patients who had histologically proven axillary lymph node involvement at the time of a diagnosis of breast cancer, and who had been clinically followed for at least two years or until relapse. All 381 patients had histograms that by use of conventional techniques were interpretable for both ploidy and S-phase.
The preparation of specimens, DNA flow cytometry and the conventional interpretation presented for comparison below was done by Nichols Institute in San Juan, Capistrano, California. Clinical follow-up for recording information about relapse was performed by the Nichols Institute research network. Patients were defined as disease free if they had been followed for at least two years, and had not shown any signs of relapse. Patients were defined as having relapsed if they had relapsed within two years of diagnosis.
In order to train and test the neural network of this embodiment of the present invention, the 381 patients were randomly assigned to independent training and testing subsets. To form the combined model of the present invention, the 381 patients were randomly assigned to a 191 patient training set or to a 190 patient testing set. In the diploid and aneuploid models of the present invention, the training and testing subsets were generated from the training and testing sets used for the combined model. For the diploid model, the training set included 98 patients, and the testing set included 84 patients. For the aneuploid model, the training set included 93 patients and the testing set included 106 patients.
To generate the flow histograms for the 381 patients, tumor specimens were prepared by freezing and pulverizing fresh tumor specimens into a coarse powder. The powder was then homogenized into a Tris sucrose buffer, filtered through 210 and 53 micron nylon meshes and debris was removed by a sucrose cushion technique. After centrifugation at 1500g for 45 minutes, the pellet was resuspended in MEM containing 10% fetal bovine serum. The DNA in the nuclei was then stained with propidium iodide. Nuclei were then pelleted by centrifugation, resuspended in staining buffer, syringed through a 27 gauge needle to break up any clumps, filtered through a 37 micro mesh and injected into a flow cytometer. The flow cytometer used was a Epic V flow cytometer available from Coulter Electronics, of Hialeah, Florida, fitted with a single Inova 90 argon laser available from Coherent Laser Products Division, of Palo Alto, California. Laser emission was 400mW at 488nm. Approximately 50,000 tumor events were acquired on a single-parameter 256 channel integrated fluorescence histogram.
Using conventional analysis techniques, DNA content or ploidy in a sample was confirmed as diploid if the G0/G1 peak fell between channel 60 to 64 of the 256 channel histogram. The DNA content was defined as aneuploid if two discrete G0/G1 peaks occurred with the aneuploid G0/G1 peak having at least 10% of the 50,000 sample events collected, and having a corresponding G2/M peak. Samples were rejected as uninterpretable if the sample quality was poor (for example, excess cell debris or too few cells) , or if the histogram lacked resolution to distinguish two separate peaks. Coefficients of variation of the G0/G1 peak width were required to be less than or equal to 5% to be considered valuable for this conventional study.
In accordance with the present invention, a neural network is used to assess data derived from the 256 channel DNA flow cytometric histograms in order to determine the risk of relapse of breast cancer.
Referring to Fig. 5, the neural network used in this embodiment of the invention is presented. This neural network was simulated using Nworks software available from NeuralWare, Inc. of Pittsburgh, Pennsylvania. The network includes 33 input units i66-i98, two hidden layer units, h99 and hlOO, and one output layer unit, olOl. One of the input units, i66, is a bias unit that has a constant input value of l.o. Bias unit i66 has connection weights to both units of the hidden layer, h99 and hlOO, and a connection weight to the unit of the output layer, olOl. For the sake of clarity, only the connections between input units i66-i98 and one hidden layer unit, h99, are illustrated. However, it is to be understood that each input unit i66-i98 is also connected through weighted connections to hidden layer unit hlOO.
The transfer functions were linear in the input layer units i66-i98, and were hyperbolic tangent functions
(TANH(x)) in hidden layer units h99 and hi00 and in output layer unit olOl. Use of the hyperbolic tangent function allows scaling of the activation weights to values between -1.0 and 1.0.
The data from the original 256 channel histograms were compressed into 32 channels for application to the neural network of Fig. 5. This compression was done in order to improve the convergence of the network, and was achieved by summing the number of events in 8 consecutive channels of the 256 channel histogram, adding 1.0, taking the log base 10, and dividing by 6. The values in each of the resulting 32 channels in the training examples were further normalized by finding the maximum and minimum values in each channel and by linearly transforming all values in a given channel to lie between -1.0 and 1.0 for presentation to the network.
The cumulative back propagation of errors learning algorithm was used for training the network of Fig. 5 with an epoch of 10 (i.e., correction of connection weight was done after every ten data representations) . During training, the 32 compressed histogram channels served as inputs for the 32 input units i67-i98, and the relapse status was presented by the output unit olOl with a 1 representing relapse and a 0 representing non-relapse. After training the network with the training subsets of patients, the network was tested for its ability to generize using a second independent test set of patients. The network was tested after each 250 histogram presentations. The network appeared to reach the best solution within 3,000 data representations, and the performance degraded thereafter. To evaluate the ability of the neural network to discriminate between patients with good and poor prognosis, during testing the value generated by the output unit in response to a histogram was recorded. These output values would ideally be either l's or 0's corresponding to relapse or non-relapse, but in fact were over a continuous range from 1 to 0. The output values were then ranked from highest to lowest, and the 50% of the patients in the testing set with the highest output values were defined as being in the high risk group, and their relapse rate was calculated. The remaining patients were defined as being in the low risk group, and this group's relapse rate was also calculated.
To test the neural network of the present invention, three different models were used. The first (the combined model) , used all of the histogram data divided into a training set and a testing set. The second and third models (diploid and aneuploid models) , used only patients from the combined model which were defined by conventional histogram analysis as being diploid or aneuploid. With each of these models, the relapse rate of high and low risk groups was calculated, and the differences in relapse rate were calculated using the chi-square test. In the combined model, the 381 histograms were randomly assigned to a 191 patient training set or a 190 patient testing set, and the 191 patients in the training set were used to train a series of four neural networks, each of which had a structure identical to that of Fig. 5, with the only difference being the use of four different sets of initial conditions for the connection weights at the beginning of the training session. For each of the four networks, the connection weights in the neural network were initially set before training to random values between -0.2 and 0.2. Table VII presents the connection weights for network no. 1 of the combined model.
TABLE VII
NEURAL NETWORK CONNECTION WEIGHTS
FOR DNA FLOW CYTOMETRIC HISTOGRAMS
(COMBINED MODEL)
Weights From other Units to unit No. h99
From Unit No. Wei ht
Figure imgf000042_0001
i87 -0.1958 i88 -0.1127 i89 +0.0718 i90 +0.0733 i91 +0.0264 i92 -0.1320 i93 -0.0914 i94 -0.1158 i95 +0.1360 i96 +0.0466 i97 -0.1036 i98 +0.1672
Weights From Other Units to Unit No. hlOO
From Unit No. Wei ht
Figure imgf000043_0001
Weights From Other Units to Unit No. olOl
From Unit No. Weight i66 (Bias) -0.3720 h99 -0.1787 hlOO +0.0536
After training the four combined model networks, the
190 patients assigned to the testing set were used to test the performance of the networks. Table VIII shows the differences in relapse rates in the neural network defined as low and high risk groups.
TABLE VIII
ACTUAL RELAPSE RATES IN THE TESTING SET (COMBINED MODELS)
Figure imgf000044_0001
Referring to Table VIII, all combined model neural networks achieved a level of discrimination between low and high risk subsets that was statistically significant.
Even the weakest network was capable of separating the patients in the testing subsets into a low risk half with a risk of relapse of 13.6% versus a high risk half with a risk of relapse of 26.4%. This discrimination was better than that provided by conventional analysis of ploidy status which separated the patients in the testing set into a diploid set with a relapse rate of 15.5% (43.1% of the patients) , and an aneuploid set with a relapse rate of 23.6% (56.9% of the patients). The p value for differences in relapse rate based on ploidy status alone did not reach statistical significance (p > 0.10). To further increase the accuracy of the neural network method of the present invention when analyzing DNA histograms, a combination of conventional techniques and a neural network approach was used. This was accomplished by training a series of four networks, identical in structure to that of Fig. 5, each with different initial conditions, to analyze prognosis after presentation of exclusively diploid DNA histograms (diploid model) , and then by training a series of four networks, each with structure identical to that of Fig. 5 with different initial conditions, to analyze prognosis after presentation of exclusively aneuploid DNA histograms as defined by conventional histogram analysis (aneuploid model) .
To create the diploid model, training and testing subsets of 98 and 84 patients were selected from the training and testing sets used with the combined model. Table IX presents the connection weights for network no. 1 of the diploid model.
TABLE IX
NEURAL NETWORK CONNECTION WEIGHTS FOR DNA FLOW CYTOMETRIC HISTOGRAMS
(DIPLOID MODEL)
Weights From Other Units to Unit No. h99
From Unit No. Wei ht
Figure imgf000045_0001
i76 +0.1240 i77 +0.1853 i78 +0.0796 i79 +0.2817 i80 +0.0874 iδl -0.2549 i82 -0.3846 i83 -0.0600 i84 +0.3048 i85 +0.4717 i86 +0.4189 i87 +0.2851 i88 +0.0190 i89 +0.1962 i90 +0.0608 i91 +0.0670 i92 +0.1140 i93 +0.0654 i94 +0.0846 i95 +0.2185 i96 -0.0280 i97 +0.1448 i98 -0.4853
Weights From Other Units to Unit No. hi00
From Unit No.
Figure imgf000046_0001
i89 +0.0107 i90 -0.0194 i91 -0.0151 i92 -0.0377 i93 -0.2963 i94 +0.0460 i95 -0.2160 i96 +0.1115 i97 -0.0356 i98 +0.4837
Weights From Other Units to Unit No. olOl
From Unit No. Weight i66 (Bias) +0.0237 h99 +0.4966 hlOO -0.4390
Table X illustrates that the four neural networks of the diploid model were able quickly and consistently to learn to discriminate between patients with diploid tumors who had a low risk versus a high risk for relapse.
TABLE X ACTUAL RELAPSE RATES IN THE TESTING SETS
(DIPLOID MODEL)
Iterations
1 , 500 1 , 250 2 , 000
Figure imgf000047_0001
1 , 250
Referring to Table X, all four networks were able to generate low and high risk subsets with at least a three¬ fold difference in prognosis (7.1% versus 23.8% relapse, respectively, p = 0.03). All four networks of the diploid model were able to do this with statistical significance, and when compared to conventional analysis, which did not reach statistical significance, the neural networks were superior. Conventional analysis for S-phase dichotomized these patients into subsets with 11.9% and 19.0% risk of relapse.
Finally, four neural networks were trained using histograms from patients who had tumor DNA histograms classified as aneuploid by conventional techniques. The training and testing subsets used for these four networks included 93 and 106 patients, respectively, selected from the training and testing sets used in the combined model. Once again, each of the four networks has the structure of Fig. 5, with each having different initial conditions.
Table XI presents the connection weights for network no. 1 of the aneuploid model.
TABLE XI
NEURAL NETWORK CONNECTION WEIGHTS FOR DNA FLOW CYTOMETRIC HISTOGRAMS
(ANEUPLOID MODEL)
Weights From Other Units to Unit No. h99
From Unit No. Wei ht
Figure imgf000048_0001
i84 -0.3574 i85 -0.3031 i86 -0.1846 i87 -0.2023 i88 -0.1719 i89 -0.3219 i90 -0.3280 i91 -0.4161 i92 -0.1316 i93 +0.4717 i94 +0.2640 i95 +0.4035 i96 -0.1262 i97 +0.5744 i98 +0.2289
Weights From Other Units to Unit No. hlOO
om U t No. Wei ht
Figure imgf000049_0001
-4δ-
i97 -0. 1360 i9δ -0. 0752
Weights From Other Units to Unit No. olOl
From Unit No. Weight i66 (Bias) -0.2751 h99 -0.5412 hlOO +0.0751
Table XII shows the differences in relapse rates that the aneuploid model neural networks defined as low and high risk groups.
TABLE XII ACTUAL RELAPSE RATES IN THE TESTING SETS
(ANEUPLOID MODELS)
Network Low Risk High Risk p value Iterations
Figure imgf000050_0001
Referring to Table XII, all four neural networks were able to discriminate between histograms with high and low relapse risk. In three out of four cases, this reached statistical significance. In the case of aneuploid tumors, the low risk group had a risk of relapse at two years of 15.1%, while that assigned by conventional S- phase analysis was 13.2%. These two results are not statistically significantly different.
Figs. 6 and 7 respectively show typical examples of 256 channel diploid and aneuploid histograms. The diploid histogram of Fig. 6 demonstrates a diploid GO/Gl peak 102, a diploid G2/M peak 103 and a S-phase region 104 between the two. The aneuploid histogram of Fig. 7 also demonstrates a GO/Gl peak 105 and a diploid G2/M peak 106. In addition, the aneuploid DNA histogram of Fig. 7 demonstrates an aneuploid GO/Gl peak 107 and an aneuploid G2/M peak lOδ.
Fig. 8 shows typical examples of diploid and aneuploid histograms after compression into 32 channels in accordance with the present invention. The compressed diploid histogram is shown by unfilled circles, and the compressed aneuploid histogram is shown by filled circles. The diploid histogram shows the typical early diploid G0/G1 peak 109 in channels 7 and 8 followed by a G2/M peak 110 in channels 14, 15 and 16. The aneuploid histogram shows in addition to these diploid peaks, an aneuploid G0/G1 peak 111 in channels 10 and 11, and an aneuploid G0/M peak 112 in channels 21 and 22.
Examination of Fig. 8 illustrates the complexity of the histograms. For example, there are a large number of nuclei that do not stain with an intensity represented by any of the channels. This background noise exhibits an exponential decay from low numbered channels to high numbered channels. Discrimination of these background counts from those that are truly S-phase nuclei lying between the G0/G1 peaks and the G2/M peaks is a complex task particularly when multiple G0/G1 peaks exist. There are also frequently peaks that do not correspond to those expected by simple models, such as peak 113 appearing in the diploid histogram in channel 20.
Although the present invention has been described with reference to exemplary embodiments, those of ordinary skill in the art will understand that modifications, additions and deletions can be made to these exemplary embodiments without departing from the spirit and scope of the present invention.

Claims

CLAIMS :
1. A method of predicting the future occurrence of a medical condition that is presently occult or non- existing, comprising:
providing a neural network;
training said neural network using first sets of known data, each of said first sets including a predetermined number of prognostic input variables, and corresponding known medical condition occurrence, said prognostic input variables chosen according to capability to predict occurrence of said medical condition; and
predicting future occurrence of said medical condition for second sets of data using said trained neural network, each of said second sets including only said predetermined number of prognostic input variables.
2. The method of claim 1, said neural network comprising a neural network of the back-propagation class.
3. The method of claim 1, said training step comprising conditioning said first sets of known data using a back- propagation of errors neural network training algorithm.
4. The method of claim 1, further comprising quantizing each of said predetermined number of input variables. 5. The method of claim 4, said quantizing step comprising:
defining a range for each of said input variables;
dividing each of said ranges into subranges; and
determining a subrange within which each of said input variables falls.
6. The method of claim 1, said prognostic input variables comprising measurable medical quantities.
7. The method of claim 1, said prognostic input variable comprising proposed treatments for said medical condition.
δ. The method of claim 1, said medical condition comprising breast cancer.
9. The method of claim δ, said input variables comprising progesterone receptor status, tumor size, cathepsin D protein level and HER-2/neu protein level.
10. The method of claim 1, said medical condition comprising diabetes mellitus.
11. The method of claim 10, said input variables comprising age, fasting glucose level, two-hour glucose level, fasting insulin level, and body mass index. 12. A method of predicting mortality of individuals, comprising:
providing a neural network;
training said neural network using data about a first set of individuals, including prognostic information and known mortality information related to said prognostic information; and
predicting mortality of a second set of individuals for which mortality is unknown and from which said prognostic information has been attained, suing said trained neural network.
13. A method of predicting the relapse of cancer that is presently clinically occult or non-existing, comprising:
providing a neural network;
training said neural network using a first set of DNA flow cytometric histograms generated from tumor cells of patients having known cancer relapse rates;
obtaining at least one DNA flow cytometric histogram from tumor cells of a patient having an unknown cancer relapse rate; and
predicting relapse of cancer in said patient having an unknown relapse rate using said at least one DNA flow cytometric histogram and said trained neural network. 14. The method of claim 13, said first set of DNA flow cytometric histograms including diploid histograms and aneuploid histograms.
15. The method of claim 13, said first set of DNA flow cytometric histograms including only diploid histograms.
16. The method of claim 13, said first set of DNA flow cytometric histograms including only aneuploid histograms.
17. The method of claim 13, said cancer comprising breast cancer.
lδ. The method of claim 13, further comprising:
data compressing a plurality of DNA flow cytometric histograms to produce said first set of histograms.
19. A method of characterizing DNA flow cytometric histograms to predict the occurrence of a target medical condition that is presently clinically occult or non- existing, comprising:
providing a neural network;
training said neural network using a first set of DNA flow cytometric histograms and obtained from cells of a first set of patients having corresponding known occurrence or non-occurrence of said target medical condition; and predicting future occurrence of said target medical condition for at least one additional patient using a DNA flow cytometric histogram obtained from cells of said at least one additional patient and said trained neural network.
20. A method of characterizing DNA cytophotometric data to predict the occurrence of a target medical condition that is presently clinically occult or non-existing, comprising:
providing a neural network;
training said neural network using a first set of DNA cytophotometric data generated from cells of patients having known occurrence or non- occurrence of said target medical condition; and
predicting future occurrence of said target medical condition for at least one additional patient using DNA cytophotometric data obtained from cells of said at least one additional patient and said trained neural network.
PCT/US1991/003302 1990-05-21 1991-05-10 Method for predicting the future occurrence of clinically occult or non-existent medical conditions WO1991018364A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US52622490A 1990-05-21 1990-05-21
US526,224 1990-05-21
US07/607,120 US5862304A (en) 1990-05-21 1990-10-31 Method for predicting the future occurrence of clinically occult or non-existent medical conditions
US607,120 1990-10-31

Publications (1)

Publication Number Publication Date
WO1991018364A1 true WO1991018364A1 (en) 1991-11-28

Family

ID=27062055

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1991/003302 WO1991018364A1 (en) 1990-05-21 1991-05-10 Method for predicting the future occurrence of clinically occult or non-existent medical conditions

Country Status (3)

Country Link
US (1) US5862304A (en)
AU (1) AU7980691A (en)
WO (1) WO1991018364A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1997012247A1 (en) * 1995-09-29 1997-04-03 Urocor, Inc. A sextant core biopsy predictive mechanism for non-organ confined disease status
WO2001095253A2 (en) * 2000-06-09 2001-12-13 Medimage Aps Quality and safety assurance of medical applications
WO2002047026A2 (en) 2000-12-07 2002-06-13 Kates Ronald E Method for determining competing risks
US7747389B2 (en) 2001-07-23 2010-06-29 F. Hoffmann-La Roche Ag Scoring system for the prediction of cancer recurrence

Families Citing this family (49)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
USRE43433E1 (en) 1993-12-29 2012-05-29 Clinical Decision Support, Llc Computerized medical diagnostic and treatment advice system
US5935060A (en) 1996-07-12 1999-08-10 First Opinion Corporation Computerized medical diagnostic and treatment advice system including list based processing
US6206829B1 (en) 1996-07-12 2001-03-27 First Opinion Corporation Computerized medical diagnostic and treatment advice system including network access
US5660176A (en) * 1993-12-29 1997-08-26 First Opinion Corporation Computerized medical diagnostic and treatment advice system
US5708591A (en) * 1995-02-14 1998-01-13 Akzo Nobel N.V. Method and apparatus for predicting the presence of congenital and acquired imbalances and therapeutic conditions
US6429017B1 (en) * 1999-02-04 2002-08-06 Biomerieux Method for predicting the presence of haemostatic dysfunction in a patient sample
US6321164B1 (en) 1995-06-07 2001-11-20 Akzo Nobel N.V. Method and apparatus for predicting the presence of an abnormal level of one or more proteins in the clotting cascade
US6898532B1 (en) 1995-06-07 2005-05-24 Biomerieux, Inc. Method and apparatus for predicting the presence of haemostatic dysfunction in a patient sample
CN1252877A (en) * 1997-03-13 2000-05-10 第一咨询公司 Disease management system
ID20131A (en) * 1997-03-31 1998-10-08 Mitsubishi Heavy Ind Ltd METHODS AND EQUIPMENT OF COAL DRYING, METHODS FOR OLD STORAGE OF REFORMED COAL AND REFORMED OLD COAL STORAGE, AND PROCESSES AND SYSTEMS FOR PRODUCTION OF REFORMED COAL STONE
US5933819C1 (en) * 1997-05-23 2001-11-13 Scripps Research Inst Prediction of relative binding motifs of biologically active peptides and peptide mimetics
US7006866B1 (en) * 1997-11-07 2006-02-28 Siemens Aktiengesellschaft Arrangement for predicting an abnormality of a system and for carrying out an action which counteracts the abnormality
US6090044A (en) * 1997-12-10 2000-07-18 Bishop; Jeffrey B. System for diagnosing medical conditions using a neural network
US6502040B2 (en) 1997-12-31 2002-12-31 Biomerieux, Inc. Method for presenting thrombosis and hemostasis assay data
US6178382B1 (en) * 1998-06-23 2001-01-23 The Board Of Trustees Of The Leland Stanford Junior University Methods for analysis of large sets of multiparameter data
US7297479B2 (en) 1998-08-06 2007-11-20 Lucent Technologies Inc. DNA-based analog neural networks
WO2000046603A1 (en) * 1999-02-04 2000-08-10 Akzo Nobel N.V. A method and apparatus for predicting the presence of haemostatic dysfunction in a patient sample
US6110109A (en) * 1999-03-26 2000-08-29 Biosignia, Inc. System and method for predicting disease onset
DE1233366T1 (en) * 1999-06-25 2003-03-20 Genaissance Pharmaceuticals Method for producing and using haplotype data
US7058517B1 (en) 1999-06-25 2006-06-06 Genaissance Pharmaceuticals, Inc. Methods for obtaining and using haplotype data
WO2001004343A2 (en) 1999-07-09 2001-01-18 The Burnham Institute A method for determining the prognosis of cancer patients by measuring levels of bag expression
US20020068857A1 (en) * 2000-02-14 2002-06-06 Iliff Edwin C. Automated diagnostic system and method including reuse of diagnostic objects
US7058616B1 (en) 2000-06-08 2006-06-06 Virco Bvba Method and system for predicting resistance of a disease to a therapeutic agent using a neural network
WO2001095230A2 (en) * 2000-06-08 2001-12-13 Virco Bvba Method for predicting therapeutic agent resistance using neural networks
US7179612B2 (en) 2000-06-09 2007-02-20 Biomerieux, Inc. Method for detecting a lipoprotein-acute phase protein complex and predicting an increased risk of system failure or mortality
US6931326B1 (en) 2000-06-26 2005-08-16 Genaissance Pharmaceuticals, Inc. Methods for obtaining and using haplotype data
US20030144879A1 (en) * 2001-02-28 2003-07-31 Shinji Hata System for providing medical service
US6917926B2 (en) * 2001-06-15 2005-07-12 Medical Scientists, Inc. Machine learning method
US7181054B2 (en) * 2001-08-31 2007-02-20 Siemens Medical Solutions Health Services Corporation System for processing image representative data
US20030101076A1 (en) * 2001-10-02 2003-05-29 Zaleski John R. System for supporting clinical decision making through the modeling of acquired patient medical information
US8458082B2 (en) 2001-11-13 2013-06-04 Interthinx, Inc. Automated loan risk assessment system and method
US20040010481A1 (en) * 2001-12-07 2004-01-15 Whitehead Institute For Biomedical Research Time-dependent outcome prediction using neural networks
US20040267458A1 (en) * 2001-12-21 2004-12-30 Judson Richard S. Methods for obtaining and using haplotype data
US7153691B2 (en) * 2002-11-13 2006-12-26 G6 Science Corp. Method of identifying and assessing DNA euchromatin in biological cells for detecting disease, monitoring wellness, assessing bio-activity, and screening pharmacological agents
US20070015971A1 (en) * 2003-05-14 2007-01-18 Atignal Shankara R A Disease predictions
US7780595B2 (en) * 2003-05-15 2010-08-24 Clinical Decision Support, Llc Panel diagnostic method and system
US20050010444A1 (en) * 2003-06-06 2005-01-13 Iliff Edwin C. System and method for assisting medical diagnosis using an anatomic system and cause matrix
US20050186577A1 (en) * 2004-02-20 2005-08-25 Yixin Wang Breast cancer prognostics
WO2005086068A2 (en) * 2004-02-27 2005-09-15 Aureon Laboratories, Inc. Methods and systems for predicting occurrence of an event
US9081879B2 (en) * 2004-10-22 2015-07-14 Clinical Decision Support, Llc Matrix interface for medical diagnostic and treatment advice system and method
US7478192B2 (en) * 2004-11-03 2009-01-13 Saffron Technology, Inc. Network of networks of associative memory networks
US7809188B1 (en) * 2006-05-11 2010-10-05 Texas Instruments Incorporated Digital camera and method
AU2008245433A1 (en) * 2007-04-30 2008-11-06 Clinical Decision Support, Llc Arbiter system and method of computerized medical diagnosis and advice
US20090055150A1 (en) * 2007-08-25 2009-02-26 Quantum Leap Research, Inc. Scalable, computationally efficient and rapid simulation suited to decision support, analysis and planning
KR101064908B1 (en) * 2008-11-12 2011-09-16 연세대학교 산학협력단 Method for patterning nanowires on substrate using novel sacrificial layer material
US8990135B2 (en) 2010-06-15 2015-03-24 The Regents Of The University Of Michigan Personalized health risk assessment for critical care
US8914319B2 (en) * 2010-06-15 2014-12-16 The Regents Of The University Of Michigan Personalized health risk assessment for critical care
WO2017053592A1 (en) * 2015-09-23 2017-03-30 The Regents Of The University Of California Deep learning in label-free cell classification and machine vision extraction of particles
US11476004B2 (en) * 2019-10-11 2022-10-18 Hualien Armed Forces General Hospital System capable of establishing model for cardiac ventricular hypertrophy screening

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3909792A (en) * 1973-02-26 1975-09-30 American Optical Corp Electrocardiographic review system
DE2351167C3 (en) * 1973-10-11 1980-05-29 Fritz Schwarzer Gmbh, 8000 Muenchen EKG machine
US4290114A (en) * 1976-07-01 1981-09-15 Sinay Hanon S Medical diagnostic computer
US4041468A (en) * 1976-10-07 1977-08-09 Pfizer Inc. Method and system for analysis of ambulatory electrocardiographic tape recordings
US4965725B1 (en) * 1988-04-08 1996-05-07 Neuromedical Systems Inc Neural network based automated cytological specimen classification system and method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
IJCNN INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS vol. II, 1989, NEW YORK page 609; STUBBS: 'Three applications of neurocomputing in biomedical research ' see page 609, right column, line 1 - line 20 *
PROCEEDINGS OF THE 13-TH ANNUAL SYMPOSIUM ON COMPUTER APPLICATIONS IN MEDICAL CARE 1989, pages 288 - 294; BLUMENFELD: 'A Connectionnist Approach to the Recognition of Trends in Time Ordered Medical Parameters ' see page 289 - page 291 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1997012247A1 (en) * 1995-09-29 1997-04-03 Urocor, Inc. A sextant core biopsy predictive mechanism for non-organ confined disease status
WO2001095253A2 (en) * 2000-06-09 2001-12-13 Medimage Aps Quality and safety assurance of medical applications
WO2001095253A3 (en) * 2000-06-09 2002-05-10 Medimage Aps Quality and safety assurance of medical applications
WO2002047026A2 (en) 2000-12-07 2002-06-13 Kates Ronald E Method for determining competing risks
WO2002047026A3 (en) * 2000-12-07 2003-11-06 Ronald E Kates Method for determining competing risks
US7747389B2 (en) 2001-07-23 2010-06-29 F. Hoffmann-La Roche Ag Scoring system for the prediction of cancer recurrence
US8655597B2 (en) 2001-07-23 2014-02-18 F. Hoffmann-La Roche Ag Scoring system for the prediction of cancer recurrence

Also Published As

Publication number Publication date
US5862304A (en) 1999-01-19
AU7980691A (en) 1991-12-10

Similar Documents

Publication Publication Date Title
US5862304A (en) Method for predicting the future occurrence of clinically occult or non-existent medical conditions
Saerens et al. Adjusting the outputs of a classifier to new a priori probabilities: a simple procedure
KR101047575B1 (en) Heuristic Method of Classification
US7228295B2 (en) Methods for selecting, developing and improving diagnostic tests for pregnancy-related conditions
CN108717867A (en) Disease forecasting method for establishing model and device based on Gradient Iteration tree
Adjouadi et al. Classification of leukemia blood samples using neural networks
US8725668B2 (en) Classifying an item to one of a plurality of groups
Higa Diagnosis of breast cancer using decision tree and artificial neural network algorithms
Pillai et al. Prediction of heart disease using rnn algorithm
Valentini et al. Bagged ensembles of support vector machines for gene expression data analysis
US6968327B1 (en) Method for training a neural network
Hung et al. Estimating breast cancer risks using neural networks
CN113270191A (en) Data correction and classification method and storage medium
US20230215571A1 (en) Automated classification of immunophenotypes represented in flow cytometry data
Denny et al. Cloud based acute lymphoblastic leukemia detection using deep convolutional neural networks
RU2733077C1 (en) Diagnostic technique for acute coronary syndrome
Walker Efficient assessment of confounder effects in matched follow-up studies
CN108346471A (en) A kind of analysis method and device of pathological data
US20090006055A1 (en) Automated Reduction of Biomarkers
Plomann Choosing a patient classification system to describe the hospital product
CN112259231A (en) High-risk gastrointestinal stromal tumor patient postoperative recurrence risk assessment method and system
CN110175926A (en) A kind of medical insurance payment measuring method based on big data analysis
Yarnold et al. Predicting in‐hospital mortality of patients receiving cardiopulmonary resuscitation: unit‐weighted MultiODA for binary data
Umami et al. Analysis of Classification Algorithm for Wisconsin Diagnosis Breast Cancer Data Study
Shkanov et al. Express diagnosis of COVID-19 on cough audiograms with machine learning algorithms from Scikit-learn library and GMDH Shell tool

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): AT AU BB BG BR CA CH DE DK ES FI GB HU JP KP KR LK LU MC MG MW NL NO PL RO SD SE SU

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE BF BJ CF CG CH CI CM DE DK ES FR GA GB GR IT LU ML MR NL SE SN TD TG

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

NENP Non-entry into the national phase

Ref country code: CA