OJHAS: 2006-3-1, Kavitha S, Sarbadhikari SN, Rao AN. Automated Screening for Three Inborn Metabolic Disorders: A Pilot Study

Abstract:

Background: Inborn metabolic disorders (IMDs) form a large group of rare, but often serious, metabolic disorders. Aims: Our objective was to construct a decision tree, based on classification algorithm for the data on three metabolic disorders, enabling us to take decisions on the screening and clinical diagnosis of a patient. Settings and Design: A non-incremental concept learning classification algorithm was applied to a set of patient data and the procedure followed to obtain a decision on a patient�s disorder. Materials and Methods: Initially a training set containing 13 cases was investigated for three inborn errors of metabolism. Results: A total of thirty test cases were investigated for the three inborn errors of metabolism. The program identified 10 cases with galactosemia, another 10 cases with fructosemia and the remaining 10 with propionic acidemia. The program successfully identified all the 30 cases. Conclusions: This kind of decision support systems can help the healthcare delivery personnel immensely for early screening of IMDs.
Key Words: Decision support techniques, Metabolic diseases, Computer-assisted diagnosis, Expert system

Inborn Metabolic Disorders (IMDs) (1) are inherited disorders that are caused by a defect in a single gene. Some IMDs produce relatively unimportant physical features or skeletal abnormalities. Others produce serious disease and even death. Most inborn errors of metabolism are monitored by routine blood or urine tests.

Galactosemia (1) is the inability of the body to use the simple sugar galactose (causing the accumulation of galactose 1-phosphate), which then reaches high levels in the body, causing damage to the liver, central nervous system and various other body systems. Hereditary fructose intolerance (1) is a metabolic disease caused by the absence of an enzyme 1-phosphofructaldolase. Propionic acidemia (1) is an inherited disorder in which the body is unable to process certain proteins and lipids properly. The gene defect for Propionic Acidemia is an autosomal recessive genetic trait and is unknowingly passed down from generation to generation.

Many of the inborn errors of metabolism can be treated (1) effectively. Treatment depends on the enzyme defect itself and how readily the compounds involved can be eliminated or replaced.

Decision trees (2) are excellent tools for making decisions where a lot of complex information needs to be taken into account. They also help us to form an accurate, balanced picture of the risks and rewards that can result from a particular choice.

A decision tree is a representation of a decision procedure for determining the class of a given instance. In many problems chance (or probability) plays an important role. Decision analysis is the general name that is given to techniques for analyzing problems containing risk/uncertainty/probabilities. Decision trees are one specific decision analysis technique. A decision tree takes as input an object or situation described by a set of properties, and outputs a yes/no decision.(2)

Research on information retrieval and database management systems has been advancing very quickly over the past few decades, especially in healthcare applications. One such information system was the Internet platform oriented program developed by Hofestadt et al (3) that provided complex and specific knowledge about inborn errors of metabolism. Another program developed by Kauert et al (4) provided diagnosis and therapy for IEMs. This information system could be helpful for scientists and physicians working in the field of inborn metabolic diseases. PROLOG, another computer aided expert system developed by Pince et al (5) was used to obtain uniform and fast phenotyping and reporting of dyslipoproteinemia.

Metanet (6) is a neural network based computer program for diagnosis of inherited metabolic diseases in children, using plasma or urinary amino acid results as input wherein the expert system asks a question to the user regarding the presence or absence of common clinical and/or biochemical abnormalities. Using both the amino acid data and the answers to the questions, the MetaNet program provides a provisional diagnosis. The diagnostic output indicates the degree of confidence of the program in the diagnosis.

The objective of the present study was to construct a decision tree from the clinical data on certain inborn errors of metabolism so that the predictions with the tree can be used for early diagnosis and interventions.

To the best of our knowledge, this is the first report from India for screening IMDs by automated classifiers.

The inborn errors of metabolism handled in this study were Galactosemia, Fructosemia and Propionic Acidemia. The real data of patients was obtained from the hospital to which the authors are affiliated.

The cases had been referred from the Pediatrics and Neurology departments of the hospital, and were confirmed by chemical analysis from serum/urine. Moreover, the incidences of such disorders are not high and the clinical features are rarely documented properly.

A decision tree is constructed by looking for regularities in data. A decision tree is an arrangement of tests that prescribes an appropriate test at every step in an analysis. More specifically, decision trees classify instances by sorting them down the tree from the root node to some leaf node, which provides the classification of the instance. Each node in the tree specifies a test of some attribute of the instance, and each branch descending from that node corresponds to one of the possible values for this attribute. An instance is classified by starting at the root node of the decision tree, testing the attribute specified by this node, then moving down the tree branch corresponding to the value of the attribute. This process is then repeated at the node on this branch and so on until a leaf node is reached.

The basic classification algorithm (7) was a non-incremental concept learning method that produced a hypothesis in the form of a decision tree. The algorithm accepted a training set of positive and negative examples of a concept. The decision tree and rules produced from applying the algorithm attempted to give a generalization that could be used to classify objects not in the original training set. The classification algorithm could be summarized as follows.

The input to this decision tree was a set of instances containing information about the patients and their medical conditions. The output enabled us to decide the type of inborn errors of metabolism in the patient. The input data consisted of seven attributes namely seizures, mental Retardation, gastro intestinal infection, respiratory tract infection, liver dysfunction, hypoglycemia, and hepatomegaly with four values each (-, +, ++, +++). The values -, +, ++, +++ indicate absent, mild/low, moderate and severe respectively. Decision trees often used in medical diagnosis structurally consist of two types of nodes: non terminal (intermediate) and terminal (leaf). The former corresponds to questions asked about the characteristic features of the diagnosed case. These may be factorial, e.g., �Does the patient have symptom X?�, or ordinal, e.g., �Is symptom X absent, mild, moderate or severe?�, or continuous, e.g., �What is the patient�s median motor latency at the wrist?�. (The above questions, though not actually related to IEMs, are just examples to indicate the mode of questioning in any diagnostic process.) Questions about the characteristic features of the cases used in this study were ordinal i.e., questioning whether the symptom is absent, mild, moderate or severe. The best value was selected for the intermediate node. The terminal node, on the other hand, generated decisions/diagnosis.

A stepwise decision-making process, where a single question was asked each time and, depending on the answer, a different branch of the tree containing another set of questions was asked, achieved diagnosis. The root of the tree contained the first diagnostic question asked by the classifier. Depending on the answer for each particular case, a different branch of the tree is traversed, arriving at a decision node. This method of building decision trees was called top-down induction of decision trees and was based on recursive partitioning of data. In brief, at each recursive step, one question was asked which selects a best value. This question partitioned the design set, and the process was repeated for each subsets, until they reached a final decision. This was the method of applying the classification algorithm for the patient�s real data to build the decision tree classifier in order to understand the decision process.

Initially a training set containing 13 cases was investigated for the three inborn errors of metabolism. Using a small set of training data for classification is an essential step in any decision tree construction process. This is because the initial classification of data using a training set can give an idea about the categorization of the future instances. Thirty cases of three types of inborn errors of metabolism were studied with the programme and it could correctly identify the 10 cases with galactosemia, another 10 cases with fructosemia and the remaining 10 with propionic acidemia. There was no wrong identification of the normal cases as disordered cases.

Table 1: Performance of the Program for Screening Inborn Metabolic Disorders