No Arabic abstract
A lot of time is spent searching for the most performing data mining algorithms applied in clinical diagnosis. The study set out to identify the most performing predictive data mining algorithms applied in the diagnosis of Erythemato-squamous diseases. The study used Naive Bayes, Multilayer Perceptron and J48 decision tree induction to build predictive data mining models on 366 instances of Erythemato-squamous diseases datasets. Also, 10-fold cross-validation and sets of performance metrics were used to evaluate the baseline predictive performance of the classifiers. The comparative analysis shows that the Naive Bayes performed best with accuracy of 97.4%, Multilayer Perceptron came out second with accuracy of 96.6%, and J48 came out the worst with accuracy of 93.5%. The evaluation of these classifiers on clinical datasets, gave an insight into the predictive ability of different data mining algorithms applicable in clinical diagnosis especially in the diagnosis of Erythemato-squamous diseases.
Longitudinal observational databases have become a recent interest in the post marketing drug surveillance community due to their ability of presenting a new perspective for detecting negative side effects. Algorithms mining longitudinal observation databases are not restricted by many of the limitations associated with the more conventional methods that have been developed for spontaneous reporting system databases. In this paper we investigate the robustness of four recently developed algorithms that mine longitudinal observational databases by applying them to The Health Improvement Network (THIN) for six drugs with well document known negative side effects. Our results show that none of the existing algorithms was able to consistently identify known adverse drug reactions above events related to the cause of the drug and no algorithm was superior.
It is crucial to provide compatible treatment schemes for a disease according to various symptoms at different stages. However, most classification methods might be ineffective in accurately classifying a disease that holds the characteristics of multiple treatment stages, various symptoms, and multi-pathogenesis. Moreover, there are limited exchanges and cooperative actions in disease diagnoses and treatments between different departments and hospitals. Thus, when new diseases occur with atypical symptoms, inexperienced doctors might have difficulty in identifying them promptly and accurately. Therefore, to maximize the utilization of the advanced medical technology of developed hospitals and the rich medical knowledge of experienced doctors, a Disease Diagnosis and Treatment Recommendation System (DDTRS) is proposed in this paper. First, to effectively identify disease symptoms more accurately, a Density-Peaked Clustering Analysis (DPCA) algorithm is introduced for disease-symptom clustering. In addition, association analyses on Disease-Diagnosis (D-D) rules and Disease-Treatment (D-T) rules are conducted by the Apriori algorithm separately. The appropriate diagnosis and treatment schemes are recommended for patients and inexperienced doctors, even if they are in a limited therapeutic environment. Moreover, to reach the goals of high performance and low latency response, we implement a parallel solution for DDTRS using the Apache Spark cloud platform. Extensive experimental results demonstrate that the proposed DDTRS realizes disease-symptom clustering effectively and derives disease treatment recommendations intelligently and accurately.
The work presented in this paper is part of the cooperative research project AUTO-OPT carried out by twelve partners from the automotive industries. One major work package concerns the application of data mining methods in the area of automotive design. Suitable methods for data preparation and data analysis are developed. The objective of the work is the re-use of data stored in the crash-simulation department at BMW in order to gain deeper insight into the interrelations between the geometric variations of the car during its design and its performance in crash testing. In this paper a method for data analysis of finite element models and results from crash simulation is proposed and application to recent data from the industrial partner BMW is demonstrated. All necessary steps from data pre-processing to re-integration into the working environment of the engineer are covered.
The research in anomaly detection lacks a unified definition of what represents an anomalous instance. Discrepancies in the nature itself of an anomaly lead to multiple paradigms of algorithms design and experimentation. Predictive maintenance is a special case, where the anomaly represents a failure that must be prevented. Related time-series research as outlier and novelty detection or time-series classification does not apply to the concept of an anomaly in this field, because they are not single points which have not been seen previously and may not be precisely annotated. Moreover, due to the lack of annotated anomalous data, many benchmarks are adapted from supervised scenarios. To address these issues, we generalise the concept of positive and negative instances to intervals to be able to evaluate unsupervised anomaly detection algorithms. We also preserve the imbalance scheme for evaluation through the proposal of the Preceding Window ROC, a generalisation for the calculation of ROC curves for time-series scenarios. We also adapt the mechanism from a established time-series anomaly detection benchmark to the proposed generalisations to reward early detection. Therefore, the proposal represents a flexible evaluation framework for the different scenarios. To show the usefulness of this definition, we include a case study of Big Data algorithms with a real-world time-series problem provided by the company ArcelorMittal, and compare the proposal with an evaluation method.
The current state-of-the-art deep neural networks (DNNs) for Alzheimers Disease diagnosis use different biomarker combinations to classify patients, but do not allow extracting knowledge about the interactions of biomarkers. However, to improve our understanding of the disease, it is paramount to extract such knowledge from the learned model. In this paper, we propose a Deep Factorization Machine model that combines the ability of DNNs to learn complex relationships and the ease of interpretability of a linear model. The proposed model has three parts: (i) an embedding layer to deal with sparse categorical data, (ii) a Factorization Machine to efficiently learn pairwise interactions, and (iii) a DNN to implicitly model higher order interactions. In our experiments on data from the Alzheimers Disease Neuroimaging Initiative, we demonstrate that our proposed model classifies cognitive normal, mild cognitive impaired, and demented patients more accurately than competing models. In addition, we show that valuable knowledge about the interactions among biomarkers can be obtained.