ترغب بنشر مسار تعليمي؟ اضغط هنا

Feature Selection on Lyme Disease Patient Survey Data

86   0   0.0 ( 0 )
 نشر من قبل Joshua Vendrow
 تاريخ النشر 2020
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Lyme disease is a rapidly growing illness that remains poorly understood within the medical community. Critical questions about when and why patients respond to treatment or stay ill, what kinds of treatments are effective, and even how to properly diagnose the disease remain largely unanswered. We investigate these questions by applying machine learning techniques to a large scale Lyme disease patient registry, MyLymeData, developed by the nonprofit LymeDisease.org. We apply various machine learning methods in order to measure the effect of individual features in predicting participants answers to the Global Rating of Change (GROC) survey questions that assess the self-reported degree to which their condition improved, worsened, or remained unchanged following antibiotic treatment. We use basic linear regression, support vector machines, neural networks, entropy-based decision tree models, and $k$-nearest neighbors approaches. We first analyze the general performance of the model and then identify the most important features for predicting participant answers to GROC. After we identify the key features, we separate them from the dataset and demonstrate the effectiveness of these features at identifying GROC. In doing so, we highlight possible directions for future study both mathematically and clinically.



قيم البحث

اقرأ أيضاً

153 - Yimeng Xie , Li Xu , Jie Li 2018
Lyme disease is an infectious disease that is caused by a bacterium called Borrelia burgdorferi sensu stricto. In the United States, Lyme disease is one of the most common infectious diseases. The major endemic areas of the disease are New England, M id-Atlantic, East-North Central, South Atlantic, and West North-Central. Virginia is on the front-line of the diseases diffusion from the northeast to the south. One of the research objectives for the infectious disease community is to identify environmental and economic variables that are associated with the emergence of Lyme disease. In this paper, we use a spatial Poisson regression model to link the spatial disease counts and environmental and economic variables, and develop a spatial variable selection procedure to effectively identify important factors by using an adaptive elastic net penalty. The proposed methods can automatically select important covariates, while adjusting for possible spatial correlations of disease counts. The performance of the proposed method is studied and compared with existing methods via a comprehensive simulation study. We apply the developed variable selection methods to the Virginia Lyme disease data and identify important variables that are new to the literature. Supplementary materials for this paper are available online.
The feature-selection problem is formulated from an information-theoretic perspective. We show that the problem can be efficiently solved by an extension of the recently proposed info-clustering paradigm. This reveals the fundamental duality between feature selection and data clustering,which is a consequence of the more general duality between the principal partition and the principal lattice of partitions in combinatorial optimization.
Falls have serious consequences and are prevalent in acute hospitals and nursing homes caring for older people. Most falls occur in bedrooms and near the bed. Technological interventions to mitigate the risk of falling aim to automatically monitor be d-exit events and subsequently alert healthcare personnel to provide timely supervisions. We observe that frequency-domain information related to patient activities exist predominantly in very low frequencies. Therefore, we recognise the potential to employ a low resolution acceleration sensing modality in contrast to powering and sensing with a conventional MEMS (Micro Electro Mechanical System) accelerometer. Consequently, we investigate a batteryless sensing modality with low cost wirelessly powered Radio Frequency Identification (RFID) technology with the potential for convenient integration into clothing, such as hospital gowns. We design and build a passive accelerometer-based RFID sensor embodiment---ID-Sensor---for our study. The sensor design allows deriving ultra low resolution acceleration data from the rate of change of unique RFID tag identifiers in accordance with the movement of a patients upper body. We investigate two convolutional neural network architectures for learning from raw RFID-only data streams and compare performance with a traditional shallow classifier with engineered features. We evaluate performance with 23 hospitalized older patients. We demonstrate, for the first time and to the best of knowledge, that: i) the low resolution acceleration data embedded in the RF powered ID-Sensor data stream can provide a practicable method for activity recognition; and ii) highly discriminative features can be efficiently learned from the raw RFID-only data stream using a fully convolutional network architecture.
The electroencephalographic (EEG) signals provide highly informative data on brain activities and functions. However, their heterogeneity and high dimensionality may represent an obstacle for their interpretation. The introduction of a priori knowled ge seems the best option to mitigate high dimensionality problems, but could lose some information and patterns present in the data, while data heterogeneity remains an open issue that often makes generalization difficult. In this study, we propose a genetic algorithm (GA) for feature selection that can be used with a supervised or unsupervised approach. Our proposal considers three different fitness functions without relying on expert knowledge. Starting from two publicly available datasets on cognitive workload and motor movement/imagery, the EEG signals are processed, normalized and their features computed in the time, frequency and time-frequency domains. The feature vector selection is performed by applying our GA proposal and compared with two benchmarking techniques. The results show that different combinations of our proposal achieve better results in respect to the benchmark in terms of overall performance and feature reduction. Moreover, the proposed GA, based on a novel fitness function here presented, outperforms the benchmark when the two different datasets considered are merged together, showing the effectiveness of our proposal on heterogeneous data.
This paper describes a distributed MapReduce implementation of the minimum Redundancy Maximum Relevance algorithm, a popular feature selection method in bioinformatics and network inference problems. The proposed approach handles both tall/narrow and wide/short datasets. We further provide an open source implementation based on Hadoop/Spark, and illustrate its scalability on datasets involving millions of observations or features.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا