No Arabic abstract
Objectives: Most cancer data sources lack information on metastatic recurrence. Electronic medical records (EMRs) and population-based cancer registries contain complementary information on cancer treatment and outcomes, yet are rarely used synergistically. To enable detection of metastatic breast cancer (MBC), we applied a semi-supervised machine learning framework to linked EMR-California Cancer Registry (CCR) data. Materials and Methods: We studied 11,459 female patients treated at Stanford Health Care who received an incident breast cancer diagnosis from 2000-2014. The dataset consisted of structured data and unstructured free-text clinical notes from EMR, linked to CCR, a component of the Surveillance, Epidemiology and End Results (SEER) database. We extracted information on metastatic disease from patient notes to infer a class label and then trained a regularized logistic regression model for MBC classification. We evaluated model performance on a gold standard set of set of 146 patients. Results: There are 495 patients with de novo stage IV MBC, 1,374 patients initially diagnosed with Stage 0-III disease had recurrent MBC, and 9,590 had no evidence of metastatis. The median follow-up time is 96.3 months (mean 97.8, standard deviation 46.7). The best-performing model incorporated both EMR and CCR features. The area under the receiver-operating characteristic curve=0.925 [95% confidence interval: 0.880-0.969], sensitivity=0.861, specificity=0.878 and overall accuracy=0.870. Discussion and Conclusion: A framework for MBC case detection combining EMR and CCR data achieved good sensitivity, specificity and discrimination without requiring expert-labeled examples. This approach enables population-based research on how patients die from cancer and may identify novel predictors of cancer recurrence.
The International Symposium on Biomedical Imaging (ISBI) held a grand challenge to evaluate computational systems for the automated detection of metastatic breast cancer in whole slide images of sentinel lymph node biopsies. Our team won both competitions in the grand challenge, obtaining an area under the receiver operating curve (AUC) of 0.925 for the task of whole slide image classification and a score of 0.7051 for the tumor localization task. A pathologist independently reviewed the same images, obtaining a whole slide image classification AUC of 0.966 and a tumor localization score of 0.733. Combining our deep learning systems predictions with the human pathologists diagnoses increased the pathologists AUC to 0.995, representing an approximately 85 percent reduction in human error rate. These results demonstrate the power of using deep learning to produce significant improvements in the accuracy of pathological diagnoses.
Febrile neutropenia (FN) has been associated with high mortality, especially among adults with cancer. Understanding the patient and provider level heterogeneity in FN hospital admissions has potential to inform personalized interventions focused on increasing survival of individuals with FN. We leverage machine learning techniques to disentangling the complex interactions among multi domain risk factors in a population with FN. Data from the Healthcare Cost and Utilization Project (HCUP) National Inpatient Sample and Nationwide Inpatient Sample (NIS) were used to build machine learning based models of mortality for adult cancer patients who were diagnosed with FN during a hospital admission. In particular, the importance of risk factors from different domains (including demographic, clinical, and hospital associated information) was studied. A set of more interpretable (decision tree, logistic regression) as well as more black box (random forest, gradient boosting, neural networks) models were analyzed and compared via multiple cross validation. Our results demonstrate that a linear prediction score of FN mortality among adults with cancer, based on admission information is effective in classifying high risk patients; clinical diagnoses is the domain with the highest predictive power. A number of the risk variables (e.g. sepsis, kidney failure, etc.) identified in this study are clinically actionable and may inform future studies looking at the patients prior medical history are warranted.
With an aging and growing population, the number of women requiring either screening or symptomatic mammograms is increasing. To reduce the number of mammograms that need to be read by a radiologist while keeping the diagnostic accuracy the same or better than current clinical practice, we develop Man and Machine Mammography Oracle (MAMMO) - a clinical decision support system capable of triaging mammograms into those that can be confidently classified by a machine and those that cannot be, thus requiring the reading of a radiologist. The first component of MAMMO is a novel multi-view convolutional neural network (CNN) with multi-task learning (MTL). MTL enables the CNN to learn the radiological assessments known to be associated with cancer, such as breast density, conspicuity, suspicion, etc., in addition to learning the primary task of cancer diagnosis. We show that MTL has two advantages: 1) learning refined feature representations associated with cancer improves the classification performance of the diagnosis task and 2) issuing radiological assessments provides an additional layer of model interpretability that a radiologist can use to debug and scrutinize the diagnoses provided by the CNN. The second component of MAMMO is a triage network, which takes as input the radiological assessment and diagnostic predictions of the first networks MTL outputs and determines which mammograms can be correctly and confidently diagnosed by the CNN and which mammograms cannot, thus needing to be read by a radiologist. Results obtained on a private dataset of 8,162 patients show that MAMMO reduced the number of radiologist readings by 42.8% while improving the overall diagnostic accuracy in comparison to readings done by radiologists alone. We analyze the triage of patients decided by MAMMO to gain a better understanding of what unique mammogram characteristics require radiologists expertise.
If Electronic Health Records contain a large amount of information about the patients condition and response to treatment, which can potentially revolutionize the clinical practice, such information is seldom considered due to the complexity of its extraction and analysis. We here report on a first integration of an NLP framework for the analysis of clinical records of lung cancer patients making use of a telephone assistance service of a major Spanish hospital. We specifically show how some relevant data, about patient demographics and health condition, can be extracted; and how some relevant analyses can be performed, aimed at improving the usefulness of the service. We thus demonstrate that the use of EHR texts, and their integration inside a data analysis framework, is technically feasible and worth of further study.
Breast cancer is one of the leading fatal disease worldwide with high risk control if early discovered. Conventional method for breast screening is x-ray mammography, which is known to be challenging for early detection of cancer lesions. The dense breast structure produced due to the compression process during imaging lead to difficulties to recognize small size abnormalities. Also, inter- and intra-variations of breast tissues lead to significant difficulties to achieve high diagnosis accuracy using hand-crafted features. Deep learning is an emerging machine learning technology that requires a relatively high computation power. Yet, it proved to be very effective in several difficult tasks that requires decision making at the level of human intelligence. In this paper, we develop a new network architecture inspired by the U-net structure that can be used for effective and early detection of breast cancer. Results indicate a high rate of sensitivity and specificity that indicate potential usefulness of the proposed approach in clinical use.