ﻻ يوجد ملخص باللغة العربية
$textbf{Background:}$ At the onset of a pandemic, such as COVID-19, data with proper labeling/attributes corresponding to the new disease might be unavailable or sparse. Machine Learning (ML) models trained with the available data, which is limited in quantity and poor in diversity, will often be biased and inaccurate. At the same time, ML algorithms designed to fight pandemics must have good performance and be developed in a time-sensitive manner. To tackle the challenges of limited data, and label scarcity in the available data, we propose generating conditional synthetic data, to be used alongside real data for developing robust ML models. $textbf{Methods:}$ We present a hybrid model consisting of a conditional generative flow and a classifier for conditional synthetic data generation. The classifier decouples the feature representation for the condition, which is fed to the flow to extract the local noise. We generate synthetic data by manipulating the local noise with fixed conditional feature representation. We also propose a semi-supervised approach to generate synthetic samples in the absence of labels for a majority of the available data. $textbf{Results:}$ We performed conditional synthetic generation for chest computed tomography (CT) scans corresponding to normal, COVID-19, and pneumonia afflicted patients. We show that our method significantly outperforms existing models both on qualitative and quantitative performance, and our semi-supervised approach can efficiently synthesize conditional samples under label scarcity. As an example of downstream use of synthetic data, we show improvement in COVID-19 detection from CT scans with conditional synthetic data augmentation.
Thanks to the increasing availability of genomics and other biomedical data, many machine learning approaches have been proposed for a wide range of therapeutic discovery and development tasks. In this survey, we review the literature on machine lear
The increasing take-up of machine learning techniques requires ever-more application-specific training data. Manually collecting such training data is time-consuming and error-prone process. Data marketplaces represent a compelling alternative, provi
We consider a novel data driven approach for designing learning algorithms that can effectively learn with only a small number of labeled examples. This is crucial for modern machine learning applications where labels are scarce or expensive to obtai
Synthetic medical data which preserves privacy while maintaining utility can be used as an alternative to real medical data, which has privacy costs and resource constraints associated with it. At present, most models focus on generating cross-sectio
In the field of machine learning there is a growing interest towards more robust and generalizable algorithms. This is for example important to bridge the gap between the environment in which the training data was collected and the environment where