ﻻ يوجد ملخص باللغة العربية
We introduce an improved unsupervised clustering protocol specially suited for large-scale structured data. The protocol follows three steps: a dimensionality reduction of the data, a density estimation over the low dimensional representation of the data, and a final segmentation of the density landscape. For the dimensionality reduction step we introduce a parallelized implementation of the well-known t-Stochastic Neighbouring Embedding (t-SNE) algorithm that significantly alleviates some inherent limitations, while improving its suitability for large datasets. We also introduce a new adaptive Kernel Density Estimation particularly coupled with the t-SNE framework in order to get accurate density estimates out of the embedded data, and a variant of the rainfalling watershed algorithm to identify clusters within the density landscape. The whole mapping protocol is wrapped in the bigMap R package, together with visualization and analysis tools to ease the qualitative and quantitative assessment of the clustering.
Modern datasets and models are notoriously difficult to explore and analyze due to their inherent high dimensionality and massive numbers of samples. Existing visualization methods which employ dimensionality reduction to two or three dimensions are
A first line of attack in exploratory data analysis is data visualization, i.e., generating a 2-dimensional representation of data that makes clusters of similar points visually identifiable. Standard Johnson-Lindenstrauss dimensionality reduction do
Current metagenomic analysis algorithms require significant computing resources, can report excessive false positives (type I errors), may miss organisms (type II errors / false negatives), or scale poorly on large datasets. This paper explores using
Complete understanding of the mechanisms regulating the proliferation and differentiation that takes place during human immune CD8+ T cell responses is still lacking. Human clinical data is usually limited to blood cell counts, yet the initiation of
Big imaging data is becoming more prominent in brain sciences across spatiotemporal scales and phylogenies. We have developed a computational ecosystem that enables storage, visualization, and analysis of these data in the cloud, thusfar spanning 20+