ترغب بنشر مسار تعليمي؟ اضغط هنا

An introduction to Topological Data Analysis: fundamental and practical aspects for data scientists

105   0   0.0 ( 0 )
 نشر من قبل Bertrand Michel
 تاريخ النشر 2017
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English
 تأليف Frederic Chazal




اسأل ChatGPT حول البحث

Topological Data Analysis is a recent and fast growing field providing a set of new topological and geometric tools to infer relevant features for possibly complex data. This paper is a brief introduction, through a few selected topics, to basic fundamental and practical aspects of tda for non experts.



قيم البحث

اقرأ أيضاً

We introduce a novel gradient descent algorithm extending the well-known Gradient Sampling methodology to the class of stratifiably smooth objective functions, which are defined as locally Lipschitz functions that are smooth on some regular pieces-ca lled the strata-of the ambient Euclidean space. For this class of functions, our algorithm achieves a sub-linear convergence rate. We then apply our method to objective functions based on the (extended) persistent homology map computed over lower-star filters, which is a central tool of Topological Data Analysis. For this, we propose an efficient exploration of the corresponding stratification by using the Cayley graph of the permutation group. Finally, we provide benchmark and novel topological optimization problems, in order to demonstrate the utility and applicability of our framework.
Persistent homology is a vital tool for topological data analysis. Previous work has developed some statistical estimators for characteristics of collections of persistence diagrams. However, tools that provide statistical inference for observations that are persistence diagrams are limited. Specifically, there is a need for tests that can assess the strength of evidence against a claim that two samples arise from the same population or process. We propose the use of randomization-style null hypothesis significance tests (NHST) for these situations. The test is based on a loss function that comprises pairwise distances between the elements of each sample and all the elements in the other sample. We use this method to analyze a range of simulated and experimental data. Through these examples we experimentally explore the power of the p-values. Our results show that the randomization-style NHST based on pairwise distances can distinguish between samples from different processes, which suggests that its use for hypothesis tests upon persistence diagrams is reasonable. We demonstrate its application on a real dataset of fMRI data of patients with ADHD.
Understanding protein structure-function relationships is a key challenge in computational biology, with applications across the biotechnology and pharmaceutical industries. While it is known that protein structure directly impacts protein function, many functional prediction tasks use only protein sequence. In this work, we isolate protein structure to make functional annotations for proteins in the Protein Data Bank in order to study the expressiveness of different structure-based prediction schemes. We present PersGNN - an end-to-end trainable deep learning model that combines graph representation learning with topological data analysis to capture a complex set of both local and global structural features. While variations of these techniques have been successfully applied to proteins before, we demonstrate that our hybridized approach, PersGNN, outperforms either method on its own as well as a baseline neural network that learns from the same information. PersGNN achieves a 9.3% boost in area under the precision recall curve (AUPR) compared to the best individual model, as well as high F1 scores across different gene ontology categories, indicating the transferability of this approach.
Deep generative models have emerged as a powerful tool for learning informative molecular representations and designing novel molecules with desired properties, with applications in drug discovery and material design. Deep generative auto-encoders de fined over molecular SMILES strings have been a popular choice for that purpose. However, capturing salient molecular properties like quantum-chemical energies remains challenging and requires sophisticated neural net models of molecular graphs or geometry-based information. As a simpler and more efficient alternative, we present a SMILES Variational Auto-Encoder (VAE) augmented with topological data analysis (TDA) representations of molecules, known as persistence images. Our experiments show that this TDA augmentation enables a SMILES VAE to capture the complex relation between 3D geometry and electronic properties, and allows generation of novel, diverse, and valid molecules with geometric features consistent with the training data, which exhibit a varying range of global electronic structural properties, such as a small HOMO-LUMO gap - a critical property for designing organic solar cells. We demonstrate that our TDA augmentation yields better success in downstream tasks compared to models trained without these representations and can assist in targeted molecule discovery.
402 - Michelle Feng , Abigail Hickok , 2021
In this chapter, we discuss applications of topological data analysis (TDA) to spatial systems. We briefly review the recently proposed level-set construction of filtered simplicial complexes, and we then examine persistent homology in two cases stud ies: street networks in Shanghai and hotspots of COVID-19 infections. We then summarize our results and provide an outlook on TDA in spatial systems.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا