No Arabic abstract
Persistent homology is a vital tool for topological data analysis. Previous work has developed some statistical estimators for characteristics of collections of persistence diagrams. However, tools that provide statistical inference for observations that are persistence diagrams are limited. Specifically, there is a need for tests that can assess the strength of evidence against a claim that two samples arise from the same population or process. We propose the use of randomization-style null hypothesis significance tests (NHST) for these situations. The test is based on a loss function that comprises pairwise distances between the elements of each sample and all the elements in the other sample. We use this method to analyze a range of simulated and experimental data. Through these examples we experimentally explore the power of the p-values. Our results show that the randomization-style NHST based on pairwise distances can distinguish between samples from different processes, which suggests that its use for hypothesis tests upon persistence diagrams is reasonable. We demonstrate its application on a real dataset of fMRI data of patients with ADHD.
In this chapter, we discuss applications of topological data analysis (TDA) to spatial systems. We briefly review the recently proposed level-set construction of filtered simplicial complexes, and we then examine persistent homology in two cases studies: street networks in Shanghai and hotspots of COVID-19 infections. We then summarize our results and provide an outlook on TDA in spatial systems.
We describe the utility of point processes and failure rates and the most common point process for modeling failure rates, the Poisson point process. Next, we describe the uniformly most powerful test for comparing the rates of two Poisson point processes for a one-sided test (henceforth referred to as the rate test). A common argument against using this test is that real world data rarely follows the Poisson point process. We thus investigate what happens when the distributional assumptions of tests like these are violated and the test still applied. We find a non-pathological example (using the rate test on a Compound Poisson distribution with Binomial compounding) where violating the distributional assumptions of the rate test make it perform better (lower error rates). We also find that if we replace the distribution of the test statistic under the null hypothesis with any other arbitrary distribution, the performance of the test (described in terms of the false negative rate to false positive rate trade-off) remains exactly the same. Next, we compare the performance of the rate test to a version of the Wald test customized to the Negative Binomial point process and find it to perform very similarly while being much more general and versatile. Finally, we discuss the applications to Microsoft Azure. The code for all experiments performed is open source and linked in the introduction.
Topological Data Analysis is a recent and fast growing field providing a set of new topological and geometric tools to infer relevant features for possibly complex data. This paper is a brief introduction, through a few selected topics, to basic fundamental and practical aspects of tda for non experts.
This paper introduces a simple yet powerful approach based on topological data analysis (TDA) for detecting the true steps in a piecewise constant (PWC) signal. The signal is a two-state square wave with randomly varying in-between-pulse spacing, and subject to spurious steps at the rising or falling edges which we refer to as digital ringing. We use persistent homology to derive mathematical guarantees for the resulting change detection which enables accurate identification and counting of the true pulses. The approach is described and tested using both synthetic and experimental data obtained using an engine lathe instrumented with a laser tachometer. The described algorithm enables the accurate calculation of the spindle speed with the appropriate error bounds. The results of the described approach are compared to the frequency domain approach via Fourier transform. It is found that both our approach and the Fourier analysis yield comparable results for numerical and experimental pulses with regular spacing and digital ringing. However, the described approach significantly outperforms Fourier analysis when the spacing between the peaks is varied. We also generalize the approach to higher dimensional PWC signals, although utilizing this extension remains an interesting question for future research.
Making binary decisions is a common data analytical task in scientific research and industrial applications. In data sciences, there are two related but distinct strategies: hypothesis testing and binary classification. In practice, how to choose between these two strategies can be unclear and rather confusing. Here we summarize key distinctions between these two strategies in three aspects and list five practical guidelines for data analysts to choose the appropriate strategy for specific analysis needs. We demonstrate the use of those guidelines in a cancer driver gene prediction example.