ﻻ يوجد ملخص باللغة العربية
Making binary decisions is a common data analytical task in scientific research and industrial applications. In data sciences, there are two related but distinct strategies: hypothesis testing and binary classification. In practice, how to choose between these two strategies can be unclear and rather confusing. Here we summarize key distinctions between these two strategies in three aspects and list five practical guidelines for data analysts to choose the appropriate strategy for specific analysis needs. We demonstrate the use of those guidelines in a cancer driver gene prediction example.
Persistent homology is a vital tool for topological data analysis. Previous work has developed some statistical estimators for characteristics of collections of persistence diagrams. However, tools that provide statistical inference for observations
We describe the utility of point processes and failure rates and the most common point process for modeling failure rates, the Poisson point process. Next, we describe the uniformly most powerful test for comparing the rates of two Poisson point proc
Efficient automatic protein classification is of central importance in genomic annotation. As an independent way to check the reliability of the classification, we propose a statistical approach to test if two sets of protein domain sequences coming
We consider the problem of distributed binary hypothesis testing of two sequences that are generated by an i.i.d. doubly-binary symmetric source. Each sequence is observed by a different terminal. The two hypotheses correspond to different levels of
The classical binary hypothesis testing problem is revisited. We notice that when one of the hypotheses is composite, there is an inherent difficulty in defining an optimality criterion that is both informative and well-justified. For testing in the