ترغب بنشر مسار تعليمي؟ اضغط هنا

Neyman-Pearson (NP) classification algorithms and NP receiver operating characteristics (NP-ROC)

83   0   0.0 ( 0 )
 نشر من قبل Jingyi Jessica Li
 تاريخ النشر 2016
  مجال البحث الاحصاء الرياضي
والبحث باللغة English




اسأل ChatGPT حول البحث

In many binary classification applications such as disease diagnosis and spam detection, practitioners often face great needs to control type I errors (i.e., the conditional probability of misclassifying a class 0 observation as class 1) so that it remains below a desired threshold. To address this need, the Neyman-Pearson (NP) classification paradigm is a natural choice; it minimizes type II error (i.e., the conditional probability of misclassifying a class 1 observation as class 0) while enforcing an upper bound, $alpha$, on the type I error. Although the NP paradigm has a century-long history in hypothesis testing, it has not been well recognized and implemented in classification schemes. Common practices that directly limit the empirical type I error to no more than $alpha$ do not satisfy the type I error control objective because the resulting classifiers are still likely to have type I errors much larger than $alpha$. As a result, the NP paradigm has not been properly implemented for many classification scenarios in practice. In this work, we develop the first umbrella algorithm that implements the NP paradigm for all scoring-type classification methods, including popular methods such as logistic regression, support vector machines and random forests. Powered by this umbrella algorithm, we propose a novel graphical tool for NP classification methods: NP receiver operating characteristic (NP-ROC) bands, motivated by the popular receiver operating characteristic (ROC) curves. NP-ROC bands will help choose $alpha$ in a data adaptive way and compare different NP classifiers. We demonstrate the use and properties of the NP umbrella algorithm and NP-ROC bands, available in the R package nproc, through simulation and real data case studies.



قيم البحث

اقرأ أيضاً

This paper presents a logic language for expressing NP search and optimization problems. Specifically, first a language obtained by extending (positive) Datalog with intuitive and efficient constructs (namely, stratified negation, constraints and exc lusive disjunction) is introduced. Next, a further restricted language only using a restricted form of disjunction to define (non-deterministically) subsets (or partitions) of relations is investigated. This language, called NP Datalog, captures the power of Datalog with unstratified negation in expressing search and optimization problems. A system prototype implementing NP Datalog is presented. The system translates NP Datalog queries into OPL programs which are executed by the ILOG OPL Development Studio. Our proposal combines easy formulation of problems, expressed by means of a declarative logic language, with the efficiency of the ILOG System. Several experiments show the effectiveness of this approach.
In the Nikoli pencil-and-paper game Tatamibari, a puzzle consists of an $m times n$ grid of cells, where each cell possibly contains a clue among +, -, |. The goal is to partition the grid into disjoint rectangles, where every rectangle contains exac tly one clue, rectangles containing + are square, rectangles containing - are strictly longer horizontally than vertically, rectangles containing | are strictly longer vertically than horizontally, and no four rectangles share a corner. We prove this puzzle NP-complete, establishing a Nikoli gap of 16 years. Along the way, we introduce a gadget framework for proving hardness of similar puzzles involving area coverage, and show that it applies to an existing NP-hardness proof for Spiral Galaxies. We also present a mathematical puzzle font for Tatamibari.
Rikudo is a number-placement puzzle, where the player is asked to complete a Hamiltonian path on a hexagonal grid, given some clues (numbers already placed and edges of the path). We prove that the game is complete for NP, even if the puzzle has no h ole. When all odd numbers are placed it is in P, whereas it is still NP-hard when all numbers of the form $3k+1$ are placed.
In this paper, we show that deciding rigid foldability of a given crease pattern using all creases is weakly NP-hard by a reduction from Partition, and that deciding rigid foldability with optional creases is strongly NP-hard by a reduction from 1-in -3 SAT. Unlike flat foldability of origami or flexibility of other kinematic linkages, whose complexity originates in the complexity of the layer ordering and possible self-intersection of the material, rigid foldability from a planar state is hard even though there is no potential self-intersection. In fact, the complexity comes from the combinatorial behavior of the different possible rigid folding configurations at each vertex. The results underpin the fact that it is harder to fold from an unfolded sheet of paper than to unfold a folded state back to a plane, frequently encountered problem when realizing folding-based systems such as self-folding matter and reconfigurable robots.
In complexity theory, there exists a famous unsolved problem whether NP can be P or not. In this paper, we discuss this aspect in SAT (satisfiability) problem, and it is shown that the SAT can be solved in plynomial time by means of quantum algorithm.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا