Measuring the Stability of Learned Features

82 0 0.0 ( 0 )

Download Cite

Added by Kris Sankaran

Publication date 2021

fields Mathematical Statistics

and research's language is English

Authors Kris Sankaran

Computation

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Many modern datasets dont fit neatly into $n times p$ matrices, but most techniques for measuring statistical stability expect rectangular data. We study methods for stability assessment on non-rectangular data, using statistical learning algorithms to extract rectangular latent features. We design controlled simulations to characterize the power and practicality of competing approaches. This motivates new strategies for visualizing feature stability. Our stability curves supplement the direct analysis, providing information about the reliability of inferences based on learned features. Finally, we illustrate our approach using a spatial proteomics dataset, where machine learning tools can augment the scientists workflow, but where guarantees of statistical reproducibility are still central. Our raw data, packaged code, and experimental outputs are publicly available.

rate research

Performance Evaluation of Learned 3D Features

118 - Riccardo Spezialetti , Samuele Salti , Luigi Di Stefano 2019

Matching surfaces is a challenging 3D Computer Vision problem typically addressed by local features. Although a variety of 3D feature detectors and descriptors has been proposed in literature, they have seldom been proposed together and it is yet not clear how to identify the most effective detector-descriptor pair for a specific application. A promising solution is to leverage machine learning to learn the optimal 3D detector for any given 3D descriptor [15]. In this paper, we report a performance evaluation of the detector-descriptor pairs obtained by learning a paired 3D detector for the most popular 3D descriptors. In particular, we address experimental settings dealing with object recognition and surface registration.

Computer Vision and Pattern Recognition

ergm 4.0: New features and improvements

159 - Pavel N. Krivitsky 2021

The ergm package supports the statistical analysis and simulation of network data. It anchors the statnet suite of packages for network analysis in R introduced in a special issue in Journal of Statistical Software in 2008. This article provides an overview of the functionality and performance improvements in the 2021 ergm 4.0 release. These include more flexible handling of nodal covariates, operator terms that extend and simplify model specification, new models for networks with valued edges, improved handling of constraints on the sample space of networks, performance enhancements to the Markov chain Monte Carlo and maximum likelihood estimation algorithms, broader and faster searching for networks with certain target statistics using simulated annealing, and estimation with missing edge data. We also identify the new packages in the statnet suite that extend ergms functionality to other network data types and structural features, and the robust set of online resources that support the statnet development process and applications.

Computation Other Statistics

Domain Adaptation of Learned Features for Visual Localization

66 - Sungyong Baik , Hyo Jin Kim , Tianwei Shen 2020

We tackle the problem of visual localization under changing conditions, such as time of day, weather, and seasons. Recent learned local features based on deep neural networks have shown superior performance over classical hand-crafted local features. However, in a real-world scenario, there often exists a large domain gap between training and target images, which can significantly degrade the localization accuracy. While existing methods utilize a large amount of data to tackle the problem, we present a novel and practical approach, where only a few examples are needed to reduce the domain gap. In particular, we propose a few-shot domain adaptation framework for learned local features that deals with varying conditions in visual localization. The experimental results demonstrate the superior performance over baselines, while using a scarce number of training examples from the target domain.

Computer Vision and Pattern Recognition

MaxSkew and MultiSkew: Two R Packages for Detecting, Measuring and Removing Multivariate Skewness

86 - Cinzia Franceschini 2019

Skewness plays a relevant role in several multivariate statistical techniques. Sometimes it is used to recover data features, as in cluster analysis. In other circumstances, skewness impairs the performances of statistical methods, as in the Hotellings one-sample test. In both cases, there is the need to check the symmetry of the underlying distribution, either by visual inspection or by formal testing. The R packages MaxSkew and MultiSkew address these issues by measuring, testing and removing skewness from multivariate data. Skewness is assessed by the third multivariate cumulant and its functions. The hypothesis of symmetry is tested either nonparametrically, with the bootstrap, or parametrically, under the normality assumption. Skewness is removed or at least alleviated by projecting the data onto appropriate linear subspaces. Usages of MaxSkew and MultiSkew are illustrated with the Iris dataset.

Computation

Measuring the Stability of EHR- and EKG-based Predictive Models

166 - Andrew C. Miller , Ziad Obermeyer , Sendhil Mullainathan 2018

Databases of electronic health records (EHRs) are increasingly used to inform clinical decisions. Machine learning methods can find patterns in EHRs that are predictive of future adverse outcomes. However, statistical models may be built upon patterns of health-seeking behavior that vary across patient subpopulations, leading to poor predictive performance when training on one patient population and predicting on another. This note proposes two tests to better measure and understand model generalization. We use these tests to compare models derived from two data sources: (i) historical medical records, and (ii) electrocardiogram (EKG) waveforms. In a predictive task, we show that EKG-based models can be more stable than EHR-based models across different patient populations.

Machine Learning Machine Learning