ترغب بنشر مسار تعليمي؟ اضغط هنا

Measuring the Stability of Learned Features

82   0   0.0 ( 0 )
 نشر من قبل Kris Sankaran
 تاريخ النشر 2021
  مجال البحث الاحصاء الرياضي
والبحث باللغة English
 تأليف Kris Sankaran




اسأل ChatGPT حول البحث

Many modern datasets dont fit neatly into $n times p$ matrices, but most techniques for measuring statistical stability expect rectangular data. We study methods for stability assessment on non-rectangular data, using statistical learning algorithms to extract rectangular latent features. We design controlled simulations to characterize the power and practicality of competing approaches. This motivates new strategies for visualizing feature stability. Our stability curves supplement the direct analysis, providing information about the reliability of inferences based on learned features. Finally, we illustrate our approach using a spatial proteomics dataset, where machine learning tools can augment the scientists workflow, but where guarantees of statistical reproducibility are still central. Our raw data, packaged code, and experimental outputs are publicly available.



قيم البحث

اقرأ أيضاً

Matching surfaces is a challenging 3D Computer Vision problem typically addressed by local features. Although a variety of 3D feature detectors and descriptors has been proposed in literature, they have seldom been proposed together and it is yet not clear how to identify the most effective detector-descriptor pair for a specific application. A promising solution is to leverage machine learning to learn the optimal 3D detector for any given 3D descriptor [15]. In this paper, we report a performance evaluation of the detector-descriptor pairs obtained by learning a paired 3D detector for the most popular 3D descriptors. In particular, we address experimental settings dealing with object recognition and surface registration.
The ergm package supports the statistical analysis and simulation of network data. It anchors the statnet suite of packages for network analysis in R introduced in a special issue in Journal of Statistical Software in 2008. This article provides an o verview of the functionality and performance improvements in the 2021 ergm 4.0 release. These include more flexible handling of nodal covariates, operator terms that extend and simplify model specification, new models for networks with valued edges, improved handling of constraints on the sample space of networks, performance enhancements to the Markov chain Monte Carlo and maximum likelihood estimation algorithms, broader and faster searching for networks with certain target statistics using simulated annealing, and estimation with missing edge data. We also identify the new packages in the statnet suite that extend ergms functionality to other network data types and structural features, and the robust set of online resources that support the statnet development process and applications.
We tackle the problem of visual localization under changing conditions, such as time of day, weather, and seasons. Recent learned local features based on deep neural networks have shown superior performance over classical hand-crafted local features. However, in a real-world scenario, there often exists a large domain gap between training and target images, which can significantly degrade the localization accuracy. While existing methods utilize a large amount of data to tackle the problem, we present a novel and practical approach, where only a few examples are needed to reduce the domain gap. In particular, we propose a few-shot domain adaptation framework for learned local features that deals with varying conditions in visual localization. The experimental results demonstrate the superior performance over baselines, while using a scarce number of training examples from the target domain.
Skewness plays a relevant role in several multivariate statistical techniques. Sometimes it is used to recover data features, as in cluster analysis. In other circumstances, skewness impairs the performances of statistical methods, as in the Hotellin gs one-sample test. In both cases, there is the need to check the symmetry of the underlying distribution, either by visual inspection or by formal testing. The R packages MaxSkew and MultiSkew address these issues by measuring, testing and removing skewness from multivariate data. Skewness is assessed by the third multivariate cumulant and its functions. The hypothesis of symmetry is tested either nonparametrically, with the bootstrap, or parametrically, under the normality assumption. Skewness is removed or at least alleviated by projecting the data onto appropriate linear subspaces. Usages of MaxSkew and MultiSkew are illustrated with the Iris dataset.
Databases of electronic health records (EHRs) are increasingly used to inform clinical decisions. Machine learning methods can find patterns in EHRs that are predictive of future adverse outcomes. However, statistical models may be built upon pattern s of health-seeking behavior that vary across patient subpopulations, leading to poor predictive performance when training on one patient population and predicting on another. This note proposes two tests to better measure and understand model generalization. We use these tests to compare models derived from two data sources: (i) historical medical records, and (ii) electrocardiogram (EKG) waveforms. In a predictive task, we show that EKG-based models can be more stable than EHR-based models across different patient populations.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا