ترغب بنشر مسار تعليمي؟ اضغط هنا

Too good to be true: when overwhelming evidence fails to convince

174   0   0.0 ( 0 )
 نشر من قبل Lachlan Gunn
 تاريخ النشر 2016
  مجال البحث الاحصاء الرياضي
والبحث باللغة English




اسأل ChatGPT حول البحث

Is it possible for a large sequence of measurements or observations, which support a hypothesis, to counterintuitively decrease our confidence? Can unanimous support be too good to be true? The assumption of independence is often made in good faith, however rarely is consideration given to whether a systemic failure has occurred. Taking this into account can cause certainty in a hypothesis to decrease as the evidence for it becomes apparently stronger. We perform a probabilistic Bayesian analysis of this effect with examples based on (i) archaeological evidence, (ii) weighing of legal evidence, and (iii) cryptographic primality testing. We find that even with surprisingly low systemic failure rates high confidence is very difficult to achieve and in particular we find that certain analyses of cryptographically-important numerical tests are highly optimistic, underestimating their false-negative rate by as much as a factor of $2^{80}$.



قيم البحث

اقرأ أيضاً

Despite their impressive performance in object recognition and other tasks under standard testing conditions, deep networks often fail to generalize to out-of-distribution (o.o.d.) samples. One cause for this shortcoming is that modern architectures tend to rely on shortcuts - superficial features that correlate with categories without capturing deeper invariants that hold across contexts. Real-world concepts often possess a complex structure that can vary superficially across contexts, which can make the most intuitive and promising solutions in one context not generalize to others. One potential way to improve o.o.d. generalization is to assume simple solutions are unlikely to be valid across contexts and avoid them, which we refer to as the too-good-to-be-true prior. A low-capacity network (LCN) with a shallow architecture should only be able to learn surface relationships, including shortcuts. We find that LCNs can serve as shortcut detectors. Furthermore, an LCNs predictions can be used in a two-stage approach to encourage a high-capacity network (HCN) to rely on deeper invariant features that should generalize broadly. In particular, items that the LCN can master are downweighted when training the HCN. Using a modified version of the CIFAR-10 dataset in which we introduced shortcuts, we found that the two-stage LCN-HCN approach reduced reliance on shortcuts and facilitated o.o.d. generalization.
Background: All states in the US have enacted at least some naloxone access laws (NALs) in an effort to reduce opioid overdose lethality. Previous evaluations found NALs increased naloxone dispensing but showed mixed results in terms of opioid overdo se mortality. One reason for mixed results could be failure to address violations of the positivity assumption caused by the co-occurrence of NAL enactment with enactment of related laws, ultimately resulting in bias, increased variance, and low statistical power. Methods: We reformulated the research question to alleviate some challenges related to law co-occurrence. Because NAL enactment was closely correlated with Good Samaritan Law (GSL) enactment, we bundled NAL with GSL, and estimated the hypothetical associations of enacting NAL/GSL up to 2 years earlier (an amount supported by the observed data) on naloxone dispensation and opioid overdose mortality. Results: We estimated that such a shift in NAL/GSL duration would have been associated with increased naloxone dispensations (0.28 dispensations/100,000 people (95% CI: 0.18-0.38) in 2013 among early NAL/GSL enactors; 47.58 (95% CI: 28.40-66.76) in 2018 among late enactors). We estimated that such a shift would have been associated with increased opioid overdose mortality (1.88 deaths/100,000 people (95% CI: 1.03-2.72) in 2013; 2.06 (95% CI: 0.92-3.21) in 2018). Conclusions: Consistent with prior research, increased duration of NAL/GSL enactment was associated with increased naloxone dispensing. Contrary to expectation, we did not find a protective association with opioid overdose morality, though residual bias due to unobserved confounding and interference likely remain.
Adversarial reprogramming allows repurposing a machine-learning model to perform a different task. For example, a model trained to recognize animals can be reprogrammed to recognize digits by embedding an adversarial program in the digit images provi ded as input. Recent work has shown that adversarial reprogramming may not only be used to abuse machine-learning models provided as a service, but also beneficially, to improve transfer learning when training data is scarce. However, the factors affecting its success are still largely unexplained. In this work, we develop a first-order linear model of adversarial reprogramming to show that its success inherently depends on the size of the average input gradient, which grows when input gradients are more aligned, and when inputs have higher dimensionality. The results of our experimental analysis, involving fourteen distinct reprogramming tasks, show that the above factors are correlated with the success and the failure of adversarial reprogramming.
It has recently become possible to prepare ultrastable glassy materials characterised by structural relaxation times which vastly exceed the duration of any feasible experiment. Similarly, new algorithms have led to the production of ultrastable comp uter glasses. Is it possible to obtain a reliable estimate of a structural relaxation time that is too long to be measured? We review, organise, and critically discuss various methods to estimate very long relaxation times. We also perform computer simulations of three dimensional ultrastable hard spheres glasses to test and quantitatively compare some of these methods for a single model system. The various estimation methods disagree significantly and it is not yet clear how to accurately estimate extremely long relaxation times.
Subtractive dither is a powerful method for removing the signal dependence of quantization noise for coarsely-quantized signals. However, estimation from dithered measurements often naively applies the sample mean or midrange, even when the total noi se is not well described with a Gaussian or uniform distribution. We show that the generalized Gaussian distribution approximately describes subtractively-dithered, quantized samples of a Gaussian signal. Furthermore, a generalized Gaussian fit leads to simple estimators based on order statistics that match the performance of more complicated maximum likelihood estimators requiring iterative solvers. The order statistics-based estimators outperform both the sample mean and midrange for nontrivial sums of Gaussian and uniform noise. Additional analysis of the generalized Gaussian approximation yields rules of thumb for determining when and how to apply dither to quantized measurements. Specifically, we find subtractive dither to be beneficial when the ratio between the Gaussian standard deviation and quantization interval length is roughly less than 1/3. If that ratio is also greater than 0.822/$K^{0.930}$ for the number of measurements $K>20$, we present estimators more efficient than the midrange.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا