New community

Subscribe to the gold package and get unlimited access to Shamra Academy

A Statistical Analysis of Noisy Crowdsourced Weather Data

143 0 0.0 ( 0 )

Download Cite

Added by Arnab Chakraborty

Publication date 2019

fields Mathematical Statistics

and research's language is English

Authors Arnab Chakraborty - Soumendra Nath Lahiri - Alyson Wilson

Applications Methodology

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Spatial prediction of weather-elements like temperature, precipitation, and barometric pressure are generally based on satellite imagery or data collected at ground-stations. None of these data provide information at a more granular or hyper-local resolution. On the other hand, crowdsourced weather data, which are captured by sensors installed on mobile devices and gathered by weather-related mobile apps like WeatherSignal and AccuWeather, can serve as potential data sources for analyzing environmental processes at a hyper-local resolution. However, due to the low quality of the sensors and the non-laboratory environment, the quality of the observations in crowdsourced data is compromised. This paper describes methods to improve hyper-local spatial prediction using this varying-quality noisy crowdsourced information. We introduce a reliability metric, namely Veracity Score (VS), to assess the quality of the crowdsourced observations using a coarser, but high-quality, reference data. A VS-based methodology to analyze noisy spatial data is proposed and evaluated through extensive simulations. The merits of the proposed approach are illustrated through case studies analyzing crowdsourced daily average ambient temperature readings for one day in the contiguous United States.

rate research

The Banff Challenge: Statistical Detection of a Noisy Signal

237 - A. C. Davison , N. Sartori 2011

Particle physics experiments such as those run in the Large Hadron Collider result in huge quantities of data, which are boiled down to a few numbers from which it is hoped that a signal will be detected. We discuss a simple probability model for this and derive frequentist and noninformative Bayesian procedures for inference about the signal. Both are highly accurate in realistic cases, with the frequentist procedure having the edge for interval estimation, and the Bayesian procedure yielding slightly better point estimates. We also argue that the significance, or $p$-value, function based on the modified likelihood root provides a comprehensive presentation of the information in the data and should be used for inference.

Applications Methodology

How to Host a Data Competition: Statistical Advice for Design and Analysis of a Data Competition

84 - Christine M. Anderson-Cook , Kary L. Myers , Lu Lu 2019

Data competitions rely on real-time leaderboards to rank competitor entries and stimulate algorithm improvement. While such competitions have become quite popular and prevalent, particularly in supervised learning formats, their implementations by the host are highly variable. Without careful planning, a supervised learning competition is vulnerable to overfitting, where the winning solutions are so closely tuned to the particular set of provided data that they cannot generalize to the underlying problem of interest to the host. This paper outlines some important considerations for strategically designing relevant and informative data sets to maximize the learning outcome from hosting a competition based on our experience. It also describes a post-competition analysis that enables robust and efficient assessment of the strengths and weaknesses of solutions from different competitors, as well as greater understanding of the regions of the input space that are well-solved. The post-competition analysis, which complements the leaderboard, uses exploratory data analysis and generalized linear models (GLMs). The GLMs not only expand the range of results we can explore, they also provide more detailed analysis of individual sub-questions including similarities and differences between algorithms across different types of scenarios, universally easy or hard regions of the input space, and different learning objectives. When coupled with a strategically planned data generation approach, the methods provide richer and more informative summaries to enhance the interpretation of results beyond just the rankings on the leaderboard. The methods are illustrated with a recently completed competition to evaluate algorithms capable of detecting, identifying, and locating radioactive materials in an urban environment.

Applications Machine Learning

A CURE for noisy magnetic resonance images: Chi-square unbiased risk estimation

192 - Florian Luisier , Thierry Blu , Patrick J. Wolfe 2011

In this article we derive an unbiased expression for the expected mean-squared error associated with continuously differentiable estimators of the noncentrality parameter of a chi-square random variable. We then consider the task of denoising squared-magnitude magnetic resonance image data, which are well modeled as independent noncentral chi-square random variables on two degrees of freedom. We consider two broad classes of linearly parameterized shrinkage estimators that can be optimized using our risk estimate, one in the general context of undecimated filterbank transforms, and another in the specific case of the unnormalized Haar wavelet transform. The resultant algorithms are computationally tractable and improve upon state-of-the-art methods for both simulated and actual magnetic resonance image data.

Applications Methodology

Presenting the Probabilities of Different Effect Sizes: Towards a Better Understanding and Communication of Statistical Uncertainty

175 - Akisato Suzuki 2020

How should social scientists understand and communicate the uncertainty of statistically estimated causal effects? It is well-known that the conventional significance-vs.-insignificance approach is associated with misunderstandings and misuses. Behavioral research suggests people understand uncertainty more appropriately in a numerical, continuous scale than in a verbal, discrete scale. Motivated by these backgrounds, I propose presenting the probabilities of different effect sizes. Probability is an intuitive continuous measure of uncertainty. It allows researchers to better understand and communicate the uncertainty of statistically estimated effects. In addition, my approach needs no decision threshold for an uncertainty measure or an effect size, unlike the conventional approaches, allowing researchers to be agnostic about a decision threshold such as p<5% and a justification for that. I apply my approach to a previous social scientific study, showing it enables richer inference than the significance-vs.-insignificance approach taken by the original study. The accompanying R package makes my approach easy to implement.

Applications Methodology Other Statistics

Statistical Analysis of weather variables of Antofagasta

94 - H. Farfan , A. Castillo , S. Curilef 2019

The statistical behavior of weather variables of Antofagasta is described, especially the daily data of air as temperature, pressure and relative humidity measured at 08:00, 14:00 and 20:00. In this article, we use a time series deseasonalization technique, Q-Q plot, skewness, kurtosis and the Pearson correlation coefficient. We found that the distributions of the records are symmetrical and have positive kurtosis, so they have heavy tails. In addition, the variables are highly autocorrelated, extending up to one year in the case of pressure and temperature.

Atmospheric and Oceanic Physics Data Analysis Statistics and Probability

comments

Fetching comments

Wadi International University

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

A Statistical Analysis of Noisy Crowdsourced Weather Data

Ask ChatGPT about the research

No Arabic abstract

Read More