No Arabic abstract
In this work we define a spatial concordance coefficient for second-order stationary processes. This problem has been widely addressed in a non-spatial context, but here we consider a coefficient that for a fixed spatial lag allows one to compare two spatial sequences along a 45-degree line. The proposed coefficient was explored for the bivariate Matern and Wendland covariance functions. The asymptotic normality of a sample version of the spatial concordance coefficient for an increasing domain sampling framework was established for the Wendland covariance function. To work with large digital images, we developed a local approach for estimating the concordance that uses local spatial models on non-overlapping windows. Monte Carlo simulations were used to gain additional insights into the asymptotic properties for finite sample sizes. As an illustrative example, we applied this methodology to two similar images of a deciduous forest canopy. The images were recorded with different cameras but similar fields-of-view and within minutes of each other. Our analysis showed that the local approach helped to explain a percentage of the non-spatial concordance and to provided additional information about its decay as a function of the spatial lag.
There are several cutting edge applications needing PCA methods for data on tori and we propose a novel torus-PCA method with important properties that can be generally applied. There are two existing general methods: tangent space PCA and geodesic PCA. However, unlike tangent space PCA, our torus-PCA honors the cyclic topology of the data space whereas, unlike geodesic PCA, our torus-PCA produces a variety of non-winding, non-dense descriptors. This is achieved by deforming tori into spheres and then using a variant of the recently developed principle nested spheres analysis. This PCA analysis involves a step of small sphere fitting and we provide an improved test to avoid overfitting. However, deforming tori into spheres creates singularities. We introduce a data-adaptive pre-clustering technique to keep the singularities away from the data. For the frequently encountered case that the residual variance around the PCA main component is small, we use a post-mode hunting technique for more fine-grained clustering. Thus in general, there are three successive interrelated key steps of torus-PCA in practice: pre-clustering, deformation, and post-mode hunting. We illustrate our method with two recently studied RNA structure (tori) data sets: one is a small RNA data set which is established as the benchmark for PCA and we validate our method through this data. Another is a large RNA data set (containing the small RNA data set) for which we show that our method provides interpretable principal components as well as giving further insight into its structure.
In Functional Data Analysis, data are commonly assumed to be smooth functions on a fixed interval of the real line. In this work, we introduce a comprehensive framework for the analysis of functional data, whose domain is a two-dimensional manifold and the domain itself is subject to variability from sample to sample. We formulate a statistical model for such data, here called Functions on Surfaces, which enables a joint representation of the geometric and functional aspects, and propose an associated estimation framework. We assess the validity of the framework by performing a simulation study and we finally apply it to the analysis of neuroimaging data of cortical thickness, acquired from the brains of different subjects, and thus lying on domains with different geometries.
As the most important tool to provide high-level evidence-based medicine, researchers can statistically summarize and combine data from multiple studies by conducting meta-analysis. In meta-analysis, mean differences are frequently used effect size measurements to deal with continuous data, such as the Cohens d statistic and Hedges g statistic values. To calculate the mean difference based effect sizes, the sample mean and standard deviation are two essential summary measures. However, many of the clinical reports tend not to directly record the sample mean and standard deviation. Instead, the sample size, median, minimum and maximum values and/or the first and third quartiles are reported. As a result, researchers have to transform the reported information to the sample mean and standard deviation for further compute the effect size. Since most of the popular transformation methods were developed upon the normality assumption of the underlying data, it is necessary to perform a pre-test before transforming the summary statistics. In this article, we had introduced test statistics for three popular scenarios in meta-analysis. We suggests medical researchers to perform a normality test of the selected studies before using them to conduct further analysis. Moreover, we applied three different case studies to demonstrate the usage of the newly proposed test statistics. The real data case studies indicate that the new test statistics are easy to apply in practice and by following the recommended path to conduct the meta-analysis, researchers can obtain more reliable conclusions.
Assessing the technical efficiency of a set of observations requires that the associated data composed of inputs and outputs are perfectly known. If this is not the case, then biased estimates will likely be obtained. Data Envelopment Analysis (DEA) is one of the most extensively used mathematical models to estimate efficiency. It constructs a piecewise linear frontier against which all observations are compared. Since the frontier is empirically defined, any deviation resulting from low data quality (imperfect knowledge of data or IKD) may lead to efficiency under/overestimation. In this study, we model IKD and, then, apply the so-called Hit & Run procedure to randomly generate admissible observations, following some prespecified probability density functions. Sets used to model IKD limit the domain of data associated with each observation. Any point belonging to that domain is a candidate to figure out as the observation for efficiency assessment. Hence, this sampling procedure must run a sizable number of times (infinite, in theory) in such a way that it populates the whole sets. The DEA technique is used during the execution of each iteration to estimate bootstrapped efficiency scores for each observation. We use some scenarios to show that the proposed routine can outperform some of the available alternatives. We also explain how efficiency estimations can be used for statistical inference. An empirical case study based on the Portuguese public hospitals database (2013-2016) was addressed using the proposed method.
Treatment effects on asymmetric and heavy tailed distributions are better reflected at extreme tails rather than at averages or intermediate quantiles. In such distributions, standard methods for estimating quantile treatment effects can provide misleading inference due to the high variability of the estimators at the extremes. In this work, we propose a novel method which incorporates a heavy tailed component in the outcome distribution to estimate the extreme tails and simultaneously employs quantile regression to model the remainder of the distribution. The threshold between the bulk of the distribution and the extreme tails is estimated by utilising a state of the art technique. Simulation results show the superiority of the proposed method over existing estimators for quantile causal effects at extremes in the case of heavy tailed distributions. The method is applied to analyse a real dataset on the London transport network. In this application, the methodology proposed can assist in effective decision making to improve network performance, where causal inference in the extremes for heavy tailed distributions is often a key aim.