No Arabic abstract
Network tomography has been regarded as one of the most promising methodologies for performance evaluation and diagnosis of the massive and decentralized Internet. This paper proposes a new estimation approach for solving a class of inverse problems in network tomography, based on marginal distributions of a sequence of one-dimensional linear projections of the observed data. We give a general identifiability result for the proposed method and study the design issue of these one dimensional projections in terms of statistical efficiency. We show that for a simple Gaussian tomography model, there is an optimal set of one-dimensional projections such that the estimator obtained from these projections is asymptotically as efficient as the maximum likelihood estimator based on the joint distribution of the observed data. For practical applications, we carry out simulation studies of the proposed method for two instances of network tomography. The first is for traffic demand tomography using a Gaussian Origin-Destination traffic model with a power relation between its mean and variance, and the second is for network delay tomography where the link delays are to be estimated from the end-to-end path delays. We compare estimators obtained from our method and that obtained from using the joint distribution and other lower dimensional projections, and show that in both cases, the proposed method yields satisfactory results.
Large graphs are natural mathematical models for describing the structure of the data in a wide variety of fields, such as web mining, social networks, information retrieval, biological networks, etc. For all these applications, automatic tools are required to get a synthetic view of the graph and to reach a good understanding of the underlying problem. In particular, discovering groups of tightly connected vertices and understanding the relations between those groups is very important in practice. This paper shows how a kernel version of the batch Self Organizing Map can be used to achieve these goals via kernels derived from the Laplacian matrix of the graph, especially when it is used in conjunction with more classical methods based on the spectral analysis of the graph. The proposed method is used to explore the structure of a medieval social network modeled through a weighted graph that has been directly built from a large corpus of agrarian contracts.
Let $X:=(X_1, ldots, X_p)$ be random objects (the inputs), defined on some probability space $(Omega,{mathcal{F}}, mathbb P)$ and valued in some measurable space $E=E_1timesldots times E_p$. Further, let $Y:=Y = f(X_1, ldots, X_p)$ be the output. Here, $f$ is a measurable function from $E$ to some Hilbert space $mathbb{H}$ ($mathbb{H}$ could be either of finite or infinite dimension). In this work, we give a natural generalization of the Sobol indices (that are classically defined when $Yinmathbb R$ ), when the output belongs to $mathbb{H}$. These indices have very nice properties. First, they are invariant. under isometry and scaling. Further they can be, as in dimension $1$, easily estimated by using the so-called Pick and Freeze method. We investigate the asymptotic behaviour of such estimation scheme.
Several statistics-based detectors, based on unimodal matrix models, for determining the number of sources in a field are designed. A new variance ratio statistic is proposed, and its asymptotic distribution is analyzed. The variance ratio detector is shown to outperform the alternatives. It is shown that further improvements are achievable via optimally selected rotations. Numerical experiments demonstrate the performance gains of our detection methods over the baseline approach.
The statistical problem for network tomography is to infer the distribution of $mathbf{X}$, with mutually independent components, from a measurement model $mathbf{Y}=Amathbf{X}$, where $A$ is a given binary matrix representing the routing topology of a network under consideration. The challenge is that the dimension of $mathbf{X}$ is much larger than that of $mathbf{Y}$ and thus the problem is often called ill-posed. This paper studies some statistical aspects of network tomography. We first address the identifiability issue and prove that the $mathbf{X}$ distribution is identifiable up to a shift parameter under mild conditions. We then use a mixture model of characteristic functions to derive a fast algorithm for estimating the distribution of $mathbf{X}$ based on the General method of Moments. Through extensive model simulation and real Internet trace driven simulation, the proposed approach is shown to be favorable comparing to previous methods using simple discretization for inferring link delays in a heterogeneous network.
A general asymptotic theory is given for the panel data AR(1) model with time series independent in different cross sections. The theory covers the cases of stationary process, nearly non-stationary process, unit root process, mildly integrated, mildly explosive and explosive processes. It is assumed that the cross-sectional dimension and time-series dimension are respectively $N$ and $T$. The results in this paper illustrate that whichever the process is, with an appropriate regularization, the least squares estimator of the autoregressive coefficient converges to a normal distribution with rate at least $O(N^{-1/3})$. Since the variance is the key to characterize the normal distribution, it is important to discuss the variance of the least squares estimator. We will show that when the autoregressive coefficient $rho$ satisfies $|rho|<1$, the variance declines at the rate $O((NT)^{-1/2})$, while the rate changes to $O(N^{-1/2}T^{-1})$ when $rho=1$ and $O(N^{-1/2}rho^{-T+2})$ when $|rho|>1$. $rho=1$ is the critical point where the convergence rate changes radically. The transition process is studied by assuming $rho$ depending on $T$ and going to $1$. An interesting phenomenon discovered in this paper is that, in the explosive case, the least squares estimator of the autoregressive coefficient has a standard normal limiting distribution in panel data case while it may not has a limiting distribution in univariate time series case.