ﻻ يوجد ملخص باللغة العربية
Information theoretic criteria (ITC) have been widely adopted in engineering and statistics for selecting, among an ordered set of candidate models, the one that better fits the observed sample data. The selected model minimizes a penalized likelihood metric, where the penalty is determined by the criterion adopted. While rules for choosing a penalty that guarantees a consistent estimate of the model order are known, theoretical tools for its design with finite samples have never been provided in a general setting. In this paper, we study model order selection for finite samples under a design perspective, focusing on the generalized information criterion (GIC), which embraces the most common ITC. The theory is general, and as case studies we consider: a) the problem of estimating the number of signals embedded in additive white Gaussian noise (AWGN) by using multiple sensors; b) model selection for the general linear model (GLM), which includes e.g. the problem of estimating the number of sinusoids in AWGN. The analysis reveals a trade-off between the probabilities of overestimating and underestimating the order of the model. We then propose to design the GIC penalty to minimize underestimation while keeping the overestimation probability below a specified level. For the considered problems, this method leads to analytical derivation of the optimal penalty for a given sample size. A performance comparison between the penalty optimized GIC and common AIC and BIC is provided, demonstrating the effectiveness of the proposed design strategy.
Selecting a minimal feature set that is maximally informative about a target variable is a central task in machine learning and statistics. Information theory provides a powerful framework for formulating feature selection algorithms -- yet, a rigoro
In high-dimensional and/or non-parametric regression problems, regularization (or penalization) is used to control model complexity and induce desired structure. Each penalty has a weight parameter that indicates how strongly the structure correspond
Current data-driven traffic prediction models are usually trained with large datasets, e.g. several months of speeds and flows. Such models provide very good fit for ordinary road conditions, but often fail just when they are most needed: when traffi
In this paper, we study the information theoretic bounds for exact recovery in sub-hypergraph models for community detection. We define a general model called the $m-$uniform sub-hypergraph stochastic block model ($m-$ShSBM). Under the $m-$ShSBM, we
Obtaining labels can be costly and time-consuming. Active learning allows a learning algorithm to intelligently query samples to be labeled for efficient learning. Fisher information ratio (FIR) has been used as an objective for selecting queries in