ترغب بنشر مسار تعليمي؟ اضغط هنا

An information theoretic model for summarization, and some basic results

186   0   0.0 ( 0 )
 نشر من قبل Eric Graves
 تاريخ النشر 2019
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

A basic information theoretic model for summarization is formulated. Here summarization is considered as the process of taking a report of $v$ binary objects, and producing from it a $j$ element subset that captures most of the important features of the original report, with importance being defined via an arbitrary set function endemic to the model. The loss of information is then measured by a weight average of variational distances, which we term the semantic loss. Our results include both cases where the probability distribution generating the $v$-length reports are known and unknown. In the case where it is known, our results demonstrate how to construct summarizers which minimize the semantic loss. For the case where the probability distribution is unknown, we show how to construct summarizers whose semantic loss when averaged uniformly over all possible distribution converges to the minimum.



قيم البحث

اقرأ أيضاً

98 - Neri Merhav , Igal Sason 2020
This work is an extension of our earlier article, where a well-known integral representation of the logarithmic function was explored, and was accompanied with demonstrations of its usefulness in obtaining compact, easily-calculable, exact formulas f or quantities that involve expectations of the logarithm of a positive random variable. Here, in the same spirit, we derive an exact integral representation (in one or two dimensions) of the moment of a nonnegative random variable, or the sum of such independent random variables, where the moment order is a general positive noninteger real (also known as fractional moments). The proposed formula is applied to a variety of examples with an information-theoretic motivation, and it is shown how it facilitates their numerical evaluations. In particular, when applied to the calculation of a moment of the sum of a large number, $n$, of nonnegative random variables, it is clear that integration over one or two dimensions, as suggested by our proposed integral representation, is significantly easier than the alternative of integrating over $n$ dimensions, as needed in the direct calculation of the desired moment.
In this paper, we study a support set reconstruction problem in which the signals of interest are jointly sparse with a common support set, and sampled by joint sparsity model-2 (JSM-2) in the presence of noise. Using mathematical tools, we develop u pper and lower bounds on the failure probability of support set reconstruction in terms of the sparsity, the ambient dimension, the minimum signal to noise ratio, the number of measurement vectors and the number of measurements. These bounds can be used to provide a guideline to determine the system parameters in various applications of compressed sensing with noisy JSM-2. Based on the bounds, we develop necessary and sufficient conditions for reliable support set reconstruction. We interpret these conditions to give theoretical explanations about the benefits enabled by joint sparsity structure in noisy JSM-2. We compare our sufficient condition with the existing result of noisy multiple measurement vectors model (MMV). As a result, we show that noisy JSM-2 may require less number of measurements than noisy MMV for reliable support set reconstruction.
A key practical constraint on the design of Hybrid automatic repeat request (HARQ) schemes is the size of the on-chip buffer that is available at the receiver to store previously received packets. In fact, in modern wireless standards such as LTE and LTE-A, the HARQ buffer size is one of the main drivers of the modem area and power consumption. This has recently highlighted the importance of HARQ buffer management, that is, of the use of buffer-aware transmission schemes and of advanced compression policies for the storage of received data. This work investigates HARQ buffer management by leveraging information-theoretic achievability arguments based on random coding. Specifically, standard HARQ schemes, namely Type-I, Chase Combining and Incremental Redundancy, are first studied under the assumption of a finite-capacity HARQ buffer by considering both coded modulation, via Gaussian signaling, and Bit Interleaved Coded Modulation (BICM). The analysis sheds light on the impact of different compression strategies, namely the conventional compression log-likelihood ratios and the direct digitization of baseband signals, on the throughput. Then, coding strategies based on layered modulation and optimized coding blocklength are investigated, highlighting the benefits of HARQ buffer-aware transmission schemes. The optimization of baseband compression for multiple-antenna links is also studied, demonstrating the optimality of a transform coding approach.
Given a probability measure $mu$ over ${mathbb R}^n$, it is often useful to approximate it by the convex combination of a small number of probability measures, such that each component is close to a product measure. Recently, Ronen Eldan used a stoch astic localization argument to prove a general decomposition result of this type. In Eldans theorem, the `number of components is characterized by the entropy of the mixture, and `closeness to product is characterized by the covariance matrix of each component. We present an elementary proof of Eldans theorem which makes use of an information theory (or estimation theory) interpretation. The proof is analogous to the one of an earlier decomposition result known as the `pinning lemma.
A communication setup is considered where a transmitter wishes to convey a message to a receiver and simultaneously estimate the state of that receiver through a common waveform. The state is estimated at the transmitter by means of generalized feedb ack, i.e., a strictly causal channel output, and the known waveform. The scenario at hand is motivated by joint radar and communication, which aims to co-design radar sensing and communication over shared spectrum and hardware. For the case of memoryless single receiver channels with i.i.d. time-varying state sequences, we fully characterize the capacity-distortion tradeoff, defined as the largest achievable rate below which a message can be conveyed reliably while satisfying some distortion constraints on state sensing. We propose a numerical method to compute the optimal input that achieves the capacity-distortion tradeoff. Then, we address memoryless state-dependent broadcast channels (BCs). For physically degraded BCs with i.i.d. time-varying state sequences, we characterize the capacity-distortion tradeoff region as a rather straightforward extension of single receiver channels. For general BCs, we provide inner and outer bounds on the capacity-distortion region, as well as a sufficient condition when this capacity-distortion region is equal to the product of the capacity region and the set of achievable distortions. A number of illustrative examples demonstrates that the optimal co-design schemes outperform conventional schemes that split the resources between sensing and communication.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا