No Arabic abstract
For a stationary stochastic process ${X_n}$ with values in some set $A$, a finite word $w in A^K$ is called a memory word if the conditional probability of $X_0$ given the past is constant on the cylinder set defined by $X_{-K}^{-1}=w$. It is a called a minimal memory word if no proper suffix of $w$ is also a memory word. For example in a $K$-step Markov processes all words of length $K$ are memory words but not necessarily minimal. We consider the problem of determining the lengths of the longest minimal memory words and the shortest memory words of an unknown process ${X_n}$ based on sequentially observing the outputs of a single sample ${xi_1,xi_2,...xi_n}$. We will give a universal estimator which converges almost surely to the length of the longest minimal memory word and show that no such universal estimator exists for the length of the shortest memory word. The alphabet $A$ may be finite or countable.
Let A be an n by m matrix with m>n, and suppose that the underdetermined linear system As=x admits a sparse solution s0 for which ||s0||_0 < 1/2 spark(A). Such a sparse solution is unique due to a well-known uniqueness theorem. Suppose now that we have somehow a solution s_hat as an estimation of s0, and suppose that s_hat is only `approximately sparse, that is, many of its components are very small and nearly zero, but not mathematically equal to zero. Is such a solution necessarily close to the true sparsest solution? More generally, is it possible to construct an upper bound on the estimation error ||s_hat-s0||_2 without knowing s0? The answer is positive, and in this paper we construct such a bound based on minimal singular values of submatrices of A. We will also state a tight bound, which is more complicated, but besides being tight, enables us to study the case of random dictionaries and obtain probabilistic upper bounds. We will also study the noisy case, that is, where x=As+n. Moreover, we will see that where ||s0||_0 grows, to obtain a predetermined guaranty on the maximum of ||s_hat-s0||_2, s_hat is needed to be sparse with a better approximation. This can be seen as an explanation to the fact that the estimation quality of sparse recovery algorithms degrades where ||s0||_0 grows.
Differential privacy has become a widely accepted notion of privacy, leading to the introduction and deployment of numerous privatization mechanisms. However, ensuring the privacy guarantee is an error-prone process, both in designing mechanisms and in implementing those mechanisms. Both types of errors will be greatly reduced, if we have a data-driven approach to verify privacy guarantees, from a black-box access to a mechanism. We pose it as a property estimation problem, and study the fundamental trade-offs involved in the accuracy in estimated privacy guarantees and the number of samples required. We introduce a novel estimator that uses polynomial approximation of a carefully chosen degree to optimally trade-off bias and variance. With $n$ samples, we show that this estimator achieves performance of a straightforward plug-in estimator with $n ln n$ samples, a phenomenon referred to as effective sample size amplification. The minimax optimality of the proposed estimator is proved by comparing it to a matching fundamental lower bound.
Practically good error-correcting codes should have good parameters and efficient decoding algorithms. Some algebraically defined good codes such as cyclic codes, Reed-Solomon codes, and Reed-Muller codes have nice decoding algorithms. However, many optimal linear codes do not have an efficient decoding algorithm except for the general syndrome decoding which requires a lot of memory. Therefore, it is a natural question whether which optimal linear codes have an efficient decoding. We show that two binary optimal $[36,19,8]$ linear codes and two binary optimal $[40,22,8]$ codes have an efficient decoding algorithm. There was no known efficient decoding algorithm for the binary optimal $[36,19,8]$ and $[40,22,8]$ codes. We project them onto the much shorter length linear $[9,5,4]$ and $[10, 6, 4]$ codes over $GF(4)$, respectively. This decoding algorithms, called {em projection decoding}, can correct errors of weight up to 3. These $[36,19,8]$ and $[40,22,8]$ codes respectively have more codewords than any optimal self-dual $[36, 18, 8]$ and $[40,20,8]$ codes for given length and minimum weight, implying that these codes more practical.
Total correlation (TC) is a fundamental concept in information theory to measure the statistical dependency of multiple random variables. Recently, TC has shown effectiveness as a regularizer in many machine learning tasks when minimizing/maximizing the correlation among random variables is required. However, to obtain precise TC values is challenging, especially when the closed-form distributions of variables are unknown. In this paper, we introduced several sample-based variational TC estimators. Specifically, we connect the TC with mutual information (MI) and constructed two calculation paths to decompose TC into MI terms. In our experiments, we estimated the true TC values with the proposed estimators in different simulation scenarios and analyzed the properties of the TC estimators.
Polar codes with memory (PCM) are proposed in this paper: a pair of consecutive code blocks containing a controlled number of mutual information bits. The shared mutual information bits of the succeeded block can help the failed block to recover. The underlying polar codes can employ any decoding scheme such as the successive cancellation (SC) decoding (PCM-SC), the belief propagation (BP) decoding (PCM-BP), and the successive cancellation list (SCL) decoding (PCM-SCL). The analysis shows that the packet error rate (PER) of PCM decreases to the order of PER squared while maintaining the same complexity as the underlying polar codes. Simulation results indicate that for PCM-SC, the PER is comparable to (less than 0.3 dB) the stand-alone SCL decoding with two lists for the block length $N=256$. The PER of PCM-SCL with $L$ lists can match that of the stand-alone SCL decoding with $2L$ lists. Two hardware decoders for PCM are also implemented: the in-serial (IS) decoder and the low-latency interleaved (LLI) decoder. For $N=256$, synthesis results show that in the worst case, the latency of the PCM LLI decoder is only $16.1%$ of the adaptive SCL decoder with $L=2$, while the throughput is improved by 13 times compared to it.