ترغب بنشر مسار تعليمي؟ اضغط هنا

The Imaginary Sliding Window As a New Data Structure for Adaptive Algorithms

51   0   0.0 ( 0 )
 نشر من قبل Boris Ryabko
 تاريخ النشر 2008
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English
 تأليف Boris Ryabko




اسأل ChatGPT حول البحث

The scheme of the sliding window is known in Information Theory, Computer Science, the problem of predicting and in stastistics. Let a source with unknown statistics generate some word $... x_{-1}x_{0}x_{1}x_{2}...$ in some alphabet $A$. For every moment $t, t=... $ $-1, 0, 1, ...$, one stores the word (window) $ x_{t-w} x_{t-w+1}... x_{t-1}$ where $w$,$w geq 1$, is called window length. In the theory of universal coding, the code of the $x_{t}$ depends on source ststistics estimated by the window, in the problem of predicting, each letter $x_{t}$ is predicted using information of the window, etc. After that the letter $x_{t}$ is included in the window on the right, while $x_{t-w}$ is removed from the window. It is the sliding window scheme. This scheme has two merits: it allows one i) to estimate the source statistics quite precisely and ii) to adapt the code in case of a change in the source statistics. However this scheme has a defect, namely, the necessity to store the window (i.e. the word $x_{t-w}... x_{t-1})$ which needs a large memory size for large $w$. A new scheme named the Imaginary Sliding Window (ISW) is constructed. The gist of this scheme is that not the last element $x_{t-w}$ but rather a random one is removed from the window. This allows one to retain both merits of the sliding window as well as the possibility of not storing the window and thus significantly decreasing the memory size.

قيم البحث

اقرأ أيضاً

The basic goal of threshold group testing is to identify up to $d$ defective items among a population of $n$ items, where $d$ is usually much smaller than $n$. The outcome of a test on a subset of items is positive if the subset has at least $u$ defe ctive items, negative if it has up to $ell$ defective items, where $0leqell<u$, and arbitrary otherwise. This is called threshold group testing. The parameter $g=u-ell-1$ is called textit{the gap}. In this paper, we focus on the case $g>0$, i.e., threshold group testing with a gap. Note that the results presented here are also applicable to the case $g = 0$; however, the results are not as efficient as those in related work. Currently, a few reported studies have investigated test designs and decoding algorithms for identifying defective items. Most of the previous studies have not been feasible because there are numerous constraints on their problem settings or the decoding complexities of their proposed schemes are relatively large. Therefore, it is compulsory to reduce the number of tests as well as the decoding complexity, i.e., the time for identifying the defective items, for achieving practical schemes. The work presented here makes five contributions. The first is a more accurate theorem for a non-adaptive algorithm for threshold group testing proposed by Chen and Fu. The second is an improvement in the construction of disjunct matrices, which are the main tools for tackling (threshold) group testing and other tasks such as constructing cover-free families or learning hidden graphs. The third and fourth contributions are a reduced exact upper bound on the number of tests and a reduced asymptotic bound on the decoding time for identifying defective items in a noisy setting on test outcomes. The fifth contribution is a simulation on the number of tests of the resulting improvements for previous work and the proposed theorems.
The goal of group testing is to efficiently identify a few specific items, called positives, in a large population of items via tests. A test is an action on a subset of items which returns positive if the subset contains at least one positive and ne gative otherwise. In non-adaptive group testing, all tests are independent, can be performed in parallel and represented as a measurement matrix. In this work, we consider non-adaptive group testing with consecutive positives in which the items are linearly ordered and the positives are consecutive in that order. We proposed two improved algorithms for efficiently identifying consecutive positives. In particular, without storing measurement matrices, we can identify up to $d$ consecutive positives with $2 log_2{frac{n}{d}} + 2d$ ($4 log_2{frac{n}{d}} + 2d$, resp.) tests in $O left( log_2^2{frac{n}{d}} + d right)$ ($O left( log_2{frac{n}{d}} + d right)$, resp.) time. These results significantly improve the state-of-the-art scheme in which it takes $5 log_2{frac{n}{d}} + 2d + 21$ tests to identify the positives in $O left( frac{n}{d} log_2{frac{n}{d}} + d^2 right)$ time with the measurement matrices associated with the scheme stored somewhere.
Transcripts generated by automatic speech recognition (ASR) systems for spoken documents lack structural annotations such as paragraphs, significantly reducing their readability. Automatically predicting paragraph segmentation for spoken documents ma y both improve readability and downstream NLP performance such as summarization and machine reading comprehension. We propose a sequence model with self-adaptive sliding window for accurate and efficient paragraph segmentation. We also propose an approach to exploit phonetic information, which significantly improves robustness of spoken document segmentation to ASR errors. Evaluations are conducted on the English Wiki-727K document segmentation benchmark, a Chinese Wikipedia-based document segmentation dataset we created, and an in-house Chinese spoken document dataset. Our proposed model outperforms the state-of-the-art (SOTA) model based on the same BERT-Base, increasing segmentation F1 on the English benchmark by 4.2 points and on Chinese datasets by 4.3-10.1 points, while reducing inference time to less than 1/6 of inference time of the current SOTA.
The advent of massive datasets (and the consequent design of high-performing distributed storage systems) have reignited the interest of the scientific and engineering community towards the design of lossless data compressors which achieve effective compression ratio and very efficient decompression speed. Lempel-Zivs LZ77 algorithm is the de facto choice in this scenario because of its decompression speed and its flexibility in trading decompression speed versus compressed-space efficiency. Each of the existing implementations offers a trade-off between space occupancy and decompression speed, so software engineers have to content themselves by picking the one which comes closer to the requirements of the application in their hands. Starting from these premises, and for the first time in the literature, we address in this paper the problem of trading optimally, and in a principled way, the consumption of these two resources by introducing the Bicriteria LZ77-Parsing problem, which formalizes in a principled way what data-compressors have traditionally approached by means of heuristics. The goal is to determine an LZ77 parsing which minimizes the space occupancy in bits of the compressed file, provided that the decompression time is bounded by a fixed amount (or vice-versa). This way, the software engineer can set its space (or time) requirements and then derive the LZ77 parsing which optimizes the decompression speed (or the space occupancy, respectively). We solve this problem efficiently in O(n log^2 n) time and optimal linear space within a small, additive approximation, by proving and deploying some specific structural properties of the weighted graph derived from the possible LZ77-parsings of the input file. The preliminary set of experiments shows that our novel proposal dominates all the highly engineered competitors, hence offering a win-win situation in theory&practice.
We investigate error propagation in sliding window decoding of braided convolutional codes (BCCs). Previous studies of BCCs have focused on iterative decoding thresholds, minimum distance properties, and their bit error rate (BER) performance at smal l to moderate frame length. Here, we consider a sliding window decoder in the context of large frame length or one that continuously outputs blocks in a streaming fashion. In this case, decoder error propagation, due to the feedback inherent in BCCs, can be a serious problem.In order to mitigate the effects of error propagation, we propose several schemes: a emph{window extension algorithm} where the decoder window size can be extended adaptively, a resynchronization mechanism where we reset the encoder to the initial state, and a retransmission strategy where erroneously decoded blocks are retransmitted. In addition, we introduce a soft BER stopping rule to reduce computational complexity, and the tradeoff between performance and complexity is examined. Simulation results show that, using the proposed window extension algorithm, resynchronization mechanism, and retransmission strategy, the BER performance of BCCs can be improved by up to four orders of magnitude in the signal-to-noise ratio operating range of interest, and in addition the soft BER stopping rule can be employed to reduce computational complexity.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا