ترغب بنشر مسار تعليمي؟ اضغط هنا

Word statistics in Blogs and RSS feeds: Towards empirical universal evidence

27   0   0.0 ( 0 )
 نشر من قبل Renaud Lambiotte
 تاريخ النشر 2007
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

We focus on the statistics of word occurrences and of the waiting times between such occurrences in Blogs. Due to the heterogeneity of words frequencies, the empirical analysis is performed by studying classes of frequently-equivalent words, i.e. by grouping words depending on their frequencies. Two limiting cases are considered: the dilute limit, i.e. for those words that are used less than once a day, and the dense limit for frequent words. In both cases, extreme events occur more frequently than expected from the Poisson hypothesis. These deviations from Poisson statistics reveal non-trivial time correlations between events that are associated with bursts of activities. The distribution of waiting times is shown to behave like a stretched exponential and to have the same shape for different sets of words sharing a common frequency, thereby revealing universal features.

قيم البحث

اقرأ أيضاً

Random linear network codes can be designed and implemented in a distributed manner, with low computational complexity. However, these codes are classically implemented over finite fields whose size depends on some global network parameters (size of the network, the number of sinks) that may not be known prior to code design. Also, if new nodes join the entire network code may have to be redesigned. In this work, we present the first universal and robust distributed linear network coding schemes. Our schemes are universal since they are independent of all network parameters. They are robust since if nodes join or leave, the remaining nodes do not need to change their coding operations and the receivers can still decode. They are distributed since nodes need only have topological information about the part of the network upstream of them, which can be naturally streamed as part of the communication protocol. We present both probabilistic and deterministic schemes that are all asymptotically rate-optimal in the coding block-length, and have guarantees of correctness. Our probabilistic designs are computationally efficient, with order-optimal complexity. Our deterministic designs guarantee zero error decoding, albeit via codes with high computational complexity in general. Our coding schemes are based on network codes over ``scalable fields. Instead of choosing coding coefficients from one field at every node, each node uses linear coding operations over an ``effective field-size that depends on the nodes distance from the source node. The analysis of our schemes requires technical tools that may be of independent interest. In particular, we generalize the Schwartz-Zippel lemma by proving a non-uniform version, wherein variables are chosen from sets of possibly different sizes. We also provide a novel robust distributed algorithm to assign unique IDs to network nodes.
150 - Boris Ryabko 2018
Suppose there is a large file which should be transmitted (or stored) and there are several (say, m) admissible data-compressors. It seems natural to try all the compressors and then choose the best, i.e. the one that gives the shortest compressed fi le. Then transfer (or store) the index number of the best compressor (it requires log m bits) and the compressed file.The only problem is the time, which essentially increases due to the need to compress the file m times (in order to find the best compressor). We propose a method that encodes the file with the optimal compressor, but uses a relatively small additional time: the ratio of this extra time and the total time of calculation can be limited by an arbitrary positive constant. Generally speaking, in many situations it may be necessary find the best data compressor out of a given set, which is often done by comparing them empirically. One of the goals of this work is to turn such a selection process into a part of the data compression method, automating and optimizing it.
Non-Orthogonal Multiple Access (NOMA) has been proposed to enhance the Spectrum Efficiency (SE) and cell-edge capacity. This paper considers the massive Multi-Input Multi-Output (MIMO) with Non-Orthogonal Multiple Access (NOMA) encoding. The close-fo rm expression of capacity of the massive MIMO with NOMA is given here. Apart from the previous Successive Interference Cancellation (SIC) method, the Power Hard Limiter (PHD) is introduced here for better reality implement.
58 - Shirin Jalali 2018
Quantized maximum a posteriori (Q-MAP) is a recently-proposed Bayesian compressed sensing algorithm that, given the source distribution, recovers $X^n$ from its linear measurements $Y^m=AX^n$, where $Ain R^{mtimes n}$ denotes the known measurement ma trix. On the other hand, Lagrangian minimum entropy pursuit (L-MEP) is a universal compressed sensing algorithm that aims at recovering $X^n$ from its linear measurements $Y^m=AX^n$, without having access to the source distribution. Both Q-MAP and L-MEP provably achieve the minimum required sampling rates, in noiseless cases where such fundamental limits are known. L-MEP is based on minimizing a cost function that consists of a linear combination of the conditional empirical entropy of a potential reconstruction vector and its corresponding measurement error. In this paper, using a first-order linear approximation of the conditional empirical entropy function, L-MEP is connected with Q-MAP. The established connection between L-MEP and Q-MAP leads to variants of Q-MAP which have the same asymptotic performance as Q-MAP in terms of their required sampling rates. Moreover, these variants suggest that Q-MAP is robust to small error in estimating the source distribution. This robustness is theoretically proven and the effect of a non-vanishing estimation error on the required sampling rate is characterized.
We study a problem of sequential frame synchronization for a frame transmitted uniformly in $A$ slots. For a discrete memoryless channel (DMC), Venkat Chandar et al showed that the frame length $N$ must scale with $A$ as $e^{N alpha(Q)} > A$ for the frame synchronization error to go to zero (asymptotically with $A$). Here, $Q$ denotes the transition probabilities of the DMC and $alpha(Q)$, defined as the synchronization threshold, characterizes the scaling needed of $N$ for asymptotic error free frame synchronization. We show that the asynchronous communication framework permits a natural tradeoff between the sync frame length $N$ and the channel (usually parameterised by the input). For an AWGN channel, we study this tradeoff between the sync frame length $N$ and the input symbol power $P$ and characterise the scaling needed of the sync frame energy $E = N P$ for optimal frame synchronisation.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا