ترغب بنشر مسار تعليمي؟ اضغط هنا

This paper presents the first evaluation framework for Web search query segmentation based directly on IR performance. In the past, segmentation strategies were mainly validated against manual annotations. Our work shows that the goodness of a segmen tation algorithm as judged through evaluation against a handful of human annotated segmentations hardly reflects its effectiveness in an IR-based setup. In fact, state-of the-art algorithms are shown to perform as good as, and sometimes even better than human annotations -- a fact masked by previous validations. The proposed framework also provides us an objective understanding of the gap between the present best and the best possible segmentation algorithm. We draw these conclusions based on an extensive evaluation of six segmentation strategies, including three most recent algorithms, vis-a-vis segmentations from three human annotators. The evaluation framework also gives insights about which segments should be necessarily detected by an algorithm for achieving the best retrieval results. The meticulously constructed dataset used in our experiments has been made public for use by the research community.
It is a well-known fact that the degree distribution (DD) of the nodes in a partition of a bipartite network influences the DD of its one-mode projection on that partition. However, there are no studies exploring the effect of the DD of the other par tition on the one-mode projection. In this article, we show that the DD of the other partition, in fact, has a very strong influence on the DD of the one-mode projection. We establish this fact by deriving the exact or approximate closed-forms of the DD of the one-mode projection through the application of generating function formalism followed by the method of iterative convolution. The results are cross-validated through appropriate simulations.
Life and language are discrete combinatorial systems (DCSs) in which the basic building blocks are finite sets of elementary units: nucleotides or codons in a DNA sequence and letters or words in a language. Different combinations of these finite uni ts give rise to potentially infinite numbers of genes or sentences. This type of DCS can be represented as an Alphabetic Bipartite Network ($alpha$-BiN) where there are two kinds of nodes, one type represents the elementary units while the other type represents their combinations. There is an edge between a node corresponding to an elementary unit $u$ and a node corresponding to a particular combination $v$ if $u$ is present in $v$. Naturally, the partition consisting of the nodes representing elementary units is fixed, while the other partition is allowed to grow unboundedly. Here, we extend recently analytical findings for $alpha$-BiNs derived in [Peruani et al., Europhys. Lett. 79, 28001 (2007)] and empirically investigate two real world systems: the codon-gene network and the phoneme-language network. The evolution equations for $alpha$-BiNs under different growth rules are derived, and the corresponding degree distributions computed. It is shown that asymptotically the degree distribution of $alpha$-BiNs can be described as a family of beta distributions. The one-mode projections of the theoretical as well as the real world $alpha$-BiNs are also studied. We propose a comparison of the real world degree distributions and our theoretical predictions as a means for inferring the mechanisms underlying the growth of real world systems.
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا