ﻻ يوجد ملخص باللغة العربية
We address the problem of cross-referencing text fragments with Wikipedia pages, in a way that synonymy and polysemy issues are resolved accurately and efficiently. We take inspiration from a recent flow of work [Cucerzan 2007, Mihalcea and Csomai 2007, Milne and Witten 2008, Chakrabarti et al 2009], and extend their scenario from the annotation of long documents to the annotation of short texts, such as snippets of search-engine results, tweets, news, blogs, etc.. These short and poorly composed texts pose new challenges in terms of efficiency and effectiveness of the annotation process, that we address by designing and engineering TAGME, the first system that performs an accurate and on-the-fly annotation of these short textual fragments. A large set of experiments shows that TAGME outperforms state-of-the-art algorithms when they are adapted to work on short texts and it results fast and competitive on long texts.
We study the problem of entity salience by proposing the design and implementation of SWAT, a system that identifies the salient Wikipedia entities occurring in an input document. SWAT consists of several modules that are able to detect and classify
For providing quick and accurate search results, a search engine maintains a local snapshot of the entire web. And, to keep this local cache fresh, it employs a crawler for tracking changes across various web pages. It would have been ideal if the cr
We argue that relationships between Web pages are functions of the users intent. We identify a class of Web tasks - information-gathering - that can be facilitated by a search engine that provides links to pages which are related to the page the user
Blog is becoming an increasingly popular media for information publishing. Besides the main content, most of blog pages nowadays also contain noisy information such as advertisements etc. Removing these unrelated elements can improves user experience
In multi-label text classification, each textual document can be assigned with one or more labels. Due to this nature, the multi-label text classification task is often considered to be more challenging compared to the binary or multi-class text clas