Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

A Simple Mechanism for Focused Web-harvesting

439 0 0.0 ( 0 )

Download Cite

Added by L.T. Handoko

Publication date 2008

fields Informatics Engineering

and research's language is English

Authors Z. Akbar - L.T. Handoko

Information Retrieval Computers and Society

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

The focused web-harvesting is deployed to realize an automated and comprehensive index databases as an alternative way for virtual topical data integration. The web-harvesting has been implemented and extended by not only specifying the targeted URLs, but also predefining human-edited harvesting parameters to improve the speed and accuracy. The harvesting parameter set comprises three main components. First, the depth-scale of being harvested final pages containing desired information counted from the first page at the targeted URLs. Secondly, the focus-point number to determine the exact box containing relevant information. Lastly, the combination of keywords to recognize encountered hyperlinks of relevant images or full-texts embedded in those final pages. All parameters are accessible and fully customizable for each target by the administrators of participating institutions over an integrated web interface. A real implementation to the Indonesian Scientific Index which covers all scientific information across Indonesia is also briefly introduced.

rate research

WTR: A Test Collection for Web Table Retrieval

171 - Zhiyu Chen , Shuo Zhang , Brian D. Davison 2021

We describe the development, characteristics and availability of a test collection for the task of Web table retrieval, which uses a large-scale Web Table Corpora extracted from the Common Crawl. Since a Web table usually has rich context information such as the page title and surrounding paragraphs, we not only provide relevance judgments of query-table pairs, but also the relevance judgments of query-table context pairs with respect to a query, which are ignored by previous test collections. To facilitate future research with this benchmark, we provide details about how the dataset is pre-processed and also baseline results from both traditional and recently proposed table retrieval methods. Our experimental results show that proper usage of context labels can benefit previous table retrieval methods.

Information Retrieval

A Study of CAPTCHAs for Securing Web Services

358 - M. Tariq Banday , N. A. Shah 2011

Atomizing various Web activities by replacing human to human interactions on the Internet has been made indispensable due to its enormous growth. However, bots also known as Web-bots which have a malicious intend and pretending to be humans pose a severe threat to various services on the Internet that implicitly assume a human interaction. Accordingly, Web service providers before allowing access to such services use various Human Interaction Proofs (HIPs) to authenticate that the user is a human and not a bot. Completely Automated Public Turing test to tell Computers and Humans Apart (CAPTCHA) is a class of HIPs tests and are based on Artificial Intelligence. These tests are easier for humans to qualify and tough for bots to simulate. Several Web services use CAPTCHAs as a defensive mechanism against automated Web-bots. In this paper, we review the existing CAPTCHA schemes that have been proposed or are being used to protect various Web services. We classify them in groups and compare them with each other in terms of security and usability. We present general method used to generate and break text-based and image-based CAPTCHAs. Further, we discuss various security and usability issues in CAPTCHA design and provide guidelines for improving their robustness and usability.

Cryptography and Security Computers and Society

ReadNet: A Hierarchical Transformer Framework for Web Article Readability Analysis

113 - Changping Meng , Muhao Chen , Jie Mao 2021

Analyzing the readability of articles has been an important sociolinguistic task. Addressing this task is necessary to the automatic recommendation of appropriate articles to readers with different comprehension abilities, and it further benefits education systems, web information systems, and digital libraries. Current methods for assessing readability employ empirical measures or statistical learning techniques that are limited by their ability to characterize complex patterns such as article structures and semantic meanings of sentences. In this paper, we propose a new and comprehensive framework which uses a hierarchical self-attention model to analyze document readability. In this model, measurements of sentence-level difficulty are captured along with the semantic meanings of each sentence. Additionally, the sentence-level features are incorporated to characterize the overall readability of an article with consideration of article structures. We evaluate our proposed approach on three widely-used benchmark datasets against several strong baseline approaches. Experimental results show that our proposed method achieves the state-of-the-art performance on estimating the readability for various web articles and literature.

Information Retrieval Human-Computer Interaction

Context Models For Web Search Personalization

573 - Maksims Volkovs 2015

We present our solution to the Yandex Personalized Web Search Challenge. The aim of this challenge was to use the historical search logs to personalize top-N document rankings for a set of test users. We used over 100 features extracted from user- and query-depended contexts to train neural net and tree-based learning-to-rank and regression models. Our final submission, which was a blend of several different models, achieved an NDCG@10 of 0.80476 and placed 4th amongst the 194 teams winning 3rd prize.

Information Retrieval

Staging Transformations for Multimodal Web Interaction Management

95 - Michael Narayan , Chris Williams , Saverio Perugini 2003

Multimodal interfaces are becoming increasingly ubiquitous with the advent of mobile devices, accessibility considerations, and novel software technologies that combine diverse interaction media. In addition to improving access and delivery capabilities, such interfaces enable flexible and personalized dialogs with websites, much like a conversation between humans. In this paper, we present a software framework for multimodal web interaction management that supports mixed-initiative dialogs between users and websites. A mixed-initiative dialog is one where the user and the website take turns changing the flow of interaction. The framework supports the functional specification and realization of such dialogs using staging transformations -- a theory for representing and reasoning about dialogs based on partial input. It supports multiple interaction interfaces, and offers sessioning, caching, and co-ordination functions through the use of an interaction manager. Two case studies are presented to illustrate the promise of this approach.

Information Retrieval Programming Languages

comments

Fetching comments

Arab International University

Additional details More universities

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد

A Simple Mechanism for Focused Web-harvesting

Ask ChatGPT about the research

No Arabic abstract

Read More