ترغب بنشر مسار تعليمي؟ اضغط هنا

DTATG: An Automatic Title Generator based on Dependency Trees

82   0   0.0 ( 0 )
 نشر من قبل Liqun Shao
 تاريخ النشر 2017
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

We study automatic title generation for a given block of text and present a method called DTATG to generate titles. DTATG first extracts a small number of central sentences that convey the main meanings of the text and are in a suitable structure for conversion into a title. DTATG then constructs a dependency tree for each of these sentences and removes certain branches using a Dependency Tree Compression Model we devise. We also devise a title test to determine if a sentence can be used as a title. If a trimmed sentence passes the title test, then it becomes a title candidate. DTATG selects the title candidate with the highest ranking score as the final title. Our experiments showed that DTATG can generate adequate titles. We also showed that DTATG-generated titles have higher F1 scores than those generated by the previous methods.



قيم البحث

اقرأ أيضاً

In this paper, we revisit the challenging problem of unsupervised single-document summarization and study the following aspects: Integer linear programming (ILP) based algorithms, Parameterized normalization of term and sentence scores, and Title-dri ven approaches for summarization. We describe a new framework, NewsSumm, that includes many existing and new approaches for summarization including ILP and title-driven approaches. NewsSumms flexibility allows to combine different algorithms and sentence scoring schemes seamlessly. Our results combining sentence scoring with ILP and normalization are in contrast to previous work on this topic, showing the importance of a broader search for optimal parameters. We also show that the new title-driven reduction idea leads to improvement in performance for both unsupervised and supervised approaches considered.
In this work we discuss the related challenges and describe an approach towards the fusion of state-of-the-art technologies from the Spoken Dialogue Systems (SDS) and the Semantic Web and Information Retrieval domains. We envision a dialogue system n amed LD-SDS that will support advanced, expressive, and engaging user requests, over multiple, complex, rich, and open-domain data sources that will leverage the wealth of the available Linked Data. Specifically, we focus on: a) improving the identification, disambiguation and linking of entities occurring in data sources and user input; b) offering advanced query services for exploiting the semantics of the data, with reasoning and exploratory capabilities; and c) expanding the typical information seeking dialogue model (slot filling) to better reflect real-world conversational search scenarios.
With the increase of complexity of modern software, social collaborative coding and reuse of open source software packages become more and more popular, which thus greatly enhances the development efficiency and software quality. However, the explosi ve growth of open source software packages exposes developers to the challenge of information overload. While this can be addressed by conventional recommender systems, they usually do not consider particular constraints of social coding such as social influence among developers and dependency relations among software packages. In this paper, we aim to model the dynamic interests of developers with both social influence and dependency constraints, and propose the Session-based Social and Dependency-aware software Recommendation (SSDRec) model. This model integrates recurrent neural network (RNN) and graph attention network (GAT) into a unified framework. A RNN is employed to model the short-term dynamic interests of developers in each session and two GATs are utilized to capture social influence from friends and dependency constraints from dependent software packages, respectively. Extensive experiments are conducted on real-world datasets and the results demonstrate that our model significantly outperforms the competitive baselines.
Timely analysis of cyber-security information necessitates automated information extraction from unstructured text. While state-of-the-art extraction methods produce extremely accurate results, they require ample training data, which is generally una vailable for specialized applications, such as detecting security related entities; moreover, manual annotation of corpora is very costly and often not a viable solution. In response, we develop a very precise method to automatically label text from several data sources by leveraging related, domain-specific, structured data and provide public access to a corpus annotated with cyber-security entities. Next, we implement a Maximum Entropy Model trained with the average perceptron on a portion of our corpus ($sim$750,000 words) and achieve near perfect precision, recall, and accuracy, with training times under 17 seconds.
The number of documents available into Internet moves each day up. For this reason, processing this amount of information effectively and expressibly becomes a major concern for companies and scientists. Methods that represent a textual document by a topic representation are widely used in Information Retrieval (IR) to process big data such as Wikipedia articles. One of the main difficulty in using topic model on huge data collection is related to the material resources (CPU time and memory) required for model estimate. To deal with this issue, we propose to build topic spaces from summarized documents. In this paper, we present a study of topic space representation in the context of big data. The topic space representation behavior is analyzed on different languages. Experiments show that topic spaces estimated from text summaries are as relevant as those estimated from the complete documents. The real advantage of such an approach is the processing time gain: we showed that the processing time can be drastically reduced using summarized documents (more than 60% in general). This study finally points out the differences between thematic representations of documents depending on the targeted languages such as English or latin languages.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا