Efficient Urdu Caption Generation using Attention based LSTM

64 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Inaam Ilahi

تاريخ النشر 2020

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Inaam Ilahi - Hafiz Muhammad Abdullah Zia - Muhammad Ahtazaz Ahsan

الحساب واللغة التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Recent advancements in deep learning have created many opportunities to solve real-world problems that remained unsolved for more than a decade. Automatic caption generation is a major research field, and the research community has done a lot of work on it in most common languages like English. Urdu is the national language of Pakistan and also much spoken and understood in the sub-continent region of Pakistan-India, and yet no work has been done for Urdu language caption generation. Our research aims to fill this gap by developing an attention-based deep learning model using techniques of sequence modeling specialized for the Urdu language. We have prepared a dataset in the Urdu language by translating a subset of the Flickr8k dataset containing 700 man images. We evaluate our proposed technique on this dataset and show that it can achieve a BLEU score of 0.83 in the Urdu language. We improve on the previous state-of-the-art by using better CNN architectures and optimization techniques. Furthermore, we provide a discussion on how the generated captions can be made correct grammar-wise.

قيم البحث

101 - Yu Yan , Jiusheng Chen , Weizhen Qi 2021

Transformer model with multi-head attention requires caching intermediate results for efficient inference in generation tasks. However, cache brings new memory-related costs and prevents leveraging larger batch size for faster speed. We propose memor y-efficient lossless attention (called EL-attention) to address this issue. It avoids heavy operations for building multi-head keys and values, cache for them is not needed. EL-attention constructs an ensemble of attention results by expanding query while keeping key and value shared. It produces the same result as multi-head attention with less GPU memory and faster inference speed. We conduct extensive experiments on Transformer, BART, and GPT-2 for summarization and question generation tasks. The results show EL-attention speeds up existing models by 1.6x to 5.3x without accuracy loss.

الحساب واللغة التعلم الآلي

Sea Ice Forecasting using Attention-based Ensemble LSTM

549 - Sahara Ali , Yiyi Huang , Xin Huang 2021

Accurately forecasting Arctic sea ice from subseasonal to seasonal scales has been a major scientific effort with fundamental challenges at play. In addition to physics-based earth system models, researchers have been applying multiple statistical an d machine learning models for sea ice forecasting. Looking at the potential of data-driven sea ice forecasting, we propose an attention-based Long Short Term Memory (LSTM) ensemble method to predict monthly sea ice extent up to 1 month ahead. Using daily and monthly satellite retrieved sea ice data from NSIDC and atmospheric and oceanic variables from ERA5 reanalysis product for 39 years, we show that our multi-temporal ensemble method outperforms several baseline and recently proposed deep learning models. This will substantially improve our ability in predicting future Arctic sea ice changes, which is fundamental for forecasting transporting routes, resource development, coastal erosion, threats to Arctic coastal communities and wildlife.

الفيزياء الجوية والمحيطية الذكاء الاصطناعي التعلم الآلي

Earlier Attention? Aspect-Aware LSTM for Aspect-Based Sentiment Analysis

117 - Bowen Xing , Lejian Liao , Dandan Song 2019

Aspect-based sentiment analysis (ABSA) aims to predict fine-grained sentiments of comments with respect to given aspect terms or categories. In previous ABSA methods, the importance of aspect has been realized and verified. Most existing LSTM-based m odels take aspect into account via the attention mechanism, where the attention weights are calculated after the context is modeled in the form of contextual vectors. However, aspect-related information may be already discarded and aspect-irrelevant information may be retained in classic LSTM cells in the context modeling process, which can be improved to generate more effective context representations. This paper proposes a novel variant of LSTM, termed as aspect-aware LSTM (AA-LSTM), which incorporates aspect information into LSTM cells in the context modeling stage before the attention mechanism. Therefore, our AA-LSTM can dynamically produce aspect-aware contextual representations. We experiment with several representative LSTM-based models by replacing the classic LSTM cells with the AA-LSTM cells. Experimental results on SemEval-2014 Datasets demonstrate the effectiveness of AA-LSTM.

الحساب واللغة

Fast Multi-language LSTM-based Online Handwriting Recognition

134 - Victor Carbune , Pedro Gonnet , Thomas Deselaers 2019

We describe an online handwriting system that is able to support 102 languages using a deep neural network architecture. This new system has completely replaced our previous Segment-and-Decode-based system and reduced the error rate by 20%-40% relati ve for most languages. Further, we report new state-of-the-art results on IAM-OnDB for both the open and closed dataset setting. The system combines methods from sequence recognition with a new input encoding using Bezier curves. This leads to up to 10x faster recognition times compared to our previous system. Through a series of experiments we determine the optimal configuration of our models and report the results of our setup on a number of additional public datasets.

الحساب واللغة التعلم الآلي التعلم الالي

Clue: Cross-modal Coherence Modeling for Caption Generation

156 - Malihe Alikhani , Piyush Sharma , Shengjie Li 2020

We use coherence relations inspired by computational models of discourse to study the information needs and goals of image captioning. Using an annotation protocol specifically devised for capturing image--caption coherence relations, we annotate 10, 000 instances from publicly-available image--caption pairs. We introduce a new task for learning inferences in imagery and text, coherence relation prediction, and show that these coherence annotations can be exploited to learn relation classifiers as an intermediary step, and also train coherence-aware, controllable image captioning models. The results show a dramatic improvement in the consistency and quality of the generated captions with respect to information needs specified via coherence relations.

الحساب واللغة الرؤية الحاسوبية وتمييز الأنماط