أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Marc Tanti

On the Language-specificity of Multilingual BERT and the Impact of Fine-tuning

133 - Marc Tanti , Lonneke van der Plas , Claudia Borg 2021

Recent work has shown evidence that the knowledge acquired by multilingual BERT (mBERT) has two components: a language-specific and a language-neutral one. This paper analyses the relationship between them, in the context of fine-tuning on two tasks -- POS tagging and natural language inference -- which require the model to bring to bear different degrees of language-specific knowledge. Visualisations reveal that mBERT loses the ability to cluster representations by language after fine-tuning, a result that is supported by evidence from language identification experiments. However, further experiments on unlearning language-specific representations using gradient reversal and iterative adversarial learning are shown not to add further improvement to the language-independent component over and above the effect of fine-tuning. The results presented here suggest that the process of fine-tuning causes a reorganisation of the models limited representational capacity, enhancing language-independent representations at the expense of language-specific ones.

الحساب واللغة الحوسبة العصبية والتطورية

On Architectures for Including Visual Information in Neural Language Models for Image Description

281 - Marc Tanti , Albert Gatt , Kenneth P. Camilleri 2019

A neural language model can be conditioned into generating descriptions for images by providing visual information apart from the sentence prefix. This visual information can be included into the language model through different points of entry resul ting in different neural architectures. We identify four main architectures which we call init-inject, pre-inject, par-inject, and merge. We analyse these four architectures and conclude that the best performing one is init-inject, which is when the visual information is injected into the initial state of the recurrent neural network. We confirm this using both automatic evaluation measures and human annotation. We then analyse how much influence the images have on each architecture. This is done by measuring how different the output probabilities of a model are when a partial sentence is combined with a completely different image from the one it is meant to be combined with. We find that init-inject tends to quickly become less influenced by the image as more words are generated. A different architecture called merge, which is when the visual information is merged with the recurrent neural networks hidden state vector prior to output, loses visual influence much more slowly, suggesting that it would work better for generating longer sentences. We also observe that the merge architecture can have its recurrent neural network pre-trained in a text-only language model (transfer learning) rather than be initialised randomly as usual. This results in even better performance than the other architectures, provided that the source language model is not too good at language modelling or it will overspecialise and be less effective at image description generation. Our work opens up new avenues of research in neural architectures, explainable AI, and transfer learning.

الحوسبة العصبية والتطورية الحساب واللغة

Transfer learning from language models to image caption generators: Better models may not transfer better

394 - Marc Tanti , Albert Gatt , Kenneth P. Camilleri 2019

When designing a neural caption generator, a convolutional neural network can be used to extract image features. Is it possible to also use a neural language model to extract sentence prefix features? We answer this question by trying different ways to transfer the recurrent neural network and embedding layer from a neural language model to an image caption generator. We find that image caption generators with transferred parameters perform better than those trained from scratch, even when simply pre-training them on the text of the same captions dataset it will later be trained on. We also find that the best language models (in terms of perplexity) do not result in the best caption generators after transfer learning.

الحساب واللغة التعلم الآلي الحوسبة العصبية والتطورية

Quantifying the amount of visual information used by neural caption generators

336 - Marc Tanti , Albert Gatt , Kenneth P. Camilleri 2018

This paper addresses the sensitivity of neural image caption generators to their visual input. A sensitivity analysis and omission analysis based on image foils is reported, showing that the extent to which image captioning architectures retain and a re sensitive to visual information varies depending on the type of word being generated and the position in the caption as a whole. We motivate this work in the context of broader goals in the field to achieve more explainability in AI.

الحوسبة العصبية والتطورية الحساب واللغة

Pre-gen metrics: Predicting caption quality metrics without generating captions

79 - Marc Tanti , Albert Gatt , Adrian Muscat 2018

Image caption generation systems are typically evaluated against reference outputs. We show that it is possible to predict output quality without generating the captions, based on the probability assigned by the neural model to the reference captions . Such pre-gen metrics are strongly correlated to standard evaluation metrics.

الحوسبة العصبية والتطورية الحساب واللغة

Where to put the Image in an Image Caption Generator

104 - Marc Tanti 2017

When a recurrent neural network language model is used for caption generation, the image information can be fed to the neural network either by directly incorporating it in the RNN -- conditioning the language model by `injecting image features -- or in a layer following the RNN -- conditioning the language model by `merging image features. While both options are attested in the literature, there is as yet no systematic comparison between the two. In this paper we empirically show that it is not especially detrimental to performance whether one architecture is used or another. The merge architecture does have practical advantages, as conditioning by merging allows the RNNs hidden state vector to shrink in size by up to four times. Our results suggest that the visual and linguistic modalities for caption generation need not be jointly encoded by the RNN as that yields large, memory-intensive models with few tangible advantages in performance; rather, the multimodal integration should be delayed to a subsequent stage.

الحوسبة العصبية والتطورية الحساب واللغة الرؤية الحاسوبية وتمييز الأنماط

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد