Do you want to publish a course? Click here

Visually Grounded Concept Composition

تكوين مفهوم الأساس بصريا

278   0   0   0.0 ( 0 )
 Publication date 2021
and research's language is English
 Created by Shamra Editor




Ask ChatGPT about the research

We investigate ways to compose complex concepts in texts from primitive ones while grounding them in images. We propose Concept and Relation Graph (CRG), which builds on top of constituency analysis and consists of recursively combined concepts with predicate functions. Meanwhile, we propose a concept composition neural network called Composer to leverage the CRG for visually grounded concept learning. Specifically, we learn the grounding of both primitive and all composed concepts by aligning them to images and show that learning to compose leads to more robust grounding results, measured in text-to-image matching accuracy. Notably, our model can model grounded concepts forming at both the finer-grained sentence level and the coarser-grained intermediate level (or word-level). Composer leads to pronounced improvement in matching accuracy when the evaluation data has significant compound divergence from the training data.



References used
https://aclanthology.org/
rate research

Read More

In this paper, we define and evaluate a methodology for extracting history-dependent spatial questions from visual dialogues. We say that a question is history-dependent if it requires (parts of) its dialogue history to be interpreted. We argue that some kinds of visual questions define a context upon which a follow-up spatial question relies. We call the question that restricts the context: trigger, and we call the spatial question that requires the trigger question to be answered: zoomer. We automatically extract different trigger and zoomer pairs based on the visual property that the questions rely on (e.g. color, number). We manually annotate the automatically extracted trigger and zoomer pairs to verify which zoomers require their trigger. We implement a simple baseline architecture based on a SOTA multimodal encoder. Our results reveal that there is much room for improvement for answering history-dependent questions.
While many NLP pipelines assume raw, clean texts, many texts we encounter in the wild, including a vast majority of legal documents, are not so clean, with many of them being visually structured documents (VSDs) such as PDFs. Conventional preprocessi ng tools for VSDs mainly focused on word segmentation and coarse layout analysis, whereas fine-grained logical structure analysis (such as identifying paragraph boundaries and their hierarchies) of VSDs is underexplored. To that end, we proposed to formulate the task as prediction of transition labels'' between text fragments that maps the fragments to a tree, and developed a feature-based machine learning system that fuses visual, textual and semantic cues. Our system is easily customizable to different types of VSDs and it significantly outperformed baselines in identifying different structures in VSDs. For example, our system obtained a paragraph boundary detection F1 score of 0.953 which is significantly better than a popular PDF-to-text tool with an F1 score of 0.739.
In this paper, we study the problem of recognizing compositional attribute-object concepts within the zero-shot learning (ZSL) framework. We propose an episode-based cross-attention (EpiCA) network which combines merits of cross-attention mechanism a nd episode-based training strategy to recognize novel compositional concepts. Firstly, EpiCA bases on cross-attention to correlate conceptvisual information and utilizes the gated pooling layer to build contextualized representations for both images and concepts. The updated representations are used for a more indepth multi-modal relevance calculation for concept recognition. Secondly, a two-phase episode training strategy, especially the ransductive phase, is adopted to utilize unlabeled test examples to alleviate the low-resource learning problem. Experiments on two widelyused zero-shot compositional learning (ZSCL) benchmarks have demonstrated the effectiveness of the model compared with recent approaches on both conventional and generalized ZSCL settings.
Human societies suffered from worsening manifestations of intolerance and violence and intolerance, Creating an imbalance in the foundations and principles and values ​​that govern the relationship of the other so remove him intellectually and politi cally, religiously and humanly ….. The absence of tolerance lead to rule mentality of prohibition and criminalization. Conversely acquired the concept of tolerance many different meanings, and reflected a variety of images across different forms of human consciousness, Tolerance is no longer confined to the sectarian and religious side, but extended to the political, legal social side , Ethnic ....... ,These issues combined kick philosophers to raise a number of issues and issues relating to tolerance. Philosophy was one of the most fields of knowledge which has worked to entrench in the human mind. .As the need for today - as in many periods of human history - To breathe life into the lofty human values ​​and fertilized and dissemination, It may be appropriate to scrutinize the concept in terms of the philosophy of tolerance, Considering that philosophy is that test center where concepts and vital area to enrich and give them strength, impact strength in thought and behavior. Tolerance is hoped wagered not only a virtue but a necessity and existential social, cultural and political need. And in order to fortify the reality in front of all the risks of dogmatism and bigotry that can take us and targeting our existence and our gains and our aspirations.

suggested questions

comments
Fetching comments Fetching comments
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا