ترغب بنشر مسار تعليمي؟ اضغط هنا

Attention-Based Query Expansion Learning

68   0   0.0 ( 0 )
 نشر من قبل Albert Gordo
 تاريخ النشر 2020
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Query expansion is a technique widely used in image search consisting in combining highly ranked images from an original query into an expanded query that is then reissued, generally leading to increased recall and precision. An important aspect of query expansion is choosing an appropriate way to combine the images into a new query. Interestingly, despite the undeniable empirical success of query expansion, ad-hoc methods with different caveats have dominated the landscape, and not a lot of research has been done on learning how to do query expansion. In this paper we propose a more principled framework to query expansion, where one trains, in a discriminative manner, a model that learns how images should be aggregated to form the expanded query. Within this framework, we propose a model that leverages a self-attention mechanism to effectively learn how to transfer information between the different images before aggregating them. Our approach obtains higher accuracy than existing approaches on standard benchmarks. More importantly, our approach is the only one that consistently shows high accuracy under different regimes, overcoming caveats of existing methods.



قيم البحث

اقرأ أيضاً

138 - Junyan Wang , Yang Bai , Yang Long 2020
Video summarization aims to select representative frames to retain high-level information, which is usually solved by predicting the segment-wise importance score via a softmax function. However, softmax function suffers in retaining high-rank repres entations for complex visual or sequential information, which is known as the Softmax Bottleneck problem. In this paper, we propose a novel framework named Dual Mixture Attention (DMASum) model with Meta Learning for video summarization that tackles the softmax bottleneck problem, where the Mixture of Attention layer (MoA) effectively increases the model capacity by employing twice self-query attention that can capture the second-order changes in addition to the initial query-key attention, and a novel Single Frame Meta Learning rule is then introduced to achieve more generalization to small datasets with limited training sources. Furthermore, the DMASum significantly exploits both visual and sequential attention that connects local key-frame and global attention in an accumulative way. We adopt the new evaluation protocol on two public datasets, SumMe, and TVSum. Both qualitative and quantitative experiments manifest significant improvements over the state-of-the-art methods.
Early prediction of cerebral palsy is essential as it leads to early treatment and monitoring. Deep learning has shown promising results in biomedical engineering thanks to its capacity of modelling complicated data with its non-linear architecture. However, due to their complex structure, deep learning models are generally not interpretable by humans, making it difficult for clinicians to rely on the findings. In this paper, we propose a channel attention module for deep learning models to predict cerebral palsy from infants body movements, which highlights the key features (i.e. body joints) the model identifies as important, thereby indicating why certain diagnostic results are found. To highlight the capacity of the deep network in modelling input features, we utilize raw joint positions instead of hand-crafted features. We validate our system with a real-world infant movement dataset. Our proposed channel attention module enables the visualization of the vital joints to this disease that the network considers. Our system achieves 91.67% accuracy, suppressing other state-of-the-art deep learning methods.
Red blood cells are highly deformable and present in various shapes. In blood cell disorders, only a subset of all cells is morphologically altered and relevant for the diagnosis. However, manually labeling of all cells is laborious, complicated and introduces inter-expert variability. We propose an attention based multiple instance learning method to classify blood samples of patients suffering from blood cell disorders. Cells are detected using an R-CNN architecture. With the features extracted for each cell, a multiple instance learning method classifies patient samples into one out of four blood cell disorders. The attention mechanism provides a measure of the contribution of each cell to the overall classification and significantly improves the networks classification accuracy as well as its interpretability for the medical expert.
After DETR was proposed, this novel transformer-based detection paradigm which performs several cross-attentions between object queries and feature maps for predictions has subsequently derived a series of transformer-based detection heads. These mod els iterate object queries after each cross-attention. However, they dont renew the query position which indicates object queries position information. Thus model needs extra learning to figure out the newest regions that query position should express and need more attention. To fix this issue, we propose the Guided Query Position (GQPos) method to embed the latest location information of object queries to query position iteratively. Another problem of such transformer-based detection heads is the high complexity to perform attention on multi-scale feature maps, which hinders them from improving detection performance at all scales. Therefore we propose a novel fusion scheme named Similar Attention (SiA): besides the feature maps is fused, SiA also fuse the attention weights maps to accelerate the learning of high-resolution attention weight map by well-learned low-resolution attention weight map. Our experiments show that the proposed GQPos improves the performance of a series of models, including DETR, SMCA, YoloS, and HoiTransformer and SiA consistently improve the performance of multi-scale transformer-based detection heads like DETR and HoiTransformer.
Attention mechanisms are ubiquitous components in neural architectures applied to natural language processing. In addition to yielding gains in predictive accuracy, attention weights are often claimed to confer interpretability, purportedly useful bo th for providing insights to practitioners and for explaining why a model makes its decisions to stakeholders. We call the latter use of attention mechanisms into question by demonstrating a simple method for training models to produce deceptive attention masks. Our method diminishes the total weight assigned to designated impermissible tokens, even when the models can be shown to nevertheless rely on these features to drive predictions. Across multiple models and tasks, our approach manipulates attention weights while paying surprisingly little cost in accuracy. Through a human study, we show that our manipulated attention-based explanations deceive people into thinking that predictions from a model biased against gender minorities do not rely on the gender. Consequently, our results cast doubt on attentions reliability as a tool for auditing algorithms in the context of fairness and accountability.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا