Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

Towards Task Understanding in Visual Settings

59 0 0.0 ( 0 )

Download Cite

Added by Sebastin Santy

Publication date 2018

fields Informatics Engineering

and research's language is English

Authors Sebastin Santy - Wazeer Zulfikar - Rishabh Mehrotra

Information Retrieval Computer Vision and Pattern Recognition

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

We consider the problem of understanding real world tasks depicted in visual images. While most existing image captioning methods excel in producing natural language descriptions of visual scenes involving human tasks, there is often the need for an understanding of the exact task being undertaken rather than a literal description of the scene. We leverage insights from real world task understanding systems, and propose a framework composed of convolutional neural networks, and an external hierarchical task ontology to produce task descriptions from input images. Detailed experiments highlight the efficacy of the extracted descriptions, which could potentially find their way in many applications, including image alt text generation.

rate research

Visual Understanding and Narration: A Deeper Understanding and Explanation of Visual Scenes

60 - Stephanie M. Lukin , Claire Bonial , 2019

We describe the task of Visual Understanding and Narration, in which a robot (or agent) generates text for the images that it collects when navigating its environment, by answering open-ended questions, such as what happens, or might have happened, here?

Computation and Language Computer Vision and Pattern Recognition

Understanding Visual Saliency in Mobile User Interfaces

89 - Luis A. Leiva , Yunfei Xue , Avya Bansal 2021

For graphical user interface (UI) design, it is important to understand what attracts visual attention. While previous work on saliency has focused on desktop and web-based UIs, mobile app UIs differ from these in several respects. We present findings from a controlled study with 30 participants and 193 mobile UIs. The results speak to a role of expectations in guiding where users look at. Strong bias toward the top-left corner of the display, text, and images was evident, while bottom-up features such as color or size affected saliency less. Classic, parameter-free saliency models showed a weak fit with the data, and data-driven models improved significantly when trained specifically on this dataset (e.g., NSS rose from 0.66 to 0.84). We also release the first annotated dataset for investigating visual saliency in mobile UIs.

Human-Computer Interaction Computer Vision and Pattern Recognition

Telepath: Understanding Users from a Human Vision Perspective in Large-Scale Recommender Systems

45 - Yu Wang , Jixing Xu , Aohan Wu 2017

Designing an e-commerce recommender system that serves hundreds of millions of active users is a daunting challenge. From a human vision perspective, therere two key factors that affect users behaviors: items attractiveness and their matching degree with users interests. This paper proposes Telepath, a vision-based bionic recommender system model, which understands users from such perspective. Telepath is a combination of a convolutional neural network (CNN), a recurrent neural network (RNN) and deep neural networks (DNNs). Its CNN subnetwork simulates the human vision system to extract key visual signals of items attractiveness and generate corresponding activations. Its RNN and DNN subnetworks simulate cerebral cortex to understand users interest based on the activations generated from browsed items. In practice, the Telepath model has been launched to JDs recommender system and advertising system. For one of the major item recommendation blocks on the JD app, click-through rate (CTR), gross merchandise value (GMV) and orders have increased 1.59%, 8.16% and 8.71% respectively. For several major ads publishers of JD demand-side platform, CTR, GMV and return on investment have increased 6.58%, 61.72% and 65.57% respectively by the first launch, and further increased 2.95%, 41.75% and 41.37% respectively by the second launch.

Information Retrieval Computer Vision and Pattern Recognition Machine Learning

Large-Scale Visual Search with Binary Distributed Graph at Alibaba

242 - Kang Zhao , Pan Pan , Yun Zheng 2021

Graph-based approximate nearest neighbor search has attracted more and more attentions due to its online search advantages. Numbers of methods studying the enhancement of speed and recall have been put forward. However, few of them focus on the efficiency and scale of offline graph-construction. For a deployed visual search system with several billions of online images in total, building a billion-scale offline graph in hours is essential, which is almost unachievable by most existing methods. In this paper, we propose a novel algorithm called Binary Distributed Graph to solve this problem. Specifically, we combine binary codes with graph structure to speedup online and offline procedures, and achieve comparable performance with the ones in real-value based scenarios by recalling more binary candidates. Furthermore, the graph-construction is optimized to completely distributed implementation, which significantly accelerates the offline process and gets rid of the limitation of memory and disk within a single machine. Experimental comparisons on Alibaba Commodity Data Set (more than three billion images) show that the proposed method outperforms the state-of-the-art with respect to the online/offline trade-off.

Information Retrieval Computer Vision and Pattern Recognition

Task Classification Model for Visual Fixation, Exploration, and Search

107 - Ayush Kumar , Anjul Tyagi , Michael Burch 2019

Yarbus claim to decode the observers task from eye movements has received mixed reactions. In this paper, we have supported the hypothesis that it is possible to decode the task. We conducted an exploratory analysis on the dataset by projecting features and data points into a scatter plot to visualize the nuance properties for each task. Following this analysis, we eliminated highly correlated features before training an SVM and Ada Boosting classifier to predict the tasks from this filtered eye movements data. We achieve an accuracy of 95.4% on this task classification problem and hence, support the hypothesis that task classification is possible from a users eye movement data.

Machine Learning Computer Vision and Pattern Recognition Machine Learning

comments

Fetching comments

Qasyoun Private University For Science And Technology

Additional details More universities

Towards Task Understanding in Visual Settings

Ask ChatGPT about the research

No Arabic abstract

Read More