Do you want to publish a course? Click here

Towards Task Understanding in Visual Settings

59   0   0.0 ( 0 )
 Added by Sebastin Santy
 Publication date 2018
and research's language is English




Ask ChatGPT about the research

We consider the problem of understanding real world tasks depicted in visual images. While most existing image captioning methods excel in producing natural language descriptions of visual scenes involving human tasks, there is often the need for an understanding of the exact task being undertaken rather than a literal description of the scene. We leverage insights from real world task understanding systems, and propose a framework composed of convolutional neural networks, and an external hierarchical task ontology to produce task descriptions from input images. Detailed experiments highlight the efficacy of the extracted descriptions, which could potentially find their way in many applications, including image alt text generation.



rate research

Read More

We describe the task of Visual Understanding and Narration, in which a robot (or agent) generates text for the images that it collects when navigating its environment, by answering open-ended questions, such as what happens, or might have happened, here?
For graphical user interface (UI) design, it is important to understand what attracts visual attention. While previous work on saliency has focused on desktop and web-based UIs, mobile app UIs differ from these in several respects. We present findings from a controlled study with 30 participants and 193 mobile UIs. The results speak to a role of expectations in guiding where users look at. Strong bias toward the top-left corner of the display, text, and images was evident, while bottom-up features such as color or size affected saliency less. Classic, parameter-free saliency models showed a weak fit with the data, and data-driven models improved significantly when trained specifically on this dataset (e.g., NSS rose from 0.66 to 0.84). We also release the first annotated dataset for investigating visual saliency in mobile UIs.
45 - Yu Wang , Jixing Xu , Aohan Wu 2017
Designing an e-commerce recommender system that serves hundreds of millions of active users is a daunting challenge. From a human vision perspective, therere two key factors that affect users behaviors: items attractiveness and their matching degree with users interests. This paper proposes Telepath, a vision-based bionic recommender system model, which understands users from such perspective. Telepath is a combination of a convolutional neural network (CNN), a recurrent neural network (RNN) and deep neural networks (DNNs). Its CNN subnetwork simulates the human vision system to extract key visual signals of items attractiveness and generate corresponding activations. Its RNN and DNN subnetworks simulate cerebral cortex to understand users interest based on the activations generated from browsed items. In practice, the Telepath model has been launched to JDs recommender system and advertising system. For one of the major item recommendation blocks on the JD app, click-through rate (CTR), gross merchandise value (GMV) and orders have increased 1.59%, 8.16% and 8.71% respectively. For several major ads publishers of JD demand-side platform, CTR, GMV and return on investment have increased 6.58%, 61.72% and 65.57% respectively by the first launch, and further increased 2.95%, 41.75% and 41.37% respectively by the second launch.
242 - Kang Zhao , Pan Pan , Yun Zheng 2021
Graph-based approximate nearest neighbor search has attracted more and more attentions due to its online search advantages. Numbers of methods studying the enhancement of speed and recall have been put forward. However, few of them focus on the efficiency and scale of offline graph-construction. For a deployed visual search system with several billions of online images in total, building a billion-scale offline graph in hours is essential, which is almost unachievable by most existing methods. In this paper, we propose a novel algorithm called Binary Distributed Graph to solve this problem. Specifically, we combine binary codes with graph structure to speedup online and offline procedures, and achieve comparable performance with the ones in real-value based scenarios by recalling more binary candidates. Furthermore, the graph-construction is optimized to completely distributed implementation, which significantly accelerates the offline process and gets rid of the limitation of memory and disk within a single machine. Experimental comparisons on Alibaba Commodity Data Set (more than three billion images) show that the proposed method outperforms the state-of-the-art with respect to the online/offline trade-off.
Yarbus claim to decode the observers task from eye movements has received mixed reactions. In this paper, we have supported the hypothesis that it is possible to decode the task. We conducted an exploratory analysis on the dataset by projecting features and data points into a scatter plot to visualize the nuance properties for each task. Following this analysis, we eliminated highly correlated features before training an SVM and Ada Boosting classifier to predict the tasks from this filtered eye movements data. We achieve an accuracy of 95.4% on this task classification problem and hence, support the hypothesis that task classification is possible from a users eye movement data.
comments
Fetching comments Fetching comments
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا