أوراق بحثية, رسائل ماجستير ودكتوراه منشورة من قبل Yash Goyal

Static and Dynamic Measures of Active Music Listening as Indicators of Depression Risk

60 - Aayush Surana , Yash Goyal , Vinoo Alluri 2020

Music, an integral part of our lives, which is not only a source of entertainment but plays an important role in mental well-being by impacting moods, emotions and other affective states. Music preferences and listening strategies have been shown to be associated with the psychological well-being of listeners including internalized symptomatology and depression. However, till date no studies exist that examine time-varying music consumption, in terms of acoustic content, and its association with users well-being. In the current study, we aim at unearthing static and dynamic patterns prevalent in active listening behavior of individuals which may be used as indicators of risk for depression. Mental well-being scores and listening histories of 541 Last.fm users were examined. Static and dynamic acoustic and emotion-related features were extracted from each users listening history and correlated with their mental well-being scores. Results revealed that individuals with greater depression risk resort to higher dependency on music with greater repetitiveness in their listening activity. Furthermore, the affinity of depressed individuals towards music that can be perceived as sad was found to be resistant to change over time. This study has large implications for future work in the area of assessing mental illness risk by exploiting digital footprints of users via online music streaming platforms.

معالجة الصوت والكلام استرجاع المعلومات الوسائط المتعددة

Tag2Risk: Harnessing Social Music Tags for Characterizing Depression Risk

57 - Aayush Surana , Yash Goyal , Manish Shrivastava 2020

Musical preferences have been considered a mirror of the self. In this age of Big Data, online music streaming services allow us to capture ecologically valid music listening behavior and provide a rich source of information to identify several user- specific aspects. Studies have shown musical engagement to be an indirect representation of internal states including internalized symptomatology and depression. The current study aims at unearthing patterns and trends in the individuals at risk for depression as it manifests in naturally occurring music listening behavior. Mental well-being scores, musical engagement measures, and listening histories of Last.fm users (N=541) were acquired. Social tags associated with each listeners most popular tracks were analyzed to unearth the mood/emotions and genres associated with the users. Results revealed that social tags prevalent in the users at risk for depression were predominantly related to emotions depicting Sadness associated with genre tags representing neo-psychedelic-, avant garde-, dream-pop. This study will open up avenues for an MIR-based approach to characterizing and predicting risk for depression which can be helpful in early detection and additionally provide bases for designing music recommendations accordingly.

استرجاع المعلومات الوسائط المتعددة أنظمة الصوت في الحاسوب

Question-Conditioned Counterfactual Image Generation for VQA

83 - Jingjing Pan , Yash Goyal , Stefan Lee 2019

While Visual Question Answering (VQA) models continue to push the state-of-the-art forward, they largely remain black-boxes - failing to provide insight into how or why an answer is generated. In this ongoing work, we propose addressing this shortcom ing by learning to generate counterfactual images for a VQA model - i.e. given a question-image pair, we wish to generate a new image such that i) the VQA model outputs a different answer, ii) the new image is minimally different from the original, and iii) the new image is realistic. Our hope is that providing such counterfactual examples allows users to investigate and understand the VQA models internal mechanisms.

الرؤية الحاسوبية وتمييز الأنماط الحساب واللغة

Explaining Classifiers with Causal Concept Effect (CaCE)

92 - Yash Goyal , Amir Feder , Uri Shalit 2019

How can we understand classification decisions made by deep neural networks? Many existing explainability methods rely solely on correlations and fail to account for confounding, which may result in potentially misleading explanations. To overcome th is problem, we define the Causal Concept Effect (CaCE) as the causal effect of (the presence or absence of) a human-interpretable concept on a deep neural nets predictions. We show that the CaCE measure can avoid errors stemming from confounding. Estimating CaCE is difficult in situations where we cannot easily simulate the do-operator. To mitigate this problem, we use a generative model, specifically a Variational AutoEncoder (VAE), to measure VAE-CaCE. In an extensive experimental analysis, we show that the VAE-CaCE is able to estimate the true concept causal effect, compared to baselines for a number of datasets including high dimensional images.

التعلم الآلي الرؤية الحاسوبية وتمييز الأنماط التعلم الالي

Counterfactual Visual Explanations

89 - Yash Goyal , Ziyan Wu , Jan Ernst 2019

In this work, we develop a technique to produce counterfactual visual explanations. Given a query image $I$ for which a vision system predicts class $c$, a counterfactual visual explanation identifies how $I$ could change such that the system would o utput a different specified class $c$. To do this, we select a distractor image $I$ that the system predicts as class $c$ and identify spatial regions in $I$ and $I$ such that replacing the identified region in $I$ with the identified region in $I$ would push the system towards classifying $I$ as $c$. We apply our approach to multiple image classification datasets generating qualitative results showcasing the interpretability and discriminativeness of our counterfactual explanations. To explore the effectiveness of our explanations in teaching humans, we present machine teaching experiments for the task of fine-grained bird classification. We find that users trained to distinguish bird species fare better when given access to counterfactual explanations in addition to training examples.

التعلم الآلي الذكاء الاصطناعي الرؤية الحاسوبية وتمييز الأنماط

Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering

63 - Yash Goyal , Tejas Khot , Douglas Summers-Stay 2016

Problems at the intersection of vision and language are of significant importance both as challenging research questions and for the rich set of applications they enable. However, inherent structure in our world and bias in our language tend to be a simpler signal for learning than visual modalities, resulting in models that ignore visual information, leading to an inflated sense of their capability. We propose to counter these language priors for the task of Visual Question Answering (VQA) and make vision (the V in VQA) matter! Specifically, we balance the popular VQA dataset by collecting complementary images such that every question in our balanced dataset is associated with not just a single image, but rather a pair of similar images that result in two different answers to the question. Our dataset is by construction more balanced than the original VQA dataset and has approximately twice the number of image-question pairs. Our complete balanced dataset is publicly available at www.visualqa.org as part of the 2nd iteration of the Visual Question Answering Dataset and Challenge (VQA v2.0). We further benchmark a number of state-of-art VQA models on our balanced dataset. All models perform significantly worse on our balanced dataset, suggesting that these models have indeed learned to exploit language priors. This finding provides the first concrete empirical evidence for what seems to be a qualitative sense among practitioners. Finally, our data collection protocol for identifying complementary images enables us to develop a novel interpretable model, which in addition to providing an answer to the given (image, question) pair, also provides a counter-example based explanation. Specifically, it identifies an image that is similar to the original image, but it believes has a different answer to the same question. This can help in building trust for machines among their users.

الرؤية الحاسوبية وتمييز الأنماط الذكاء الاصطناعي الحساب واللغة

Towards Transparent AI Systems: Interpreting Visual Question Answering Models

245 - Yash Goyal , Akrit Mohapatra , Devi Parikh 2016

Deep neural networks have shown striking progress and obtained state-of-the-art results in many AI research fields in the recent years. However, it is often unsatisfying to not know why they predict what they do. In this paper, we address the problem of interpreting Visual Question Answering (VQA) models. Specifically, we are interested in finding what part of the input (pixels in images or words in questions) the VQA model focuses on while answering the question. To tackle this problem, we use two visualization techniques -- guided backpropagation and occlusion -- to find important words in the question and important regions in the image. We then present qualitative and quantitative analyses of these importance maps. We found that even without explicit attention mechanisms, VQA models may sometimes be implicitly attending to relevant regions in the image, and often to appropriate words in the question.

الرؤية الحاسوبية وتمييز الأنماط الذكاء الاصطناعي الحساب واللغة

Yin and Yang: Balancing and Answering Binary Visual Questions

50 - Peng Zhang , Yash Goyal , Douglas Summers-Stay 2015

The complex compositional structure of language makes problems at the intersection of vision and language challenging. But language also provides a strong prior that can result in good superficial performance, without the underlying models truly unde rstanding the visual content. This can hinder progress in pushing state of art in the computer vision aspects of multi-modal AI. In this paper, we address binary Visual Question Answering (VQA) on abstract scenes. We formulate this problem as visual verification of concepts inquired in the questions. Specifically, we convert the question to a tuple that concisely summarizes the visual concept to be detected in the image. If the concept can be found in the image, the answer to the question is yes, and otherwise no. Abstract scenes play two roles (1) They allow us to focus on the high-level semantics of the VQA task as opposed to the low-level recognition problems, and perhaps more importantly, (2) They provide us the modality to balance the dataset such that language priors are controlled, and the role of vision is essential. In particular, we collect fine-grained pairs of scenes for every question, such that the answer to the question is yes for one scene, and no for the other for the exact same question. Indeed, language priors alone do not perform better than chance on our balanced dataset. Moreover, our proposed approach matches the performance of a state-of-the-art VQA approach on the unbalanced dataset, and outperforms it on the balanced dataset.

الحساب واللغة الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

CloudCV: Large Scale Distributed Computer Vision as a Cloud Service

66 - Harsh Agrawal , Clint Solomon Mathialagan , Yash Goyal 2015

We are witnessing a proliferation of massive visual data. Unfortunately scaling existing computer vision algorithms to large datasets leaves researchers repeatedly solving the same algorithmic, logistical, and infrastructural problems. Our goal is to democratize computer vision; one should not have to be a computer vision, big data and distributed computing expert to have access to state-of-the-art distributed computer vision algorithms. We present CloudCV, a comprehensive system to provide access to state-of-the-art distributed computer vision algorithms as a cloud service through a Web Interface and APIs.

الرؤية الحاسوبية وتمييز الأنماط النظم الموزعة والتوازية والحوسبة العنقودية

يمكنك البدء بجني المال وتحقيق ربح مادي من أبحاثك العلمية، المزيد