ترغب بنشر مسار تعليمي؟ اضغط هنا

Learning Interactions and Relationships between Movie Characters

74   0   0.0 ( 0 )
 نشر من قبل Anna Kukleva
 تاريخ النشر 2020
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

Interactions between people are often governed by their relationships. On the flip side, social relationships are built upon several interactions. Two strangers are more likely to greet and introduce themselves while becoming friends over time. We are fascinated by this interplay between interactions and relationships, and believe that it is an important aspect of understanding social situations. In this work, we propose neural models to learn and jointly predict interactions, relationships, and the pair of characters that are involved. We note that interactions are informed by a mixture of visual and dialog cues, and present a multimodal architecture to extract meaningful information from them. Localizing the pair of interacting characters in video is a time-consuming process, instead, we train our model to learn from clip-level weak labels. We evaluate our models on the MovieGraphs dataset and show the impact of modalities, use of longer temporal context for predicting relationships, and achieve encouraging performance using weak labels as compared with ground-truth labels. Code is online.



قيم البحث

اقرأ أيضاً

We tackle the problem of learning the geometry of multiple categories of deformable objects jointly. Recent work has shown that it is possible to learn a unified dense pose predictor for several categories of related objects. However, training such m odels requires to initialize inter-category correspondences by hand. This is suboptimal and the resulting models fail to maintain correct correspondences as individual categories are learned. In this paper, we show that improved correspondences can be learned automatically as a natural byproduct of learning category-specific dense pose predictors. To do this, we express correspondences between different categories and between images and categories using a unified embedding. Then, we use the latter to enforce two constraints: symmetric inter-category cycle consistency and a new asymmetric image-to-category cycle consistency. Without any manual annotations for the inter-category correspondences, we obtain state-of-the-art alignment results, outperforming dedicated methods for matching 3D shapes. Moreover, the new model is also better at the task of dense pose prediction than prior work.
66 - Danila Potapov 2015
While important advances were recently made towards temporally localizing and recognizing specific human actions or activities in videos, efficient detection and classification of long video chunks belonging to semantically defined categories such as pursuit or romance remains challenging.We introduce a new dataset, Action Movie Franchises, consisting of a collection of Hollywood action movie franchises. We define 11 non-exclusive semantic categories - called beat-categories - that are broad enough to cover most of the movie footage. The corresponding beat-events are annotated as groups of video shots, possibly overlapping.We propose an approach for localizing beat-events based on classifying shots into beat-categories and learning the temporal constraints between shots. We show that temporal constraints significantly improve the classification performance. We set up an evaluation protocol for beat-event localization as well as for shot classification, depending on whether movies from the same franchise are present or not in the training data.
Knowledge bases of entities and relations (either constructed manually or automatically) are behind many real world search engines, including those at Yahoo!, Microsoft, and Google. Those knowledge bases can be viewed as graphs with nodes representin g entities and edges representing (primary) relationships, and various studies have been conducted on how to leverage them to answer entity seeking queries. Meanwhile, in a complementary direction, analyses over the query logs have enabled researchers to identify entity pairs that are statistically correlated. Such entity relationships are then presented to search users through the related searches feature in modern search engines. However, entity relationships thus discovered can often be puzzling to the users because why the entities are connected is often indescribable. In this paper, we propose a novel problem called entity relationship explanation, which seeks to explain why a pair of entities are connected, and solve this challenging problem by integrating the above two complementary approaches, i.e., we leverage the knowledge base to explain the connections discovered between entity pairs. More specifically, we present REX, a system that takes a pair of entities in a given knowledge base as input and efficiently identifies a ranked list of relationship explanations. We formally define relationship explanations and analyze their desirable properties. Furthermore, we design and implement algorithms to efficiently enumerate and rank all relationship explanations based on multiple measures of interestingness. We perform extensive experiments over real web-scale data gathered from DBpedia and a commercial search engine, demonstrating the efficiency and scalability of REX. We also perform user studies to corroborate the effectiveness of explanations generated by REX.
Automatically writing stylized Chinese characters is an attractive yet challenging task due to its wide applicabilities. In this paper, we propose a novel framework named Style-Aware Variational Auto-Encoder (SA-VAE) to flexibly generate Chinese char acters. Specifically, we propose to capture the different characteristics of a Chinese character by disentangling the latent features into content-related and style-related components. Considering of the complex shapes and structures, we incorporate the structure information as prior knowledge into our framework to guide the generation. Our framework shows a powerful one-shot/low-shot generalization ability by inferring the style component given a character with unseen style. To the best of our knowledge, this is the first attempt to learn to write new-style Chinese characters by observing only one or a few examples. Extensive experiments demonstrate its effectiveness in generating different stylized Chinese characters by fusing the feature vectors corresponding to different contents and styles, which is of significant importance in real-world applications.
The goal of the IARAI competition traffic4cast was to predict the city-wide traffic status within a 15-minute time window, based on information from the previous hour. The traffic status was given as multi-channel images (one pixel roughly correspond s to 100x100 meters), where one channel indicated the traffic volume, another one the average speed of vehicles, and a third one their rough heading. As part of our work on the competition, we evaluated many different network architectures, analyzed the statistical properties of the given data in detail, and thought about how to transform the problem to be able to take additional spatio-temporal context-information into account, such as the street network, the positions of traffic lights, or the weather. This document summarizes our efforts that led to our best submission, and gives some insights about which other approaches we evaluated, and why they did not work as well as imagined.
التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا