No Arabic abstract
Learning effective representations of visual data that generalize to a variety of downstream tasks has been a long quest for computer vision. Most representation learning approaches rely solely on visual data such as images or videos. In this paper, we explore a novel approach, where we use human interaction and attention cues to investigate whether we can learn better representations compared to visual-only representations. For this study, we collect a dataset of human interactions capturing body part movements and gaze in their daily lives. Our experiments show that our muscly-supervised representation that encodes interaction and attention cues outperforms a visual-only state-of-the-art method MoCo (He et al.,2020), on a variety of target tasks: scene classification (semantic), action recognition (temporal), depth estimation (geometric), dynamics prediction (physics) and walkable surface estimation (affordance). Our code and dataset are available at: https://github.com/ehsanik/muscleTorch.
We describe the pitfalls encountered in deducing from classical double radio source observables (luminosity, spectral index, redshift and linear size) the essential nature of how these objects evolve. We discuss the key role played by hotspots in governing the energy distribution of the lobes they feed, and subsequent spectral evolution. We present images obtained using the new 74 MHz receivers on the VLA and discuss constraints which these enforce on models of the backflow and ages in classical doubles.
We present a passive non-line-of-sight method that infers the number of people or activity of a person from the observation of a blank wall in an unknown room. Our technique analyzes complex imperceptible changes in indirect illumination in a video of the wall to reveal a signal that is correlated with motion in the hidden part of a scene. We use this signal to classify between zero, one, or two moving people, or the activity of a person in the hidden scene. We train two convolutional neural networks using data collected from 20 different scenes, and achieve an accuracy of $approx94%$ for both tasks in unseen test environments and real-time online settings. Unlike other passive non-line-of-sight methods, the technique does not rely on known occluders or controllable light sources, and generalizes to unknown rooms with no re-calibration. We analyze the generalization and robustness of our method with both real and synthetic data, and study the effect of the scene parameters on the signal quality.
We investigate the effects of multi-task learning using the recently introduced task of semantic tagging. We employ semantic tagging as an auxiliary task for three different NLP tasks: part-of-speech tagging, Universal Dependency parsing, and Natural Language Inference. We compare full neural network sharing, partial neural network sharing, and what we term the learning what to share setting where negative transfer between tasks is less likely. Our findings show considerable improvements for all tasks, particularly in the learning what to share setting, which shows consistent gains across all tasks.
We discuss the features of instabilities in binary systems, in particular, for asymmetric nuclear matter. We show its relevance for the interpretation of results obtained in experiments and in ab initio simulations of the reaction between $^{124}Sn+^{124}Sn$ at 50AMeV.}
Different from other multiple top-quark productions, triple top-quark production requires the presence of both flavor violating neutral interaction and flavor conserving neutral interaction. We describe the interaction of triple top-quarks and up-quark in terms of two dimension-6 operators; one can be induced by a new heavy vector resonance, the other by a scalar resonance. Combining same-sign top-quark pair production and four top-quark production, we explore the potential of the 13 TeV LHC on searching for the triple top-quark production.