No Arabic abstract
Intra-operative anticipation of instrument usage is a necessary component for context-aware assistance in surgery, e.g. for instrument preparation or semi-automation of robotic tasks. However, the sparsity of instrument occurrences in long videos poses a challenge. Current approaches are limited as they assume knowledge on the timing of future actions or require dense temporal segmentations during training and inference. We propose a novel learning task for anticipation of instrument usage in laparoscopic videos that overcomes these limitations. During training, only sparse instrument annotations are required and inference is done solely on image data. We train a probabilistic model to address the uncertainty associated with future events. Our approach outperforms several baselines and is competitive to a variant using richer annotations. We demonstrate the models ability to quantify task-relevant uncertainties. To the best of our knowledge, we are the first to propose a method for anticipating instruments in surgery.
Recent work in neural machine translation has demonstrated both the necessity and feasibility of using inter-sentential context -- context from sentences other than those currently being translated. However, while many current methods present model architectures that theoretically can use this extra context, it is often not clear how much they do actually utilize it at translation time. In this paper, we introduce a new metric, conditional cross-mutual information, to quantify the usage of context by these models. Using this metric, we measure how much document-level machine translation systems use particular varieties of context. We find that target context is referenced more than source context, and that conditioning on a longer context has a diminishing effect on results. We then introduce a new, simple training method, context-aware word dropout, to increase the usage of context by context-aware models. Experiments show that our method increases context usage and that this reflects on the translation quality according to metrics such as BLEU and COMET, as well as performance on anaphoric pronoun resolution and lexical cohesion contrastive datasets.
State-of-the-art navigation methods leverage a spatial memory to generalize to new environments, but their occupancy maps are limited to capturing the geometric structures directly observed by the agent. We propose occupancy anticipation, where the agent uses its egocentric RGB-D observations to infer the occupancy state beyond the visible regions. In doing so, the agent builds its spatial awareness more rapidly, which facilitates efficient exploration and navigation in 3D environments. By exploiting context in both the egocentric views and top-down maps our model successfully anticipates a broader map of the environment, with performance significantly better than strong baselines. Furthermore, when deployed for the sequential decision-making tasks of exploration and navigation, our model outperforms state-of-the-art methods on the Gibson and Matterport3D datasets. Our approach is the winning entry in the 2020 Habitat PointNav Challenge. Project page: http://vision.cs.utexas.edu/projects/occupancy_anticipation/
Face anti-spoofing (FAS) seeks to discriminate genuine faces from fake ones arising from any type of spoofing attack. Due to the wide varieties of attacks, it is implausible to obtain training data that spans all attack types. We propose to leverage physical cues to attain better generalization on unseen domains. As a specific demonstration, we use physically guided proxy cues such as depth, reflection, and material to complement our main anti-spoofing (a.k.a liveness detection) task, with the intuition that genuine faces across domains have consistent face-like geometry, minimal reflection, and skin material. We introduce a novel uncertainty-aware attention scheme that independently learns to weigh the relative contributions of the main and proxy tasks, preventing the over-confident issue with traditional attention modules. Further, we propose attribute-assisted hard negative mining to disentangle liveness-irrelevant features with liveness features during learning. We evaluate extensively on public benchmarks with intra-dataset and inter-dataset protocols. Our method achieves the superior performance especially in unseen domain generalization for FAS.
Anticipating human motion depends on two factors: the past motion and the persons intention. While the first factor has been extensively utilized to forecast short sequences of human motion, the second one remains elusive. In this work we approximate a persons intention via a symbolic representation, for example fine-grained action labels such as walking or sitting down. Forecasting a symbolic representation is much easier than forecasting the full body pose with its complex inter-dependencies. However, knowing the future actions makes forecasting human motion easier. We exploit this connection by first anticipating symbolic labels and then generate human motion, conditioned on the human motion input sequence as well as on the forecast labels. This allows the model to anticipate motion changes many steps ahead and adapt the poses accordingly. We achieve state-of-the-art results on short-term as well as on long-term human motion forecasting.
We study the large-time behavior of systems driven by radial potentials, which react to anticipated positions, ${mathbf x}^tau(t)={mathbf x}(t)+tau {mathbf v}(t)$, with anticipation increment $tau>0$. As a special case, such systems yield the celebrated Cucker-Smale model for alignment, coupled with pairwise interactions. Viewed from this perspective, such anticipated-driven systems are expected to emerge into flocking due to alignment of velocities, and spatial concentration due to confining potentials. We treat both the discrete dynamics and large crowd hydrodynamics, proving the decisive role of anticipation in driving such systems with attractive potentials into velocity alignment and spatial concentration. We also study the concentration effect near equilibrium for anticipated-based dynamics of pair of agents governed by attractive-repulsive potentials.