Advanced search powered by artificial intelligence

New community

Subscribe to the gold package and get unlimited access to Shamra Academy

SMASH: Physics-guided Reconstruction of Collisions from Videos

74 0 0.0 ( 0 )

Download Cite

Added by Aron Monszpart

Publication date 2016

fields Informatics Engineering

and research's language is English

Authors Aron Monszpart - Nils Thuerey - Niloy J. Mitra

Graphics Computer Vision and Pattern Recognition

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Collision sequences are commonly used in games and entertainment to add drama and excitement. Authoring even two body collisions in the real world can be difficult, as one has to get timing and the object trajectories to be correctly synchronized. After tedious trial-and-error iterations, when objects can actually be made to collide, then they are difficult to capture in 3D. In contrast, synthetically generating plausible collisions is difficult as it requires adjusting different collision parameters (e.g., object mass ratio, coefficient of restitution, etc.) and appropriate initial parameters. We present SMASH to directly read off appropriate collision parameters directly from raw input video recordings. Technically we enable this by utilizing laws of rigid body collision to regularize the problem of lifting 2D trajectories to a physically valid 3D reconstruction of the collision. The reconstructed sequences can then be modified and combined to easily author novel and plausible collisions. We evaluate our system on a range of synthetic scenes and demonstrate the effectiveness of our method by accurately reconstructing several complex real world collision events.

rate research

iMapper: Interaction-guided Joint Scene and Human Motion Mapping from Monocular Videos

215 - Aron Monszpart , Paul Guerrero , Duygu Ceylan 2018

A long-standing challenge in scene analysis is the recovery of scene arrangements under moderate to heavy occlusion, directly from monocular video. While the problem remains a subject of active research, concurrent advances have been made in the context of human pose reconstruction from monocular video, including image-space feature point detection and 3D pose recovery. These methods, however, start to fail under moderate to heavy occlusion as the problem becomes severely under-constrained. We approach the problems differently. We observe that people interact similarly in similar scenes. Hence, we exploit the correlation between scene object arrangement and motions performed in that scene in both directions: first, typical motions performed when interacting with objects inform us about possible object arrangements; and second, object arrangements, in turn, constrain the possible motions. We present iMapper, a data-driven method that focuses on identifying human-object interactions, and jointly reasons about objects and human movement over space-time to recover both a plausible scene arrangement and consistent human interactions. We first introduce the notion of characteristic interactions as regions in space-time when an informative human-object interaction happens. This is followed by a novel occlusion-aware matching procedure that searches and aligns such characteristic snapshots from an interaction database to best explain the input monocular video. Through extensive evaluations, both quantitative and qualitative, we demonstrate that iMapper significantly improves performance over both dedicated state-of-the-art scene analysis and 3D human pose recovery approaches, especially under medium to heavy occlusion.

Graphics Computer Vision and Pattern Recognition

SFV: Reinforcement Learning of Physical Skills from Videos

151 - Xue Bin Peng , Angjoo Kanazawa , Jitendra Malik 2018

Data-driven character animation based on motion capture can produce highly naturalistic behaviors and, when combined with physics simulation, can provide for natural procedural responses to physical perturbations, environmental changes, and morphological discrepancies. Motion capture remains the most popular source of motion data, but collecting mocap data typically requires heavily instrumented environments and actors. In this paper, we propose a method that enables physically simulated characters to learn skills from videos (SFV). Our approach, based on deep pose estimation and deep reinforcement learning, allows data-driven animation to leverage the abundance of publicly available video clips from the web, such as those from YouTube. This has the potential to enable fast and easy design of character controllers simply by querying for video recordings of the desired behavior. The resulting controllers are robust to perturbations, can be adapted to new settings, can perform basic object interactions, and can be retargeted to new morphologies via reinforcement learning. We further demonstrate that our method can predict potential human motions from still images, by forward simulation of learned controllers initialized from the observed pose. Our framework is able to learn a broad range of dynamic skills, including locomotion, acrobatics, and martial arts.

Graphics Computer Vision and Pattern Recognition Machine Learning

Scene-Aware Audio for 360textdegree{} Videos

93 - Dingzeyu Li , Timothy R. Langlois , Changxi Zheng 2018

Although 360textdegree{} cameras ease the capture of panoramic footage, it remains challenging to add realistic 360textdegree{} audio that blends into the captured scene and is synchronized with the camera motion. We present a method for adding scene-aware spatial audio to 360textdegree{} videos in typical indoor scenes, using only a conventional mono-channel microphone and a speaker. We observe that the late reverberation of a rooms impulse response is usually diffuse spatially and directionally. Exploiting this fact, we propose a method that synthesizes the directional impulse response between any source and listening locations by combining a synthesized early reverberation part and a measured late reverberation tail. The early reverberation is simulated using a geometric acoustic simulation and then enhanced using a frequency modulation method to capture room resonances. The late reverberation is extracted from a recorded impulse response, with a carefully chosen time duration that separates out the late reverberation from the early reverberation. In our validations, we show that our synthesized spatial audio matches closely with recordings using ambisonic microphones. Lastly, we demonstrate the strength of our method in several applications.

Graphics Computer Vision and Pattern Recognition Emerging Technologies

Shape-from-Mask: A Deep Learning Based Human Body Shape Reconstruction from Binary Mask Images

72 - Zhongping Ji , Xiao Qi , Yigang Wang 2018

3D content creation is referred to as one of the most fundamental tasks of computer graphics. And many 3D modeling algorithms from 2D images or curves have been developed over the past several decades. Designers are allowed to align some conceptual images or sketch some suggestive curves, from front, side, and top views, and then use them as references in constructing a 3D model automatically or manually. However, to the best of our knowledge, no studies have investigated on 3D human body reconstruction in a similar manner. In this paper, we propose a deep learning based reconstruction of 3D human body shape from 2D orthographic views. A novel CNN-based regression network, with two branches corresponding to frontal and lateral views respectively, is designed for estimating 3D human body shape from 2D mask images. We train our networks separately to decouple the feature descriptors which encode the body parameters from different views, and fuse them to estimate an accurate human body shape. In addition, to overcome the shortage of training data required for this purpose, we propose some significantly data augmentation schemes for 3D human body shapes, which can be used to promote further research on this topic. Extensive experimen- tal results demonstrate that visually realistic and accurate reconstructions can be achieved effectively using our algorithm. Requiring only binary mask images, our method can help users create their own digital avatars quickly, and also make it easy to create digital human body for 3D game, virtual reality, online fashion shopping.

Graphics Computer Vision and Pattern Recognition

Improved Supervised Training of Physics-Guided Deep Learning Image Reconstruction with Multi-Masking

108 - Burhaneddin Yaman , Seyed Amir Hossein Hosseini , Steen Moeller andn Mehmet Akc{c}akaya 2020

Physics-guided deep learning (PG-DL) via algorithm unrolling has received significant interest for improved image reconstruction, including MRI applications. These methods unroll an iterative optimization algorithm into a series of regularizer and data consistency units. The unrolled networks are typically trained end-to-end using a supervised approach. Current supervised PG-DL approaches use all of the available sub-sampled measurements in their data consistency units. Thus, the network learns to fit the rest of the measurements. In this study, we propose to improve the performance and robustness of supervised training by utilizing randomness by retrospectively selecting only a subset of all the available measurements for data consistency units. The process is repeated multiple times using different random masks during training for further enhancement. Results on knee MRI show that the proposed multi-mask supervised PG-DL enhances reconstruction performance compared to conventional supervised PG-DL approaches.

Image and Video Processing Computer Vision and Pattern Recognition Machine Learning

comments

Fetching comments

Higher Institute for Applied Sciences and Technology

Additional details More universities

SMASH: Physics-guided Reconstruction of Collisions from Videos

Ask ChatGPT about the research

No Arabic abstract

Read More