No Arabic abstract
Many autonomous systems forecast aspects of the future in order to aid decision-making. For example, self-driving vehicles and robotic manipulation systems often forecast future object poses by first detecting and tracking objects. However, this detect-then-forecast pipeline is expensive to scale, as pose forecasting algorithms typically require labeled sequences of object poses, which are costly to obtain in 3D space. Can we scale performance without requiring additional labels? We hypothesize yes, and propose inverting the detect-then-forecast pipeline. Instead of detecting, tracking and then forecasting the objects, we propose to first forecast 3D sensor data (e.g., point clouds with $100$k points) and then detect/track objects on the predicted point cloud sequences to obtain future poses, i.e., a forecast-then-detect pipeline. This inversion makes it less expensive to scale pose forecasting, as the sensor data forecasting task requires no labels. Part of this works focus is on the challenging first step -- Sequential Pointcloud Forecasting (SPF), for which we also propose an effective approach, SPFNet. To compare our forecast-then-detect pipeline relative to the detect-then-forecast pipeline, we propose an evaluation procedure and two metrics. Through experiments on a robotic manipulation dataset and two driving datasets, we show that SPFNet is effective for the SPF task, our forecast-then-detect pipeline outperforms the detect-then-forecast approaches to which we compared, and that pose forecasting performance improves with the addition of unlabeled data.
Production forecasting is a key step to design the future development of a reservoir. A classical way to generate such forecasts consists in simulating future production for numerical models representative of the reservoir. However, identifying such models can be very challenging as they need to be constrained to all available data. In particular, they should reproduce past production data, which requires to solve a complex non-linear inverse problem. In this paper, we thus propose to investigate the potential of machine learning algorithms to predict the future production of a reservoir based on past production data without model calibration. We focus more specifically on robust online aggregation, a deterministic approach that provides a robust framework to make forecasts on a regular basis. This method does not rely on any specific assumption or need for stochastic modeling. Forecasts are first simulated for a set of base reservoir models representing the prior uncertainty, and then combined to predict production at the next time step. The weight associated to each forecast is related to its past performance. Three different algorithms are considered for weight computations: the exponentially weighted average algorithm, ridge regression and the Lasso regression. They are applied on a synthetic reservoir case study, the Brugge case, for sequential predictions. To estimate the potential of development scenarios, production forecasts are needed on long periods of time without intermediary data acquisition. An extension of the deterministic aggregation approach is thus proposed in this paper to provide such multi-step-ahead forecasts.
Smooth and seamless robot navigation while interacting with humans depends on predicting human movements. Forecasting such human dynamics often involves modeling human trajectories (global motion) or detailed body joint movements (local motion). Prior work typically tackled local and global human movements separately. In this paper, we propose a novel framework to tackle both tasks of human motion (or trajectory) and body skeleton pose forecasting in a unified end-to-end pipeline. To deal with this real-world problem, we consider incorporating both scene and social contexts, as critical clues for this prediction task, into our proposed framework. To this end, we first couple these two tasks by i) encoding their history using a shared Gated Recurrent Unit (GRU) encoder and ii) applying a metric as loss, which measures the source of errors in each task jointly as a single distance. Then, we incorporate the scene context by encoding a spatio-temporal representation of the video data. We also include social clues by generating a joint feature representation from motion and pose of all individuals from the scene using a social pooling layer. Finally, we use a GRU based decoder to forecast both motion and skeleton pose. We demonstrate that our proposed framework achieves a superior performance compared to several baselines on two social datasets.
Joint forecasting of human trajectory and pose dynamics is a fundamental building block of various applications ranging from robotics and autonomous driving to surveillance systems. Predicting body dynamics requires capturing subtle information embedded in the humans interactions with each other and with the objects present in the scene. In this paper, we propose a novel TRajectory and POse Dynamics (nicknamed TRiPOD) method based on graph attentional networks to model the human-human and human-object interactions both in the input space and the output space (decoded future output). The model is supplemented by a message passing interface over the graphs to fuse these different levels of interactions efficiently. Furthermore, to incorporate a real-world challenge, we propound to learn an indicator representing whether an estimated body joint is visible/invisible at each frame, e.g. due to occlusion or being outside the sensor field of view. Finally, we introduce a new benchmark for this joint task based on two challenging datasets (PoseTrack and 3DPW) and propose evaluation metrics to measure the effectiveness of predictions in the global space, even when there are invisible cases of joints. Our evaluation shows that TRiPOD outperforms all prior work and state-of-the-art specifically designed for each of the trajectory and pose forecasting tasks.
Recurrent neural networks are widely used on time series data, yet such models often ignore the underlying physical structures in such sequences. A new class of physics-based methods related to Koopman theory has been introduced, offering an alternative for processing nonlinear dynamical systems. In this work, we propose a novel Consistent Koopman Autoencoder model which, unlike the majority of existing work, leverages the forward and backward dynamics. Key to our approach is a new analysis which explores the interplay between consistent dynamics and their associated Koopman operators. Our network is directly related to the derived analysis, and its computational requirements are comparable to other baselines. We evaluate our method on a wide range of high-dimensional and short-term dependent problems, and it achieves accurate estimates for significant prediction horizons, while also being robust to noise.
We propose a sequential optimizing betting strategy in the multi-dimensional bounded forecasting game in the framework of game-theoretic probability of Shafer and Vovk (2001). By studying the asymptotic behavior of its capital process, we prove a generalization of the strong law of large numbers, where the convergence rate of the sample mean vector depends on the growth rate of the quadratic variation process. The growth rate of the quadratic variation process may be slower than the number of rounds or may even be zero. We also introduce an information criterion for selecting efficient betting items. These results are then applied to multiple asset trading strategies in discrete-time and continuous-time games. In the case of continuous-time game we present a measure of the jaggedness of a vector-valued continuous process. Our results are examined by several numerical examples.