ﻻ يوجد ملخص باللغة العربية
An important aspect of video understanding is the ability to predict the evolution of its content in the future. This paper presents a future frame semantic segmentation technique for predicting semantic masks of the current and future frames in a time-lapsed video. We specifically focus on time-lapsed videos with large temporal displacement to highlight the models ability to capture large motions in time. We first introduce a unique semantic segmentation prediction dataset with over 120,000 time-lapsed sky-video frames and all corresponding semantic masks captured over a span of five years in North America region. The dataset has immense practical value for cloud cover analysis, which are treated as non-rigid objects of interest. %Here the model provides both semantic segmentation of cloud region and solar irradiance emitted from a region from the sky-videos. Next, our proposed recurrent network architecture departs from existing trend of using temporal convolutional networks (TCN) (or feed-forward networks), by explicitly learning an internal representations for the evolution of video content with time. Experimental evaluation shows an improvement of mean IoU over TCNs in the segmentation task by 10.8% for 10 mins (21% over 60 mins) ahead of time predictions. Further, our model simultaneously measures both the current and future solar irradiance from the same video frames with a normalized-MAE of 10.5% over two years. These results indicate that recurrent memory networks with attention mechanism are able to capture complex advective and diffused flow characteristic of dense fluids even with sparse temporal sampling and are more suitable for future frame prediction tasks for longer duration videos.
Weakly supervised instance segmentation reduces the cost of annotations required to train models. However, existing approaches which rely only on image-level class labels predominantly suffer from errors due to (a) partial segmentation of objects and
In deep CNN based models for semantic segmentation, high accuracy relies on rich spatial context (large receptive fields) and fine spatial details (high resolution), both of which incur high computational costs. In this paper, we propose a novel arch
The ability to predict and therefore to anticipate the future is an important attribute of intelligence. It is also of utmost importance in real-time systems, e.g. in robotics or autonomous driving, which depend on visual scene understanding for deci
Powered by the ImageNet dataset, unsupervised learning on large-scale data has made significant advances for classification tasks. There are two major challenges to allow such an attractive learning modality for segmentation tasks: i) a large-scale b
Semantic segmentation is a crucial task for robot navigation and safety. However, it requires huge amounts of pixelwise annotations to yield accurate results. While recent progress in computer vision algorithms has been heavily boosted by large groun