Unsupervised Temporal Video Segmentation as an Auxiliary Task for Predicting the Remaining Surgery Duration

51 0 0.0 ( 0 )

Download Cite

Added by Dominik Rivoir

Publication date 2020

fields Informatics Engineering

and research's language is English

Authors Dominik Rivoir - Sebastian Bodenstedt - Felix von Bechtolsheim

Computer Vision and Pattern Recognition

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Estimating the remaining surgery duration (RSD) during surgical procedures can be useful for OR planning and anesthesia dose estimation. With the recent success of deep learning-based methods in computer vision, several neural network approaches have been proposed for fully automatic RSD prediction based solely on visual data from the endoscopic camera. We investigate whether RSD prediction can be improved using unsupervised temporal video segmentation as an auxiliary learning task. As opposed to previous work, which presented supervised surgical phase recognition as auxiliary task, we avoid the need for manual annotations by proposing a similar but unsupervised learning objective which clusters video sequences into temporally coherent segments. In multiple experimental setups, results obtained by learning the auxiliary task are incorporated into a deep RSD model through feature extraction, pretraining or regularization. Further, we propose a novel loss function for RSD training which attempts to counteract unfavorable characteristics of the RSD ground truth. Using our unsupervised method as an auxiliary task for RSD training, we outperform other self-supervised methods and are comparable to the supervised state-of-the-art. Combined with the novel RSD loss, we slightly outperform the supervised approach.

rate research

Incorporating Temporal Prior from Motion Flow for Instrument Segmentation in Minimally Invasive Surgery Video

84 - Yueming Jin , Keyun Cheng , Qi Dou 2019

Automatic instrument segmentation in video is an essentially fundamental yet challenging problem for robot-assisted minimally invasive surgery. In this paper, we propose a novel framework to leverage instrument motion information, by incorporating a derived temporal prior to an attention pyramid network for accurate segmentation. Our inferred prior can provide reliable indication of the instrument location and shape, which is propagated from the previous frame to the current frame according to inter-frame motion flow. This prior is injected to the middle of an encoder-decoder segmentation network as an initialization of a pyramid of attention modules, to explicitly guide segmentation output from coarse to fine. In this way, the temporal dynamics and the attention network can effectively complement and benefit each other. As additional usage, our temporal prior enables semi-supervised learning with periodically unlabeled video frames, simply by reverse execution. We extensively validate our method on the public 2017 MICCAI EndoVis Robotic Instrument Segmentation Challenge dataset with three different tasks. Our method consistently exceeds the state-of-the-art results across all three tasks by a large margin. Our semi-supervised variant also demonstrates a promising potential for reducing annotation cost in the clinical practice.

Computer Vision and Pattern Recognition

Fourier Transform Approximation as an Auxiliary Task for Image Classification

327 - Chen Liu 2021

Image reconstruction is likely the most predominant auxiliary task for image classification, but we would like to think twice about this convention. In this paper, we investigated approximating the Fourier Transform of the input image as a potential alternative, in the hope that it may further boost the performances on the primary task or introduce novel constraints not well covered by image reconstruction. We experimented with five popular classification architectures on the CIFAR-10 dataset, and the empirical results indicated that our proposed auxiliary task generally improves the classification accuracy. More notably, the results showed that in certain cases our proposed auxiliary task may enhance the classifiers resistance to adversarial attacks generated using the fast gradient sign method.

Computer Vision and Pattern Recognition

Unsupervised Domain Adaptation for Video Semantic Segmentation

92 - Inkyu Shin , Kwanyong Park , Sanghyun Woo 2021

Unsupervised Domain Adaptation for semantic segmentation has gained immense popularity since it can transfer knowledge from simulation to real (Sim2Real) by largely cutting out the laborious per pixel labeling efforts at real. In this work, we present a new video extension of this task, namely Unsupervised Domain Adaptation for Video Semantic Segmentation. As it became easy to obtain large-scale video labels through simulation, we believe attempting to maximize Sim2Real knowledge transferability is one of the promising directions for resolving the fundamental data-hungry issue in the video. To tackle this new problem, we present a novel two-phase adaptation scheme. In the first step, we exhaustively distill source domain knowledge using supervised loss functions. Simultaneously, video adversarial training (VAT) is employed to align the features from source to target utilizing video context. In the second step, we apply video self-training (VST), focusing only on the target data. To construct robust pseudo labels, we exploit the temporal information in the video, which has been rarely explored in the previous image-based self-training approaches. We set strong baseline scores on VIPER to CityscapeVPS adaptation scenario. We show that our proposals significantly outperform previous image-based UDA methods both on image-level (mIoU) and video-level (VPQ) evaluation metrics.

Computer Vision and Pattern Recognition

Deep Spatio-Temporal Random Fields for Efficient Video Segmentation

92 - Siddhartha Chandra , Camille Couprie , Iasonas Kokkinos 2018

In this work we introduce a time- and memory-efficient method for structured prediction that couples neuron decisions across both space at time. We show that we are able to perform exact and efficient inference on a densely connected spatio-temporal graph by capitalizing on recent advances on deep Gaussian Conditional Random Fields (GCRFs). Our method, called VideoGCRF is (a) efficient, (b) has a unique global minimum, and (c) can be trained end-to-end alongside contemporary deep networks for video understanding. We experiment with multiple connectivity patterns in the temporal domain, and present empirical improvements over strong baselines on the tasks of both semantic and instance segmentation of videos.

Computer Vision and Pattern Recognition Machine Learning Machine Learning

Forecasting the Remaining Duration of an Ongoing Solar Flare

141 - Jeffrey W. Reep , Will T. Barnes 2021

The solar X-ray irradiance is significantly heightened during the course of a solar flare, which can cause radio blackouts due to ionization of the atoms in the ionosphere. As the duration of a solar flare is not related to the size of that flare, it is not directly clear how long those blackouts can persist. Using a random forest regression model trained on data taken from X-ray light curves, we have developed a direct forecasting method that predicts how long the event will remain above background levels. We test this on a large collection of flares observed with GOES-15, and show that it generally outperforms simple linear regression. This forecast is computationally light enough to be performed in real time, allowing for the prediction to be made during the course of a flare.

Solar and Stellar Astrophysics Space Physics