ﻻ يوجد ملخص باللغة العربية
Automatic instrument segmentation in video is an essentially fundamental yet challenging problem for robot-assisted minimally invasive surgery. In this paper, we propose a novel framework to leverage instrument motion information, by incorporating a derived temporal prior to an attention pyramid network for accurate segmentation. Our inferred prior can provide reliable indication of the instrument location and shape, which is propagated from the previous frame to the current frame according to inter-frame motion flow. This prior is injected to the middle of an encoder-decoder segmentation network as an initialization of a pyramid of attention modules, to explicitly guide segmentation output from coarse to fine. In this way, the temporal dynamics and the attention network can effectively complement and benefit each other. As additional usage, our temporal prior enables semi-supervised learning with periodically unlabeled video frames, simply by reverse execution. We extensively validate our method on the public 2017 MICCAI EndoVis Robotic Instrument Segmentation Challenge dataset with three different tasks. Our method consistently exceeds the state-of-the-art results across all three tasks by a large margin. Our semi-supervised variant also demonstrates a promising potential for reducing annotation cost in the clinical practice.
This paper reports a CPU-level real-time stereo matching method for surgical images (10 Hz on 640 * 480 image with a single core of i5-9400). The proposed method is built on the fast dense inverse searching algorithm, which estimates the disparity of
Estimating the remaining surgery duration (RSD) during surgical procedures can be useful for OR planning and anesthesia dose estimation. With the recent success of deep learning-based methods in computer vision, several neural network approaches have
Crowd segmentation is a fundamental task serving as the basis of crowded scene analysis, and it is highly desirable to obtain refined pixel-level segmentation maps. However, it remains a challenging problem, as existing approaches either require dens
We propose a new method for six-degree-of-freedom (6-DoF) autonomous camera movement for minimally invasive surgery, which, unlike previous methods, takes into account both the position and orientation information from structures in the surgical scen
Applying image processing algorithms independently to each video frame often leads to temporal inconsistency in the resulting video. To address this issue, we present a novel and general approach for blind video temporal consistency. Our method is on