Assessing YOLACT++ for real time and robust instance segmentation of medical instruments in endoscopic procedures

70 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Gilberto Ochoa-Ruiz

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Juan Carlos Angeles Ceron - Leonardo Chang - Gilberto Ochoa-Ruiz andn Sharib Ali

الرؤية الحاسوبية وتمييز الأنماط الذكاء الاصطناعي

قم بزيارة صفحتنا على فيسبوك

‎Shamra Academia - شمرا أكاديميا‎

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

Image-based tracking of laparoscopic instruments plays a fundamental role in computer and robotic-assisted surgeries by aiding surgeons and increasing patient safety. Computer vision contests, such as the Robust Medical Instrument Segmentation (ROBUST-MIS) Challenge, seek to encourage the development of robust models for such purposes, providing large, diverse, and annotated datasets. To date, most of the existing models for instance segmentation of medical instruments were based on two-stage detectors, which provide robust results but are nowhere near to the real-time (5 frames-per-second (fps)at most). However, in order for the method to be clinically applicable, real-time capability is utmost required along with high accuracy. In this paper, we propose the addition of attention mechanisms to the YOLACT architecture that allows real-time instance segmentation of instrument with improved accuracy on the ROBUST-MIS dataset. Our proposed approach achieves competitive performance compared to the winner ofthe 2019 ROBUST-MIS challenge in terms of robustness scores,obtaining 0.313 MI_DSC and 0.338 MI_NSD, while achieving real-time performance (37 fps)

قيم البحث

113 - Laurynas Miksys , Saumya Jetley , Michael Sapienza 2019

Instance segmentation is an important problem in computer vision, with applications in autonomous driving, drone navigation and robotic manipulation. However, most existing methods are not real-time, complicating their deployment in time-sensitive co ntexts. In this work, we extend an existing approach to real-time instance segmentation, called `Straight to Shapes (STS), which makes use of low-dimensional shape embedding spaces to directly regress to object shape masks. The STS model can run at 35 FPS on a high-end desktop, but its accuracy is significantly worse than that of offline state-of-the-art methods. We leverage recent advances in the design and training of deep instance segmentation models to improve the performance accuracy of the STS model whilst keeping its real-time capabilities intact. In particular, we find that parameter sharing, more aggressive data augmentation and the use of structured loss for shape mask prediction all provide a useful boost to the network performance. Our proposed approach, `Straight to Shapes++, achieves a remarkable 19.7 point improvement in mAP (at IOU of 0.5) over the original method as evaluated on the PASCAL VOC dataset, thus redefining the accuracy frontier at real-time speeds. Since the accuracy of instance segmentation is closely tied to that of object bounding box prediction, we also study the error profile of the latter and examine the failure modes of our method for future improvements.

الرؤية الحاسوبية وتمييز الأنماط الذكاء الاصطناعي التعلم الآلي

Multi-frame Feature Aggregation for Real-time Instrument Segmentation in Endoscopic Video

83 - Shan Lin , Fangbo Qin , Haonan Peng 2020

Deep learning-based methods have achieved promising results on surgical instrument segmentation. However, the high computation cost may limit the application of deep models to time-sensitive tasks such as online surgical video analysis for robotic-as sisted surgery. Moreover, current methods may still suffer from challenging conditions in surgical images such as various lighting conditions and the presence of blood. We propose a novel Multi-frame Feature Aggregation (MFFA) module to aggregate video frame features temporally and spatially in a recurrent mode. By distributing the computation load of deep feature extraction over sequential frames, we can use a lightweight encoder to reduce the computation costs at each time step. Moreover, public surgical videos usually are not labeled frame by frame, so we develop a method that can randomly synthesize a surgical frame sequence from a single labeled frame to assist network training. We demonstrate that our approach achieves superior performance to corresponding deeper segmentation models on two public surgery datasets.

الرؤية الحاسوبية وتمييز الأنماط

Explicit Shape Encoding for Real-Time Instance Segmentation

147 - Wenqiang Xu , Haiyang Wang , Fubo Qi 2019

In this paper, we propose a novel top-down instance segmentation framework based on explicit shape encoding, named textbf{ESE-Seg}. It largely reduces the computational consumption of the instance segmentation by explicitly decoding the multiple obje ct shapes with tensor operations, thus performs the instance segmentation at almost the same speed as the object detection. ESE-Seg is based on a novel shape signature Inner-center Radius (IR), Chebyshev polynomial fitting and the strong modern object detectors. ESE-Seg with YOLOv3 outperforms the Mask R-CNN on Pascal VOC 2012 at mAP$^[email protected] while 7 times faster.

الرؤية الحاسوبية وتمييز الأنماط

Trilateral Attention Network for Real-time Medical Image Segmentation

282 - Ghada Zamzmi , Vandana Sachdev , 2021

Accurate segmentation of medical images into anatomically meaningful regions is critical for the extraction of quantitative indices or biomarkers. The common pipeline for segmentation comprises regions of interest detection stage and segmentation sta ge, which are independent of each other and typically performed using separate deep learning networks. The performance of the segmentation stage highly relies on the extracted set of spatial features and the receptive fields. In this work, we propose an end-to-end network, called Trilateral Attention Network (TaNet), for real-time detection and segmentation in medical images. TaNet has a module for region localization, and three segmentation pathways: 1) handcrafted pathway with hand-designed convolutional kernels, 2) detail pathway with regular convolutional kernels, and 3) a global pathway to enlarge the receptive field. The first two pathways encode rich handcrafted and low-level features extracted by hand-designed and regular kernels while the global pathway encodes high-level context information. By jointly training the network for localization and segmentation using different sets of features, TaNet achieved superior performance, in terms of accuracy and speed, when evaluated on an echocardiography dataset for cardiac segmentation. The code and models will be made publicly available in TaNet Github page.

الرؤية الحاسوبية وتمييز الأنماط

Real-time Instance Segmentation with Discriminative Orientation Maps

153 - Wentao Du , Zhiyu Xiang , Shuya Chen 2021

Although instance segmentation has made considerable advancement over recent years, its still a challenge to design high accuracy algorithms with real-time performance. In this paper, we propose a real-time instance segmentation framework termed Orie nMask. Upon the one-stage object detector YOLOv3, a mask head is added to predict some discriminative orientation maps, which are explicitly defined as spatial offset vectors for both foreground and background pixels. Thanks to the discrimination ability of orientation maps, masks can be recovered without the need for extra foreground segmentation. All instances that match with the same anchor size share a common orientation map. This special sharing strategy reduces the amortized memory utilization for mask predictions but without loss of mask granularity. Given the surviving box predictions after NMS, instance masks can be concurrently constructed from the corresponding orientation maps with low complexity. Owing to the concise design for mask representation and its effective integration with the anchor-based object detector, our method is qualified under real-time conditions while maintaining competitive accuracy. Experiments on COCO benchmark show that OrienMask achieves 34.8 mask AP at the speed of 42.7 fps evaluated with a single RTX 2080 Ti. The code is available at https://github.com/duwt/OrienMask.

الرؤية الحاسوبية وتمييز الأنماط