Local block-wise self attention for normal organ segmentation


Abstract in English

We developed a new and computationally simple local block-wise self attention based normal structures segmentation approach applied to head and neck computed tomography (CT) images. Our method uses the insight that normal organs exhibit regularity in their spatial location and inter-relation within images, which can be leveraged to simplify the computations required to aggregate feature information. We accomplish this by using local self attention blocks that pass information between each other to derive the attention map. We show that adding additional attention layers increases the contextual field and captures focused attention from relevant structures. We developed our approach using U-net and compared it against multiple state-of-the-art self attention methods. All models were trained on 48 internal headneck CT scans and tested on 48 CT scans from the external public domain database of computational anatomy dataset. Our method achieved the highest Dice similarity coefficient segmentation accuracy of 0.85$pm$0.04, 0.86$pm$0.04 for left and right parotid glands, 0.79$pm$0.07 and 0.77$pm$0.05 for left and right submandibular glands, 0.93$pm$0.01 for mandible and 0.88$pm$0.02 for the brain stem with the lowest increase of 66.7% computing time per image and 0.15% increase in model parameters compared with standard U-net. The best state-of-the-art method called point-wise spatial attention, achieved textcolor{black}{comparable accuracy but with 516.7% increase in computing time and 8.14% increase in parameters compared with standard U-net.} Finally, we performed ablation tests and studied the impact of attention block size, overlap of the attention blocks, additional attention layers, and attention block placement on segmentation performance.

Download