Frequency Domain Convolutional Neural Network: Accelerated CNN for Large Diabetic Retinopathy Image Classification

135 0 0.0 ( 0 )

تحميل البحث استخدام كمرجع

نشر من قبل Zhiyuan Chen Dr

تاريخ النشر 2021

مجال البحث الهندسة المعلوماتية

والبحث باللغة English

تأليف Ee Fey Goh - ZhiYuan Chen - Wei Xiang Lim

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي

قم بزيارة صفحتنا على فيسبوك

اسأل ChatGPT حول البحث

الملخص بالعربية الملخص بالإنكليزية

The conventional spatial convolution layers in the Convolutional Neural Networks (CNNs) are computationally expensive at the point where the training time could take days unless the number of layers, the number of training images or the size of the training images are reduced. The image size of 256x256 pixels is commonly used for most of the applications of CNN, but this image size is too small for applications like Diabetic Retinopathy (DR) classification where the image details are important for accurate classification. This research proposed Frequency Domain Convolution (FDC) and Frequency Domain Pooling (FDP) layers which were built with RFFT, kernel initialization strategy, convolution artifact removal and Channel Independent Convolution (CIC) to replace the conventional convolution and pooling layers. The FDC and FDP layers are used to build a Frequency Domain Convolutional Neural Network (FDCNN) to accelerate the training of large images for DR classification. The Full FDC layer is an extension of the FDC layer to allow direct use in conventional CNNs, it is also used to modify the VGG16 architecture. FDCNN is shown to be at least 54.21% faster and 70.74% more memory efficient compared to an equivalent CNN architecture. The modified VGG16 architecture with Full FDC layer is reported to achieve a shorter training time and a higher accuracy at 95.63% compared to the original VGG16 architecture for DR classification.

قيم البحث

92 - Zhiguang Wang , Jianbo Yang 2017

We proposed a deep learning method for interpretable diabetic retinopathy (DR) detection. The visual-interpretable feature of the proposed method is achieved by adding the regression activation map (RAM) after the global averaging pooling layer of th e convolutional networks (CNN). With RAM, the proposed model can localize the discriminative regions of an retina image to show the specific region of interest in terms of its severity level. We believe this advantage of the proposed deep learning model is highly desired for DR detection because in practice, users are not only interested with high prediction performance, but also keen to understand the insights of DR detection and why the adopted learning model works. In the experiments conducted on a large scale of retina image dataset, we show that the proposed CNN model can achieve high performance on DR detection compared with the state-of-the-art while achieving the merits of providing the RAM to highlight the salient regions of the input image.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي الحوسبة العصبية والتطورية

SpecNet: Spectral Domain Convolutional Neural Network

127 - Bochen Guan , Jinnian Zhang , William A. Sethares 2019

The memory consumption of most Convolutional Neural Network (CNN) architectures grows rapidly with increasing depth of the network, which is a major constraint for efficient network training on modern GPUs with limited memory, embedded systems, and m obile devices. Several studies show that the feature maps (as generated after the convolutional layers) are the main bottleneck in this memory problem. Often, these feature maps mimic natural photographs in the sense that their energy is concentrated in the spectral domain. Although embedding CNN architectures in the spectral domain is widely exploited to accelerate the training process, we demonstrate that it is also possible to use the spectral domain to reduce the memory footprint, a method we call Spectral Domain Convolutional Neural Network (SpecNet) that performs both the convolution and the activation operations in the spectral domain. The performance of SpecNet is evaluated on three competitive object recognition benchmark tasks (CIFAR-10, SVHN, and ImageNet), and compared with several state-of-the-art implementations. Overall, SpecNet is able to reduce memory consumption by about 60% without significant loss of performance for all tested networks.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي معالجة الصور والفيديو

Neural Networks with Manifold Learning for Diabetic Retinopathy Detection

93 - Arjun Raj Rajanna , Kamelia Aryafar , Rajeev Ramchandran 2016

Widespread outreach programs using remote retinal imaging have proven to decrease the risk from diabetic retinopathy, the leading cause of blindness in the US. However, this process still requires manual verification of image quality and grading of i mages for level of disease by a trained human grader and will continue to be limited by the lack of such scarce resources. Computer-aided diagnosis of retinal images have recently gained increasing attention in the machine learning community. In this paper, we introduce a set of neural networks for diabetic retinopathy classification of fundus retinal images. We evaluate the efficiency of the proposed classifiers in combination with preprocessing and augmentation steps on a sample dataset. Our experimental results show that neural networks in combination with preprocessing on the images can boost the classification accuracy on this dataset. Moreover the proposed models are scalable and can be used in large scale datasets for diabetic retinopathy detection. The models introduced in this paper can be used to facilitate the diagnosis and speed up the detection process.

الرؤية الحاسوبية وتمييز الأنماط

Transform-Invariant Convolutional Neural Networks for Image Classification and Search

148 - Xu Shen , Xinmei Tian , Anfeng He 2019

Convolutional neural networks (CNNs) have achieved state-of-the-art results on many visual recognition tasks. However, current CNN models still exhibit a poor ability to be invariant to spatial transformations of images. Intuitively, with sufficient layers and parameters, hierarchical combinations of convolution (matrix multiplication and non-linear activation) and pooling operations should be able to learn a robust mapping from transformed input images to transform-invariant representations. In this paper, we propose randomly transforming (rotation, scale, and translation) feature maps of CNNs during the training stage. This prevents complex dependencies of specific rotation, scale, and translation levels of training images in CNN models. Rather, each convolutional kernel learns to detect a feature that is generally helpful for producing the transform-invariant answer given the combinatorially large variety of transform levels of its input feature maps. In this way, we do not require any extra training supervision or modification to the optimization process and training images. We show that random transformation provides significant improvements of CNNs on many benchmark tasks, including small-scale image recognition, large-scale image recognition, and image retrieval. The code is available at https://github.com/jasonustc/caffe-multigpu/tree/TICNN.

الرؤية الحاسوبية وتمييز الأنماط التعلم الآلي معالجة الصور والفيديو

Lesion detection and Grading of Diabetic Retinopathy via Two-stages Deep Convolutional Neural Networks

148 - Yehui Yang , Tao Li , Wensi Li 2017

We propose an automatic diabetic retinopathy (DR) analysis algorithm based on two-stages deep convolutional neural networks (DCNN). Compared to existing DCNN-based DR detection methods, the proposed algorithm have the following advantages: (1) Our me thod can point out the location and type of lesions in the fundus images, as well as giving the severity grades of DR. Moreover, since retina lesions and DR severity appear with different scales in fundus images, the integration of both local and global networks learn more complete and specific features for DR analysis. (2) By introducing imbalanced weighting map, more attentions will be given to lesion patches for DR grading, which significantly improve the performance of the proposed algorithm. In this study, we label 12,206 lesion patches and re-annotate the DR grades of 23,595 fundus images from Kaggle competition dataset. Under the guidance of clinical ophthalmologists, the experimental results show that our local lesion detection net achieve comparable performance with trained human observers, and the proposed imbalanced weighted scheme also be proved to significantly improve the capability of our DCNN-based DR grading algorithm.

الرؤية الحاسوبية وتمييز الأنماط