ترغب بنشر مسار تعليمي؟ اضغط هنا

Inception-Based Network and Multi-Spectrogram Ensemble Applied For Predicting Respiratory Anomalies and Lung Diseases

275   0   0.0 ( 0 )
 نشر من قبل Lam Pham
 تاريخ النشر 2020
  مجال البحث الهندسة المعلوماتية
والبحث باللغة English




اسأل ChatGPT حول البحث

This paper presents an inception-based deep neural network for detecting lung diseases using respiratory sound input. Recordings of respiratory sound collected from patients are firstly transformed into spectrograms where both spectral and temporal information are well presented, referred to as front-end feature extraction. These spectrograms are then fed into the proposed network, referred to as back-end classification, for detecting whether patients suffer from lung-relevant diseases. Our experiments, conducted over the ICBHI benchmark meta-dataset of respiratory sound, achieve competitive ICBHI scores of 0.53/0.45 and 0.87/0.85 regarding respiratory anomaly and disease detection, respectively.

قيم البحث

اقرأ أيضاً

This paper presents and explores a robust deep learning framework for auscultation analysis. This aims to classify anomalies in respiratory cycles and detect disease, from respiratory sound recordings. The framework begins with front-end feature extr action that transforms input sound into a spectrogram representation. Then, a back-end deep learning network is used to classify the spectrogram features into categories of respiratory anomaly cycles or diseases. Experiments, conducted over the ICBHI benchmark dataset of respiratory sounds, confirm three main contributions towards respiratory-sound analysis. Firstly, we carry out an extensive exploration of the effect of spectrogram type, spectral-time resolution, overlapped/non-overlapped windows, and data augmentation on final prediction accuracy. This leads us to propose a novel deep learning system, built on the proposed framework, which outperforms current state-of-the-art methods. Finally, we apply a Teacher-Student scheme to achieve a trade-off between model performance and model complexity which additionally helps to increase the potential of the proposed framework for building real-time applications.
127 - Sercan O. Arik , Heewoo Jun , 2018
We propose the multi-head convolutional neural network (MCNN) architecture for waveform synthesis from spectrograms. Nonlinear interpolation in MCNN is employed with transposed convolution layers in parallel heads. MCNN achieves more than an order of magnitude higher compute intensity than commonly-used iterative algorithms like Griffin-Lim, yielding efficient utilization for modern multi-core processors, and very fast (more than 300x real-time) waveform synthesis. For training of MCNN, we use a large-scale speech recognition dataset and losses defined on waveforms that are related to perceptual audio quality. We demonstrate that MCNN constitutes a very promising approach for high-quality speech synthesis, without any iterative algorithms or autoregression in computations.
We previously established a large lung sound database, HF_Lung_V2 (Lung_V2). We trained convolutional-bidirectional gated recurrent unit (CNN-BiGRU) networks for detecting inhalation, exhalation, continuous adventitious sound (CAS) and discontinuous adventitious sound at the recording level on the basis of Lung_V2. However, the performance of CAS detection was poor due to many reasons, one of which is the highly diversified CAS patterns. To make the original CNN-BiGRU model learn the CAS patterns more effectively and not cause too much computing burden, three strategies involving minimal modifications of the network architecture of the CNN layers were investigated: (1) making the CNN layers a bit deeper by using the residual blocks, (2) making the CNN layers a bit wider by increasing the number of CNN kernels, and (3) separating the feature input into multiple paths (the model was denoted by Multi-path CNN-BiGRU). The performance of CAS segment and event detection were evaluated. Results showed that improvement in CAS detection was observed among all the proposed architecture-modified models. The F1 score for CAS event detection of the proposed models increased from 0.445 to 0.491-0.530, which was deemed significant. However, the Multi-path CNN-BiGRU model outperformed the other models in terms of the number of winning titles (five) in total nine evaluation metrics. In addition, the Multi-path CNN-BiGRU model did not cause extra computing burden (0.97-fold inference time) compared to the original CNN-BiGRU model. Conclusively, the Multi-path CNN layers can efficiently improve the effectiveness of feature extraction and subsequently result in better CAS detection.
Attempts to develop speech enhancement algorithms with improved speech intelligibility for cochlear implant (CI) users have met with limited success. To improve speech enhancement methods for CI users, we propose to perform speech enhancement in a co chlear filter-bank feature space, a feature-set specifically designed for CI users based on CI auditory stimuli. We leverage a convolutional neural network (CNN) to extract both stationary and non-stationary components of environmental acoustics and speech. We propose three CNN architectures: (1) vanilla CNN that directly generates the enhanced signal; (2) spectral-subtraction-style CNN (SS-CNN) that first predicts noise and then generates the enhanced signal by subtracting noise from the noisy signal; (3) Wiener-style CNN (Wiener-CNN) that generates an optimal mask for suppressing noise. An important problem of the proposed networks is that they introduce considerable delays, which limits their real-time application for CI users. To address this, this study also considers causal variations of these networks. Our experiments show that the proposed networks (both causal and non-causal forms) achieve significant improvement over existing baseline systems. We also found that causal Wiener-CNN outperforms other networks, and leads to the best overall envelope coefficient measure (ECM). The proposed algorithms represent a viable option for implementation on the CCi-MOBILE research platform as a pre-processor for CI users in naturalistic environments.
96 - Sangwook Park , David K. Han , 2019
Simulators that generate observations based on theoretical models can be important tools for development, prediction, and assessment of signal processing algorithms. In order to design these simulators, painstaking effort is required to construct mat hematical models according to their application. Complex models are sometimes necessary to represent a variety of real phenomena. In contrast, obtaining synthetic observations from generative models developed from real observations often require much less effort. This paper proposes a generative model based on adversarial learning. Given that observations are typically signals composed of a linear combination of sinusoidal waves and random noises, sinusoidal wave generating networks are first designed based on an adversarial network. Audio waveform generation can then be performed using the proposed network. Several approaches to designing the objective function of the proposed network using adversarial learning are investigated experimentally. In addition, amphibian sound classification is performed using a convolutional neural network trained with real and synthetic sounds. Both qualitative and quantitative results show that the proposed generative model makes realistic signals and is very helpful for data augmentation and data analysis.

الأسئلة المقترحة

التعليقات
جاري جلب التعليقات جاري جلب التعليقات
سجل دخول لتتمكن من متابعة معايير البحث التي قمت باختيارها
mircosoft-partner

هل ترغب بارسال اشعارات عن اخر التحديثات في شمرا-اكاديميا