Abstract:
Lung sound auscultation is essential for monitoring respiratory health, especially in regions facing a shortage of skilled healthcare workers. Automated analysis of respi- ratory sounds has the potential to provide significant clinical support in such settings. However, lung sounds (LS) are frequently contaminated by various noise sources such as heart sounds, background conversations, and movement artifacts which makes accu- rate interpretation challenging. Conventional denoising techniques often fail to address these challenges due to the spectral overlap between noise and respiratory signals in real-world clinical recordings. In addition to that, while respiratory sound classification has been widely studied in adults, its application in pediatric populations, particularly in children aged <=6 years, remains a complex and underexplored area. The developmen- tal changes in pediatric lungs considerably alter the acoustic properties of respiratory sounds, necessitating specialized classification approaches tailored to this age group. To address these challenges and advance automated lung sound analysis, this thesis is divided into two major components. In the first part, a specialized deep-denoiser model (Uformer) has been proposed for lung sound denoising. The proposed Uformer model consists of three modules: a Convolutional Neural Network (CNN) encoder module dedicated to extracting latent features, a Transformer encoder module employed to en- hance the encoding of unique LS features further and effectively capture intricate long- range dependencies, and a CNN decoder module employed to generate the denoised signals. The performance of the proposed Uformer model has been evaluated on lung sounds induced with different types of synthetic and real-world noise. The proposed model showed an average output SNR of 16.51 dB when evaluated with -12 dB LS signals. Our end-to-end model, with an average output SNR of 19.31 dB, outperforms the existing model, achieving nearly double the performance, when evaluated with am- bient noise and fewer parameters. Based on the qualitative and quantitative findings in this study, it can be stated that the proposed denoising model is robust and gener- alized to assist with monitoring respiratory conditions. The second part of the thesis focuses on the classification of pediatric respiratory sounds. A multistage hybrid CNN- Transformer architecture has been proposed for detecting respiratory diseases from both entire recordings and individual breath cycles using scalogram images of the signals. To fill the gap in pediatric classification, the SPRSound dataset, comprising recordings from children with an average age of 5.5 years, has been utilized for two-level classifica- tion tasks. The proposed classification model utilizes CNN-extracted respiratory sound
features from scalogram images, integrating an attention framework to improve predic- tive performance. The proposed framework achieved a score of 0.9039 in binary event classification and 0.8448 in multiclass event classification. At the record level, ternary classification yielded a score of 0.720, while multiclass record classification attained
0.571. However, our proposed method consistently demonstrates a 3.81% and 5.94% performance gain over the existing best model, respectively, in these record-level clas- sification tasks. These proposed approaches can significantly aid in the diagnosis and prediction of the severity of respiratory diseases in both developing and underdeveloped nations.