Abstract:
Soundeventdetection(SED)inmedicalenvironmentsiscrucialforextractingvaluableinformation from diverse sound events such as coughing, sneezing, sniffling, speech,gasping, and snoring. These events carry vital information for diagnosis, monitoring,andprevention.Byutilizingsoundevents,healthcareprofessionalscanmakeinformeddecisions and provide optimal care. Due to the success of Transformer encoder archi-tecturesforsoundeventdetection,theyseemtobeaprudentchoicefordetectingaudioeventsinhospitalsettings.However,applyingTransformerstomedicalaudioeventde-tection faces two significant challenges.Firstly, there is a severe scarcity of medicalaudio data, making it difficult to train Transformer models effectively. Secondly, SEDmodelsmustbecomputationallyefficienttobedeployableinresource-limitedmedicalenvironments.Unfortunately,Transformershavehighcomputationalcomplexityduetothe attention mechanism they employ. To tackle these obstacles, this thesis introducesAudioSpectrogramFourierNetwork(ASFNet),anovelattention-freeTransformeren-coder specifically designed for sound event detection in medical environments.ASFNetreplaces the attention operation with a simplified Fast Fourier Transform. By employ-ing this technique, ASFNet surpasses other methods, achieving an impressive averagemeanaverageprecision(mAP)of0.474witha16.76%relativeimprovement.ASFNetachievesthisperformancewithfewermodelparametersandsmallermodelsize,makingitahighlyefficientandeffectivesolutionfordetectingmedicalaudioevents.
Furthermore,speech-privacyisacriticalconsiderationinmedicalaudioeventdetection.It is important to separate speech data from audio recordings to protect privacy of thepatients when collecting the dataset.While audio source separation techniques canseparatespeechsignalsofdifferentspeakers,weneedtodifferentiatespeechandothermedicalaudioeventsofthesamespeaker.Therefore,acustomdatasetwaspreparedandaWave-U-Netmodelwastrainedforseparatingspeechdatafrommedicalaudioeventsduringdataacquisition.Wave-U-Netdemonstratesanoverallsource-to-distortionratio(SDR)of11.829indicatinganear-perfectsourceseparationtask.
Therefore, the combination of ASFNet and Wave-U-Net has the potential to play asignificantroleindevelopingspeech-privacyconsciousandresource-efficientmedicalsoundeventdetectionormonitoringsystems.