| dc.description.abstract |
Recent advances in deep learning have significantly enhanced medical image segmen- tation. As medical data becomes increasingly diverse and complex, the need for ar- chitectures that can generalize across modalities and anatomical structures has grown paramount. While CNNs, Transformers, and their hybrid architectures have addressed issues such as limited receptive fields and redundant feature representations, most mod- els remain confined to the spatial domain—overlooking the frequency domain’s rich structural cues. Some recent studies have explored spectral information at the feature level; however, frequency-domain integration at the supervision level remains largely untapped. To this end, we propose Phi-SegNet, a CNN-based architecture that incor- porates phase-aware cues at both architectural and optimization levels. The network integrates Bi-Feature Mask Former (BFMF) modules that blend neighboring encoder features to reduce semantic gaps, and Reverse Fourier Attention (RFA) blocks that re- fine decoder outputs using phase-regularized embeddings. A dedicated phase-aware loss aligns these embeddings with structural priors, forming a closed feedback loop that emphasizes boundary precision. Evaluated on five public datasets spanning ultra- sound, X-ray, histopathology, MRI, and colonoscopy, Phi-SegNet consistently achieves state-of-the-art performance, particularly excelling in fine-grained boundary segmen- tation tasks. On average, across these five datasets, Phi-SegNet achieves a relative improvement of 1.54% ± 1.26% in IoU and 1.10% ± 0.69% in F1-score over the next best-performing model for each dataset. Additionally, under generalized training using a unified dataset comprising all five modalities, as well as in cross-dataset generaliza- tion scenarios involving unseen datasets from the known domain, Phi-SegNet exhibits robust and superior performance—highlighting its adaptability and modality-agnostic design. These findings demonstrate the potential of leveraging spectral priors in both learning and supervision, offering a new direction toward generalized, universal, and anatomically precise segmentation frameworks. |
en_US |