AsyCMST: Asymmetric cross-modal spatio-temporal learning for multimodal ultrasound nodule recognition

Jul 1, 2026·

HAN Hongcheng

Zhiqian Tian

Minghao Wang

Yutong Zhang

Dong Zhang

Qinbo Guo

Hui Guo

Jue Jiang

Shaoyi Du*

Corresponding author

Juan Wang*

Corresponding author

· 0 min read

Source Document Code PDF DOI

Abstract

Multimodal ultrasound combining B-mode ultrasound (BUS) and contrast-enhanced ultrasound (CEUS) has become a powerful tool for diagnosing superficial nodules in the thyroid and breast, leveraging the complementary strengths of BUS spatial structure and CEUS temporal hemodynamics. However, existing fusion methods typically treat both modalities symmetrically or focus solely on modality-specific features, overlooking the inherent asymmetric bidirectional guidance between BUS spatial context and CEUS perfusion dynamics. To address this limitation, we propose AsyCMST, an asymmetric cross-modal spatio-temporal network for multimodal ultrasound nodule diagnosis. First, we design a multi-task learning module to enhance modality-specific representations, where frame self-sorting distills canonical contrast perfusion patterns in CEUS, while nodule segmentation reinforces precise lesion localization in BUS. Second, we propose an asymmetric cross-modal spatio-temporal attention mechanism to enable clinically meaningful directional interaction: BUS spatial cues guide CEUS temporal modeling toward lesion-relevant regions, and CEUS hemodynamic evolution refines ambiguous structural patterns in BUS. This design effectively captures the asymmetric interdependency between structure and function. Experiments on thyroid and breast datasets demonstrate that AsyCMST significantly outperforms state-of-the-art video understanding and multimodal ultrasound fusion methods in accuracy, F1-score, AUC, and cross-dataset generalization. These results validate the effectiveness of knowledge-driven asymmetric fusion and highlight its potential to advance clinical adoption of multimodal ultrasound analysis.

Type

Journal article

Publication

Medical Image Analysis, 112

License

CC-BY-4.0

Last updated on Jul 1, 2026

Medical Image Analysis

Authors

HAN Hongcheng (he/him)

PhD Candidate in Control Science and Engineering

Han Hongcheng (韩泓丞) received the degree of B.Eng. in School of Energy and Power Engineering, Xi’an Jiaotong University, Xi’an, China in 2020. Since then, he is studying for Ph.D. degree in Institute of Artificial Intelligence and Robotics, Xi’an Jiaotong University. His interests focus on intelligent transportation and medical image analysis, specializing in Multimodal data fusion and Image Synthesis.

Keypoint-Guided Medical Video Segmentation Model With Spatiotemporal Feature Fusion Mar 16, 2026 →

No results found

AsyCMST: Asymmetric cross-modal spatio-temporal learning for multimodal ultrasound nodule recognition