Keypoint-Guided Medical Video Segmentation Model With Spatiotemporal Feature Fusion

Mar 16, 2026·

Minghao Wang

Shaoyi Du*

Corresponding author

Huanhuan Huo

Juejiang

Dong Zhang

HAN Hongcheng

Shengdi Hou

Juan Wang*

Corresponding author

· 0 min read

Source Document DOI

Abstract

Atrial fibrillation, characterized by high prevalence and poor prognosis, presents a significant global health burden. Accurate segmentation and measurement of left ventricular and left atrial appendage morphology and function are essential for reliable risk assessment. However, these tasks are hindered by ambiguous boundaries, complex cardiac motion, and sparse annotations. To address these challenges, we propose a Keypoint-Guided Medical Video Segmentation Model with Spatiotemporal Feature Fusion (KG-STS). First, we propose a shape-constrained point encoder that explicitly encodes boundary points to improve the representation of ambiguous boundaries. Next, we introduce a motion-aware alignment module that models cardiac motion by forming coherent motion information across frames. Building on these two modules, we develop a keypoint-guided spatiotemporal feature fusion module that integrates spatial boundary representations with temporal motion cues to enhance decoding features under sparse annotations, enabling temporally consistent segmentation and supporting morphological measurement. We evaluate the segmentation and measurement performance of our method on a self-constructed multi-view transesophageal echocardiography dataset and two publicly available transthoracic echocardiography datasets. The results demonstrate that KG-STS achieves superior temporal consistency in segmentation and higher accuracy in morphological measurements compared to competing methods.

Type

Journal article

Publication

IEEE Transactions on Medical Imaging, 45(6)

License

CC-BY-4.0

Last updated on Mar 16, 2026

IEEE TMI

Authors

HAN Hongcheng (he/him)

PhD Candidate in Control Science and Engineering

Han Hongcheng (韩泓丞) received the degree of B.Eng. in School of Energy and Power Engineering, Xi’an Jiaotong University, Xi’an, China in 2020. Since then, he is studying for Ph.D. degree in Institute of Artificial Intelligence and Robotics, Xi’an Jiaotong University. His interests focus on intelligent transportation and medical image analysis, specializing in Multimodal data fusion and Image Synthesis.

← AsyCMST: Asymmetric cross-modal spatio-temporal learning for multimodal ultrasound nodule recognition Jul 1, 2026

A segment anything model for transesophageal echocardiography based on bidirectional spatiotemporal context fusion Mar 1, 2025 →

No results found

Keypoint-Guided Medical Video Segmentation Model With Spatiotemporal Feature Fusion