Fully Automated Multi-heartbeat
Echocardiography Video Segmentation
and Motion Tracking

2022 SPIE-Medical Imaging

Programming and experiments by Yida Chen '22

Abstract: Neural network-based video segmentation has proven effective in producing temporally-coherent segmentation and motion tracking of heart substructures in echocardiography. However, prior methods confine analysis to half-heartbeat systolic phase clips from end-diastole (ED) to end-systole (ES), requiring the specification of these frames in the video and limiting clinical applicability. Here we introduce CLAS-FV, a fully automated framework that extends upon this prior work, providing joint semantic segmentation and motion tracking in multi-beat echocardiograms. Our framework first employs a modified R2+1D ResNet stem, which is efficient in encoding spatiotemporal features, and further leverages sliding windows for both training and test time augmentation to accommodate the full cardiac cycle. First, through 10-fold cross-validation on the half-beat CAMUS dataset, we show that the R2+1D-based stem outperforms the prior 3D U-Net both in Dice overlap for all substructures, and in derived clinical indices of ED and ES ventricular volumes and ejection fraction (EF). Next, we use the large clinical EchoNet-Dynamic dataset to extend our framework to full multi-beat video segmentation. We obtain mean Dice overlap of 0.94/0.91 on left ventricle endocardium in ED/ES phases, and accurately infer EF (mean absolute error 5.3%) over 1269 test patients. The presented multi-heartbeat video segmentation framework promises fast and coherent segmentation and motion tracking for the rich phenotypic analysis of echocardiography.

Paper (pdf, preprint)

Yida was cited Honorable Mention for the Computing Research Association (CRA) Outstanding Undergraduate Research Awards, in recognition of Yida's years long and ongoing research contributions spanning medical image analysis (i.e., this work), and separately digital humanities with Drs. Ryan and Faden in math and film/media studies respectively.

Full video temporally-coherent semantic segmentation, from an example video in the EchoNet-Dynamic dataset.