Introduction

In this paper a recursive refinement network (RRN) is proposed for end-to-end unsupervised deformable registration. The performance of the algorithm is tested on DirLab COPDGene dataset including 10 inhale/exhale lung CT images of COPD patients with almost 300 landmarks on each image. The network achieved a state-of-the-art average target registration error (TRE) of 0.83 mm.

Methods

Multi-level features are extracted from fixed and moving images through 3D convolutional layers.
Features are normalized (to avoid feature vanishing at higher levels, as intermediate deformation vector fields (DVFs) are not supervised).
The moving features are warped with the 2x upsampled DVF predicted at previous level (no warping for the topmost level).
Local cost correlation volumes are computed in a memory efficient way (which are the inner dot product of fixed and moving features within a small radius).
DVF is estimated at the topmost level by using fixed features and cost volumes then it is refined, level by level, by using fixed features, cost volumes, context, and previous DVF.
Final DVF is estimated through a 7-layer dilated convolutional network (with a large receptive field).

Trulli

Fig.1 - The global architecture of RRN

Trulli

Fig. 2. The initial (a), intermediate (b) and final (c) DVF estimators. (d-e) are the network architectures of intermediate (d) and final (e) DVF estimators. Feature 1 represents features of the fixed image and feature 2 represents features of the moving image.

Loss Function

Similarity metric: Normalized local (patch-wise) cross correlation
Regularization: Total variation

Results

Trulli

Fig.3 - Example of RRN registration. From left to right: moving image, warped image, fixed image and deformation vector fields.

Trulli

Fig.4 - Comparison with state of the art classic registration methods.

Trulli

Fig.5 - Ablation experiment using VoxelMorph as a benchmark learning-based method.

Conclusions

The light weight Recursive Refinement Network (RRN) can handle large inhale-exhale deformations and outperforms state of the art pTV and VoxelMorph methods in terms of TRE on the DirLab dataset.

Remarks

The authors have generously made their code available on github. I was able to achieve mean TRE ~ 1.0mm on the same dataset. On our local dataset, I observed that by choosing a wider intensity range for the input images, the output deformation field does not follow the sliding motion of the lungs.

The network is called recursive, but in the default setting, the flow estimators are different (don’t share weights) with approx. 20 M parameters. However, it is optionally possible to share the same flow estimator between levels which drops the number of parameters to approx. 7 M.