Notes

Link to the code here

Highlights

The goal of the paper is to propose a lightweight segmentation network without compromising on the quality of the segmentation maps.
The authors propose TinyU-Net, a network with a UNet-like architecture based on a new CMRF module (Cascade Multi-Receptive Fields).

Introduction

Developping models which can run on limited resources is one of the challenges in the medical field to guarantee health equity. Those lightweight models usually try to work on a reduced number of parameters and computation complexity. For the segmentation task however, the existing methods produce results with lower quality than their non-lightweight counterparts due to a lower representation capacity.

There is therefore a need to develop lightweight models which also improve the segmentation quality.

Method

Figure 1: Details of the CMRF (left part of the figure) and the architecture of the TinyUNet (right part of the figure).

CMRF

Main idea: fuse information from multi-receptive fields with a lightweight cascading strategy

Reminders:

A pointwise convolution is a convolution that uses a 1x1 kernel.

A depthwise convolution applies a single convolutional filter for each input channel. Each channel is kept separate contrary to regular convolutions which mix the different channels. This leads to fewer parameters and number of operations.

Figure 2: Depth-wise convolution, from ¹

First step : the input with dimension \((C_{in}\times H\times W)\) is processed by a PWConv-BN-Act block (pointwise convolution + batch norm + activation)
- goal: extract feature information while regulating the number of output channels
Second step: separation of “odd” feature maps and “even” feature maps to apply two types of operations:
- linear operations: element-wise addition of the “odd” and “even” feature maps (inspired by mixup data augmentation) to have richer features
- cascade operations: the “even” feature maps are fed to \(N-1\) cascaded blocks of DWConv-BN (depthwise convolution + batch norm) which computes features from various receptive fields. Specifically, the deeper in the network the convolution block is, the larger the receptive field is.
Third step: the ouput of the addition and of the different DWConv-BN blocks are concatenated and processed through a PWConv-BN-Act block to fuse the information from multi-receptive fields while regulating the number of output channels

TinyU-Net

UNet-like architecture using the CMRF module:

encoder with four CMRF-Downsampling blocks
decoder with four Upsampling-concat-CMRF blocks
final PWConv to output C channels (where C is the number of segmentation label) in a light way

Implementation

GELU activation function (Gaussian Error Linear Unit)
The loss used is the sum of binary cross-entropy loss and dice loss.
Number of feature maps at the different stages: C1 = 64, C2 = 128, C3 = 256, and C4 = 512

Other training details:

Architecture implemented in PyTorch
Adam Optimizer, learning rate of 0.0001
300 epochs
Cosine annealing learning rate scheduler

Datasets

International Skin Imaging Collaboration (ISIC 2018)

3694 camera-acquired dermatologic images
binary segmentation of skin lesions

Novel Coronavirus Pneumonia (NCP)

750 CT slices from 150 COVID-19 patients
multi-label segmentation: background, lung field, ground-glass opacity, consolidation

Results

The author compare TinyU-Net with lightweight models (with a number of parameters inferior to 5M) and non-lightweight state-of-the-art models. They evaluate the models on the segmentation performance, number of parameters and computation complexity through FLOPs (floating point operations).

Main Segmentation results

Table 1: Results for ISIC2018 dataset

Table 2: Results for NCP dataset

On both datasets, TinyU-Net achieves the best or second-best mean IoU and mean Dice compared to the baselines (both lightweight and non-lightweight). TinyU-Net also has the lightest model in terms of number of parameters. It is only the third smallest in terms of FLOPs, behind UNeXt (see paper² and previous post³) and U-Lite⁴.
However, those two lightweight networks give sub-optimal segmentation performance.

TinyU-Net is not the only lightweight model to perform better than non-lightweight models: CMUNeXt ⁵ gives second best performance for mean IoU and mean Dice. An explanation given by the authors is that models with high computation complexity (like the non-lightweight models) might not have an advantage when working on a limited amount of data.

Figure 3: Comparative qualitative results on ISIC2018 (top two rows) and NCP (bottom two rows) datasets.

Visually, TinyU-Net gives satisfactory segmentation performance.

Ablation study

CMRF module

A first ablation study consists in replacing the blocks from two methods (lightweigh and non-lightweight) by a CMRF block. This increases the segmentation performances of these methods while reducing their cost.

Table 3: Ablation results (IoU (%)) for CMRF

Number of DWConv

A second ablation study finds the optimal number of cascade DWConv+BN blocks to be 8.

Table 4: Ablation results (mIoU (%)) for the number of DWConv-BN blocks on NCP dataset

Conclusion

Thanks to its lightweight CMRF module, the proposed TinyU-Net achieves competitive segmentation performance with only 0.48M parameters. The authors also show that their CMRF block can be adapted for other networks.

TinyU-Net: Lighter yet Better U-Net with Cascaded Multi-Receptive Fields