Perseus AI: July 2021

In this blog I will talk about one of the most famous network for biomedical image segmentation that is U-Net. U-Net takes the idea from the Fully Connected Network. This modify and extend the FCN network such that it works with few training images and yields more precise segmentation.

The main idea of FCN is use of Upsampling layer to get the full segmentated mask of input images and combining the contracting network output with the upsampling layer output for precise output. Read more about FCN network in this blog.

The important points of U-Net architecture are

In Upsampling part large number of feature channels are used that allows the network to propagate context information to higher resolution layers. As a consequence, the expansive path is more or less symmetric to the contracting path, and yields a U-shaped architecture.
Only valid convolutional is used in the proposed network (without any padding). The output segmented map contains the output for pixels for which full context is available. This strategy allows the seamless segmentation of arbitrarily large images by an overlap-tile strategy. To predict the pixels in the border region of the image extrapolation is used. Convolution with padding can be used for fixed dimension input images.
Overlap-tile strategy for seamless segmentation of arbitrary large image. Prediction of the segmentation in the yellow area, requires image data within the blue area as input.

As in case of biomedical imaging very little training data is available, data augmentation by applying elastic deformations to the available training images were used. This allows the network to learn invariance to such deformations. This is particularly important in biomedical segmentation, since deformation used to be the most common variation in tissue and realistic deformations can be simulated efficiently.
From the below image, if we remove grid, then it will be hard to differentiate between real and deformed image.

Elastic Augmentation

Another challenge in many cell segmentation task is the separation of touching object of the same class. For this author proposed the use of a weighted loss, where the separating background labels between touching cells obtain a large weight in the loss function.

a) Raw image of HeLa cells. b) Overlay with ground truth segmentation. Different color indicate different instances of the HeLa cells. c) Generated segmented mask (white: foreground, black: background). d) Map with a pixel-wise loss weight to force the network to learn the border pixels.

U-Net Architecture

Each Blue box corresponds to a multi-channel feature map. White boxes represent copied feature maps. The arrows denote the different operations. The number of channels is denoted in top of the box. The x-y size is provided at the lower left edge of the box.

It consists of a contracting path (left side / encoder) and an expansive path ( right side / decoder).

The contracting path follows the typical architecture of a convolutional network. It consists of the repeated block of two 3x3 convolution (unpadded convolution), each followed by ReLU and 2x2 max pooling operation with stride 2 for downsampling. At each downsampling number of feature channels get doubled as usual.

Every step in the expansive path consists of an upsampling of the feature map by a 2x2 up-convolution that halves the number of feature channels, a concatenation with the correspondingly cropped feature map from the contracting path, and two 3x3 convolutions each followed by a ReLU. The cropping is necessary due to the loss of border pixels in every convolution.

2x2 up-convolution
At the final layer a 1x1 convolution is used to map each 64-component feature vector to the desired number of classes. In total, network has 23 convolution layers.

One may look this contraction and expansion path as getting the information of WHAT and WHERE from the image.

Contraction and Expansion Path

As we move along the contraction network we loose WHERE information and gain WHAT information of the image and the expansion path recovers the WHERE information by gradually applying upsampling.

Weighted Loss for precise boundary

For segmentation one can use cross-entropy loss or Dice loss.

As we can see in the above images of HeLa cells that cells are almost connected and with normal loss it's little difficult to separate the instances of cell during prediction.

In the U-Net paper, author talked about using weighted cross entropy loss for precise prediction of overlapping boundary.

Weighted map for each ground truth segmentation in computed to compensate the different frequency of pixels from a certain class in the training data set and to force the network to learn the small separation border which is introduced in between touching cell.

The separation border is computed using morphological operations. The weight map is then calculated as

Weighted map

where, wc is the weight map to balance the class frequency (foreground, background), d1(x) is the distance to the nearest cell border at position x, d2(x) is the distance to the second nearest cell border. w0=10 and σ=5 were choosen.

Segmentation Mask

We can see the weight map in the above image.

Weighted Cross Entropy Loss

Pixel wise soft-max over the final feature map is calculated and combined with the cross entropy loss. The cross entropy loss is penalized at each position by the weight map w(x), which help the network to learn the separation boundary between touching cells.

Results

EM Segmentation Challenge

Achieved new best score, in term of warping error which was much better than sliding window CNN network.
Network is fast, training time is 10 hours and inference time is 1s per image on 6 GB NVidia Titan GPU.

EM Segmentation Challenge

ISBI Cell Tracking Challenge

U-Net achieves 92% IoU while second best method algorithm achieves only 83% IOU .

Perseus AI

Pages

Monday, July 12, 2021

U-Net: Convolutional Networks for Biomedical Image Segmentation