A Dataset of Laryngeal Endoscopic Images with Comparative Study on Convolution Neural Network Based Semantic Segmentation
Purpose: Automated segmentation of anatomical structures in medical image analysis is a key step in defining topology to enable or assist autonomous intervention robots. Recent methods based on deep convolutional neural networks (CNN) have outperformed former heuristic methods. However, those methods were primarily evaluated on rigid, real-world environments. In this study, we evaluate existing segmentation methods for their use with soft tissue. Methods: The four CNN-based methods SegNet, UNet, ENet and ErfNet are trained with high supervision on a novel 7-class dataset of surgeries on the human larynx. The dataset contains 400 manually segmented images from two patients during laser incisions. The Intersection-over-Union (IoU) evaluation metric is used to measure the accuracy of each method. Stochastic inference is used to show the uncertainty of the individual models. Results: Our study shows that ErfNet is best suited for laryngeal soft tissue with a mean IoU of 48.6 efficiency is achieved by ENet with a mean inference time of 9.22 ms per image on an NVIDIA GeForce GTX 1080 Ti GPGPU. Conclusion: CNN-based methods for semantic segmentation are applicable to laryngeal soft tissue. The segmentation can be used for active constraints or autonomous control in robot-assisted laser surgery. Further improvements could be achieved by using a larger dataset or training the models in a self-supervised manner.
READ FULL TEXT