Dual-stream Multiple Instance Learning Network for Whole Slide Image Classification with Self-supervised Contrastive Learning
Whole slide images (WSIs) have large resolutions and usually lack localized annotations. WSI classification can be cast as a multiple instance learning (MIL) problem when only slide-level labels are available. We propose a MIL-based method for WSI classification and tumor detection in WSI that does not require localized annotations. First, we propose a novel MIL aggregator that models the relations of the instances in a dual-stream architecture with trainable distance measurement. Second, since WSIs can produce large or unbalanced bags that hinder the training of MIL models, we propose to use self-supervised contrastive learning to extract good representations for MIL and alleviate the issue of prohibitive memory requirement for large bags. Third, we propose a pyramidal fusion mechanism for multiscale WSI features that further improves the classification and localization accuracy. The classification accuracy of our model compares favorably to fully-supervised methods, with less than 2% accuracy gap on two representative WSI datasets, and outperforms all previous MIL-based methods. Benchmark results on standard MIL datasets further show the superior performance of our MIL aggregator over other MIL models on general MIL problems.
READ FULL TEXT