Self-supervised Document Clustering Based on BERT with Data Augment
Contrastive learning is a good way to pursue discriminative unsupervised learning, which can inherit advantages and experiences of well-studied deep models without complexly novel model designing. In this paper, we propose two learning method for document clustering, the one is a partial contrastive learning with unsupervised data augment, and the other is a self-supervised contrastive learning. Both methods achieve state-of-the-art results in clustering accuracy when compared to recently proposed unsupervised clustering approaches.
READ FULL TEXT