Self-omics: A Self-supervised Learning Framework for Multi-omics Cancer Data

by   Sayed Hashim, et al.

We have gained access to vast amounts of multi-omics data thanks to Next Generation Sequencing. However, it is challenging to analyse this data due to its high dimensionality and much of it not being annotated. Lack of annotated data is a significant problem in machine learning, and Self-Supervised Learning (SSL) methods are typically used to deal with limited labelled data. However, there is a lack of studies that use SSL methods to exploit inter-omics relationships on unlabelled multi-omics data. In this work, we develop a novel and efficient pre-training paradigm that consists of various SSL components, including but not limited to contrastive alignment, data recovery from corrupted samples, and using one type of omics data to recover other omic types. Our pre-training paradigm improves performance on downstream tasks with limited labelled data. We show that our approach outperforms the state-of-the-art method in cancer type classification on the TCGA pan-cancer dataset in semi-supervised setting. Moreover, we show that the encoders that are pre-trained using our approach can be used as powerful feature extractors even without fine-tuning. Our ablation study shows that the method is not overly dependent on any pretext task component. The network architectures in our approach are designed to handle missing omic types and multiple datasets for pre-training and downstream training. Our pre-training paradigm can be extended to perform zero-shot classification of rare cancers.


page 1

page 2

page 3

page 4


MimCo: Masked Image Modeling Pre-training with Contrastive Teacher

Recent masked image modeling (MIM) has received much attention in self-s...

Wav2vec-S: Semi-Supervised Pre-Training for Speech Recognition

Self-supervised pre-training has dramatically improved the performance o...

Voucher Abuse Detection with Prompt-based Fine-tuning on Graph Neural Networks

Voucher abuse detection is an important anomaly detection problem in E-c...

Self-Supervised Detection of Contextual Synonyms in a Multi-Class Setting: Phenotype Annotation Use Case

Contextualised word embeddings is a powerful tool to detect contextual s...

ST-KeyS: Self-Supervised Transformer for Keyword Spotting in Historical Handwritten Documents

Keyword spotting (KWS) in historical documents is an important tool for ...

Statistical-mechanical analysis of pre-training and fine tuning in deep learning

In this paper, we present a statistical-mechanical analysis of deep lear...

Self-Supervised Learning for Data Scarcity in a Fatigue Damage Prognostic Problem

With the increasing availability of data for Prognostics and Health Mana...

Please sign up or login with your details

Forgot password? Click here to reset