Robust Mixture-of-Expert Training for Convolutional Neural Networks

by   Yihua Zhang, et al.

Sparsely-gated Mixture of Expert (MoE), an emerging deep model architecture, has demonstrated a great promise to enable high-accuracy and ultra-efficient model inference. Despite the growing popularity of MoE, little work investigated its potential to advance convolutional neural networks (CNNs), especially in the plane of adversarial robustness. Since the lack of robustness has become one of the main hurdles for CNNs, in this paper we ask: How to adversarially robustify a CNN-based MoE model? Can we robustly train it like an ordinary CNN model? Our pilot study shows that the conventional adversarial training (AT) mechanism (developed for vanilla CNNs) no longer remains effective to robustify an MoE-CNN. To better understand this phenomenon, we dissect the robustness of an MoE-CNN into two dimensions: Robustness of routers (i.e., gating functions to select data-specific experts) and robustness of experts (i.e., the router-guided pathways defined by the subnetworks of the backbone CNN). Our analyses show that routers and experts are hard to adapt to each other in the vanilla AT. Thus, we propose a new router-expert alternating Adversarial training framework for MoE, termed AdvMoE. The effectiveness of our proposal is justified across 4 commonly-used CNN model architectures over 4 benchmark datasets. We find that AdvMoE achieves 1 improvement over the original dense CNN, and enjoys the efficiency merit of sparsity-gated MoE, leading to more than 50 are available at


page 1

page 2

page 3

page 4


Neural Architecture Dilation for Adversarial Robustness

With the tremendous advances in the architecture and scale of convolutio...

When Adversarial Training Meets Vision Transformers: Recipes from Training to Architecture

Vision Transformers (ViTs) have recently achieved competitive performanc...

Group-wise Inhibition based Feature Regularization for Robust Classification

The vanilla convolutional neural network (CNN) is vulnerable to images w...

Extreme Value Preserving Networks

Recent evidence shows that convolutional neural networks (CNNs) are bias...

Neural Networks with Recurrent Generative Feedback

Neural networks are vulnerable to input perturbations such as additive n...

Enhancing Intrinsic Adversarial Robustness via Feature Pyramid Decoder

Whereas adversarial training is employed as the main defence strategy ag...

Applying Convolutional Neural Networks for Stock Market Trends Identification

In this paper we apply a specific type ANNs - convolutional neural netwo...

Please sign up or login with your details

Forgot password? Click here to reset