Revisiting single-gated Mixtures of Experts


Amelie Royer (Qualcomm Research),* Ilia Karmanov (Qualcomm Research), Andrii Skliar (Qualcomm AI Research), Babak Ehteshami Bejnordi (Qualcomm AI Reseach), Tijmen Blankevoort (Qualcomm)
The 33rd British Machine Vision Conference

Abstract

Mixture of Experts (MoE) are rising in popularity as a means to train extremely large-scale models, yet allowing for a reasonable computational cost at inference time. Recent state-of-the-art approaches usually assume a large number of experts, and require training all experts jointly, which often lead to training instabilities such as the router collapsing In contrast, in this work, we propose to revisit the simple single-gate MoE, which allows for more practical training. Key to our work are (i) a base model branch acting both as an early-exit and an ensembling regularization scheme, (ii) a simple and efficient asynchronous training pipeline without router collapse issues, and finally (iii) a per-sample clustering-based initialization. We show experimentally that the proposed model obtains efficiency-to-accuracy trade-offs comparable with other more complex MoE, and outperforms non-mixture baselines. This showcases the merits of even a simple single-gate MoE, and motivates further exploration in this area.

Video



Citation

@inproceedings{Royer_2022_BMVC,
author    = {Amelie Royer and Ilia Karmanov and Andrii Skliar and Babak Ehteshami Bejnordi and Tijmen Blankevoort},
title     = {Revisiting single-gated Mixtures of Experts},
booktitle = {33rd British Machine Vision Conference 2022, {BMVC} 2022, London, UK, November 21-24, 2022},
publisher = {{BMVA} Press},
year      = {2022},
url       = {https://bmvc2022.mpi-inf.mpg.de/0736.pdf}
}


Copyright © 2022 The British Machine Vision Association and Society for Pattern Recognition
The British Machine Vision Conference is organised by The British Machine Vision Association and Society for Pattern Recognition. The Association is a Company limited by guarantee, No.2543446, and a non-profit-making body, registered in England and Wales as Charity No.1002307 (Registered Office: Dept. of Computer Science, Durham University, South Road, Durham, DH1 3LE, UK).

Imprint | Data Protection