🔗 Links 🔗
Mistral MoE Model Launch – https://twitter.com/MistralAI/status/1733150512395038967
Download the model here – https://huggingface.co/DiscoResearch/mixtral-7b-8expert (Hugging Face Transformer Implementation)
Llama-Mistral Implementation – https://twitter.com/bjoern_pl/status/1733288666057818535
Mistral MoE Benchmarks – https://twitter.com/jphme/status/1733412003505463334
Quick Summaries about MoE –
Sophia Yang’s what is Mixture of Experts – https://twitter.com/sophiamyang/status/1733505991600148892
Omar Sanseviero’s
Lots of confusion about MoEs out there.
Lots of confusion about MoEs out there.
IIUC:
– Faster inference as a fixed number of experts is activated per token (if sparse). E.g., if n=1, just the most appropriate expert is activated.
– High VRAM usage; all experts need to be loaded.
– Work well when you run them on many…— Omar Sanseviero (@osanseviero) December 9, 2023
❤️ If you want to support the channel ❤️
Support here:
Patreon – https://www.patreon.com/1littlecoder/
Ko-Fi – https://ko-fi.com/1littlecoder
🧭 Follow me on 🧭
Twitter – https://twitter.com/1littlecoder
Linkedin – https://www.linkedin.com/in/amrrs/