Mistral AI 89GB Mixture of Experts – What we…

2023/12/10

Mistral AI 89GB Mixture of Experts – What we know so far!!!

Mistral AI launched a new MoE model of size 89GB, This is quick summary everything I know about i!

🔗 Links 🔗

Mistral MoE Model Launch – https://twitter.com/MistralAI/status/1733150512395038967

Download the model here – https://huggingface.co/DiscoResearch/mixtral-7b-8expert (Hugging Face Transformer Implementation)

Llama-Mistral Implementation – https://twitter.com/bjoern_pl/status/1733288666057818535

Mistral MoE Benchmarks – https://twitter.com/jphme/status/1733412003505463334

Quick Summaries about MoE –

Sophia Yang’s what is Mixture of Experts – https://twitter.com/sophiamyang/status/1733505991600148892

Omar Sanseviero’s
Lots of confusion about MoEs out there.

Lots of confusion about MoEs out there.

IIUC:
– Faster inference as a fixed number of experts is activated per token (if sparse). E.g., if n=1, just the most appropriate expert is activated.
– High VRAM usage; all experts need to be loaded.
– Work well when you run them on many…

— Omar Sanseviero (@osanseviero) December 9, 2023

❤️ If you want to support the channel ❤️
Support here:
Patreon – https://www.patreon.com/1littlecoder/
Ko-Fi – https://ko-fi.com/1littlecoder

🧭 Follow me on 🧭
Twitter – https://twitter.com/1littlecoder
Linkedin – https://www.linkedin.com/in/amrrs/

Mistral AI 89GB Mixture of Experts – What we…

⬇人気の記事！⬇

関連記事