* Sorry for the part where my face gets blurry
Download the EdrawMind for Free:https://bit.ly/46xIp8G and SAVE UP TO 40% discount here: https://bit.ly/46nbZgl
Enjoy 🙂
Become a Patron 🔥 – https://patreon.com/MatthewBerman
Join the Discord 💬 – https://discord.gg/xxysSXBxFW
Follow me on Twitter 🧠 – https://twitter.com/matthewberman
Subscribe to my Substack 🗞️ – https://matthewberman.substack.com/
Media/Sponsorship Inquiries 📈 – https://bit.ly/44TC45V
Need AI Consulting? ✅ – https://forwardfuture.ai/
Use RunPod – https://bit.ly/3OtbnQx
Links:
https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1
Looks like Mistral has a model that’s even better than Mixtral 8x7B, and they’re serving it to alpha users of their API.
Scoring 8.6 on MT-Bench, it’s frighteningly close to GPT-4, and beats all other models tested.
This is their ‘Medium’ size. ‘Large’ will likely beat GPT-4. pic.twitter.com/jaoXP8lyKl
— Matt Shumer (@mattshumer_) December 11, 2023
And here's a great MoE reading list via @sophiamyang:
– The Sparsely-Gated Mixture-of-Experts Layer (2017): https://t.co/bRUBJYKDQl
– GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding (2020) https://t.co/6oWby0QlMX
– MegaBlocks: Efficient Sparse…— Sebastian Raschka (@rasbt) December 11, 2023
Official post on Mixtral 8x7B: https://t.co/ce0ZjHhLVn
Official PR into vLLM shows the inference code:https://t.co/vJbmDG9RhG
New HuggingFace explainer on MoE very nice:https://t.co/lTaNCONUeI
In naive decoding, performance of a bit above 70B (Llama 2), at inference speed… https://t.co/OMSTfYXVsE
— Andrej Karpathy (@karpathy) December 11, 2023
https://huggingface.co/blog/moe
https://pub.towardsai.net/gpt-4-8-models-in-one-the-secret-is-out-e3d16fd1eee0
https://mistral.ai/news/mixtral-of-experts/
Chapters:
0:00 – About Mixtral 8x7B
9:00 – Installation Guide
13:06 – Mixtral Tests
#EdrawMind #EdrawMindAI #aipresentation #aimindmap