Skip to main content

Mistral AI drops new ‘mixture of experts’ model with a torrent link

A crowd of tourists gathers around the Eiffel Tower in Paris, France, as it transforms into a giant mecha.
Credit: VentureBeat made with Midjourney

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now


As Google unleashed a barrage of artificial intelligence announcements at its Cloud Next conference, Mistral AI decided to jump into action with the launch of its latest sparse mixture of experts (SMoE) model: Mixtral 8x22B.

However, unlike its competitors, the Paris-based startup, which raised Europe’s largest-ever seed round in June 2023 and has become a rising star in the AI domain, didn’t push the release through a demo video or blog post. Instead, it has continued with the unorthodox approach of dropping a torrent link on X to download and test the new model.

This marks the third major model release in the last couple of days, following the general availability of GPT-4 Turbo with vision and Gemini 1.5 Pro. Meta has also dropped hints of releasing Llama 3 next month.

Mistral’s big torrent file with a mixture of experts

While the capabilities of Mixtral 8x22B remain undisclosed, AI enthusiasts were quick to point out that this is a big release from Mistral – with the torrent leading to four files totaling 262GB – and might be difficult to run locally.


AI Scaling Hits Its Limits

Power caps, rising token costs, and inference delays are reshaping enterprise AI. Join our exclusive salon to discover how top teams are:

  • Turning energy into a strategic advantage
  • Architecting efficient inference for real throughput gains
  • Unlocking competitive ROI with sustainable AI systems

Secure your spot to stay ahead: https://bit.ly/4mwGngO


“When I bought my M1 Max Macbook I thought 32 GB would be overkill for what I do since I don’t work in art or design. I never thought my interest in AI would suddenly make that far from enough,” a Reddit user wrote when discussing the new model.

Soon after Mistral’s X post to download Mixtral 8x22B, the model also dropped on Hugging Face, where users can use it for further training and deployment. The listing notes that since the model is pretrained it does not have any moderation mechanisms too. Together AI has also made the model available for users to try.

With its sparse MoE approach, Mistral strives to provide users with a combination of different models, each specializing in a different category of tasks, to optimize for performance and cost.

“At every layer, for every token, a router network chooses two of these groups (the ‘experts’) to process the token and combine their output additively. This technique increases the number of parameters of a model while controlling cost and latency, as the model only uses a fraction of the total set of parameters per token,” Mistral notes on its website.

Previously, Mistral released Mixtral 8x7B using this approach. The model has 46.7B total parameters but uses only 12.9B parameters per token, thereby processing input and generating output at the same speed and for the same cost as a 12.9B model. In the latest one, Reddit users suggested, there are 130B total parameters with 38B being active parameters for generating tokens, if it is using two experts at a time.

For now, Mixtral 8x22B’s actual performance across benchmarks remains to be seen. Users, however, expect it will build on the original Mixtral, which outperformed Meta’s Llama 2 70B (from which it was trained) and OpenAI’s GPT-3.5 across multiple benchmarks, including GSM-8K and MMLU, while boasting faster inference.