Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now
Elon Musk’s xAI has introduced its first multimodal model. Not only can it understand text, but it’s also capable of processing things seen in documents, diagrams, charts, screenshots and photographs. Grok-1.5 Vision, or Grok-1.5V, will be available soon to early testers and existing Grok users.
“Grok-1.5V is competitive with existing frontier multimodal models in a number of domains, ranging from multi-disciplinary reasoning to understanding documents, science diagrams, charts, screenshots, and photographs,” the company said in a blog post.
Today’s unveiling comes weeks after xAI revealed its updated chatbot model Grok-1.5.
The company highlights seven examples showcasing Grok-1.5V’s potential, from transforming a whiteboard sketch of a flowchart into Python code and generating a bedtime story simply from a kid’s drawing to explaining a meme, converting a table into a CSV file format and identifying if your deck has rotten wood and needs replacing.
AI Scaling Hits Its Limits
Power caps, rising token costs, and inference delays are reshaping enterprise AI. Join our exclusive salon to discover how top teams are:
- Turning energy into a strategic advantage
- Architecting efficient inference for real throughput gains
- Unlocking competitive ROI with sustainable AI systems
Secure your spot to stay ahead: https://bit.ly/4mwGngO

Testing against peers GPT-4V, Claude 3Sonnet, Claude 3 Opus and Gemini Pro 1.5, xAI claims its multimodal model stands out. It’s especially proud that Grok-1.5V outperforms its competitors in the RealWorldQA benchmark, a new metric it created to evaluate real-world spatial understanding.

To start, RealWorldQA trained using more than 700 images along with a question-and-answer for each item. The images varied from anonymized images taken from vehicles to other real-world samples. xAI is releasing RealWorldQA to the public under a Creative Commons license.
Musk’s AI company continues to make advancements as it works to keep up with OpenAI and other market leaders since its chatbot first hit the scene in November 2023. Grok-1.5V comes less than a month after xAI made its Grok AI open source. But its efforts haven’t been without controversy. Earlier this month, researchers revealed the Grok chatbot could instruct users on criminal activities.
Nevertheless, xAI presses forth in pursuit of building “beneficial [artificial general intelligence]” capable of understanding the universe. It reveals that “significant” updates will be made to Grok AI’s multimodal understanding and generation capabilities in the coming months.