Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now
Google has developed a new vision-language multimodal model under its Gemma umbrella of lightweight open models. Named PaliGemma, it is designed to address image captioning, visual question answering and image retrieval. It joins other Gemma variants, CodeGemma and RecurrentGemma, and is available starting today for developers to use within their projects.
Google announced PaliGemma at its developer conference. PaliGemma stands out as the sole model in the Gemma family designed to translate visual information into written language. It’s also a small language model (SLM). This distinction means it operates efficiently without requiring extensive memory or processing power, making it suitable for use on resource-constrained devices like smartphones, IoT devices, and personal computers.
Developers may be drawn to the model because it opens up a host of new potentials for their applications. PaliGemma could help app users generate content, offer more search capabilities, or help the visually impaired better understand the world around them. When we use AI, it’s usually provided through the cloud and through one or more large language models (LLMs). But in order to reduce latency — the time it takes from receiving an input to generating a response — developers may opt for SLMs. Or they may turn to these models when dealing with devices where internet reliability may be an issue.
Web and mobile apps are perhaps the more conventional use cases for PaliGemma, but it’s feasible that the model could be incorporated into wearables such as sunglasses that would compete against the Ray-Ban Meta Smart Glasses or in devices similar to the Rabbit r1 or Humane AI Pin. And let’s not forget about the robots that operate within our homes and offices. Because Gemma is built from the same research and technology behind Google Gemini, developers could be more comfortable adopting the technology in their work.
AI Scaling Hits Its Limits
Power caps, rising token costs, and inference delays are reshaping enterprise AI. Join our exclusive salon to discover how top teams are:
- Turning energy into a strategic advantage
- Architecting efficient inference for real throughput gains
- Unlocking competitive ROI with sustainable AI systems
Secure your spot to stay ahead: https://bit.ly/4mwGngO
The release of PaliGemma isn’t the only announcement Google is making today around Gemma. The company has also revealed its largest version of Gemma, containing 27 billion parameters.