Meta's new Make-a-Video signals the next generative AI evolution

Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now

This morning Meta CEO Mark Zuckerberg posted on his Facebook page to announce Make-A-Video, a new AI system that allows users to turn text prompts, like “a teddy bear painting a self-portrait,” into short, high-quality, one-of-a-kind video clips.

Sound like DALL-E? That’s the idea: According to a press release, Make-A-Video builds on AI image generation technology (including Meta’s Make-A-Scene work from earlier this year) by “adding a layer of unsupervised learning that allows the system to understand motion in the physical world and apply it to traditional text-to-image generation.”

“This is pretty amazing progress,” Zuckerberg wrote in his post. “It’s much harder to generate video than photos because beyond correctly generating each pixel, the system also has to predict how they’ll change over time.”

A year after DALL-E

It’s hard to believe that it has been only about a year since the original DALL-E was unveiled January 2021, while 2022 has seemed to be the year of the text-to-image revolution thanks to DALL-E 2, Midjourney, Stable Diffusion and other large generative models allowing users to create realistic images and art from natural text prompts.

AI Scaling Hits Its Limits

Power caps, rising token costs, and inference delays are reshaping enterprise AI. Join our exclusive salon to discover how top teams are:

Turning energy into a strategic advantage
Architecting efficient inference for real throughput gains
Unlocking competitive ROI with sustainable AI systems

Secure your spot to stay ahead: https://bit.ly/4mwGngO

Is Meta’s new Make-A-Video a sign that the next step of generative AI, text-to-video, is about to go mainstream? Given the sheer speed of text-to-image evolution this year — Midjourney even created controversy with an image that won an art competition at the Colorado State Fair — it certainly seems possible. A couple of weeks ago, video editing software company Runway released a promotional video teasing a new feature of its AI-powered web-based video editor that can edit video from written descriptions.

Meta’s Make A Video taps AI to bring your imagination to video.

And the demand for text-to-video generators at the level of today’s text-to-image options is high, thanks to the need for video content across all channels — from social media advertising and video blogs to explainer videos.

Meta, for its part, seems confident, according to its research paper introducing Make-A-Video: “In all aspects, spatial and temporal revolution, faithfulness to text, and quality, we present state-of-the-art results in text-to-video generation, as determined by both qualitative and quantitative measures.”

Daily insights on business use cases with VB Daily

If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.

Read our Privacy Policy

Thanks for subscribing. Check out more VB newsletters here.

An error occured.

A year after DALL-E

The AI insights you need to lead