Here’s what’s new in Veo 3.1, Google’s latest AI video model

Did you know you can customize Google to filter garbage? Follow these steps for better search results, including adding my work at Lifehacker as a preferred source.
OpenAI’s new Sora app has been the center of attention in hyper-realistic AI over the past few weeks. Sora makes it very easy for users to generate short videos that look real enough for most people, including videos that bear resemblance to real people.
But before Sora was abandoned, it was Google that was raising fears about these realistic AI videos. With Veo 3, Google launched an AI model that not only produced realistic videos, but also generated realistic audio synchronized with the action. Sound effects, environments, and even dialogue could all be generated alongside the video itself, selling the effect entirely from a simple prompt.
I see 3.1
Now, Google is back with an upgrade to Veo, aptly named Veo 3.1, announced by the company in a blog post on Wednesday. It’s not necessarily a redesign or a revolutionary new video model. Instead, Veo 3.1 builds on Veo 3, adding “richer audio” and “enhanced realism” that Google says generates “true-to-life” textures. The new model would also support new narrative control tools, coupled with new upgrades to Flow, Google’s AI video editor. Flow users now have more granular controls when editing and can add audio to existing features like “Ingredients to Video,” “Frames to Video,” and “Extension.”
What does this mean in practice? According to Google, Ingredients to Video with Veo 3.1 allows users to add reference images to their scenes, such as a specific person, clothing or environment. The new Flow editor can then insert these elements into the finished product, as you can see in the demo video below:
Building on this new functionality, Flow now allows you to add new elements to an existing scene. With “Insert,” you can tell Veo 3.1 to add new characters, details, lighting effects and more to the clip. Google says it’s also working in reverse, to allow users to remove any items they don’t like from a generation.
Google also now offers users a new way to dictate how they want a scene to be generated, called “First and Last Frame.” Users can choose reference frames for the start and end of a scene. Flow with Veo 3.1 will then fill the gap and generate a scene that starts and ends based on those frames.
There is also now a way to create longer videos than those generated by previous iterations of Flow. The new ‘Extend’ feature allows you to either continue the action of the current clip or jump to a new scene that follows it, although Google says this feature is more useful for generating a longer establishing shot. According to the company, Extend can create videos longer than a minute.
Veo 3.1 is available to users of the Gemini app as well as Vertex AI, provided you have a Google AI Pro subscription. Developers can access it through the Gemini API. Google says the video ingredients, first and last frame, and extension arrive in the Gemini API, but “Add Object” and “Remove Object” are not available. “Extend” is also not yet available in the Vertex AI API.
Is this really a good thing?
Google sees all these advancements as a boon for creatives and creativity, but I’m very skeptical. I could see Veo 3.1 and Flow as a good tool for visualizing shots before filming or animating them (i.e. a storyboard tool), or even a way for new and aspiring filmmakers to learn editing by seeing their ideas in a more realized form. However, overall, I don’t think AI-generated content is the future – or, at least, not the future most of us want. Sure, there’s humor or novelty in some of these AI-generated videos, but I’d bet that most people who enjoy them do so ironically, or exclusively on social media.
The idea of replacing human filmmakers and actors with generations of AI seems absurd, especially when it puts us all at risk of misinformation. Is it really so important for companies like Google and OpenAI to make it easier to generate fully rendered hyper-realistic scenes, when these videos could so easily be used to fool the masses? These might be the ramblings of someone who resists change, but I don’t think most of us would like to see our favorite shows and movies made with passion and emotion, replaced by realistic characters delivering tone-deaf, robotic performances.