Current location: Home> Gemini Tutorial> How to generate videos using Veo

How to generate videos using Veo

Author: LoRA Time:

The Gemini API lets you use Veo 2, Google's advanced video generation model. Veo is designed to help you build a new generation of AI applications that convert user prompts and images into high-quality video material resources.

1. Introduction to Veo

Note : Veo is a paid feature that cannot be run in the free tier. For more information, please visit Price page.

Veo is Google's most powerful video generation model to date. It can generate videos in various cinematic and visual styles, capturing the subtleties of the prompts to present delicate details consistently across the various pictures.

Specification

Modal

Text to video generation

Image to video generation

Request Delay Time

Minimum: 11 seconds

Maximum: 6 minutes (peak hours)

Variable length generation 5-8 seconds
Solution 720p
Frame rate 24 frames/second
Aspect Ratio

16:9 - Horizontal

9:16 - Vertical

Enter language (text to video) English

Note : For more information on Veo usage restrictions, see the Model , Price , and Rate Limiting pages.

Videos produced by Veo use SynthID, our tool to embed watermarks and identify AI-generated content, and pass security filters and memory checking processes to help reduce privacy, copyright, and bias risks.

Preparation

Before calling the Gemini API, make sure that you have installed the selected SDK and have the Gemini API key configured for use.

To use Veo with Google Gen AI SDK, make sure you have one of the following versions installed:

1. Python v1.10.0 or later

2. TypeScript and JavaScript v0.8.0 or later

3. Go v1.0.0 or later

2. Generate video

This section provides code examples for using text prompts and using images to generate videos.

1. Generate based on text

You can generate videos through Veo using the following code:

Python

 import time
from google import genai
from google.genai import types
client = genai.Client() # read API key from GOOGLE_API_KEY
operation = client.models.generate_videos(
   model="veo-2.0-generate-001",
   prompt="Panning wide shot of a calico kitten sleeping in the sunshine",
   config=types.GenerateVideosConfig(
       person_generation="dont_allow", # "dont_allow" or "allow_adult"
       aspect_ratio="16:9", # "16:9" or "9:16"
   ),
)
while not operation.done:
   time.sleep(20)
   operation = client.operations.get(operation)
for n, generated_video in enumerate(operation.response.generated_videos):
   client.files.download(file=generated_video.video)
   generated_video.video.save(f"video{n}.mp4") # save the video

JavaScript

 import { GoogleGenAI } from "@google/genai";
import { createWriteStream } from "fs";
import { Readable } from "stream";
const ai = new GoogleGenAI({ apiKey: "GOOGLE_API_KEY" });
async function main() {
 let operation = await ai.models.generateVideos({
   model: "veo-2.0-generate-001",
   prompt: "Panning wide shot of a calico kitten sleeping in the sunshine",
   config: {
     personGeneration: "dont_allow",
     aspectRatio: "16:9",
   },
 });
 while (!operation.done) {
   await new Promise((resolve) => setTimeout(resolve, 10000));
   operation = await ai.operations.getVideosOperation({
     operation: operation,
   });
 }
 operation.response?.generatedVideos?.forEach(async (generatedVideo, n) => {
   const resp = await fetch(`${generatedVideo.video?.uri}&key=GOOGLE_API_KEY`); // append your API key
   const writer = createWriteStream(`video${n}.mp4`);
   Readable.fromWeb(resp.body).pipe(writer);
 });
}
main();

REST

 # Use curl to send a POST request to the predictLongRunning endpoint
# The request body includes the prompt for video generation
curl "${BASE_URL}/models/veo-2.0-generate-001:predictLongRunning?key=${GOOGLE_API_KEY}" 
 -H "Content-Type: application/json" 
 -X "POST" 
 -d '{
   "instances": [{
       "prompt": "Panning wide shot of a calico kitten sleeping in the sunshine"
     }
   ],
   "parameters": {
     "aspectRatio": "16:9",
     "personGeneration": "dont_allow",
   }
 }' | tee result.json | jq .name | sed 's/"//g' > op_name

This code takes about 2-3 minutes to run, but it may take longer if resources are limited. After the run is complete, you should see the following video:

小猫在阳光下睡觉。

If you see an error message, instead of a video, it means there is limited resources and your request cannot be completed. In this case, run the code again.

The generated video will be stored on the server for 2 days and will be removed afterwards. If you want to save a local copy of the generated video, you must run result() and save() within 2 days of generation.

2. Generate based on the picture

You can also use pictures to generate videos. The following code uses Imagen to generate the image, and then uses the generated image as the starting frame of the generated video.

First, use Imagen to generate the image:

Python

 prompt="Panning wide shot of a calico kitten sleeping in the sunshine",
imagen = client.models.generate_images(
   model="imagen-3.0-generate-002",
   prompt=prompt,
   config=types.GenerateImagesConfig(
     aspect_ratio="16:9",
     number_of_images=1
   )
)
imagen.generated_images[0].image

JavaScript

 import { GoogleGenAI } from "@google/genai";
const ai = new GoogleGenAI({ apiKey: "GOOGLE_API_KEY" });
const response = await ai.models.generateImages({
 model: "imagen-3.0-generate-002",
 prompt: "Panning wide shot of a calico kitten sleeping in the sunshine",
 config: {
   numberOfImages: 1,
 },
});
// you'll pass response.generatedImages[0].image.imageBytes to Veo

Then, use the generated image as the first frame to generate the video:

Python

 operation = client.models.generate_videos(
   model="veo-2.0-generate-001",
   prompt=prompt,
   image = imagen.generated_images[0].image,
   config=types.GenerateVideosConfig(
     # person_generation only accepts "dont_allow" for image-to-video
     aspect_ratio="16:9", # "16:9" or "9:16"
     number_of_videos=2
   ),
)
# Wait for videos to generate
while not operation.done:
 time.sleep(20)
 operation = client.operations.get(operation)
for n, video in enumerate(operation.response.generated_videos):
   fname = f'with_image_input{n}.mp4'
   print(fname)
   client.files.download(file=video.video)
   video.video.save(fname)

JavaScript

 import { GoogleGenAI } from "@google/genai";
import { createWriteStream } from "fs";
import { Readable } from "stream";
const ai = new GoogleGenAI({ apiKey: "GOOGLE_API_KEY" });
async function main() {
 // get image bytes from Imagen, as shown above
 let operation = await ai.models.generateVideos({
   model: "veo-2.0-generate-001",
   prompt: "Panning wide shot of a calico kitten sleeping in the sunshine",
   image: {
     imageBytes: response.generatedImages[0].image.imageBytes, // response from Imagen
     mimeType: "image/png",
   },
   config: {
     aspectRatio: "16:9",
     numberOfVideos: 2,
   },
 });
 while (!operation.done) {
   await new Promise((resolve) => setTimeout(resolve, 10000));
   operation = await ai.operations.getVideosOperation({
     operation: operation,
   });
 }
 operation.response?.generatedVideos?.forEach(async (generatedVideo, n) => {
   const resp = await fetch(
     `${generatedVideo.video?.uri}&key=GOOGLE_API_KEY`, // append your API key
   );
   const writer = createWriteStream(`video${n}.mp4`);
   Readable.fromWeb(resp.body).pipe(writer);
 });
}
main();

3. Veo model parameters

(Name conventions vary by programming language.)

1. prompt : the text prompt for the video. The image parameter is optional.

2. image : The image to be used as the first frame of the video. The prompt parameter is optional.

3. negativePrompt : a text string that describes what you want to block the model from being generated

4. aspectRatio : Change the aspect ratio of the generated video. Supported values ​​are "16:9" and "9:16". The default value is "16:9".

5. personGeneration : Allows the model to generate character videos. The following values ​​are supported:

Text to video generation:

"dont_allow" : It is prohibited to include characters or faces in pictures.

"allow_adult" : Generate videos that contain adults but not children.

Image to video generation function:

"dont_allow" : The default value generated by image to video, and is also the only value.

6. numberOfVideos : the requested output video, 1 or 2 .

7. durationSeconds : The duration (in seconds) of each output video, between 5 and 8 .

8. enhance_prompt : Enable or disable the prompt rewriter. Enabled by default.

4. Operations that can be tried

To make the most of Veo, please include video-specific terms in the prompts. Veo can understand various terms related to:

1. Lens composition : Specify the number of framing and text in the lens (for example, "single lens", "two lens", "shoulder lens").

2. Camera positioning and movement : Use terms such as "head-up", "high angle", "worm eye viewing", "rodar arm shooting", "zoom shooting", "panning shooting" and "tracking shooting" to control the position and movement of the camera.

3. Focus and lens effects : Use terms such as "shallow depth of field", "deep depth of field", "soft focus", "macro lens" and "wide angle lens" to achieve specific visual effects.

4. Overall style and theme : Guide Veo's creative direction by specifying styles such as "science fiction", "rom anime", "action film" or "animation". You can also describe the body and background you want, such as "city landscape", "nature", "vehicles", or "animals".

5. Veo Tips Guide

This section of the Veo Guide contains examples of videos that you can create with Veo and explains how to modify the prompts to generate different results.

1. Safety filter

Veo applies security filters in Gemini to ensure that generated videos and uploaded photos are free of offensive content. We will block tips for breach of terms and guidelines.

2. Basic knowledge of tips for writing

Good tips should be descriptive and clear. To make the generated video as close as possible to your expectations, first identify the core idea and then optimize the idea by adding keywords and modifiers.

The question should contain the following elements:

1. Text : The object, person, animal or scene you want to present in the video.

2. Background : the background or environment in which the text is located.

3. Action : The action being performed by the text (such as walking , running , or turning heads ).

4. Style : It can be regular content or very specific content. Consider using specific movie genre keywords, such as horror , movie noir , or cartoon styles.

5. Camera movement : [Optional] The camera shooting method, such as aerial photography , head-up , top shot or low-angle shooting .

6. Composition : [Optional] The viewing method of the screen, such as wide-angle lens , close-up or special close-up .

7.Ambiance : [Optional] The effect of color and light on the scene, such as blue tones , night or warm tones .

More tips for writing tips

The following tips can help you write tips for generating videos:

1. Use descriptive language : Use adjectives and adverbs to paint a clear picture for Veo.

2. Provide background information : Add background information as needed to help the model understand your needs.

3. Reference specific art style : If you have a specific aesthetic view, please refer to specific art style or art movement.

4. Use Prompt Engineering Tools : Consider exploring Prompt Engineering Tools or resources to help you optimize prompts and achieve the best results.

5. Enhance facial details in personal and group pictures : Specify facial details as the focus of the photo, such as using the word portrait in prompts.

3. Prompt and output examples

This section provides several tips, highlighting how descriptive details can improve the effectiveness of each video.

icicle

This video demonstrates how to use the element in the basics of the question to write.

hint Generated output
Close-up of melting icicles (body) on frozen rock walls (background), adopts cool tones (atmosphere), and maintains close-up details (actions) of water droplets through zoom (camera movement). Drip icicles on blue background.

A man is calling

These videos demonstrate how to modify the problem by providing increasingly specific details so that Veo can optimize output results to your preference.

hint Generated output analyze
Camera cart shot, close-up of a desperate man in a green windbreaker. He is making a call using a rotating wall-mounted phone with green neon lights. It looks like a movie scene. A man is on the phone. This is the first video generated based on the prompt.
The movie-like close-up follows a desperate man in a broken green windbreaker who dials a rotating phone mounted on a rough brick wall while the gloom of green neon lights envelops him. The camera pushed closer, showing his jaw tightness and the despair engraved on his face as he tried to make a call. The shallow depth of field aims at his frown brows and the black rotating phone, and the background is blurred into a sea of ​​neon colors and blurred shadows, creating a sense of urgency and loneliness. A man is calling The more detailed the prompts, the more focused the video and the richer the environment.
In the video, an old-fashioned rotating phone leans against the wall, which emits a weird green neon light. A desperate man in a green windbreaker is using the phone, and the camera pushes it to the man in a smooth motion. The camera started from the middle distance and slowly approached the man's face, showing his panic expression when he was eager to make a call and the sweat on his eyebrows. The picture focuses on the man's hand, his fingers groping on the dial, desperately trying to connect. Green neon lights cast long shadows on the walls, adding to the tension. The picture composition is intended to emphasize the man's loneliness and despair, highlighting the sharp contrast between the bright light of the neon light and the man's firm determination. A man is on the phone. Adding more details allows the text to present a realistic expression and create a vivid and infectious scene.

snow leopard

This example demonstrates the output that Veo might generate for simple problems.

hint Generated output
A cute creature with hair like a snow leopard strolling through the winter forest, rendered in 3D cartoon style. The snow leopard is listless.

Running snow leopard

This prompt is more detailed and shows the generated output, probably closer to what you want in the video.

hint Generated output
Create short video 3D animation scenes in cheerful cartoon style. A cute creature with a fur like a snow leopard, big and charming eyes, round and cute body, jumping happily in the fantasy winter forest. The scene should include round snow-covered trees, gentle falling snowflakes, and warm sunshine through the branches. The creature's bounce and bright smile should convey pure joy. Use bright, cheerful colors and playful animations to create a positive and heartwarming atmosphere. The snow leopard ran faster.

4. Example by writing elements

The following example shows how to optimize the problem by each basic element.

theme

The following example shows how to specify a topic description.

Topic Description hint Generated output
This description can include one topic or multiple topics and actions. Here, our theme is "White Concrete Apartment Building". Architectural rendering of a white concrete apartment building with smooth organic shapes that blend seamlessly with lush greenery and futuristic elements Placeholder.

Context

The following example shows how to specify a context.

Context hint Generated output
The background or environment in which the subject is located is very important. Try to place the text in various contexts, such as busy streets, outer space. A satellite is floating in outer space, with the moon and some stars in the background. Satellites floating in the Earth's atmosphere.

operate

This example shows how to specify an action.

operate hint Generated output
What the subject is doing, such as walking, running, or turning his head. Wide shot of footage: A woman walks on the beach, staring at the horizon with satisfaction and relaxation at sunset. The sunset is absolutely beautiful.

style

This example shows how to specify a style.

style hint Generated output
You can add keywords to improve the generated image quality and bring it closer to the expected style, such as shallow depth of field, cinematic images, minimalism, surrealism, retro, futuristic, or double exposure. Movie noir style, a man and a woman walk on the street, mysterious, cinematic, black and white. The film noir style is very beautiful.

Camera movement

This example shows how to specify camera movement.

Camera movement hint Generated output
Camera action options include first-person viewing, aerial shot, tracking drone view or tracking shot. POV shots shot from the cockpit of a vintage car have a cinematic texture when it rains in Canada at night. The sunset is so beautiful.

Composition

This example shows how to specify composition.

Composition hint Generated output
Framing method (wide angle, close-up, low angle). Close-up shot of eyes, which reflects the city view. The sunset is so beautiful.

Create a video with wide-angle lenses of surfers walking on the beach with hand-held surfboards, and beautiful sunsets, with cinematic effects. The sunset is so beautiful.

Atmosphere

This example shows how to specify an atmosphere.

Atmosphere hint Generated output
The color palette plays a vital role in photography, affecting the atmosphere and conveying the expected emotions. You can try instructions like "Soft Warm Orange", "Natural Light", "Sunrise" or "Sunset". For example, warm gold tones can inject romantic and ambience into photos. Close-up of a girl holding a cute golden retriever puppy in a sunny park. The little girl is holding a puppy in her arms.

A movie-like close-up shot of a sad lady taking a bus in the rain, with a cool blue tone and a sad atmosphere. A woman riding a bus looked sad.

5. Use reference images to generate videos

You can use Veo's picture to video function to make the pictures move. You can use existing assets or try to generate new assets using Imagen .

hint Generated output
Bunny holding chocolate bar. download (2).jpeg
The rabbit ran away. The rabbit is running away.

6. Negative prompt

Negative prompts are a powerful tool that helps specify elements you don't want to appear in your video. After the words "Negative Prompt", explain what you want to block the model from being generated. Please follow the instructions below.

❌ Do not use imperative language or words such as "no" or "no". For example, "no walls" or "no walls displayed".

✅ Please describe what you don't want to see. For example, "wall, frame" means that you do not want walls or frames to appear in the video.

hint Generated output
Generate a short styling animation showing a lonely large oak tree with leaves swaying violently in the strong wind. Trees should take a slightly exaggerated, eclectic form and have dynamic and smooth branches. The leaves should present various autumn colors and rotate and dance in the wind. The animation should use a warm, attractive color scheme. Trees using text.
Generate a short styling animation showing a lonely large oak tree with leaves swaying violently in the strong wind. Trees should take a slightly exaggerated, eclectic form and have dynamic and smooth branches. The leaves should present various autumn colors and rotate and dance in the wind. The animation should use a warm, attractive color scheme.
Contains negative tips - city background, artificial building, darkness, storm or threatening atmosphere.
Trees without negative words.

7. Aspect Ratio

Gemini Veo video generation function supports the following two aspect ratios:

Aspect Ratio illustrate
Widescreen or 16:9 The most commonly used aspect ratio for TVs, monitors and mobile phone screens (landscape). If you want to shoot more backgrounds (such as landscapes), use this aspect ratio.
Portrait or 9:16 Rotating widescreen. This aspect ratio is very popular among short video applications such as YouTube Shorts. This aspect ratio can be used for portraits or taller objects with strong vertical orientations, such as buildings, trees or waterfalls.

Wide screen

This prompt is an example of a widescreen aspect ratio of 16:9.

hint Generated output
A video is made of a man driving a red convertible in Palm Springs in the 1970s, with drone tracking and shooting, with warm sunshine and long shadows. The waterfall is very beautiful.

Vertical

This prompt is an example of a vertical aspect ratio of 9:16.

hint Generated output
Create a video that focuses on the smooth movement of the majestic Hawaiian Falls in the dense rainforest. The focus is on presenting realistic water flow, meticulous leaves and natural light to create a peaceful atmosphere. Photographing the rushing river water, the misty atmosphere and the sunshine that scatters through the dense canopy. Use smooth cinema lenses to showcase the waterfall and its surroundings. Strive to create tranquil, authentic tones that bring viewers into the tranquil beauty of Hawaiian rainforest. The waterfall is very beautiful.

You can get more experience in generating AI videos through Veo Colab.