Manus Invitation Code Application Guide
Character.AI launches AvatarFX: AI video generation model allows static images to "open to speak"
Manychat completes US$140 million Series B financing, using AI to accelerate global social e-commerce layout
Google AI Overview Severely Impacts SEO Click-through Rate: Ahrefs Research shows traffic drop by more than 34%
The Gemini API lets you use Veo 2, Google's advanced video generation model. Veo is designed to help you build a new generation of AI applications that convert user prompts and images into high-quality video material resources.
Note : Veo is a paid feature that cannot be run in the free tier. For more information, please visit Price page.
Veo is Google's most powerful video generation model to date. It can generate videos in various cinematic and visual styles, capturing the subtleties of the prompts to present delicate details consistently across the various pictures.
Modal | Text to video generation Image to video generation |
Request Delay Time | Minimum: 11 seconds Maximum: 6 minutes (peak hours) |
Variable length generation | 5-8 seconds |
Solution | 720p |
Frame rate | 24 frames/second |
Aspect Ratio | 16:9 - Horizontal 9:16 - Vertical |
Enter language (text to video) | English |
Note : For more information on Veo usage restrictions, see the Model , Price , and Rate Limiting pages.
Videos produced by Veo use SynthID, our tool to embed watermarks and identify AI-generated content, and pass security filters and memory checking processes to help reduce privacy, copyright, and bias risks.
Before calling the Gemini API, make sure that you have installed the selected SDK and have the Gemini API key configured for use.
To use Veo with Google Gen AI SDK, make sure you have one of the following versions installed:
1. Python v1.10.0 or later
2. TypeScript and JavaScript v0.8.0 or later
3. Go v1.0.0 or later
This section provides code examples for using text prompts and using images to generate videos.
You can generate videos through Veo using the following code:
import time from google import genai from google.genai import types client = genai.Client() # read API key from GOOGLE_API_KEY operation = client.models.generate_videos( model="veo-2.0-generate-001", prompt="Panning wide shot of a calico kitten sleeping in the sunshine", config=types.GenerateVideosConfig( person_generation="dont_allow", # "dont_allow" or "allow_adult" aspect_ratio="16:9", # "16:9" or "9:16" ), ) while not operation.done: time.sleep(20) operation = client.operations.get(operation) for n, generated_video in enumerate(operation.response.generated_videos): client.files.download(file=generated_video.video) generated_video.video.save(f"video{n}.mp4") # save the video
import { GoogleGenAI } from "@google/genai"; import { createWriteStream } from "fs"; import { Readable } from "stream"; const ai = new GoogleGenAI({ apiKey: "GOOGLE_API_KEY" }); async function main() { let operation = await ai.models.generateVideos({ model: "veo-2.0-generate-001", prompt: "Panning wide shot of a calico kitten sleeping in the sunshine", config: { personGeneration: "dont_allow", aspectRatio: "16:9", }, }); while (!operation.done) { await new Promise((resolve) => setTimeout(resolve, 10000)); operation = await ai.operations.getVideosOperation({ operation: operation, }); } operation.response?.generatedVideos?.forEach(async (generatedVideo, n) => { const resp = await fetch(`${generatedVideo.video?.uri}&key=GOOGLE_API_KEY`); // append your API key const writer = createWriteStream(`video${n}.mp4`); Readable.fromWeb(resp.body).pipe(writer); }); } main();
# Use curl to send a POST request to the predictLongRunning endpoint # The request body includes the prompt for video generation curl "${BASE_URL}/models/veo-2.0-generate-001:predictLongRunning?key=${GOOGLE_API_KEY}" -H "Content-Type: application/json" -X "POST" -d '{ "instances": [{ "prompt": "Panning wide shot of a calico kitten sleeping in the sunshine" } ], "parameters": { "aspectRatio": "16:9", "personGeneration": "dont_allow", } }' | tee result.json | jq .name | sed 's/"//g' > op_name
This code takes about 2-3 minutes to run, but it may take longer if resources are limited. After the run is complete, you should see the following video:
If you see an error message, instead of a video, it means there is limited resources and your request cannot be completed. In this case, run the code again.
The generated video will be stored on the server for 2 days and will be removed afterwards. If you want to save a local copy of the generated video, you must run result() and save() within 2 days of generation.
You can also use pictures to generate videos. The following code uses Imagen to generate the image, and then uses the generated image as the starting frame of the generated video.
First, use Imagen to generate the image:
prompt="Panning wide shot of a calico kitten sleeping in the sunshine", imagen = client.models.generate_images( model="imagen-3.0-generate-002", prompt=prompt, config=types.GenerateImagesConfig( aspect_ratio="16:9", number_of_images=1 ) ) imagen.generated_images[0].image
import { GoogleGenAI } from "@google/genai"; const ai = new GoogleGenAI({ apiKey: "GOOGLE_API_KEY" }); const response = await ai.models.generateImages({ model: "imagen-3.0-generate-002", prompt: "Panning wide shot of a calico kitten sleeping in the sunshine", config: { numberOfImages: 1, }, }); // you'll pass response.generatedImages[0].image.imageBytes to Veo
Then, use the generated image as the first frame to generate the video:
operation = client.models.generate_videos( model="veo-2.0-generate-001", prompt=prompt, image = imagen.generated_images[0].image, config=types.GenerateVideosConfig( # person_generation only accepts "dont_allow" for image-to-video aspect_ratio="16:9", # "16:9" or "9:16" number_of_videos=2 ), ) # Wait for videos to generate while not operation.done: time.sleep(20) operation = client.operations.get(operation) for n, video in enumerate(operation.response.generated_videos): fname = f'with_image_input{n}.mp4' print(fname) client.files.download(file=video.video) video.video.save(fname)
import { GoogleGenAI } from "@google/genai"; import { createWriteStream } from "fs"; import { Readable } from "stream"; const ai = new GoogleGenAI({ apiKey: "GOOGLE_API_KEY" }); async function main() { // get image bytes from Imagen, as shown above let operation = await ai.models.generateVideos({ model: "veo-2.0-generate-001", prompt: "Panning wide shot of a calico kitten sleeping in the sunshine", image: { imageBytes: response.generatedImages[0].image.imageBytes, // response from Imagen mimeType: "image/png", }, config: { aspectRatio: "16:9", numberOfVideos: 2, }, }); while (!operation.done) { await new Promise((resolve) => setTimeout(resolve, 10000)); operation = await ai.operations.getVideosOperation({ operation: operation, }); } operation.response?.generatedVideos?.forEach(async (generatedVideo, n) => { const resp = await fetch( `${generatedVideo.video?.uri}&key=GOOGLE_API_KEY`, // append your API key ); const writer = createWriteStream(`video${n}.mp4`); Readable.fromWeb(resp.body).pipe(writer); }); } main();
(Name conventions vary by programming language.)
1. prompt : the text prompt for the video. The image parameter is optional.
2. image : The image to be used as the first frame of the video. The prompt parameter is optional.
3. negativePrompt : a text string that describes what you want to block the model from being generated
4. aspectRatio : Change the aspect ratio of the generated video. Supported values are "16:9" and "9:16". The default value is "16:9".
5. personGeneration : Allows the model to generate character videos. The following values are supported:
Text to video generation:
"dont_allow" : It is prohibited to include characters or faces in pictures.
"allow_adult" : Generate videos that contain adults but not children.
Image to video generation function:
"dont_allow" : The default value generated by image to video, and is also the only value.
6. numberOfVideos : the requested output video, 1 or 2 .
7. durationSeconds : The duration (in seconds) of each output video, between 5 and 8 .
8. enhance_prompt : Enable or disable the prompt rewriter. Enabled by default.
To make the most of Veo, please include video-specific terms in the prompts. Veo can understand various terms related to:
1. Lens composition : Specify the number of framing and text in the lens (for example, "single lens", "two lens", "shoulder lens").
2. Camera positioning and movement : Use terms such as "head-up", "high angle", "worm eye viewing", "rodar arm shooting", "zoom shooting", "panning shooting" and "tracking shooting" to control the position and movement of the camera.
3. Focus and lens effects : Use terms such as "shallow depth of field", "deep depth of field", "soft focus", "macro lens" and "wide angle lens" to achieve specific visual effects.
4. Overall style and theme : Guide Veo's creative direction by specifying styles such as "science fiction", "rom anime", "action film" or "animation". You can also describe the body and background you want, such as "city landscape", "nature", "vehicles", or "animals".
This section of the Veo Guide contains examples of videos that you can create with Veo and explains how to modify the prompts to generate different results.
Veo applies security filters in Gemini to ensure that generated videos and uploaded photos are free of offensive content. We will block tips for breach of terms and guidelines.
Good tips should be descriptive and clear. To make the generated video as close as possible to your expectations, first identify the core idea and then optimize the idea by adding keywords and modifiers.
The question should contain the following elements:
1. Text : The object, person, animal or scene you want to present in the video.
2. Background : the background or environment in which the text is located.
3. Action : The action being performed by the text (such as walking , running , or turning heads ).
4. Style : It can be regular content or very specific content. Consider using specific movie genre keywords, such as horror , movie noir , or cartoon styles.
5. Camera movement : [Optional] The camera shooting method, such as aerial photography , head-up , top shot or low-angle shooting .
6. Composition : [Optional] The viewing method of the screen, such as wide-angle lens , close-up or special close-up .
7.Ambiance : [Optional] The effect of color and light on the scene, such as blue tones , night or warm tones .
The following tips can help you write tips for generating videos:
1. Use descriptive language : Use adjectives and adverbs to paint a clear picture for Veo.
2. Provide background information : Add background information as needed to help the model understand your needs.
3. Reference specific art style : If you have a specific aesthetic view, please refer to specific art style or art movement.
4. Use Prompt Engineering Tools : Consider exploring Prompt Engineering Tools or resources to help you optimize prompts and achieve the best results.
5. Enhance facial details in personal and group pictures : Specify facial details as the focus of the photo, such as using the word portrait in prompts.
This section provides several tips, highlighting how descriptive details can improve the effectiveness of each video.
This video demonstrates how to use the element in the basics of the question to write.
hint | Generated output |
---|---|
Close-up of melting icicles (body) on frozen rock walls (background), adopts cool tones (atmosphere), and maintains close-up details (actions) of water droplets through zoom (camera movement). | ![]() |
These videos demonstrate how to modify the problem by providing increasingly specific details so that Veo can optimize output results to your preference.
hint | Generated output | analyze |
---|---|---|
Camera cart shot, close-up of a desperate man in a green windbreaker. He is making a call using a rotating wall-mounted phone with green neon lights. It looks like a movie scene. | ![]() | This is the first video generated based on the prompt. |
The movie-like close-up follows a desperate man in a broken green windbreaker who dials a rotating phone mounted on a rough brick wall while the gloom of green neon lights envelops him. The camera pushed closer, showing his jaw tightness and the despair engraved on his face as he tried to make a call. The shallow depth of field aims at his frown brows and the black rotating phone, and the background is blurred into a sea of neon colors and blurred shadows, creating a sense of urgency and loneliness. | ![]() | The more detailed the prompts, the more focused the video and the richer the environment. |
In the video, an old-fashioned rotating phone leans against the wall, which emits a weird green neon light. A desperate man in a green windbreaker is using the phone, and the camera pushes it to the man in a smooth motion. The camera started from the middle distance and slowly approached the man's face, showing his panic expression when he was eager to make a call and the sweat on his eyebrows. The picture focuses on the man's hand, his fingers groping on the dial, desperately trying to connect. Green neon lights cast long shadows on the walls, adding to the tension. The picture composition is intended to emphasize the man's loneliness and despair, highlighting the sharp contrast between the bright light of the neon light and the man's firm determination. | ![]() | Adding more details allows the text to present a realistic expression and create a vivid and infectious scene. |
This example demonstrates the output that Veo might generate for simple problems.
hint | Generated output |
---|---|
A cute creature with hair like a snow leopard strolling through the winter forest, rendered in 3D cartoon style. | ![]() |
This prompt is more detailed and shows the generated output, probably closer to what you want in the video.
hint | Generated output |
---|---|
Create short video 3D animation scenes in cheerful cartoon style. A cute creature with a fur like a snow leopard, big and charming eyes, round and cute body, jumping happily in the fantasy winter forest. The scene should include round snow-covered trees, gentle falling snowflakes, and warm sunshine through the branches. The creature's bounce and bright smile should convey pure joy. Use bright, cheerful colors and playful animations to create a positive and heartwarming atmosphere. | ![]() |
The following example shows how to optimize the problem by each basic element.
The following example shows how to specify a topic description.
Topic Description | hint | Generated output |
---|---|---|
This description can include one topic or multiple topics and actions. Here, our theme is "White Concrete Apartment Building". | Architectural rendering of a white concrete apartment building with smooth organic shapes that blend seamlessly with lush greenery and futuristic elements | ![]() |
The following example shows how to specify a context.
Context | hint | Generated output |
---|---|---|
The background or environment in which the subject is located is very important. Try to place the text in various contexts, such as busy streets, outer space. | A satellite is floating in outer space, with the moon and some stars in the background. | ![]() |
This example shows how to specify an action.
operate | hint | Generated output |
---|---|---|
What the subject is doing, such as walking, running, or turning his head. | Wide shot of footage: A woman walks on the beach, staring at the horizon with satisfaction and relaxation at sunset. | ![]() |
This example shows how to specify a style.
style | hint | Generated output |
---|---|---|
You can add keywords to improve the generated image quality and bring it closer to the expected style, such as shallow depth of field, cinematic images, minimalism, surrealism, retro, futuristic, or double exposure. | Movie noir style, a man and a woman walk on the street, mysterious, cinematic, black and white. | ![]() |
This example shows how to specify camera movement.
Camera movement | hint | Generated output |
---|---|---|
Camera action options include first-person viewing, aerial shot, tracking drone view or tracking shot. | POV shots shot from the cockpit of a vintage car have a cinematic texture when it rains in Canada at night. | ![]() |
This example shows how to specify composition.
Composition | hint | Generated output |
---|---|---|
Framing method (wide angle, close-up, low angle). | Close-up shot of eyes, which reflects the city view. | ![]() |
Create a video with wide-angle lenses of surfers walking on the beach with hand-held surfboards, and beautiful sunsets, with cinematic effects. | ![]() |
This example shows how to specify an atmosphere.
Atmosphere | hint | Generated output |
---|---|---|
The color palette plays a vital role in photography, affecting the atmosphere and conveying the expected emotions. You can try instructions like "Soft Warm Orange", "Natural Light", "Sunrise" or "Sunset". For example, warm gold tones can inject romantic and ambience into photos. | Close-up of a girl holding a cute golden retriever puppy in a sunny park. | ![]() |
A movie-like close-up shot of a sad lady taking a bus in the rain, with a cool blue tone and a sad atmosphere. | ![]() |
You can use Veo's picture to video function to make the pictures move. You can use existing assets or try to generate new assets using Imagen .
hint | Generated output |
---|---|
Bunny holding chocolate bar. | ![]() |
The rabbit ran away. | ![]() |
Negative prompts are a powerful tool that helps specify elements you don't want to appear in your video. After the words "Negative Prompt", explain what you want to block the model from being generated. Please follow the instructions below.
❌ Do not use imperative language or words such as "no" or "no". For example, "no walls" or "no walls displayed".
✅ Please describe what you don't want to see. For example, "wall, frame" means that you do not want walls or frames to appear in the video.
hint | Generated output |
---|---|
Generate a short styling animation showing a lonely large oak tree with leaves swaying violently in the strong wind. Trees should take a slightly exaggerated, eclectic form and have dynamic and smooth branches. The leaves should present various autumn colors and rotate and dance in the wind. The animation should use a warm, attractive color scheme. | ![]() |
Generate a short styling animation showing a lonely large oak tree with leaves swaying violently in the strong wind. Trees should take a slightly exaggerated, eclectic form and have dynamic and smooth branches. The leaves should present various autumn colors and rotate and dance in the wind. The animation should use a warm, attractive color scheme. Contains negative tips - city background, artificial building, darkness, storm or threatening atmosphere. | ![]() |
Gemini Veo video generation function supports the following two aspect ratios:
Aspect Ratio | illustrate |
---|---|
Widescreen or 16:9 | The most commonly used aspect ratio for TVs, monitors and mobile phone screens (landscape). If you want to shoot more backgrounds (such as landscapes), use this aspect ratio. |
Portrait or 9:16 | Rotating widescreen. This aspect ratio is very popular among short video applications such as YouTube Shorts. This aspect ratio can be used for portraits or taller objects with strong vertical orientations, such as buildings, trees or waterfalls. |
This prompt is an example of a widescreen aspect ratio of 16:9.
hint | Generated output |
---|---|
A video is made of a man driving a red convertible in Palm Springs in the 1970s, with drone tracking and shooting, with warm sunshine and long shadows. | ![]() |
This prompt is an example of a vertical aspect ratio of 9:16.
hint | Generated output |
---|---|
Create a video that focuses on the smooth movement of the majestic Hawaiian Falls in the dense rainforest. The focus is on presenting realistic water flow, meticulous leaves and natural light to create a peaceful atmosphere. Photographing the rushing river water, the misty atmosphere and the sunshine that scatters through the dense canopy. Use smooth cinema lenses to showcase the waterfall and its surroundings. Strive to create tranquil, authentic tones that bring viewers into the tranquil beauty of Hawaiian rainforest. | ![]() |
You can get more experience in generating AI videos through Veo Colab.