Manus Invitation Code Application Guide
Character.AI launches AvatarFX: AI video generation model allows static images to "open to speak"
Manychat completes US$140 million Series B financing, using AI to accelerate global social e-commerce layout
Google AI Overview Severely Impacts SEO Click-through Rate: Ahrefs Research shows traffic drop by more than 34%
The Gemini API supports the use of Gemini 2.0 Flash Experimental Edition and Imagen 3 to generate images.
Before calling the Gemini API, make sure that you have installed the selected SDK and have the Gemini API key configured for use.
Gemini 2.0 Flash Experimental supports output text and embedded images. This way, you can use Gemini to edit images in conversational fashion, or generate output with interleaved text (for example, to generate blog posts with text and images in a single conversation). All generated images contain a SynthID watermark, and images in Google AI Studio also contain visible watermarks.
Note: Be sure to add responseModalities: ["TEXT", "IMAGE"] in the build configuration to generate text and image output using gemini-2.0-flash-exp-image-generation. Only pictures are not allowed.
The following example shows how to generate text and image output using Gemini 2.0:
from google import genai from google.genai import types from PIL import Image from io import BytesIO import base64 client = genai.Client() contents = ('Hi, can you create a 3d rendered image of a pig ' 'with wings and a top hat flying over a happy ' 'futuristic scifi city with lots of greenery?') response = client.models.generate_content( model="gemini-2.0-flash-exp-image-generation", contents=contents, config=types.GenerateContentConfig( response_modalities=['TEXT', 'IMAGE'] ) ) for part in response.candidates[0].content.parts: if part.text is not None: print(part.text) elif part.inline_data is not None: image = Image.open(BytesIO((part.inline_data.data))) image.save('gemini-native-image.png') image.show()
import { GoogleGenAI, Modality } from "@google/genai"; import * as fs from "node:fs"; async function main() { const ai = new GoogleGenAI({ apiKey: "GEMINI_API_KEY" }); const contents = "Hi, can you create a 3d rendered image of a pig " + "with wings and a top hat flying over a happy " + "futuristic scifi city with lots of greenery?"; // Set responseModalities to include "Image" so the model can generate an image const response = await ai.models.generateContent({ model: "gemini-2.0-flash-exp-image-generation", contents: contents, config: { responseModalities: [Modality.TEXT, Modality.IMAGE], }, }); for (const part of response.candidates[0].content.parts) { // Based on the part type, either show the text or save the image if (part.text) { console.log(part.text); } else if (part.inlineData) { const imageData = part.inlineData.data; const buffer = Buffer.from(imageData, "base64"); fs.writeFileSync("gemini-native-image.png", buffer); console.log("Image saved as gemini-native-image.png"); } } } main();
curl -s -X POST "https://generatedlanguage.googleapis.com/v1beta/models/gemini-2.0-flash-exp-image-generation:generateContent?key=$GEMINI_API_KEY" -H "Content-Type: application/json" -d '{ "contents": [{ "parts": [ {"text": "Hi, can you create a 3d rendered image of a pig with wings and a top hat flying over a happy futuristic scifi city with lots of greenery?"} ] }], "generationConfig":{"responseModalities":["TEXT","IMAGE"]} }' | grep -o '"data": "[^"]*"' | cut -d'"' -f4 | base64 --decode > gemini-native-image.png
Fantasy Flying Pig Pictures generated by AI
According to the prompts and context, Gemini will generate content in different modes (text to picture, text to picture, text to picture, text, etc.). Here are some examples:
1. Text to picture
Example tip: "Generate a picture of the Eiffel Tower with fireworks in the background."
2. Text to pictures and text (interleaved)
Example tip: "Generate an illustration of a paella recipe."
3. Pictures and texts to pictures and texts (interleaved)
Question example: (Showing room pictures of furniture) "What other colors of sofas are suitable for my space? Can you update the picture?"
4. Picture editing (text and pictures to pictures)
Example tip: "Edit this image into a cartoon image"
Example tips: [Picture of cat] + [Picture of pillow] + "Make my cat's pattern with cross stitch on this pillow."
5. Multiple rounds of picture editing (chat)
Example tip: [Upload a picture of a blue car. ] "Convert this car into a convertible." "Now change the color to yellow."
To perform image editing, add image as input. The following example demonstrates how to upload a base64-encoded image. For multiple images and larger loads, see the Picture Input section.
from google import genai from google.genai import types from PIL import Image from io import BytesIO import PIL.Image image = PIL.Image.open('/path/to/image.png') client = genai.Client() text_input = ('Hi, This is a picture of me.' 'Can you add a llama next to me?',) response = client.models.generate_content( model="gemini-2.0-flash-exp-image-generation", contents=[text_input, image], config=types.GenerateContentConfig( response_modalities=['TEXT', 'IMAGE'] ) ) for part in response.candidates[0].content.parts: if part.text is not None: print(part.text) elif part.inline_data is not None: image = Image.open(BytesIO(part.inline_data.data)) image.show()
import { GoogleGenAI, Modality } from "@google/genai"; import * as fs from "node:fs"; async function main() { const ai = new GoogleGenAI({ apiKey: "GEMINI_API_KEY" }); // Load the image from the local file system const imagePath = "path/to/image.png"; const imageData = fs.readFileSync(imagePath); const base64Image = imageData.toString("base64"); // Prepare the content parts const contents = [ { text: "Can you add a llama next to the image?" }, { inlineData: { mimeType: "image/png", data: base64Image, }, }, ]; // Set responseModalities to include "Image" so the model can generate an image const response = await ai.models.generateContent({ model: "gemini-2.0-flash-exp-image-generation", contents: contents, config: { responseModalities: [Modality.TEXT, Modality.IMAGE], }, }); for (const part of response.candidates[0].content.parts) { // Based on the part type, either show the text or save the image if (part.text) { console.log(part.text); } else if (part.inlineData) { const imageData = part.inlineData.data; const buffer = Buffer.from(imageData, "base64"); fs.writeFileSync("gemini-native-image.png", buffer); console.log("Image saved as gemini-native-image.png"); } } } main();
IMG_PATH=/path/to/your/image1.jpeg if [[ "$(base64 --version 2>&1)" = *"FreeBSD"* ]]; then B64FLAGS="--input" else B64FLAGS="-w0" fi IMG_BASE64=$(base64 "$B64FLAGS" "$IMG_PATH" 2>&1) curl -X POST "https://generatedlanguage.googleapis.com/v1beta/models/gemini-2.0-flash-exp-image-generation:generateContent?key=$GEMINI_API_KEY" -H 'Content-Type: application/json' -d "{ "contents": [{ "parts":[ {"text": "'Hi, This is a picture of me. Can you add a llama next to me"}, { "inline_data": { "mime_type":"image/jpeg", "data": "$IMG_BASE64" } } ] }], "generationConfig": {"responseModalities": ["TEXT", "IMAGE"]} }" | grep -o '"data": "[^"]*"' | cut -d'"' -f4 | base64 --decode > gemini-edited-image.png
1. For best results, please use the following languages: English, Spanish (Mexico), Japanese, Simplified Chinese, Hindi.
2. The image generation function does not support audio or video input.
3. The image generation function may not always trigger the following operations:
The model may only output text. Try to explicitly require image output (such as "generate image", "providing images at any time", "updating images").
The model may stop generating halfway through. Please try again or try another prompt.
4. When generating text for an image, Gemini will work best if you make text and then request an image containing text.
Which model should you use to generate images? Depends on your usage scenario.
Gemini 2.0 is best for generating context-related images, mixing text and images, incorporating world knowledge, and inference images. You can use it to embed accurate and context-related visual content in long text sequences. You can also modify images in conversational manner using natural language while maintaining context throughout the conversation.
If image quality is your top priority, Imagen 3 is a better choice. Imagen 3 specializes in realistic, artistic details and specific artistic styles (such as Impressionism or animation). Imagen 3 is also ideal for performing specialized image editing tasks such as updating product backgrounds, magnifying images, and injecting branding and style into visual content. You can use Imagen 3 to create logos or other branded product designs.
The Gemini API provides access to Imagen 3, the highest-quality text-to-image model in Google, with many new and improved features. Imagen 3 can do the following:
1. Compared with the previous model, the generated images are richer in details, richer in light and less disturbing artifacts.
2. Understand tips for writing in natural language
3. Generate pictures in various formats and styles
4. Render text more efficiently than previous models
Note: Imagen 3 is available only for paid hierarchies and always contains a SynthID watermark.
Python
from google import genai from google.genai import types from PIL import Image from io import BytesIO client = genai.Client(api_key='GEMINI_API_KEY') response = client.models.generate_images( model='imagen-3.0-generate-002', prompt='Robot holding a red skateboard', config=types.GenerateImagesConfig( number_of_images= 4, ) ) for generated_image in response.generated_images: image = Image.open(BytesIO(generated_image.image.image_bytes)) image.show()
import { GoogleGenAI } from "@google/genai"; import * as fs from "node:fs"; async function main() { const ai = new GoogleGenAI({ apiKey: "GEMINI_API_KEY" }); const response = await ai.models.generateImages({ model: 'imagen-3.0-generate-002', prompt: 'Robot holding a red skateboard', config: { numberOfImages: 4, }, }); let idx = 1; for (const generatedImage of response.generatedImages) { let imgBytes = generatedImage.image.imageBytes; const buffer = Buffer.from(imgBytes, "base64"); fs.writeFileSync(`imagen-${idx}.png`, buffer); idx++; } } main();
curl -X POST "https://generativelanguage.googleapis.com/v1beta/models/imagen-3.0-generate-002:predict?key=GEMINI_API_KEY" -H "Content-Type: application/json" -d '{ "instances": [ { "prompt": "Robot holding a red skateboard" } ], "parameters": { "sampleCount": 4 } }'
Image generated by AI: There are two furry rabbits in the kitchen
Imagen currently only supports English prompts, as well as the following parameters:
(Name conventions vary by programming language.)
1.numberOfImages: The number of images to be generated, between 1 and 4 (including these two values). The default value is 4.
2.aspectRatio: Change the aspect ratio of the generated image. Supported values include "1:1", "3:4", "4:3", "9:16" and "16:9". The default value is "1:1".
3.personGeneration: Allows the model to generate character pictures. The following values are supported:
"DONT_ALLOW": Generating character pictures is prohibited.
"ALLOW_ADULT": Generate adult images, but not children's images. This is the default value.