Current location: Home> Gemini Tutorial> How to generate images using the Gemini API

How to generate images using the Gemini API

Author: LoRA Time:

The Gemini API supports the use of Gemini 2.0 Flash Experimental Edition and Imagen 3 to generate images.

Before calling the Gemini API, make sure that you have installed the selected SDK and have the Gemini API key configured for use.

1. Use Gemini to generate pictures

Gemini 2.0 Flash Experimental supports output text and embedded images. This way, you can use Gemini to edit images in conversational fashion, or generate output with interleaved text (for example, to generate blog posts with text and images in a single conversation). All generated images contain a SynthID watermark, and images in Google AI Studio also contain visible watermarks.

Note: Be sure to add responseModalities: ["TEXT", "IMAGE"] in the build configuration to generate text and image output using gemini-2.0-flash-exp-image-generation. Only pictures are not allowed.

The following example shows how to generate text and image output using Gemini 2.0:

Python

 from google import genai
from google.genai import types
from PIL import Image
from io import BytesIO
import base64
client = genai.Client()
contents = ('Hi, can you create a 3d rendered image of a pig '
           'with wings and a top hat flying over a happy '
           'futuristic scifi city with lots of greenery?')
response = client.models.generate_content(
   model="gemini-2.0-flash-exp-image-generation",
   contents=contents,
   config=types.GenerateContentConfig(
     response_modalities=['TEXT', 'IMAGE']
   )
)
for part in response.candidates[0].content.parts:
 if part.text is not None:
   print(part.text)
 elif part.inline_data is not None:
   image = Image.open(BytesIO((part.inline_data.data)))
   image.save('gemini-native-image.png')
   image.show()

JavaScript

 import { GoogleGenAI, Modality } from "@google/genai";
import * as fs from "node:fs";
async function main() {
 const ai = new GoogleGenAI({ apiKey: "GEMINI_API_KEY" });
 const contents =
   "Hi, can you create a 3d rendered image of a pig " +
   "with wings and a top hat flying over a happy " +
   "futuristic scifi city with lots of greenery?";
 // Set responseModalities to include "Image" so the model can generate an image
 const response = await ai.models.generateContent({
   model: "gemini-2.0-flash-exp-image-generation",
   contents: contents,
   config: {
     responseModalities: [Modality.TEXT, Modality.IMAGE],
   },
 });
 for (const part of response.candidates[0].content.parts) {
   // Based on the part type, either show the text or save the image
   if (part.text) {
     console.log(part.text);
   } else if (part.inlineData) {
     const imageData = part.inlineData.data;
     const buffer = Buffer.from(imageData, "base64");
     fs.writeFileSync("gemini-native-image.png", buffer);
     console.log("Image saved as gemini-native-image.png");
   }
 }
}
main();

REST

 curl -s -X POST 
 "https://generatedlanguage.googleapis.com/v1beta/models/gemini-2.0-flash-exp-image-generation:generateContent?key=$GEMINI_API_KEY" 
 -H "Content-Type: application/json" 
 -d '{
   "contents": [{
     "parts": [
       {"text": "Hi, can you create a 3d rendered image of a pig with wings and a top hat flying over a happy futuristic scifi city with lots of greenery?"}
     ]
   }],
   "generationConfig":{"responseModalities":["TEXT","IMAGE"]}
 }' 
 | grep -o '"data": "[^"]*"' 
 | cut -d'"' -f4 
 | base64 --decode > gemini-native-image.png 

download (1).jpeg

Fantasy Flying Pig Pictures generated by AI

According to the prompts and context, Gemini will generate content in different modes (text to picture, text to picture, text to picture, text, etc.). Here are some examples:

1. Text to picture

Example tip: "Generate a picture of the Eiffel Tower with fireworks in the background."

2. Text to pictures and text (interleaved)

Example tip: "Generate an illustration of a paella recipe."

3. Pictures and texts to pictures and texts (interleaved)

Question example: (Showing room pictures of furniture) "What other colors of sofas are suitable for my space? Can you update the picture?"

4. Picture editing (text and pictures to pictures)

Example tip: "Edit this image into a cartoon image"

Example tips: [Picture of cat] + [Picture of pillow] + "Make my cat's pattern with cross stitch on this pillow."

5. Multiple rounds of picture editing (chat)

Example tip: [Upload a picture of a blue car. ] "Convert this car into a convertible." "Now change the color to yellow."

2. Use Gemini to edit pictures

To perform image editing, add image as input. The following example demonstrates how to upload a base64-encoded image. For multiple images and larger loads, see the Picture Input section.

Python

 from google import genai
from google.genai import types
from PIL import Image
from io import BytesIO
import PIL.Image
image = PIL.Image.open('/path/to/image.png')
client = genai.Client()
text_input = ('Hi, This is a picture of me.'
           'Can you add a llama next to me?',)
response = client.models.generate_content(
   model="gemini-2.0-flash-exp-image-generation",
   contents=[text_input, image],
   config=types.GenerateContentConfig(
     response_modalities=['TEXT', 'IMAGE']
   )
)
for part in response.candidates[0].content.parts:
 if part.text is not None:
   print(part.text)
 elif part.inline_data is not None:
   image = Image.open(BytesIO(part.inline_data.data))
   image.show()

JavaScript

 import { GoogleGenAI, Modality } from "@google/genai";
import * as fs from "node:fs";
async function main() {
 const ai = new GoogleGenAI({ apiKey: "GEMINI_API_KEY" });
 // Load the image from the local file system
 const imagePath = "path/to/image.png";
 const imageData = fs.readFileSync(imagePath);
 const base64Image = imageData.toString("base64");
 // Prepare the content parts
 const contents = [
   { text: "Can you add a llama next to the image?" },
   {
     inlineData: {
       mimeType: "image/png",
       data: base64Image,
     },
   },
 ];
 // Set responseModalities to include "Image" so the model can generate an image
 const response = await ai.models.generateContent({
   model: "gemini-2.0-flash-exp-image-generation",
   contents: contents,
   config: {
     responseModalities: [Modality.TEXT, Modality.IMAGE],
   },
 });
 for (const part of response.candidates[0].content.parts) {
   // Based on the part type, either show the text or save the image
   if (part.text) {
     console.log(part.text);
   } else if (part.inlineData) {
     const imageData = part.inlineData.data;
     const buffer = Buffer.from(imageData, "base64");
     fs.writeFileSync("gemini-native-image.png", buffer);
     console.log("Image saved as gemini-native-image.png");
   }
 }
}
main();

REST

 IMG_PATH=/path/to/your/image1.jpeg
if [[ "$(base64 --version 2>&1)" = *"FreeBSD"* ]]; then
 B64FLAGS="--input"
else
 B64FLAGS="-w0"
fi
IMG_BASE64=$(base64 "$B64FLAGS" "$IMG_PATH" 2>&1)
curl -X POST 
 "https://generatedlanguage.googleapis.com/v1beta/models/gemini-2.0-flash-exp-image-generation:generateContent?key=$GEMINI_API_KEY" 
   -H 'Content-Type: application/json' 
   -d "{
     "contents": [{
       "parts":[
           {"text": "'Hi, This is a picture of me. Can you add a llama next to me"},
           {
             "inline_data": {
               "mime_type":"image/jpeg",
               "data": "$IMG_BASE64"
             }
           }
       ]
     }],
     "generationConfig": {"responseModalities": ["TEXT", "IMAGE"]}
   }" 
 | grep -o '"data": "[^"]*"' 
 | cut -d'"' -f4 
 | base64 --decode > gemini-edited-image.png

limit

1. For best results, please use the following languages: English, Spanish (Mexico), Japanese, Simplified Chinese, Hindi.

2. The image generation function does not support audio or video input.

3. The image generation function may not always trigger the following operations:

The model may only output text. Try to explicitly require image output (such as "generate image", "providing images at any time", "updating images").

The model may stop generating halfway through. Please try again or try another prompt.

4. When generating text for an image, Gemini will work best if you make text and then request an image containing text.

Select a model

Which model should you use to generate images? Depends on your usage scenario.

Gemini 2.0 is best for generating context-related images, mixing text and images, incorporating world knowledge, and inference images. You can use it to embed accurate and context-related visual content in long text sequences. You can also modify images in conversational manner using natural language while maintaining context throughout the conversation.

If image quality is your top priority, Imagen 3 is a better choice. Imagen 3 specializes in realistic, artistic details and specific artistic styles (such as Impressionism or animation). Imagen 3 is also ideal for performing specialized image editing tasks such as updating product backgrounds, magnifying images, and injecting branding and style into visual content. You can use Imagen 3 to create logos or other branded product designs.

3. Use Imagen 3 to generate images

The Gemini API provides access to Imagen 3, the highest-quality text-to-image model in Google, with many new and improved features. Imagen 3 can do the following:

1. Compared with the previous model, the generated images are richer in details, richer in light and less disturbing artifacts.

2. Understand tips for writing in natural language

3. Generate pictures in various formats and styles

4. Render text more efficiently than previous models

Note: Imagen 3 is available only for paid hierarchies and always contains a SynthID watermark.

Python

 from google import genai
from google.genai import types
from PIL import Image
from io import BytesIO
client = genai.Client(api_key='GEMINI_API_KEY')
response = client.models.generate_images(
   model='imagen-3.0-generate-002',
   prompt='Robot holding a red skateboard',
   config=types.GenerateImagesConfig(
       number_of_images= 4,
   )
)
for generated_image in response.generated_images:
 image = Image.open(BytesIO(generated_image.image.image_bytes))
 image.show()

JavaScript

 import { GoogleGenAI } from "@google/genai";
import * as fs from "node:fs";
async function main() {
 const ai = new GoogleGenAI({ apiKey: "GEMINI_API_KEY" });
 const response = await ai.models.generateImages({
   model: 'imagen-3.0-generate-002',
   prompt: 'Robot holding a red skateboard',
   config: {
     numberOfImages: 4,
   },
 });
 let idx = 1;
 for (const generatedImage of response.generatedImages) {
   let imgBytes = generatedImage.image.imageBytes;
   const buffer = Buffer.from(imgBytes, "base64");
   fs.writeFileSync(`imagen-${idx}.png`, buffer);
   idx++;
 }
}
main();

REST

 curl -X POST 
   "https://generativelanguage.googleapis.com/v1beta/models/imagen-3.0-generate-002:predict?key=GEMINI_API_KEY" 
   -H "Content-Type: application/json" 
   -d '{
       "instances": [
         {
           "prompt": "Robot holding a red skateboard"
         }
       ],
       "parameters": {
         "sampleCount": 4
       }
     }' 

download.jpeg

Image generated by AI: There are two furry rabbits in the kitchen

Imagen Model Parameters

Imagen currently only supports English prompts, as well as the following parameters:

(Name conventions vary by programming language.)

1.numberOfImages: The number of images to be generated, between 1 and 4 (including these two values). The default value is 4.

2.aspectRatio: Change the aspect ratio of the generated image. Supported values ​​include "1:1", "3:4", "4:3", "9:16" and "16:9". The default value is "1:1".

3.personGeneration: Allows the model to generate character pictures. The following values ​​are supported:

"DONT_ALLOW": Generating character pictures is prohibited.

"ALLOW_ADULT": Generate adult images, but not children's images. This is the default value.