Skip to main content
POST
/
vendors
/
klingai
/
v1
/
kling-v3
/
image-to-video
/
generation
Image to Video Generation
curl --request POST \
  --url https://api.mulerouter.ai/vendors/klingai/v1/kling-v3/image-to-video/generation \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "mode": "pro",
  "duration": 5,
  "first_frame": "https://example.com/reference-image.jpg",
  "prompt": "The character slowly turns to face the camera",
  "sound": "off"
}
'
{
  "task_info": {
    "id": "8e1e315e-b50d-4334-a231-be7d19a372f4",
    "status": "pending",
    "created_at": "2026-02-14T00:00:00.000Z",
    "updated_at": "2026-02-14T00:00:00.000Z"
  }
}

Overview

Generate videos from reference images using the Kling V3.0 model. In addition to the text-to-video features, image-to-video supports:
  • First/Last frame control — provide first_frame and last_frame for start-to-end frame interpolation
  • Element references — reference up to 3 subjects via elements with frontal and reference images

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json
first_frame
string | null

First frame image (URL or Base64). Sets the opening frame of the generated video.

Important: When using Base64 encoding, do not add any prefixes such as data:image/png;base64,. Provide only the Base64-encoded string itself.

  • Supported image formats: .jpg, .jpeg, .png
  • Image file size cannot exceed 10MB
  • Width and height dimensions must not be less than 300px
  • Aspect ratio should be between 1:2.5 ~ 2.5:1
  • At least one of first_frame and last_frame must be provided.
last_frame
string | null

Last frame image (URL or Base64). Sets the closing frame of the generated video.

  • At least one of first_frame and last_frame must be provided.
  • last_frame, dynamic_masks/static_mask, and camera_control are mutually exclusive.
prompt
string

Positive text prompt. Cannot exceed 2500 characters.

Required when multi_shot is false or when shot_type is intelligence.

Maximum string length: 2500
negative_prompt
string

Negative text prompt. Cannot exceed 2500 characters.

Maximum string length: 2500
multi_shot
boolean
default:false

Whether to generate a multi-shot video.

  • true: Enable multi-shot mode. prompt is ignored; use shot_type and multi_prompt instead.
  • false: Single-shot mode (default).
shot_type
enum<string>

Shot segmentation method. Required when multi_shot is true.

  • customize: Custom shots, requires multi_prompt.
  • intelligence: AI-generated shots, requires prompt.
Available options:
customize,
intelligence
multi_prompt
object[] | null

Shot prompt list for multi-shot videos.

  • Max 6 shots, min 1 shot.
  • Each shot prompt max 512 characters.
  • Each shot duration must not exceed total duration and must be >= 1.
  • Sum of all shot durations must equal total task duration.

Required when multi_shot is true and shot_type is customize.

Required array length: 1 - 6 elements
elements
object[] | null

Element definitions. Max 3 elements. Provide frontal and reference images. Use <<<element_1>>> in prompt to reference elements.

Required array length: 1 - 3 elements
sound
enum<string>
default:off

Generate audio simultaneously when generating videos.

  • on: Enable audio generation
  • off: Disable audio generation (silent video)
Available options:
on,
off
mode
enum<string>
default:std

Video generation mode.

std: Standard Mode (720P), cost-effective. pro: Professional Mode (1080P), higher quality video output.

Available options:
std,
pro
duration
integer
default:5

Video length in seconds (3-15).

Response

202 - application/json

Accepted - Task created successfully

task_info
object