Skip to main content
POST
/
vendors
/
alibaba
/
v1
/
wan2.6-t2v
/
generation
Create Generation Task
curl --request POST \
  --url https://api.mulerouter.ai/vendors/alibaba/v1/wan2.6-t2v/generation \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "prompt": "<string>",
  "negative_prompt": "<string>",
  "size": "1280*720",
  "duration": 5,
  "prompt_extend": true,
  "shot_type": "single",
  "audio": true,
  "audio_url": "<string>",
  "seed": 1073741823
}
'
{
  "task_info": {
    "id": "3c90c3cc-0d44-4b50-8888-8dd25736052a",
    "status": "pending",
    "created_at": "2023-11-07T05:31:56Z",
    "updated_at": "2023-11-07T05:31:56Z"
  }
}
This API supports Alibaba Tongyi Wanxiang (Wan2) video generation models. Please refer to Alibaba Cloud’s official documentation for more details.

Overview

Generate videos from text prompts using the wan2.6-t2v model with support for longer durations and multi-shot generation.

Key Features

  • Text-to-video generation with audio support
  • Multiple resolution options (720P/1080P)
  • 5s, 10s, or 15s duration
  • Single or multi-shot generation

Resolution Options

720P

  • 1280×720 (16:9)
  • 720×1280 (9:16)
  • 960×960 (1:1)
  • 1088×832 (4:3)
  • 832×1088 (3:4)

1080P

  • 1920×1080 (16:9)
  • 1080×1920 (9:16)
  • 1440×1440 (1:1)
  • 1632×1248 (4:3)
  • 1248×1632 (3:4)

Example Requests

Basic Text-to-Video

{
  "prompt": "Miyazaki-style mule dancing",
  "size": "1280*720",
  "duration": 5
}

Multi-shot Video

{
  "prompt": "Waves crashing against rocks, water splashing, sunlight on the sea",
  "size": "1920*1080",
  "duration": 10,
  "prompt_extend": true,
  "shot_type": "multi",
  "audio": true
}

With Custom Audio

{
  "prompt": "A person walking through a city at night",
  "size": "1280*720",
  "duration": 15,
  "audio_url": "https://example.com/city_sounds.mp3"
}

Parameters

shot_type

  • Default: single
  • Effect: When prompt rewriting is enabled, controls whether the output is single-shot or multi-shot
  • Only effective when: prompt_extend is enabled

duration

  • Options: 5, 10, or 15 seconds
  • Note: Longer durations may increase processing time

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json
prompt
string
required

Text description for the desired video content (max 2000 characters).

Maximum string length: 2000
negative_prompt
string

Negative prompt describing unwanted content (max 500 characters).

Maximum string length: 500
size
enum<string>
default:1280*720

Output resolution ("width*height"). Supported tiers:

  • 720P: 1280*720 (16:9), 720*1280 (9:16), 960*960 (1:1), 1088*832 (4:3), 832*1088 (3:4)
  • 1080P: 1920*1080 (16:9), 1080*1920 (9:16), 1440*1440 (1:1), 1632*1248 (4:3), 1248*1632 (3:4)
Available options:
1280*720,
720*1280,
960*960,
1088*832,
832*1088,
1920*1080,
1080*1920,
1440*1440,
1632*1248,
1248*1632
duration
enum<integer>

Video duration in seconds. Supported values 5, 10, or 15.

Available options:
5,
10,
15
prompt_extend
boolean
default:true

Enable intelligent prompt rewriting (slightly longer latency, better detail).

shot_type
enum<string> | null
default:single

Specifies the shot type for video generation.

Only takes effect when prompt_extend is enabled.

  • single: Default value, outputs single-shot video
  • multi: Outputs multi-shot video
Available options:
single,
multi
audio
boolean | null
default:true

Enable automatic audio generation. Set to false to force a silent output.

audio_url
string<uri> | null

Custom audio file URL (wav/mp3, 3-30s, ≤15MB). Overrides the audio flag.

seed
integer

Random seed [0, 2147483647].

Required range: 0 <= x <= 2147483647

Response

202 - application/json

Accepted - Task created successfully

task_info
object