Skip to main content
POST
/
vendors
/
minimax
/
v1
/
speech-2.8-turbo
/
text-to-speech
/
generation
Create Speech Generation Task
curl --request POST \
  --url https://api.mulerouter.ai/vendors/minimax/v1/speech-2.8-turbo/text-to-speech/generation \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "prompt": "Hello, this is a test of speech synthesis.",
  "voice_setting": {
    "voice_id": "Wise_Woman",
    "speed": 1,
    "vol": 1,
    "pitch": 0
  },
  "output_format": "url"
}
'
{
  "task_info": {
    "id": "8e1e315e-b50d-4334-a231-be7d19a372f4",
    "status": "pending",
    "created_at": "2026-03-03T00:00:00Z",
    "updated_at": "2026-03-03T00:00:00Z"
  }
}

Documentation Index

Fetch the complete documentation index at: https://mulerouter.ai/docs/llms.txt

Use this file to discover all available pages before exploring further.

Overview

Generate speech from text using the MiniMax Speech 2.8 Turbo model. Speech 2.8 Turbo delivers fast, high-quality synthesis:
  • Fast generation — optimized for speed with Turbo-tier performance
  • Pause tags — insert precise pauses with <#x#> syntax (x = 0.01-99.99 seconds)
  • Interjection tags — add natural expressions like (laughs), (sighs), (coughs), (clears throat), (gasps), (sniffs), (groans), (yawns)
  • Voice settings — control speed, volume, pitch, and emotion
  • 40+ languages — extensive language support with language boost
  • Audio customization — configurable format (MP3/PCM/FLAC), sample rate, channel, and bitrate
  • Voice modification — fine-tune pitch, intensity, and timbre
  • Pronunciation dictionary — custom pronunciation replacements
  • Loudness normalization — professional audio level control

Supported Voice IDs

Speech 2.8 Turbo supports the same 223+ system voice IDs as Speech 2.8 HD, covering 20+ languages including Chinese (Mandarin), Chinese (Cantonese), English, Japanese, Korean, Spanish, Portuguese, French, Indonesian, German, Russian, Italian, Arabic, Turkish, Ukrainian, Dutch, Vietnamese, Thai, Polish, Romanian, Greek, Czech, Finnish, and Hindi. See the Speech 2.8 HD documentation for the complete voice ID list.

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json
prompt
string
required

Text to convert to speech. Use <#x#> for pauses (x = 0.01-99.99 seconds). Supports interjection tags: (laughs), (sighs), (coughs), (clears throat), (gasps), (sniffs), (groans), (yawns).

Required string length: 1 - 10000
voice_setting
object

Voice configuration settings (optional, defaults to Wise_Woman).

audio_setting
object

Audio configuration settings.

language_boost
enum<string> | null

Enhance recognition of specified languages and dialects.

Available options:
Chinese,
Chinese,Yue,
English,
Arabic,
Russian,
Spanish,
French,
Portuguese,
German,
Turkish,
Dutch,
Ukrainian,
Vietnamese,
Indonesian,
Japanese,
Italian,
Korean,
Thai,
Polish,
Romanian,
Greek,
Czech,
Finnish,
Hindi,
Bulgarian,
Danish,
Hebrew,
Malay,
Slovak,
Swedish,
Croatian,
Hungarian,
Norwegian,
Slovenian,
Catalan,
Nynorsk,
Afrikaans,
auto
output_format
enum<string>
default:hex

Format of the output content (non-streaming only). Default: hex.

Available options:
url,
hex
pronunciation_dict
object

Custom pronunciation dictionary for text replacement.

normalization_setting
object

Loudness normalization settings for the audio.

voice_modify
object

Voice modification settings to adjust pitch, intensity, and timbre.

Response

202 - application/json

Accepted - Task created successfully

task_info
object