Overview
Qwen3.6 Max Preview is a preview release of the next-generation Qwen3.6 Max flagship model. It features a 256K context window with tiered pricing and cache-aware billing, delivering enhanced reasoning and code generation capabilities.
Key Features
- 256K context window — Process up to 256K tokens per request
- Cache-aware billing — Implicit cache saves 80% on repeated prefixes automatically; explicit cache (
cache_control) costs 125% to create but saves 90% on subsequent hits
- Tiered pricing — $1.30 input / $7.80 output under 128K, scaling for longer contexts
- OpenAI-compatible API — Drop-in replacement using chat completions format
Context Cache
Two caching modes reduce costs on repeated input prefixes:
| Explicit Cache | Implicit Cache |
|---|
| Setup | Add "cache_control": {"type": "ephemeral"} | Automatic, no config |
| Cache creation cost | 125% of input price | 100% of input price |
| Cache hit cost | 10% of input price | 20% of input price |
| Min tokens | 1024 | 256 |
| TTL | 5 min (resets on hit) | Not guaranteed |
Explicit cache supports up to 4 markers per request. Cache is created after the model responds and requires at least 1024 tokens per block.
Explicit Cache Usage
Add "cache_control": {"type": "ephemeral"} to a content block. The system searches backward up to 20 content blocks from each marker to find a matching cache.
First request — create the cache:
[{
"role": "system",
"content": [{
"type": "text",
"text": "Long system prompt (1024+ tokens)...",
"cache_control": {"type": "ephemeral"}
}]
}]
Cache block is created after the model responds. TTL starts at 5 minutes.
Second request — hit the cache and extend it:
[
{"role": "system", "content": "Same system prompt..."},
{"role": "assistant", "content": "Previous response..."},
{"role": "user", "content": [{
"type": "text",
"text": "Follow-up question...",
"cache_control": {"type": "ephemeral"}
}]}
]
If there are no more than 20 content blocks between the new marker and the cached prefix, the previous cache is hit (billed at 10%) and its TTL resets. A new, longer cache block is also created covering the full context.
Pricing
Per Million Tokens
0 < Tokens <= 128K
Input Price: $1.30
Output Price: $7.80
128K < Tokens <= 256K
Input Price: $2.00
Output Price: $12.00
Parameters
No input schema available.