‹Explore/alibaba/qwen3-6-max-preview

Alibaba

Qwen3.6 Max Preview

"vendor": "alibaba""model_name": "qwen3.6-max-preview"

Preview of Qwen3.6 Max flagship model with enhanced reasoning, 256K context window, and tiered cache-aware pricing

LLMNew

auto_storiesLLMs

menu_bookModel Docs

VENDORAlibaba

INPUTtitle

OUTPUTtitle

PRICING

Hover to view▾

Playground

qwen3.6-max-preview

AI can make mistakes. Verify important information.

README for Qwen3.6 Max Preview

Key Features

256K context window — Process up to 256K tokens per request

Cache-aware billing — Implicit cache saves 80% on repeated prefixes automatically; explicit cache (cache_control) costs 125% to create but saves 90% on subsequent hits

Tiered pricing — $1.30 input / $7.80 output under 128K, scaling for longer contexts

OpenAI-compatible API — Drop-in replacement using chat completions format

Context Cache

Two caching modes reduce costs on repeated input prefixes:

Explicit Cache

Implicit Cache

Setup

Add "cache_control": {"type": "ephemeral"}

Automatic, no config

Cache creation cost

125% of input price

100% of input price

Cache hit cost

10% of input price

20% of input price

Min tokens

1024

256

TTL

5 min (resets on hit)

Not guaranteed

Explicit cache supports up to 4 markers per request. Cache is created after the model responds and requires at least 1024 tokens per block.

Explicit Cache Usage

Add "cache_control": {"type": "ephemeral"} to a content block. The system searches backward up to 20 content blocks from each marker to find a matching cache.

First request — create the cache:

[{ "role": "system", "content": [{ "type": "text", "text": "Long system prompt (1024+ tokens)...", "cache_control": {"type": "ephemeral"} }] }]

Cache block is created after the model responds. TTL starts at 5 minutes.

Second request — hit the cache and extend it:

[ {"role": "system", "content": "Same system prompt..."}, {"role": "assistant", "content": "Previous response..."}, {"role": "user", "content": [{ "type": "text", "text": "Follow-up question...", "cache_control": {"type": "ephemeral"} }]} ]

If there are no more than 20 content blocks between the new marker and the cached prefix, the previous cache is hit (billed at 10%) and its TTL resets. A new, longer cache block is also created covering the full context.

Qwen3.6 Max Preview

Pricing Card

Playground