I am so pumped that the new Grok 4.20 API is here! I’ve been testing 4.20 in the Grok app for weeks and it is crazy smart, and fast. I was shocked to learn that tokens will be priced like Grok 4.1!
What?
Also that there is output limit on the API. What?!
For real though — xAI kept the exact same pricing as Grok 4.1, and that is the biggest cost advantage here:
- $2 per million input tokens
- $6 per million output tokens
- Cached input at just $0.20
You’re getting a full flagship model with a 2,000,000 token context window (same massive size as 4.1) at the old price. That’s twice the context window for a fraction of the price of Claude.
Plus, unlike Claude, there is no output limit cap, other than the context window itself.
New abilities that make it worth switching:
- Lightning-fast inference (noticeably snappier than 4.1)
- Lowest hallucination rate xAI has ever shipped + super strict prompt adherence
- Improved agentic tool calling (web search, code execution, etc.)
- Dedicated reasoning and non-reasoning variants so you can tune for speed vs depth
But the real standout is the built-in Multi-Agent mode.
How the Multi-Agent API actually works:
There’s a dedicated model called grok-4.20-multi-agent-beta-0309 (aliases: grok-4.20-multi-agent or grok-4.20-multi-agent-latest).
Instead of one model answering, it spins up multiple specialized agents that collaborate in real time:
- Default (low/medium reasoning.effort) → 4 agents
- High effort (high/xhigh reasoning.effort) → up to 16 agents
The agents have different roles (research, logic, critique, synthesis) and literally debate each other internally before giving you the final output. This built-in collaboration is what crushes hallucinations and handles complex, multi-step tasks way better than a single model.
You don’t need external frameworks like AutoGen or LangGraph — just switch the model name in your API call and (optionally) tweak the reasoning.effort parameter. It’s that simple.
(Direct docs: Models and Pricing | xAI )
