Grok 4.20 API Just Dropped – Same Pricing as 4.1 + Native Multi-Agent Mode!

I am so pumped that the new Grok 4.20 API is here! I’ve been testing 4.20 in the Grok app for weeks and it is crazy smart, and fast. I was shocked to learn that tokens will be priced like Grok 4.1!

What?

Also that there is output limit on the API. What?!

For real though — xAI kept the exact same pricing as Grok 4.1, and that is the biggest cost advantage here:

  • $2 per million input tokens
  • $6 per million output tokens
  • Cached input at just $0.20

You’re getting a full flagship model with a 2,000,000 token context window (same massive size as 4.1) at the old price. That’s twice the context window for a fraction of the price of Claude.

Plus, unlike Claude, there is no output limit cap, other than the context window itself.

New abilities that make it worth switching:

  • Lightning-fast inference (noticeably snappier than 4.1)
  • Lowest hallucination rate xAI has ever shipped + super strict prompt adherence
  • Improved agentic tool calling (web search, code execution, etc.)
  • Dedicated reasoning and non-reasoning variants so you can tune for speed vs depth

But the real standout is the built-in Multi-Agent mode.

How the Multi-Agent API actually works:

There’s a dedicated model called grok-4.20-multi-agent-beta-0309 (aliases: grok-4.20-multi-agent or grok-4.20-multi-agent-latest).

Instead of one model answering, it spins up multiple specialized agents that collaborate in real time:

  • Default (low/medium reasoning.effort) → 4 agents
  • High effort (high/xhigh reasoning.effort) → up to 16 agents

The agents have different roles (research, logic, critique, synthesis) and literally debate each other internally before giving you the final output. This built-in collaboration is what crushes hallucinations and handles complex, multi-step tasks way better than a single model.

You don’t need external frameworks like AutoGen or LangGraph — just switch the model name in your API call and (optionally) tweak the reasoning.effort parameter. It’s that simple.

(Direct docs: Models and Pricing | xAI )

How does Grok 4.1 or Grok 4.2 compare to Gemini 3 Flash or Gemini 3 Pro? Have you tested these?

I haven’t tested 4.20 against Grok 3.1 in Pickaxe yet. But I did test Grok 4.1 Fast Reasoning against Gemini 3.0 Pro. Grok generally has far few hallucinations than Gemini, but Gemini 3 Pro is smarter than 4.1 at reasoning tasks. Grok 4.1 Fast Reasoning has the bigger 2m context window and is a lot cheaper so I tend to use it as my go to. But the biggest limitation with 4.1 is that Pickaxe limits the output tokens for some reason, making it not work for longer projects.

I currently use Gemini for uses where hallucinations are less of a factor, but I may switch those models out to Grok to save money if 4.20 can compete.

There is a very narrow band for Gemini. If I really want intelligence and reasoning, I will go up to Claude Opus 4.6. If I need cheap, context window, and low hallucinations I go with Grok 4.1 Fast Reasoning. Gemini is for when I want something in between those two uses.

One more thing to say about 4.20, it’s in active development still, so it is improving every week. In my experience, OpenAI models get worse over time as the hoard of free users poison the RL for the model. I’m not sure if that happens with Grok and Gemini.

1 Like