TL;DR
- We now measure user-facing latency as send → end of generation, so if dashboards looked slower, that is partly because this measurement window is broader than before.
- We’re actively improving Pickaxe-side latency, especially in message construction and RAG paths.
Intro
Hello!
I wanted to put together this post going through some of the work we’ve been doing to make Pickaxe faster. As many of you know, we’re putting together a big product improvement/launch this week. But at the same time, we’ve seen many folks struggling with slow speeds on their tools.
As a result, over the past few weeks, we’ve started to implement new tracking and fixes to make Pickaxe lightning fast. The changes we’ve made so far are just the very beginning of what we’re planning.
In the interest of transparency, I wanted to share some internal data on latency, and build in public as we improve the numbers. Consider this post as a benchmark against future progress. I expect to follow up in 2 weeks or so and show the results of the many improvements we’ll be making.
Measurement Changes
Some of you may have checked on your Message Insights panel and seen something like this recently:
This is because we have adjusted the generation time metric, to factor in time from when the user clicked send to when the message finished generating. This change makes it look as if these times have gone up, but they haven’t. We’re now just showing the entire time until the last token is printed.
We did this because many folks were unable to tell how long action generations we taking.
Current Pickaxe Side Latency
Currently, Pickaxe makes up for 75% of the wait time before the user starts seeing a result.
On average, Pickaxe adds a little under 5 seconds of wait time currently. That’s down from 7 seconds last week, before we started making improvements. My goal is to get that number down to 2 seconds by the end of the month.
That lag mostly comes from our RAG system, and we’ll be making some big improvements to that system to make it faster in the near future.
Current Model TTFT Latency
The model providers also slow things down. On average, they contribute to 25% of the current wait, about 2 seconds on average. The issue is not that they’re slow on average, but that the variability is very high. Some model providers respond in an average of 2 seconds, but still sometimes take 10 seconds when they’re being overloaded. You can learn more here about provider latency and TTFT.
This sample chart shows how much higher the latency and the variability is with Gemini 3 Pro over other models.
I hope this was a helpful dive into our work on latency. We’re making it faster every day!


