I’m building a Pickaxe called Ask Coach Kelly for baseball and softball players. Here’s the shape of it, and I’d love guidance on what this is realistically going to cost me to run.
How it works: A user enters their results from a series of physical performance tests. Coach Kelly responds with (1) where they rank compared to peers and elite athletes, and (2) the specific exercises and drills they should prioritize. Mostly a single-turn interaction — prompt in, coaching response out — though some users may follow up with clarifying questions.
Knowledge base: Roughly 25–30 documents, around 5 pages each. All text. This is where the ranking benchmarks and drill recommendations live.
Volume projections:
-
Total addressable base: ~2,000 teams × 15 players = ~30,000 athletes
-
Realistic paid conversion: ~10% = ~3,000 paid members
-
Usage frequency: roughly 3 sessions per user per month
-
So I’m budgeting for ~9,000 sessions/month at steady state, ramping up from a much smaller start with a few teams
-
Each session is one substantial response, maybe a short follow-up exchange
Business model: Bundled into an existing $20/month membership, not sold standalone.
Model: Open to recommendations. The documents carry the substantive content, so I don’t think I need a top-tier model — I’d rather optimize for cost as long as quality stays solid.
What I’m trying to figure out:
-
At ~9,000 sessions/month with a 25–30 document knowledge base, what’s a realistic monthly cost range? A back-of-the-envelope estimate is fine — I just need to know if I’m looking at $50/month, $500/month, or $5,000/month.
-
Which model would you recommend for this use case to keep costs in check without sacrificing coaching quality?
-
Does the size of the knowledge base meaningfully drive per-session cost, or is it mostly the response length and model choice that matter?
-
Do you offer volume pricing, a usage estimator, or any case studies from other high-volume builds I could reference?
-
Anything I should design into the Pickaxe from the start to keep costs down at scale (response length caps, retrieval settings, etc.)?
Appreciate any guidance you can offer. I want to go into this with eyes open before we roll it out broadly.
