I was paying real money for Gemini API calls and did not notice until the invoice. Not a lot of money, but enough to look stupid. The chat backend on the Nexus AI Terminal hits Gemini as one of the rotation models. Every visitor session that ended up routed there was costing fractions of a cent, adding up to actual dollars. The fix was straightforward once I understood what was happening. The fix also was not just "use the free model."
How I was paying without realizing
The API key lived in a Cloudflare Worker secret called GEMINI_API_KEY. I had generated it in Google AI Studio years ago, when I first started messing with Gemini. At some point along the way I had clicked through the upgrade flow because I wanted to test image generation, which at the time wanted a billing enabled project. After image gen moved to Replicate I never went back to flip the project off billing. The key kept working. The project kept billing. The model in the Worker code was gemini-2.0-flash, which is technically available on the free tier, but a billing enabled key bypasses the free quota and goes straight to paid usage.
The number was small enough to not show up as alarming on the credit card. It was the kind of charge that says "Google Cloud, $1.42" and you assume it is some forgotten cron job. It was the chat backend. Every time someone typed at Nexus and got routed to NEXUS-6 (which is the Gemini slot in the model rotation), I paid for it.
The actual two part fix
Part one: change the model string in the Worker.
The Worker code calls generativelanguage.googleapis.com/v1beta/models/{MODEL}:generateContent. The migration from gemini-2.0-flash to gemini-2.5-flash is a four character change in a string literal. Same request body shape. Same response shape. node --check passed and the deploy went out in a few seconds via the Cloudflare GitHub Action.
Part two, and the actually important one: change the key.
I went into AI Studio, created a new project, did not enable billing, generated a fresh API key. Then I rotated it onto the Worker via wrangler secret put GEMINI_API_KEY from my laptop. The old key, the one that was on the billing enabled project, I revoked.
If I had only done part one and not part two, the Worker would still be using a billing enabled key. The free tier model would have still been charging me. That is the trap. The model string controls quality and rate limits. The key controls whether you are billing or free.
What the free tier actually gives you
For Gemini 2.5 Flash on a non billing AI Studio project, current as of when this post went up:
- 10 requests per minute
- 250 requests per day
- 250,000 input tokens per minute
- No charge as long as the project does not have billing enabled
For a personal AI terminal that handles maybe 30 chat sessions on a good day, 250 requests per day is plenty. If a single visitor sends 40 messages, that uses up 16 percent of the daily budget. The rotation in Nexus also has Groq llama 3 and Hugging Face models, so Gemini takes maybe a quarter of total traffic. The free quota covers it.
Quality difference, 2.0 to 2.5
I expected this swap to be invisible. It is mostly invisible. A few small differences that came out after a week of real use:
- 2.5 Flash is tighter. Fewer trailing "in summary" paragraphs at the end of answers. The replies feel like the model knows when to stop.
- 2.5 is slightly more cautious on edge content. The Nexus Unfiltered mode prompt did not change, but the model now hedges in a few places where 2.0 just answered. Not a deal breaker, but real.
- Coding answers are noticeably better. The Coder mode on Nexus now returns code that compiles more often without follow ups.
- Latency to first token feels the same, maybe a hair slower. The total response time is about the same because 2.5 writes shorter replies.
The biggest visible change is that 2.5 sometimes shows a brief "thinking" pause before the first character lands. The model is doing internal chain of thought before producing output. Other Gemini 2.5 wrappers expose this with a thinking budget setting. The default is fine for a chat use case, and the latency cost is small.
Lessons that stick
Three things from this that I will not forget for a while.
Check the billing tab. If you have ever clicked "enable billing" on a Google Cloud project to test something, go look right now. The project keeps billing until you turn it off. The fact that you forgot is not a defense.
The model and the key are separate dials. "Use the free model" is half the story. The key tells the API which billing relationship to apply. A free model on a billing enabled key is still billed usage.
Rotation buys you a free tier escape hatch. Because Nexus already had Groq and Hugging Face in the rotation, the Gemini slot going from paid to free was a configuration change, not a feature loss. If Gemini quota runs out mid day, the next request just hits Groq. The user does not notice. Building this kind of provider rotation is more annoying upfront than picking one and committing, but it pays back the first time a provider has an outage or starts charging.
The Worker change shipped to GitHub a few minutes before this post went up. The terminal at thyfwxit.com/nexus is using the new free model right now. The April through May Gemini line item should drop to zero next billing cycle.