The Nexus AI Terminal backend used to run on Render's free tier. It now runs on a Cloudflare Worker. Same endpoints. Same AI providers. Cut my cold start latency to zero and my warm call latency from around 1200ms to under 100ms. Same price (free on both, though Cloudflare's free tier is meaningfully more generous). Here is what the migration looked like and where it hurt.

The problem with Render free tier

Render is a good service. The free tier just has one fact you have to know up front: your web service goes to sleep after 15 minutes of inactivity. When the next request hits a sleeping service, Render spins it back up. That takes 30 seconds or longer.

For a personal portfolio with light traffic, this is brutal. A visitor lands on the page, types into the AI chat, and waits 30 seconds for the first reply. They assume the site is broken. They leave. The actual cold start time depends on the runtime, a Python FastAPI server like the one I had running was around 35 to 50 seconds.

I tried the obvious workarounds. A scheduled keepalive ping every 14 minutes was the first attempt. It worked in the sense that the service stayed warm, but Cloudflare's Bot Fight Mode kept tripping on the keepalive, the request was synthetic, the User Agent looked like curl, the timing was perfectly periodic. After a week the keepalive started failing with exit code 22 from curl -f on a 403. I could have hardened the keepalive, but at that point I was working around a free tier limitation with another workaround. Time to fix the root cause.

The pitch for Cloudflare Workers

Cloudflare Workers do not sleep. They are not containers. The Worker code runs in V8 isolates that Cloudflare spins up on demand at every edge location. Cold start is measured in single digit milliseconds because there is no "cold", the runtime is always warm at the edge. The same Worker code runs in every region. A visitor in Singapore hits an edge node in Singapore that already has the code loaded.

Free tier on Cloudflare Workers gives you 100,000 requests per day, 10ms CPU per request, and bindings to KV (key value storage) and R2 (object storage). For a project that handles light traffic and proxies most of the heavy work to upstream AI providers, this is over allocated by an order of magnitude.

The hard limit that matters: 10ms of CPU time per request. Note that this is CPU time, not wall clock time, waiting on a fetch() to Groq does not count. So an AI chat that spends 1500ms waiting for the LLM and 4ms parsing JSON costs you 4ms of CPU, well under budget. The constraint is real for compute heavy workloads but invisible for proxy shaped ones.

The migration

I rewrote the backend over a weekend. The old Python FastAPI server was around 800 lines. The Cloudflare Worker rewrite came out to around 1200 lines of JavaScript because some things were easier with await/async and some things took more boilerplate (V8 isolates do not include Node's standard library, no node:crypto, no node:buffer, you use the Web Crypto API and TextEncoder instead).

The biggest porting tasks:

  • JWT signing. The Python pyjwt library is one line. In a Worker you use crypto.subtle to do HMAC SHA256 manually. Maybe 40 lines once you have helpers for base64url encoding and decoding.
  • State storage. The Python version used in memory dicts and a JSON file on disk. Workers have no disk. Everything went into Cloudflare KV, with the keys structured by domain: handle:<sub>, lb:<game>, banned_accounts, etc. KV reads are eventually consistent across regions, which matters more for some keys than others, for leaderboard reads I do not care if a Singapore edge sees a write from a US edge 30 seconds late.
  • Secrets. Render had a Variables tab. Cloudflare has Worker Secrets, set via the dashboard or wrangler secret put. Same idea, slightly nicer interface.
  • OAuth. The Google OAuth callback flow worked identically. The only change was the redirect URI in the Google Cloud Console, which now points at https://api.thyfwxit.com/auth/google-callback instead of the Render hostname.

What hurt

A few things were genuinely worse on Workers than on the Render Python service.

Debugging. On Render I had a real shell. I could SSH in, look at log files, run Python interactively. On Workers I have wrangler tail which streams logs, and the Cloudflare dashboard's Logs tab. The DX is fine, but it is not the same as a real shell. When something breaks at the edge, I am reading log lines, not poking around the runtime.

Cold mental model. Workers run for each request and then die. There is no global state between requests other than what you persist to KV or R2. Python had module level caches that just worked. On a Worker, anything you want to share between requests has to round trip through KV or be re fetched. For a chat backend this barely matters; for some other workloads it would.

10ms CPU limit. Once I had a regex heavy moderation check that hit the limit on long prompts. I had to break the regex into pieces and short circuit early. Annoying but solvable.

What got better

Almost everything else.

  • Cold starts: gone. First request to a region that has never seen the Worker still takes a few ms. There is no 30 second wake up.
  • Latency: dramatically better. Warm Render ping was about 1200ms because the service sat in a single region far from me. The Worker edge nearest me responds in 40 to 90ms.
  • Cost: still free, with more headroom. Render free tier was generous for a sleeping service. Workers free tier is generous for a service that is always warm.
  • Deployment: faster. wrangler deploy uploads the Worker in 3 seconds. Render took 90 seconds to build a Docker image and roll it out.
  • Routing. The Worker is bound to api.thyfwxit.com/* via a Cloudflare zone route. Same domain as the rest of the site. No cross origin cookie pain.

When you should not do this

Cloudflare Workers are not always the right answer. A few cases where Render or a container host is still better:

  • Long running tasks. Workers are request/response. If you have a job that runs for 60 seconds, you want a queue + worker pattern (which is doable on Workers via Durable Objects / Queues, but you have left "simple" territory).
  • Apps that need a real filesystem. Workers have no disk. If your app writes temp files, you are rewriting that part of your code.
  • Native dependencies. Python with NumPy / Pandas / a custom C extension is not coming to V8 isolates. Rewrite or stay on a container host.
  • Apps that need an always on websocket server with a lot of in memory state. Possible with Durable Objects, but again, not simple.

For a Nexus shaped workload, HTTP request in, fetch out to some AI provider, format response, Workers are basically free latency.

If you are doing this migration

One tip that saved me hours: use wrangler dev with --remote while you are migrating. It runs your Worker against the real Cloudflare edge with your real bindings, instead of a local emulator. Catches the V8 vs Node API differences immediately. The local emulator is faster to start but lies to you about the runtime.

Second tip: set up your KV bindings before you start writing code. Cloudflare KV namespaces have to exist before wrangler.toml can reference them, and the id in the toml is the namespace ID, not the name. Easy to get tripped up on.

Third: do not migrate secrets manually. Use wrangler secret put for each one. It pipes through stdin and never lands in your shell history.

The Nexus terminal is at thyfwxit.com/nexus if you want to see the result. The backend runs as a single Cloudflare Worker at api.thyfwxit.com, latency typically under 100ms.