hyper.dev

2026-03-29 · letloop - 0 - micro-benchmarks don’t tell the whole story

I published a benchmark report comparing 15 HTTP server implementations across C, Rust, Java, Scheme, Go, Node.js, Deno, Bun, Gleam, Common Lisp, Python, Ruby, and Racket. Scheme (pico + io_uring) placed fifth overall at 222k req/s — behind C (h2o), Rust (axum), Java (Vert.x), and Bun, but ahead of Deno, Go, Node.js, and everything else.

These are micro-benchmarks. I want to be honest about what they show and what they don’t.

what the numbers say

A trivial counter endpoint. Single-threaded. Ten seconds per concurrency level. wrk as the load generator. No disk, no database, no parsing, no serialization. The purest measure of “how fast can this runtime shuffle bytes through a socket.”

Here’s how throughput scales with concurrent connections for four runtimes that represent four different async strategies:

         req/s (thousands) vs concurrent connections

  320 |  +
      |   + +  +  +    +    +    +    +         ← Rust (axum + tokio)
      |
      |
  220 |          +  +  +    +    +    +    +     ← Scheme (pico + io_uring)
      |       +
      |    +
      |
  140 |  + +  +  +  +    +    +    +             ← Node.js (stdlib)
      |                                    +
      |
      |
   48 |  +  +  +  +  +    +    +    +    +      ← FastAPI (uvicorn)
      +---+--+--+--+---+----+----+----+----→
        1  2  4  8  16   32   64  128  256
                    connections

Rust peaks early at 316k req/s (8 connections), then holds a gentle plateau — the hallmark of a mature runtime with a well-tuned scheduler. Scheme peaks later at 222k req/s (16 connections), also plateaus cleanly. Node.js sits flat around 140k from 4 connections onward. FastAPI never breaks 48k — the Python GIL is the ceiling, not uvicorn.

But the more interesting picture is latency under pressure:

         p99 latency (ms) at 256 concurrent connections

  Rust       ▓  1.03
  Scheme     ▓  1.39
  Node.js    ▓▓▓▓▓▓▓▓▓  9.66
  FastAPI    ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓  130.03

At 256 connections, Scheme’s p99 is 1.39ms — thirty-five percent higher than Rust’s 1.03ms, but seven times lower than Node.js’s 9.66ms. FastAPI’s p99 explodes to 130ms — a hundred times worse than Scheme. This is where architecture starts to show through the numbers.

what the numbers don’t say

Nothing about correctness. Nothing about how the code reads, how it changes, how it fails. Nothing about what happens when you add a database, a parser, a state machine, an authentication layer. Nothing about the six months after the benchmark, when the system has to evolve under production pressure and the original author is gone.

A micro-benchmark is a controlled experiment. It holds everything constant except the variable you’re measuring. That’s its virtue and its limit. The variable here is raw I/O throughput. Everything that makes a real system hard — concurrency bugs, resource leaks, error propagation, compositional complexity — is absent by design.

why it still matters

The benchmark exists to answer one narrow question: is the letloop async runtime fast enough that choosing Scheme doesn’t mean accepting a performance tax? The answer is yes. At 222k req/s with a p99 under 1.4ms at high concurrency, pico + io_uring is competitive with production-grade runtimes written in systems languages, ahead of Node.js, and dramatically ahead of Python.

That matters because the choice was never about speed. It was about something else.

transparent async

Most async runtimes are colored. In Rust, you write async fn and enter a different world — different traits, different lifetimes, different debugging. In JavaScript, you write async/await and functions split into two kinds: the ones that return promises and the ones that don’t. In Python, the split is even worse — asyncio is essentially a parallel universe that doesn’t compose cleanly with synchronous code.

letloop’s async is transparent. You write ordinary Scheme. The runtime handles scheduling via io_uring underneath. There is no function coloring, no await keyword, no split between “async code” and “normal code.” A procedure that reads from a socket and a procedure that computes a factorial use the same call conventions, the same continuation model, the same debugging tools.

This is not a cosmetic difference. Function coloring has real costs:

It doubles the API surface — every library that does I/O needs sync and async variants, or forces callers into one world.

It makes composition harder — mixing sync and async code is where most concurrency bugs live in Python and JavaScript.

It raises the floor of understanding — a new developer can’t ignore the async/sync distinction; it’s load-bearing from day one.

Transparent async removes these costs. The entire system has one calling convention. Composition is ordinary function composition. The floor of understanding is lower. Not zero — io_uring itself is complex machinery — but the complexity stays below the API boundary where it belongs, instead of leaking into every function signature.

what’s next

chez scheme + pico http parser + io_uring is a work in progress. The scheduler is single-threaded. The error handling story is incomplete. These are real gaps, not aspirational roadmap items.

But the foundation is sound. The numbers confirm that the architecture doesn’t impose a performance ceiling. What matters now is everything the benchmark doesn’t measure: correctness, composability, comprehensibility. The things that make a system livable for the person who maintains it after the benchmarks are forgotten.

The full report with all 15 implementations is at hyper.dev/letloop/report.html.

· /discourse · /acknowledgements ·

🗒️ hyper.dev · ️📫 amirouche.dev · hello@amirouche.dev · amirouche across platforms

hyper.dev well come