The problem with centralised inference

When BeeBlast launched, all inference requests were routed to a single region. For customers in North America this was acceptable. For customers in Southeast Asia or Europe, median latency regularly exceeded 300ms — unacceptable for real-time agent workflows.

The obvious fix is to replicate inference capacity across regions. The hard part is doing so without fragmenting the memory and state that agents depend on.

Our edge architecture

We built a two-tier system: stateless inference nodes at the edge, and a global state layer that replicates agent memory with single-digit millisecond consistency.

Edge nodes handle LLM inference and tool dispatch locally. State reads are served from the nearest replica; writes propagate asynchronously to all regions with a consistency guarantee of under 50ms. In practice, the vast majority of reads are served from local cache, making the replication lag invisible to the agent.

Results

After rolling out to 12 edge regions, global median latency dropped to 34ms. P99 latency — the figure that matters most for interactive workflows — fell from 1.2 seconds to 180ms.

We are expanding to four additional regions in Q2 2026, which we expect to bring P99 below 120ms for 99% of our customer base.