Rethinking attention with wave physics

We're building O(N log N) attention mechanisms that run on consumer GPUs. No quadratic bottleneck. No datacenter required.

21.8×
faster at 32K context
512×
less memory than KV cache
256K
context on $2K GPU
latest

How Wave Propagation Replaces Attention

An interactive exploration of O(N log N) attention. Visualize wave kernels, compare GPU benchmarks, and understand why standard transformers hit a wall at 32K tokens.

180 Parameters That Control Everything

We discovered that all attention routing in a 132M model is controlled by just 2,160 wave parameters. The rest compresses to INT8 with improved quality.

Training on Consumer GPUs

Standard transformers OOM at 32K on a 3090. Wave Field runs 256K. We measured it across RTX 3090, 5090, H100, and Blackwell.

Variable Heads Self-Organize

Small attention heads naturally learn long-range waves. Large heads learn local grammar. The architecture discovers its own specialization.

WaveEngine: 52KB of Pure C

A complete inference engine in 1,670 lines of C. Custom FFT, wave kernels, tokenizer. Runs on phone, laptop, server — no Python required.