How Wave Propagation Replaces Attention
An interactive exploration of O(N log N) attention. Visualize wave kernels, compare GPU benchmarks, and understand why standard transformers hit a wall at 32K tokens.
We're building O(N log N) attention mechanisms that run on consumer GPUs. No quadratic bottleneck. No datacenter required.
An interactive exploration of O(N log N) attention. Visualize wave kernels, compare GPU benchmarks, and understand why standard transformers hit a wall at 32K tokens.
We discovered that all attention routing in a 132M model is controlled by just 2,160 wave parameters. The rest compresses to INT8 with improved quality.
Standard transformers OOM at 32K on a 3090. Wave Field runs 256K. We measured it across RTX 3090, 5090, H100, and Blackwell.
Small attention heads naturally learn long-range waves. Large heads learn local grammar. The architecture discovers its own specialization.
A complete inference engine in 1,670 lines of C. Custom FFT, wave kernels, tokenizer. Runs on phone, laptop, server — no Python required.