Every large language model today — GPT, Claude, LLaMA, Mistral — uses the same core mechanism: self-attention. Each token compares itself with every other token in the sequence. This is powerful, but it scales quadratically: O(N²).
Wave Field LLM replaces this with something fundamentally different. Tokens don't compare with each other at all. Instead, they deposit information onto a continuous field, and wave physics propagates that information. The result is O(N log N) complexity — and it changes what's computationally possible.
In standard attention, every token must attend to every other token. The number of operations grows with the square of the sequence length. Drag the slider to see how this scales:
At 2,048 tokens, the difference is modest. At 128K tokens, standard attention needs 16 billion operations. Wave Field needs 2.2 million. At 1M tokens, standard attention is physically impossible — no computer can store the N×N matrix. Wave Field runs fine.
Instead of direct token-to-token comparison, Wave Field attention works in three steps:
The key operation — convolution — uses the Fast Fourier Transform, which is inherently O(N log N). There is no attention matrix. No softmax. No quadratic bottleneck.
Each attention head has a learnable wave kernel, defined by just three parameters:
ω controls frequency — how fast the wave oscillates. α controls damping — how far the signal travels. φ controls phase — head diversity.
A head with low ω and low α produces a slow, far-reaching wave — useful for connecting tokens thousands of positions apart. A head with high ω and high α oscillates rapidly and decays quickly — useful for local grammar. The model learns the right frequencies during training.
A trained Wave Field model has 12 attention heads, each with different wave parameters. Click any head to see its kernel:
Notice how the heads self-organize: early heads (0-3) have low frequency for long-range connections, while later heads (9-11) have high frequency for local grammar. This emerges naturally during training — nobody assigns these roles.
These are not theoretical projections. They are measured on real hardware, running the same 130M-parameter model in float32.
On an RTX 5090 — a consumer GPU that costs $2,000 — Wave Field processes 256K tokens while standard attention cannot even handle 32K. The throughput doesn't degrade: Wave Field maintains 157K tokens/sec from 32K through 256K context.
Standard attention stores a key and value vector for every token. This KV cache grows linearly with sequence length and dominates GPU memory at long contexts.
Wave Field has no KV cache. It stores a field — a compressed wave representation. At 1 million tokens:
Because Wave Field runs at long context on consumer GPUs, it unlocks capabilities that are impossible with standard attention at the same hardware budget:
A cluster of 8× RTX 5090 GPUs — costing $16,000 total — can train a Wave Field model at 256K context. The same training on standard transformers would require 8× H100 GPUs costing $240,000, and would still be limited to shorter context due to the quadratic memory wall.
The architecture scales. We have verified it up to 512K tokens on a single H100.