← blog

benchmarks — all results

2026 3 30 && benchmarks && data

By Badaramoni Avinash

Every number on this page comes from measured runs. No projections, no estimates. The model used is Wave Field V4 (1.49B parameters) unless otherwise noted.


GPU throughput comparison

Wave Field's O(N log N) attention means consumer GPUs can handle context lengths that would OOM a standard transformer. These are measured speeds at 128K context.

GPU Price VRAM Max Context Speed @128K
RTX 4000 Ada $1,000 20 GB 128K 35K tok/s
RTX PRO 4500 BW $1,500 32 GB 256K 64K tok/s
RTX 3090 $1,500 24 GB 256K 66K tok/s
RTX 5090 $2,000 32 GB 256K 157K tok/s
H100 $30,000 80 GB 512K 183K tok/s
Standard transformer any any OOM at 32K

Standard transformer OOMs at 32K on ALL consumer GPUs.


DCLM benchmark (130M model, 32B tokens)

Evaluated on the DCLM CORE suite — four standard tasks. The 130M Wave Field model was trained on 32 billion tokens.

Task Accuracy
ARC Easy 30.1%
HellaSwag 27.6%
PIQA 52.3%
WinoGrande 49.1%
DCLM CORE Average
46.8%
Wave Field 130M
GPT-2 Target
26.5%
DCLM baseline

Wave Field 130M beats GPT-2's DCLM score by 1.77×.


throughput vs standard transformer at 32K

Head-to-head comparison at 32K context length, same hardware, same parameter count.

Throughput advantage
21.8×
faster than standard
Memory advantage
5.3×
less memory used
Metric Standard Transformer Wave Field
Memory at 32K 35.6 GB 6.74 GB
Memory at 128K OOM 26.76 GB

memory at long context

Attention state memory comparison. Standard transformer KV cache grows linearly with context length. Wave Field memory is fixed.

Context Length Standard KV Cache Wave Field Savings
2K 37 MB 72 KB 513×
32K 600 MB 72 KB 8,333×
128K 2.4 GB 72 KB 33,333×
1M 19 GB 72 KB 263,889×

Wave Field memory is constant regardless of context length.


compression (selective INT8)

Wave parameters (180 total) stay in full FP32 precision. Everything else is quantized to INT8.

Original Model
529
MB — full float32
Selective INT8
171
MB — 3.1× compression
Metric Before After
Model size 529 MB 171 MB
QA accuracy 60% 100%
Wave params precision FP32 FP32
Everything else FP32 INT8

Quality IMPROVED after compression — 60% to 100% on QA evaluation.


training details

Configuration and final evaluation metrics for the V4 model.

Parameter Value
Model V4 — 1.49B params
Architecture 24 layers, 16 heads, dim 2048
Training data 6.8B tokens ClimbMix
Hardware 8×H100 SXM
Training time 11.5 hours
Final perplexity 8.6
Final accuracy 84.7%
Throughput at 32K 164K tok/s