benchmarks — all results

2026 3 30 && benchmarks && data

Every number on this page comes from measured runs. No projections, no estimates. The model used is Wave Field V4 (1.49B parameters) unless otherwise noted.

GPU throughput comparison

Wave Field's O(N log N) attention means consumer GPUs can handle context lengths that would OOM a standard transformer. These are measured speeds at 128K context.

GPU	Price	VRAM	Max Context	Speed @128K
RTX 4000 Ada	$1,000	20 GB	128K	35K tok/s
RTX PRO 4500 BW	$1,500	32 GB	256K	64K tok/s
RTX 3090	$1,500	24 GB	256K	66K tok/s
RTX 5090	$2,000	32 GB	256K	157K tok/s
H100	$30,000	80 GB	512K	183K tok/s
Standard transformer	any	any	OOM at 32K	—

Standard transformer OOMs at 32K on ALL consumer GPUs.

DCLM benchmark (130M model, 32B tokens)

Evaluated on the DCLM CORE suite — four standard tasks. The 130M Wave Field model was trained on 32 billion tokens.

Task	Accuracy
ARC Easy	30.1%
HellaSwag	27.6%
PIQA	52.3%
WinoGrande	49.1%

DCLM CORE Average

46.8%

Wave Field 130M

GPT-2 Target

26.5%

DCLM baseline

Wave Field 130M beats GPT-2's DCLM score by 1.77×.

throughput vs standard transformer at 32K

Head-to-head comparison at 32K context length, same hardware, same parameter count.

Throughput advantage

21.8×

faster than standard

Memory advantage

5.3×

less memory used

Metric	Standard Transformer	Wave Field
Memory at 32K	35.6 GB	6.74 GB
Memory at 128K	OOM	26.76 GB

memory at long context

Attention state memory comparison. Standard transformer KV cache grows linearly with context length. Wave Field memory is fixed.

Context Length	Standard KV Cache	Wave Field	Savings
2K	37 MB	72 KB	513×
32K	600 MB	72 KB	8,333×
128K	2.4 GB	72 KB	33,333×
1M	19 GB	72 KB	263,889×

Wave Field memory is constant regardless of context length.

compression (selective INT8)

Wave parameters (180 total) stay in full FP32 precision. Everything else is quantized to INT8.

Original Model

529

MB — full float32

Selective INT8

171

MB — 3.1× compression

Metric	Before	After
Model size	529 MB	171 MB
QA accuracy	60%	100%
Wave params precision	FP32	FP32
Everything else	FP32	INT8

Quality IMPROVED after compression — 60% to 100% on QA evaluation.

training details

Configuration and final evaluation metrics for the V4 model.

Parameter	Value
Model	V4 — 1.49B params
Architecture	24 layers, 16 heads, dim 2048
Training data	6.8B tokens ClimbMix
Hardware	8×H100 SXM
Training time	11.5 hours
Final perplexity	8.6
Final accuracy	84.7%
Throughput at 32K	164K tok/s