Wave heads learn different frequencies; some behave more like long-range routing, others more like local structure. Whether variable head widths help is an implementation detail — figures and tables are not published here.
See GitHub for experiments.