Bandwidth Data

Notes about data:

GPU data was collected using Nemes's Vulkan test. For GPUs that can't use Vulkan, results are either from a deprecated OpenCL version of her test, or an OpenCL test written by Clamchowder. Vulkan figures should be considerd the most polished and accurate, as the Clamchowder OpenCL tests are still a work in progress, especially the bandwidth stuff. Benchmarking GPUs is hard.
CPU tests use SSE, AVX, AVX512, or NEON/ASIMD assembly, whichever is supported.
For multithreaded bandwidth, CPUs were tested in two modes. Shared means one array is read by all threads, while private means each thread is given its own private array. Shared mode tends to overestimate main memory bandwidth, possibly because the memory controller is broadcasting reads from the same location to multiple cores. So, use private mode to estimate memory bandwidth. Cache bandwidth appears to be accurate in both modes.
Instruction bandwidth is measured using 8-byte NOPs, specifically 0F 1F 84 00 00 00 00 00, unless otherwise specified. On ARM and IBM power, the respective fixed-length NOP encodings are used.
Test points might not be matched between different tests. Where there's a gap, the average of the two nearest data points is shown.

Bandwidth Data Graphing and Reference