SIMD OpenMP v4.0 explicitly enables SIMD code generations. Earlier versions of OpenMP were concerned with using multiple cores to speed up processing, SIMD allows explict generation of vector instructions. The current Zen 4 and p-core Xeon cpus support avx-512. That means that a single instruction can support vectors of 16 single precision numbers and 8 double precision numbers. Older Zen, most current Intel client, and e-core Intel servers support avx-2 which has vectors that are half the width of avx-512 vectors. The current default (64-bit x86) wgrib2 build defaults to the original 64-bit x86 specification SSE which uses a vector of 4 single precision floats. The alignment restrictions for SSE are pretty severe which make the vector instructions difficult to use. So one should probably restrict SIMD to cpus that support a variant of avx which allowed non-aligned memory access. What version of SIMD? SSE, AVX, AVX-2, AVX-512 This is not an easy question to answer. From the wiki AVX-512 AMD: Zen 4 (2022) Intel Xeon: Skylake (2015) Intel client: Rocket Lake (2021) i11xxx The Intel Core replacements for Rocket Lake do not have support for avx-512 because it is not supported by the e-cores in Alder Lake. AVX-2 AMD: excavator (2015) Intel server: Haswell (2013) Intel client: Haswell (2013) note: pentium and lower may not have avx-2 Intel client: Alder Lake (2021) core, pentium and celeron have avx-2 AVX AMD: Bulldozer (2011), puma, jaguar Intel: Sandy Bridge (2011) from wiki: "Not all CPUs from the listed families support AVX. Generally, CPUs with the commercial denomination Core i3/i5/i7/i9 support them, whereas Pentium and Celeron CPUs before Tiger Lake[12] do not." There was an effort to make AVX-2 the default for linux builds as AVX-2 was introduced in 2013 by Intel and copied by AMD in 2015. However, that effort failed as people pointed out that some Intel cpus didn't have support. Much anguish was directed towards Intel marketing. Will the future avx-10 make a difference? No. AVX-10 has three subsets, one that supports 512-bit registers, another one for 256-bit registers and a third for 128-bit register. This is no different from the current avx-512 vs avx-2 problem. AVX-10 does bring some new capabilities, however they are more ai related (reduced precision support). Summary, there isn't a good univeral SIMD configuration. My desktop at work is a 4 core Xeon with AVX-512. My desktop at home is newer but only has AVX-2. Laptops in my family have SSEE (no avx), avx-2 and avx-512. Servers that I use are either avx-2 or avx-512. Wgrib and SIMD Wgrib2 v3.1.3 adds support for OpenMP v4.0, and SIMD options were added to unpk_complex.c and Ens_processing.c The conversion was to add OpenMP simd pragmas and not replace any existing threading pragmas. ran time wgrib2.v? -ens_processing x.v? 0 ensemble.grb configuration: cpu amd 5600g (6 cores, 12 threads), nmve pcie-3 wgrib2 v3.1.3 beta 9/2023 v0 default build no simd optimizations real 0m4.490s user 0m16.958s sys 0m0.442s v1 cpu opt (avx2) -march=native -mtune=native real 0m4.384s user 0m16.808s sys 0m0.458s v2 cpu opn (avx2) + omp simd .. one loop rewritten, ens_processing, unpk_complex have simd pragmas real 0m4.256s user 0m16.618s sys 0m0.385s v0 with OMP_NUM_THREADS=1 real 0m11.077s user 0m10.914s sys 0m0.140s The "native" optimizations give about 2.5% speed improvement, and simd pragmas gave a 5% improvement with a huge sampling error Should thread parallelism be replaced by SIMD? For short loops, yes. The overhead for setting up the threads is huge. More testing is needed to give an answer for other instances. For example, consider the following #pragma omp simd for (i = 0; i