SIMD-accelerated integer-to-string conversion https://lemire.me/blog/2026/05/18/simd-accelerated-integer-t...
Other speedy things:
On-Demand JSON: A Better Way to Parse Documents? https://lemire.me/en/publication/arxiv231217149/
Parsing Millions of URLs per Second https://lemire.me/en/publication/arxiv231110533/
Transcoding Unicode Characters with AVX-512 Instructions https://lemire.me/en/publication/arxiv221205098/
I don’t think it has much to do with the case of an algorithm that offers a faster solution to a problem that is rarely a bottleneck (not sure if that’s true in this case anyway).
And this algorithm has low constant costs, and does not take dramatically more icache than the simple versions. There is no reason not to use this if your compile target can handle avx-512.
That number looked familiar, and yep it's the taxicab number. Coincidence? Neither of the two references seems to mention it
Nonetheless, there are plenty of people who advocate the use of JSON, XML and similar formats, in which case I assume that number conversions can take a non-negligible time, which might be decreased by such fast algorithms.
MMX (Pentium MMX, 1997) sucked badly (because in designing it ease of implementation was prioritized over usefulness), SSE (Pentium III, 1999) was much worse than the simultaneously launched Motorola AltiVec, and AVX (Sandy Bridge, 2011) was much worse than the simultaneously developed Larrabee New Instructions (despite the fact that Sandy Bridge was developed by the A-team, while Larrabee was developed by the C- or D-team, which however had hired competent consultants from outside Intel, experienced in programming games and graphic applications).
AVX-512 is for now better than any competitive vector ISA, both in the achievable energy efficiency and in the achievable performance. Obviously, it is possible that some future Aarch64 (Arm) or even RISC-V CPUs will change this, by implementing wider registers and execution units and by adding any missing operations.
The SME ISA extension (Scalable Matrix Extension), which is available in the latest Apple CPUs and in the current 2026 generation of Arm C1 CPUs, has the potential to be more efficient than AVX-512, exploiting the fact that the current Intel AMX ISA is intended only for ML/AI and not also for general-purpose computing. Nonetheless this may happen only in a rather distant future, because neither Apple nor Qualcomm nor Arm seem interested to make products suitable for the needs of technical and scientific computing, like Intel and AMD. Because of that, in the existing CPUs with SME the ratio between SME execution units and the general-purpose CPU cores is low, resulting in a low total throughput.
The latter could at least be solved with some community effort, although the relevant set of instructions is quite large. It's also not specific to AVX-512. Any comparable vector ISA faces the same challenge.
Initial work on this was started by an engineer at Intel. She was based in St Petersburg so that work stalled in 2022. Here is the bugzilla item https://bugs.kde.org/show_bug.cgi?id=383010. The other big issue is that we don't have enough people working on Valgrind that are experts with the virtual CPU. There are a couple of guys working on s390 and a little bit of work is being done reusing amd64 sse4 support on x86. I dabble a little bit on arm64,
If there are any AVX512 experts that would like to help with this it would be most welcome.
They dropped the idea of having AVX10 variants that don't support the full thing, and as of Nova Lake even the E cores will have it. Is there a significant risk it doesn't get into all products starting soon?
Literal man-millennia have been poured into writing software for both x86 and ARM, and nobody seems close to designing a competitive RISC-V chip.
By this metric, there's no need for video encoders faster than the movie's own FPS, either.
Next year, these AVX-512 supporting CPUs will be joined by AMD Zen 6 and Intel Nova Lake. Starting with Intel Nova Lake, all future Intel CPUs will support AVX-512.
About a half of year ago Intel has announced that they will mandate the 512-bit vector width and the full AVX-512 a.k.a. AVX10 support in all future CPUs, starting with Nova Lake. This includes all E-cores that will succeed the current Skymont/Darkmont cores, which have been the roadblock to general AVX-512 adoption.
Obviously, they were forced to do this to align with AMD. Moreover, Intel has announced that they will coordinate the future ISA extensions with AMD and with the major customers, so that all future Intel and AMD CPUs will remain mostly ISA compatible, at least for the user applications.
Not long ago, there has been published a joint AMD-Intel whitepaper about the future "AI Compute Extensions for x86", which will be present in future AMD and Intel CPUs for accelerating AI inference, extending the AVX-512 ISA, and which are similar to the Advanced Matrix Extensions currently supported by some of the Intel server CPUs, but the new ISA extensions are better compatible with AVX-512.
This document demonstrates that at least for now Intel and AMD have understood that implementing a compatible ISA is their greatest moat against Arm and other competitors, so they should better coordinate their extensions instead of trying to pull in different directions.
For example, Intel stated this:
> Intel® Advanced Vector Extensions 10 (Intel® AVX10) introduces a modern vector Instruction Set Architecture (ISA) that will be supported across future Intel® processors.
They don't actually say “all”, and it is probably meant to apply to future microarchitectures anyway. Depending on various factors, Intel may end up designing new CPUs based on existing microarchitectures well into the 2030s.
Page 18:
> 3.1 INTEL® AVX10 INTRODUCTION
> ...
> This ISA will be supported on all future processors, including Performance cores (P-cores) and Efficient cores (E-cores).
As you see, now they actually say "all future".
The Intel Nova Lake desktop and laptop CPUs and the Diamond Rapids server CPUs will mark a jump in the Intel ISA, by changing the CPUID CPU family number for the first time after a few decades and by introducing not only AVX-512 across all cores, to match AMD, but also the APX ISA extension, which adds features that remove some of the advantages of Arm Aarch64, by increasing to 32 the number of general-purpose registers and by adding double-register load/store instructions.
I'm excited about Nova Lake as well. Maybe not so much for the EGPRs (maybe we should have made 4 of the new registers callee-saved?), but there are other goodies as well.
Only in the now obsolete Intel server and workstation CPUs from the Skylake Server, Cascade Lake and Cooper Lake families, using AVX-512 was a win when a great number of vector instructions were executed, but it was a loss when only a small number of vector instructions were executed.
This was caused by a really stupid Intel voltage and frequency controller, which could not react quickly enough to changes in power consumption. Because of this, it dropped preemptively the clock frequency, before there was any need for this, for fear that the CPU will overheat if the power consumption will increase in the future and the voltage/frequency controller will not be able to decrease it fast enough, before reaching the temperature limit. For the same reason, the clock frequency was kept low a very long time after the last vector instruction seen.
Now this is pretty much history, because few have kept those 7 to 10 years old inefficient servers and those who have kept them normally run workloads that either use a lot of vector instructions or they do not use such instructions.
Nowadays the only reason for not using AVX-512 is when the software is intended to be widely used and one does not want to distribute 2 versions, one with AVX-512 for AMD or P-core Intel CPUs and one with AVX for E-core or hybrid Intel CPUs or very old AMD CPUs.
AVX-512 is being discontinued in newer Intel consumer CPUs, particularly with the Alder Lake series, where it has been completely disabled through BIOS updates.
AVX-512 had been discontinued in the CPU generations from Alder Lake until the Panther Lake, Wildcat Lake and Clearwater Forest CPUs introduced during the first half of 2026, but Intel has committed than all future Intel CPUs will implement the complete 512-bit variant of the AVX-512 a.k.a. AVX10 ISA, starting with the Nova Lake desktop and laptop CPUs, to be launched by the end of this year.
Obviously, the competition from the AMD Zen 4, Zen 5 and Zen 6 CPUs, all of which implement AVX-512 and easily beat any Intel CPU in any workload that has been updated to use the AVX-512 ISA, has forced Intel to reconsider its previous decision.
All the other things that I do and which can take a noticeable CPU time (i.e. not time used for waiting on SSDs or other peripherals) can be accelerated by AVX-512. This includes things like computing file hashes, data compression and encryption algorithms, graphics/audio/video algorithms and also EDA/CAD applications.