ndza

Adds, muls on my superscalar processor

Preamble

I'm always curious about how long things take on machines, and wanted to learn more. So, I wrote some benchmarks to run on my machine. It's is a Dell XPS 15 9510 laptop, with a 11th Gen Intel(R) Core(TM) i9-11900H @ 2.50GHz, so it's not super new, but I use it on a daily basis. And I'm currently on Ubuntu 24.04.2 LTS.

For this process, needed a micro-benchmarking library, so choose the Google benchmark. It was pretty easy to set up as a CMake project and I was quickly able to focus on writing some C++ code.

Starting small

So, I know that superscalar processors can execute multiple instructions concurrently. And they achieve this through multiple pipelines and execution units. I wanted to focus on additions and multiplications of integers.

May be I can check how long it take to add 1 billion integers: ~56ms. What if I have some more additions in there?

May be I can check how long it takes to multiply 1 billions integers: ~629ms. What is I have some more multiplications in there?

I then simply added more variables and wrote Add2, Add3..., Mul2, Mul3.... I also wrote AddMul that combines a few additions and multiplications.

There's probably a nicer way of writing a single C++ templated function with template parameter packs for all this. But, I wanted to keep it simple at this point as I was exploring. I should also look at other operations so there's more to do here.

Interesting learnings

OK. So, all the benchmarks in this source file include an extra addition due to the incrementing i. This can be seen in the instructions here: A screenshot of Compiler Explorer

The timing results are here

OK. A few learnings and more questions.

Hmm... May be this was a bit of a waste of time... but, I do like the incremental process of writing benchmarks. And I hope this does not turn into an unhealthy addiction. :D

Updates