CPU Performance Demonstrations

A collection of microbenchmarks demonstrating low-level concepts and optimisations that affect performance on modern x86 CPUs.
Each demonstration examines a single concept in isolation to make learning easier.

Warning: CPU microarchitecture ahead!

How To Start

If you are new to the world of CPU architecture and microarchitecture, you may want to read Primer.md, which covers some basic concepts that are prerequisite knowledge for many of the demonstrations.

Some demonstrations inherently build on topics discussed in other demonstrations. You may want to try the demonstrations in this order to minimise confusion:

Enjoy!

Notes

Performance disclaimer

Naturally, the exact results of microbenchmarks depend significantly on your CPU's microarchitecture - demonstrating microarchitecture in a microarchitecture-agnostic manner is difficult. Some factors that may contribute to differing results include:

Feature is not implemented on all CPUs.
Particular instruction latencies are assumed.
A minimum amount of parallel execution capacity (execution ports) is assumed.

The demonstrations were written and tested with Intel x86-64 CPUs from Skylake onwards in mind. I have tried my best to indicate in each demonstration broadly which CPUs are supported and what assumptions are made.

What's this "Skylake JCC alignment issue"?

In almost every demonstration's assembly code, you will see something like this:

.p2align 4      # Skylake JCC alignment issue (unimportant)
loop:
    ...

The .p2align enforces memory address alignment on the start of the loop. This alignment ensures the loop's trailing jump instruction is placed correctly to avoid a performance pessimisation on some Intel CPUs (see Intel's paper for details).
Please ignore this issue - it does not affect the correctness of the demonstrations.

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
branch-prediction		branch-prediction
cache-efficiency		cache-efficiency
indirect-jump-prediction		indirect-jump-prediction
loop-carried-dependency		loop-carried-dependency
macro-fusion		macro-fusion
mov-elimination		mov-elimination
out-of-order-execution		out-of-order-execution
register-renaming		register-renaming
superscalar-execution		superscalar-execution
zeroing-idiom		zeroing-idiom
LICENSE		LICENSE
Primer.md		Primer.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CPU Performance Demonstrations

How To Start

Notes

Performance disclaimer

What's this "Skylake JCC alignment issue"?

About

Releases

Packages

Languages

License

MC-DeltaT/cpu-performance-demos

Folders and files

Latest commit

History

Repository files navigation

CPU Performance Demonstrations

How To Start

Notes

Performance disclaimer

What's this "Skylake JCC alignment issue"?

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages