Tutorial¶
Installation¶
Download
nanobench.h
from the releases and make it available in your project.Create a .cpp file, e.g.
nanobench.cpp
, where the bulk of nanobench is compiled:1 2
#define ANKERL_NANOBENCH_IMPLEMENT #include <nanobench.h>
Wherever you want to use nanobench’s functionality, simply
#include <nanobench.h>
. All functionality resides within namespaceankerl::nanobench
.
Quick Start¶
Create nanobench.cpp:
nanobench.cpp¶1 2
#define ANKERL_NANOBENCH_IMPLEMENT #include <nanobench.h>
Compile with
g++ -O3 -I../include -c nanobench.cpp
. This compiles the bulk of nanobench, and took 3.37 seconds on my machine. It’s done only once.Create the actual benchmark code, in full_example.cpp:
full_example.cpp¶1 2 3 4 5 6 7 8 9 10 11
#include <nanobench.h> #include <atomic> int main() { int y = 0; std::atomic<int> x(0); ankerl::nanobench::Bench().run("compare_exchange_strong", [&] { x.compare_exchange_strong(y, 0); }); }
The most important entry point is
ankerl::nanobench::Bench
. It creates a benchmarking object, optionally configures it, and then runs the code to benchmark withrun()
.Compile & link with
g++ -O3 -I../include nanobench.o full_example.cpp -o full_example
. This takes just 0.5 seconds on my machine.Run
./full_example
, which gives an output like this:| ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:---------- | 5.63 | 177,595,338.98 | 0.0% | 3.00 | 17.98 | 0.167 | 1.00 | 0.1% | 0.00 | `compare_exchange_strong`
Which renders as
ns/op
op/s
err%
ins/op
cyc/op
IPC
bra/op
miss%
total
benchmark
5.63
177,595,338.98
0.0%
3.00
17.98
0.167
1.00
0.1%
0.00
compare_exchange_strong
Which means that one
x.compare_exchange_strong(y, 0);
call takes 5.63ns on my machine, or ~178 million operations per second. Runtime fluctuates by around 0.0%, so the results are very stable. Each call required 3 instructions, which took ~18 CPU cycles. There was a single branch per call, with only 0.1% misspredicted.
In the remaining examples, I’m using doctest as a unit test framework, which is like Catch2 - but compiles much faster. It pairs well with nanobench.
Benchmarking¶
Something Fast¶
Let’s benchmarks how fast we can do x += x
for uint64_t
:
1 2 3 4 5 6 7 8 9 | #include <nanobench.h>
#include <thirdparty/doctest/doctest.h>
TEST_CASE("tutorial_fast_v1") {
uint64_t x = 1;
ankerl::nanobench::Bench().run("++x", [&]() {
++x;
});
}
|
After 0.2ms we get this output:
| ns/op | op/s | err% | total | benchmark
|--------------------:|--------------------:|--------:|----------:|:----------
| - | - | - | - | :boom: `++x` (iterations overflow. Maybe your code got optimized away?)
No data there! we only get :boom: iterations overflow.
. The compiler could optimize x += x
away because we never used the output. Thanks to doNotOptimizeAway
, this is easy to fix:
1 2 3 4 5 6 7 8 9 | #include <nanobench.h>
#include <thirdparty/doctest/doctest.h>
TEST_CASE("tutorial_fast_v2") {
uint64_t x = 1;
ankerl::nanobench::Bench().run("++x", [&]() {
ankerl::nanobench::doNotOptimizeAway(x += 1);
});
}
|
This time the benchmark runs for 2.2ms and we actually get reasonable data:
| ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
| 0.31 | 3,192,444,232.50 | 0.0% | 1.00 | 1.00 | 0.998 | 0.00 | 0.0% | 0.00 | `++x`
It’s a very stable result. One run the op/s is 3,192 million/sec, the next time I execute it I get 3,168 million/sec. It always takes 1.00 instructions per operation on my machine, and can do this in ~1 cycle.
Something Slow¶
Let’s benchmark if sleeping for 100ms really takes 100ms.
1 2 3 4 5 6 7 8 9 10 11 | #include <nanobench.h>
#include <thirdparty/doctest/doctest.h>
#include <chrono>
#include <thread>
TEST_CASE("tutorial_slow_v1") {
ankerl::nanobench::Bench().run("sleep 100ms, auto", [&] {
std::this_thread::sleep_for(std::chrono::milliseconds(100));
});
}
|
After 1.1 seconds I get
| ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:---------------------
| 100,125,753.00 | 9.99 | 0.0% | 51.00 | 7,714.00 | 0.007 | 11.00 | 90.9% | 1.10 | `sleep 100ms, auto`
So we actually take 100.125ms instead of 100ms. Next time I run it, I get 100.141. Also a very stable result. Interestingly, sleep takes 51 instructions but 7,714 cycles - so we only got 0.007 instructions per cycle. That’s extremely low, but expected of sleep
. It also required 11 branches, of which 90.9% were misspredicted on average.
If the extremely slow 1.1 second is too much for you, you can manually configure the number of evaluations (epochs):
1 2 3 4 5 6 7 8 9 10 11 | #include <nanobench.h>
#include <thirdparty/doctest/doctest.h>
#include <chrono>
#include <thread>
TEST_CASE("tutorial_slow_v2") {
ankerl::nanobench::Bench().epochs(3).run("sleep 100ms", [&] {
std::this_thread::sleep_for(std::chrono::milliseconds(100));
});
}
|
| ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
| 100,099,096.00 | 9.99 | 0.0% | 51.00 | 7,182.00 | 0.007 | 11.00 | 90.9% | 0.30 | `sleep 100ms`
This time it took only 0.3 seconds, but with only 3 evaluations instead of 11. The err% will be less meaningfull, but since the benchmark is so stable it doesn’t really matter.
Something Unstable¶
Lets create an extreme artifical test that’s hard to benchmark, because runtime fluctuates randomly: Each iteration randomly skip between 0-254 random numbers:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | #include <nanobench.h>
#include <thirdparty/doctest/doctest.h>
#include <random>
TEST_CASE("tutorial_fluctuating_v1") {
std::random_device dev;
std::mt19937_64 rng(dev());
ankerl::nanobench::Bench().run("random fluctuations", [&] {
// each run, perform a random number of rng calls
auto iterations = rng() & UINT64_C(0xff);
for (uint64_t i = 0; i < iterations; ++i) {
ankerl::nanobench::doNotOptimizeAway(rng());
}
});
}
|
After 2.3ms, I get this result:
| ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
| 334.12 | 2,992,911.53 | 6.3% | 3,486.44 | 1,068.67 | 3.262 | 287.86 | 0.7% | 0.00 | :wavy_dash: `random fluctuations` (Unstable with ~56.7 iters. Increase `minEpochIterations` to e.g. 567)
So on average each loop takes about 334.12ns, but we get a warning that the results are unstable. The median percentage error is 6.3% which is quite high,
Let’s use the suggestion and set the minimum number of iterations to 5000, and try again:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | #include <nanobench.h>
#include <thirdparty/doctest/doctest.h>
#include <random>
TEST_CASE("tutorial_fluctuating_v2") {
std::random_device dev;
std::mt19937_64 rng(dev());
ankerl::nanobench::Bench().minEpochIterations(5000).run(
"random fluctuations", [&] {
// each run, perform a random number of rng calls
auto iterations = rng() & UINT64_C(0xff);
for (uint64_t i = 0; i < iterations; ++i) {
ankerl::nanobench::doNotOptimizeAway(rng());
}
});
}
|
The fluctuations are much better:
| ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmark
|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:----------
| 277.31 | 3,606,106.48 | 0.7% | 3,531.75 | 885.18 | 3.990 | 291.59 | 0.7% | 0.00 | `random fluctuations`
The results are more stable, with only 0.7% error.
Comparing Results¶
I have implemented a comparison of multiple random number generators. Here several RNGs are compared to a baseline calculated from std::default_random_engine. I factored out the general benchmarking code so it’s easy to use for each of the random number generators:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 | namespace {
// Benchmarks how fast we can get 64bit random values from Rng.
template <typename Rng>
void bench(ankerl::nanobench::Bench* bench, char const* name) {
std::random_device dev;
Rng rng(dev());
bench->run(name, [&]() {
auto r = std::uniform_int_distribution<uint64_t>{}(rng);
ankerl::nanobench::doNotOptimizeAway(r);
});
}
} // namespace
TEST_CASE("example_random_number_generators") {
// perform a few warmup calls, and since the runtime is not always stable
// for each generator, increase the number of epochs to get more accurate
// numbers.
ankerl::nanobench::Bench b;
b.title("Random Number Generators")
.unit("uint64_t")
.warmup(100)
.relative(true);
b.performanceCounters(true);
// sets the first one as the baseline
bench<std::default_random_engine>(&b, "std::default_random_engine");
bench<std::mt19937>(&b, "std::mt19937");
bench<std::mt19937_64>(&b, "std::mt19937_64");
bench<std::ranlux24_base>(&b, "std::ranlux24_base");
bench<std::ranlux48_base>(&b, "std::ranlux48_base");
bench<std::ranlux24>(&b, "std::ranlux24_base");
bench<std::ranlux48>(&b, "std::ranlux48");
bench<std::knuth_b>(&b, "std::knuth_b");
bench<WyRng>(&b, "WyRng");
bench<NasamRng>(&b, "NasamRng");
bench<Sfc4>(&b, "Sfc4");
bench<RomuTrio>(&b, "RomuTrio");
bench<RomuDuo>(&b, "RomuDuo");
bench<RomuDuoJr>(&b, "RomuDuoJr");
bench<Orbit>(&b, "Orbit");
bench<ankerl::nanobench::Rng>(&b, "ankerl::nanobench::Rng");
}
|
Runs for 60ms and prints this table:
| relative | ns/uint64_t | uint64_t/s | err% | ins/uint64_t | cyc/uint64_t | IPC | bra/uint64_t | miss% | total | Random Number Generators
|---------:|--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:-------------------------
| 100.0% | 35.87 | 27,881,924.28 | 2.3% | 127.80 | 114.61 | 1.115 | 9.77 | 3.7% | 0.00 | `std::default_random_engine`
| 490.3% | 7.32 | 136,699,693.21 | 0.6% | 89.55 | 23.49 | 3.812 | 9.51 | 0.1% | 0.00 | `std::mt19937`
| 1,767.4% | 2.03 | 492,786,582.33 | 0.6% | 24.38 | 6.48 | 3.761 | 1.26 | 0.6% | 0.00 | `std::mt19937_64`
| 85.2% | 42.08 | 23,764,853.03 | 0.7% | 157.07 | 134.62 | 1.167 | 19.51 | 7.6% | 0.00 | `std::ranlux24_base`
| 121.3% | 29.56 | 33,824,759.51 | 0.5% | 91.03 | 94.35 | 0.965 | 10.00 | 8.1% | 0.00 | `std::ranlux48_base`
| 17.4% | 205.67 | 4,862,080.59 | 1.2% | 709.83 | 657.10 | 1.080 | 101.79 | 16.1% | 0.00 | `std::ranlux24_base`
| 8.7% | 412.46 | 2,424,497.97 | 1.8% | 1,514.70 | 1,318.43 | 1.149 | 219.09 | 16.7% | 0.00 | `std::ranlux48`
| 59.2% | 60.60 | 16,502,276.18 | 1.9% | 253.77 | 193.39 | 1.312 | 24.93 | 1.5% | 0.00 | `std::knuth_b`
| 5,187.1% | 0.69 | 1,446,254,071.66 | 0.1% | 6.00 | 2.21 | 2.714 | 0.00 | 0.0% | 0.00 | `WyRng`
| 1,431.7% | 2.51 | 399,177,833.54 | 0.0% | 21.00 | 8.01 | 2.621 | 0.00 | 0.0% | 0.00 | `NasamRng`
| 2,629.9% | 1.36 | 733,279,957.30 | 0.1% | 13.00 | 4.36 | 2.982 | 0.00 | 0.0% | 0.00 | `Sfc4`
| 3,815.7% | 0.94 | 1,063,889,655.17 | 0.0% | 11.00 | 3.01 | 3.661 | 0.00 | 0.0% | 0.00 | `RomuTrio`
| 3,529.5% | 1.02 | 984,102,081.37 | 0.3% | 9.00 | 3.25 | 2.768 | 0.00 | 0.0% | 0.00 | `RomuDuo`
| 4,580.4% | 0.78 | 1,277,113,402.06 | 0.0% | 7.00 | 2.50 | 2.797 | 0.00 | 0.0% | 0.00 | `RomuDuoJr`
| 2,291.2% | 1.57 | 638,820,992.09 | 0.0% | 11.00 | 5.00 | 2.200 | 0.00 | 0.0% | 0.00 | `ankerl::nanobench::Rng`
It shows that ankerl::nanobench::Rng
is one of the fastest RNG, and has the least amount of
fluctuation. It takes only 1.57ns to generate a random uint64_t
, so ~638 million calls per
seconds are possible. To the left we show relative performance compared to std::default_random_engine
.
Note
Here pure runtime performance is not necessarily the best benchmark. Especially the fastest RNG’s can be inlined and use instruction level parallelism to their advantage: they immediately return an old state, and while user code can already use that value, the next value is calculated in parallel. See the excellent paper at romu-random for details.
Asymptotic Complexity¶
It is possible to calculate asymptotic complexity (Big O) from multiple runs of a benchmark. Run the benchmark with different complexity N, then nanobench can calculate the best fitting curve.
The following example finds out the asymptotic complexity of std::set
’s find()
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | #include <nanobench.h>
#include <thirdparty/doctest/doctest.h>
#include <iostream>
#include <set>
TEST_CASE("tutorial_complexity_set_find") {
// Create a single benchmark instance that is used in multiple benchmark
// runs, with different settings for complexityN.
ankerl::nanobench::Bench bench;
// a RNG to generate input data
ankerl::nanobench::Rng rng;
std::set<uint64_t> set;
// Running the benchmark multiple times, with different number of elements
for (auto setSize :
{10U, 20U, 50U, 100U, 200U, 500U, 1000U, 2000U, 5000U, 10000U}) {
// fill up the set with random data
while (set.size() < setSize) {
set.insert(rng());
}
// Run the benchmark, provide setSize as the scaling variable.
bench.complexityN(set.size()).run("std::set find", [&] {
ankerl::nanobench::doNotOptimizeAway(set.find(rng()));
});
}
// calculate BigO complexy best fit and print the results
std::cout << bench.complexityBigO() << std::endl;
}
|
The loop runs the benchmark 10 times, with different set sizes from 10 to 10k.
Note
Each of the 10 benchmark runs automatically scales the number of iterations so results are still fast and accurate. In total the whole test takes about 90ms.
The Bench
object holds the benchmark results of the 10 benchmark runs. Each benchmark is recorded with a
different setting for complexityN
.
After the benchmark prints the benchmark results, we calculate & print the Big O of the most important complexity functions.
std::cout << bench.complexityBigO() << std::endl;
prints e.g. this markdown table:
| coefficient | err% | complexity
|--------------:|-------:|------------
| 6.66562e-09 | 29.1% | O(log n)
| 1.47588e-11 | 58.3% | O(n)
| 1.10742e-12 | 62.6% | O(n log n)
| 5.15683e-08 | 63.8% | O(1)
| 1.40387e-15 | 78.7% | O(n^2)
| 1.32792e-19 | 85.7% | O(n^3)
The table is sorted, best fitting complexity function first. So \(\mathcal{O}(\log{}n)\) provides the best approximation for the complexity. Interestingly, in that case error compared to \(\mathcal{O}(n)\) is not very large, which can be an indication that even though the red-black tree should theoretically have logarithmic complexity, in practices that is not perfectly the case.
Rendering Mustache-like Templates¶
Nanobench comes with a powerful Mustache-like template mechanism to process the benchmark
results into all kinds of formats. You can find a full description of all possible tags at ankerl::nanobench::render()
.
Several preconfigured format exist in the namespace ankerl::nanobench::templates
. Rendering these templates can be done
with either ankerl::nanobench::render()
, or directly with ankerl::nanobench::Bench::render()
.
The following example shows how to use the CSV - Comma-Separated Values template, without writing the standard output.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | #include <nanobench.h>
#include <thirdparty/doctest/doctest.h>
#include <atomic>
#include <iostream>
TEST_CASE("tutorial_render_simple") {
std::atomic<int> x(0);
ankerl::nanobench::Bench()
.output(nullptr)
.run("std::vector",
[&] {
++x;
})
.render(ankerl::nanobench::templates::csv(), std::cout);
}
|
In line 11 we call Bench::output()
with nullptr
, thus disabling the standard output.
After the benchmark we directly call Bench::render()
in line 16. Here we use the
CSV template, and write the rendered output to std::cout
. When running, we get just the CSV output to the console which looks like this:
"title";"name";"unit";"batch";"elapsed";"error %";"instructions";"branches";"branch misses";"total"
"benchmark";"std::vector";"op";1;6.51982200647249e-09;8.26465858909014e-05;23.0034662045061;5;0.00116867939228672;0.000171959
Nanobench comes with a few preconfigured templates, residing in the namespace ankerl::nanobench::templates
. To demonstrate what these templates can do,
here is an simple example that benchmarks two random generators std::mt19937_64
and std::knuth_b
and prints both the template and the rendered
output:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 | #include <nanobench.h>
#include <thirdparty/doctest/doctest.h>
#include <fstream>
#include <random>
namespace {
void gen(std::string const& typeName, char const* mustacheTemplate,
ankerl::nanobench::Bench const& bench) {
std::ofstream templateOut("mustache.template." + typeName);
templateOut << mustacheTemplate;
std::ofstream renderOut("mustache.render." + typeName);
ankerl::nanobench::render(mustacheTemplate, bench, renderOut);
}
} // namespace
TEST_CASE("tutorial_mustache") {
ankerl::nanobench::Bench bench;
bench.title("Benchmarking std::mt19937_64 and std::knuth_b");
std::mt19937_64 rng1;
bench.run("std::mt19937_64", [&] {
ankerl::nanobench::doNotOptimizeAway(rng1());
});
std::knuth_b rng2;
bench.run("std::knuth_b", [&] {
ankerl::nanobench::doNotOptimizeAway(rng2());
});
gen("json", ankerl::nanobench::templates::json(), bench);
gen("html", ankerl::nanobench::templates::htmlBoxplot(), bench);
gen("csv", ankerl::nanobench::templates::csv(), bench);
}
|
CSV - Comma-Separated Values¶
The function ankerl::nanobench::templates::csv()
provides this template:
1 2 3 | "title";"name";"unit";"batch";"elapsed";"error %";"instructions";"branches";"branch misses";"total"
{{#result}}"{{title}}";"{{name}}";"{{unit}}";{{batch}};{{median(elapsed)}};{{medianAbsolutePercentError(elapsed)}};{{median(instructions)}};{{median(branchinstructions)}};{{median(branchmisses)}};{{sumProduct(iterations, elapsed)}}
{{/result}}
|
This generates a compact CSV file, where entries are separated by a semicolon ;. Run with the example, I get this output:
1 2 3 | "title";"name";"unit";"batch";"elapsed";"error %";"instructions";"branches";"branch misses";"total"
"Benchmarking std::mt19937_64 and std::knuth_b";"std::mt19937_64";"op";1;2.54441805225653e-08;0.0236579384033733;125.989678899083;16.7645714285714;0.564133016627078;0.000218811
"Benchmarking std::mt19937_64 and std::knuth_b";"std::knuth_b";"op";1;3.19013867488444e-08;0.00091350764819687;170.013008130081;28;0.0031104199066874;0.000217248
|
Rendered as CSV table:
title |
name |
unit |
batch |
elapsed |
error % |
instructions |
branches |
branch misses |
total |
---|---|---|---|---|---|---|---|---|---|
Benchmarking std::mt19937_64 and std::knuth_b |
std::mt19937_64 |
op |
1 |
2.54441805225653e-08 |
0.0236579384033733 |
125.989678899083 |
16.7645714285714 |
0.564133016627078 |
0.000218811 |
Benchmarking std::mt19937_64 and std::knuth_b |
std::knuth_b |
op |
1 |
3.19013867488444e-08 |
0.00091350764819687 |
170.013008130081 |
28 |
0.0031104199066874 |
0.000217248 |
Note that the CSV template doesn’t provide all the data that is available.
HTML Box Plots¶
With the template ankerl::nanobench::templates::htmlBoxplot()
you get a plotly based HTML output which generates
an boxplot of the runtime. The template is rather simple.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | <html>
<head>
<script src="https://cdn.plot.ly/plotly-latest.min.js"></script>
</head>
<body>
<div id="myDiv"></div>
<script>
var data = [
{{#result}}{
name: '{{name}}',
y: [{{#measurement}}{{elapsed}}{{^-last}}, {{/last}}{{/measurement}}],
},
{{/result}}
];
var title = '{{title}}';
data = data.map(a => Object.assign(a, { boxpoints: 'all', pointpos: 0, type: 'box' }));
var layout = { title: { text: title }, showlegend: false, yaxis: { title: 'time per unit', rangemode: 'tozero', autorange: true } }; Plotly.newPlot('myDiv', data, layout, {responsive: true});
</script>
</body>
</html>
|
This generates a nice interactive boxplot, which gives a nice visual showcase of the runtime performance of the evaluated benchmarks. Each epoch is visualized as a dot, and the boxplot itself shows median, percentiles, and outliers. You’ll might want to increase the default number of epochs for an even better visualization result.
JSON - JavaScript Object Notation¶
The ankerl::nanobench::templates::json()
template gives everything, all data that is available, from all runs. The template is therefore quite complex:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 | {
"results": [
{{#result}} {
"title": "{{title}}",
"name": "{{name}}",
"unit": "{{unit}}",
"batch": {{batch}},
"complexityN": {{complexityN}},
"epochs": {{epochs}},
"clockResolution": {{clockResolution}},
"clockResolutionMultiple": {{clockResolutionMultiple}},
"maxEpochTime": {{maxEpochTime}},
"minEpochTime": {{minEpochTime}},
"minEpochIterations": {{minEpochIterations}},
"warmup": {{warmup}},
"relative": {{relative}},
"median(elapsed)": {{median(elapsed)}},
"medianAbsolutePercentError(elapsed)": {{medianAbsolutePercentError(elapsed)}},
"median(instructions)": {{median(instructions)}},
"medianAbsolutePercentError(instructions)": {{medianAbsolutePercentError(instructions)}},
"median(cpucycles)": {{median(cpucycles)}},
"median(contextswitches)": {{median(contextswitches)}},
"median(pagefaults)": {{median(pagefaults)}},
"median(branchinstructions)": {{median(branchinstructions)}},
"median(branchmisses)": {{median(branchmisses)}},
"totalTime": {{sumProduct(iterations, elapsed)}},
"measurements": [
{{#measurement}} {
"iterations": {{iterations}},
"elapsed": {{elapsed}},
"pagefaults": {{pagefaults}},
"cpucycles": {{cpucycles}},
"contextswitches": {{contextswitches}},
"instructions": {{instructions}},
"branchinstructions": {{branchinstructions}},
"branchmisses": {{branchmisses}}
}{{^-last}},{{/-last}}
{{/measurement}} ]
}{{^-last}},{{/-last}}
{{/result}} ]
}
|
This also gives the data from each separate ankerl::nanobench::Bench::epochs()
, not just the accumulated data as in the CSV template.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 | {
"results": [
{
"title": "Benchmarking std::mt19937_64 and std::knuth_b",
"name": "std::mt19937_64",
"unit": "op",
"batch": 1,
"complexityN": -1,
"epochs": 11,
"clockResolution": 1.8e-08,
"clockResolutionMultiple": 1000,
"maxEpochTime": 0.1,
"minEpochTime": 0,
"minEpochIterations": 1,
"warmup": 0,
"relative": 0,
"median(elapsed)": 2.54441805225653e-08,
"medianAbsolutePercentError(elapsed)": 0.0236579384033733,
"median(instructions)": 125.989678899083,
"medianAbsolutePercentError(instructions)": 0.035125448044942,
"median(cpucycles)": 81.3479809976247,
"median(contextswitches)": 0,
"median(pagefaults)": 0,
"median(branchinstructions)": 16.7645714285714,
"median(branchmisses)": 0.564133016627078,
"totalTime": 0.000218811,
"measurements": [
{
"iterations": 875,
"elapsed": 2.54708571428571e-08,
"pagefaults": 0,
"cpucycles": 81.472,
"contextswitches": 0,
"instructions": 125.885714285714,
"branchinstructions": 16.7645714285714,
"branchmisses": 0.574857142857143
},
{
"iterations": 809,
"elapsed": 2.58467243510507e-08,
"pagefaults": 0,
"cpucycles": 82.5290482076638,
"contextswitches": 0,
"instructions": 128.771322620519,
"branchinstructions": 17.0296662546354,
"branchmisses": 0.582200247218789
},
{
"iterations": 737,
"elapsed": 2.24097693351425e-08,
"pagefaults": 0,
"cpucycles": 71.6431478968792,
"contextswitches": 0,
"instructions": 118.374491180461,
"branchinstructions": 15.9470827679783,
"branchmisses": 0.417910447761194
},
{
"iterations": 872,
"elapsed": 2.53405963302752e-08,
"pagefaults": 0,
"cpucycles": 80.9896788990826,
"contextswitches": 0,
"instructions": 125.989678899083,
"branchinstructions": 16.7580275229358,
"branchmisses": 0.563073394495413
},
{
"iterations": 834,
"elapsed": 2.59256594724221e-08,
"pagefaults": 0,
"cpucycles": 82.7661870503597,
"contextswitches": 0,
"instructions": 127.635491606715,
"branchinstructions": 16.9352517985612,
"branchmisses": 0.575539568345324
},
{
"iterations": 772,
"elapsed": 2.25310880829016e-08,
"pagefaults": 0,
"cpucycles": 72.0129533678757,
"contextswitches": 0,
"instructions": 117.108808290155,
"branchinstructions": 15.8341968911917,
"branchmisses": 0.405440414507772
},
{
"iterations": 842,
"elapsed": 2.54441805225653e-08,
"pagefaults": 0,
"cpucycles": 81.3479809976247,
"contextswitches": 0,
"instructions": 127.266033254157,
"branchinstructions": 16.8859857482185,
"branchmisses": 0.564133016627078
},
{
"iterations": 792,
"elapsed": 2.20126262626263e-08,
"pagefaults": 0,
"cpucycles": 70.3623737373737,
"contextswitches": 0,
"instructions": 116.420454545455,
"branchinstructions": 15.7588383838384,
"branchmisses": 0.396464646464646
},
{
"iterations": 757,
"elapsed": 2.63870541611625e-08,
"pagefaults": 0,
"cpucycles": 84.332892998679,
"contextswitches": 0,
"instructions": 131.462351387054,
"branchinstructions": 17.334214002642,
"branchmisses": 0.618229854689564
},
{
"iterations": 850,
"elapsed": 2.23305882352941e-08,
"pagefaults": 0,
"cpucycles": 71.3505882352941,
"contextswitches": 0,
"instructions": 114.629411764706,
"branchinstructions": 15.5823529411765,
"branchmisses": 0.392941176470588
},
{
"iterations": 774,
"elapsed": 2.60607235142119e-08,
"pagefaults": 0,
"cpucycles": 83.1679586563308,
"contextswitches": 0,
"instructions": 130.576227390181,
"branchinstructions": 17.2635658914729,
"branchmisses": 0.590439276485788
}
]
},
{
"title": "Benchmarking std::mt19937_64 and std::knuth_b",
"name": "std::knuth_b",
"unit": "op",
"batch": 1,
"complexityN": -1,
"epochs": 11,
"clockResolution": 1.8e-08,
"clockResolutionMultiple": 1000,
"maxEpochTime": 0.1,
"minEpochTime": 0,
"minEpochIterations": 1,
"warmup": 0,
"relative": 0,
"median(elapsed)": 3.19013867488444e-08,
"medianAbsolutePercentError(elapsed)": 0.00091350764819687,
"median(instructions)": 170.013008130081,
"medianAbsolutePercentError(instructions)": 4.11992392254248e-06,
"median(cpucycles)": 101.973254086181,
"median(contextswitches)": 0,
"median(pagefaults)": 0,
"median(branchinstructions)": 28,
"median(branchmisses)": 0.0031104199066874,
"totalTime": 0.000217248,
"measurements": [
{
"iterations": 568,
"elapsed": 3.2137323943662e-08,
"pagefaults": 0,
"cpucycles": 102.55985915493,
"contextswitches": 0,
"instructions": 170.014084507042,
"branchinstructions": 28,
"branchmisses": 0.00528169014084507
},
{
"iterations": 576,
"elapsed": 3.19305555555556e-08,
"pagefaults": 0,
"cpucycles": 102.059027777778,
"contextswitches": 0,
"instructions": 170.013888888889,
"branchinstructions": 28,
"branchmisses": 0.00347222222222222
},
{
"iterations": 643,
"elapsed": 3.18973561430793e-08,
"pagefaults": 0,
"cpucycles": 101.973561430793,
"contextswitches": 0,
"instructions": 170.012441679627,
"branchinstructions": 28,
"branchmisses": 0.0031104199066874
},
{
"iterations": 591,
"elapsed": 3.1912013536379e-08,
"pagefaults": 0,
"cpucycles": 101.944162436548,
"contextswitches": 0,
"instructions": 170.013536379019,
"branchinstructions": 28,
"branchmisses": 0.00169204737732657
},
{
"iterations": 673,
"elapsed": 3.19049034175334e-08,
"pagefaults": 0,
"cpucycles": 101.973254086181,
"contextswitches": 0,
"instructions": 170.011887072808,
"branchinstructions": 28,
"branchmisses": 0.00297176820208024
},
{
"iterations": 649,
"elapsed": 3.19013867488444e-08,
"pagefaults": 0,
"cpucycles": 101.850539291217,
"contextswitches": 0,
"instructions": 170.012326656394,
"branchinstructions": 28,
"branchmisses": 0.00308166409861325
},
{
"iterations": 606,
"elapsed": 3.18547854785479e-08,
"pagefaults": 0,
"cpucycles": 101.83498349835,
"contextswitches": 0,
"instructions": 170.013201320132,
"branchinstructions": 28,
"branchmisses": 0.0033003300330033
},
{
"iterations": 650,
"elapsed": 3.18769230769231e-08,
"pagefaults": 0,
"cpucycles": 101.898461538462,
"contextswitches": 0,
"instructions": 170.012307692308,
"branchinstructions": 28,
"branchmisses": 0.00307692307692308
},
{
"iterations": 615,
"elapsed": 3.18520325203252e-08,
"pagefaults": 0,
"cpucycles": 101.858536585366,
"contextswitches": 0,
"instructions": 170.013008130081,
"branchinstructions": 28,
"branchmisses": 0.0032520325203252
},
{
"iterations": 579,
"elapsed": 3.18618307426598e-08,
"pagefaults": 0,
"cpucycles": 101.989637305699,
"contextswitches": 0,
"instructions": 170.013816925734,
"branchinstructions": 28,
"branchmisses": 0.00345423143350604
},
{
"iterations": 657,
"elapsed": 3.19558599695586e-08,
"pagefaults": 0,
"cpucycles": 102.229832572298,
"contextswitches": 0,
"instructions": 170.012176560122,
"branchinstructions": 28,
"branchmisses": 0.0030441400304414
}
]
}
]
}
|