1kcbench(1) User Commands kcbench(1)
2
3
4
6 kcbench - Linux kernel compile benchmark, speed edition
7
9 kcbench [options]
10
12 Kcbench tries to compile a Linux kernel really quickly which can be
13 used to test a system's performance or stability.
14
15 Note: The number of compile jobs ('-j') that delivers the best result
16 depends on the machine being benched. See the section "ON THE DEFAULT
17 NUMBER OF JOBS" below for details.
18
19 To get comparable results from different machines you need to use the
20 exact same operating system on all of them. There are multiple reasons
21 for this recommendation, but one of the main reasons is: the Linux ver‐
22 sion this benchmark downloads and compiles depends on the operating
23 system's default compiler.
24
25 If you choose to ignore this recommendation at least make sure to hard
26 code the Linux version to compile ('-s 5.4'), as for example compiling
27 5.7 will take longer than 5.4 or 4.19 and thus lead to results one can‐
28 not compare. Also, make sure the compiler used on the systems you want
29 to compare is from similar, as for example gcc10 will try harder to op‐
30 timize the code than gcc8 or gcc9 and thus take more time for its work.
31
32 Kcbench is accompanied by kcbenchrate. Both are quite similar, but
33 work slightly different:
34
35 • kcbench tries to build one kernel as fast as possible. This approach
36 is called 'speed run' and let's make start multiple compilers jobs in
37 parallel by using 'make -j #'. That way kcbench will use a lot of
38 CPU cores most of the time, except during those few phases where the
39 Linux kernel build process is singled threaded and thus utilizes just
40 one CPU core. That for example is the case when vmlinux is linked.
41
42 • kcbenchrate tries to keep all CPU cores busy constantly by starting
43 workers on all of them, which each builds one kernel with just one
44 job ('make -j 1'). This approach is called 'rate run'. It takes a
45 lot longer to generate a result than kcbench; it also needs a lot
46 more storage space, but will utilize the machine and its processors
47 better.
48
49 Options
50 -b, --bypass
51 Omit the initial kernel compile to fill caches; saves time, but
52 first result might be slightly lower than the following ones.
53
54 -d, --detailedresults
55 Print more detailed results.
56
57 -h, --help
58 Show usage.
59
60 -i, --iterations int
61 Determines the number of kernels that kcbench will compile se‐
62 quentially with different values of jobs ('-j'). Default: 2
63
64 -j, --jobs int(,int, int, ...)
65 Number of jobs to use when compiling a kernel('make -j #').
66
67 This option can be given multiple times (-j 2 -j 4 -j 8) or
68 'int' can be a list (-j "2 4 8"). The default depends on the
69 number of cores in the system and if its processor uses SMT.
70 Run '--help' to query the default on the particular machine.
71
72 Important note: kcbench on machines with SMTs will do runs which
73 do not utilize all available CPU cores; this might look odd, but
74 there are reasons for this behaviour. See "ON THE DEFAULT NUM‐
75 BER OF JOBS" below for details.
76
77 -m, --modconfig
78 Instead of using a config generated with 'defconfig' use one
79 built by 'allmodconfig' and compile modules as well. Takes a
80 lot longer to compile, which is more suitable for machines with
81 a lot of fast CPU cores.
82
83 -o, --outputdir dir
84 Use path to compile Linux. Passes 'O=dir/kcbench-worker/' to
85 make when calling it to compile a kernel; use a temporary direc‐
86 tory if not given.
87
88 -s, --src path|version
89 Look for sources in path, ~/.cache/kcbench/linux-version or
90 /usr/share/kcbench/linux-version. If not found try to download
91 version automatically unless '--no-download' was specified.
92
93 -v, --verbose
94 Increase verboselevel; option can be given multiple times.
95
96 -V, --version
97 Output program version.
98
99 --cc exec
100 Use exec as target compiler.
101
102 --cross-compile arch
103 EXPERIMENTAL: Cross compile the Linux kernel. Cross compilers
104 for this task are packaged in some Linux distribution. There
105 are also pre-compiled compilers available on the internet, for
106 example here: https://mirrors.edge.ker‐
107 nel.org/pub/tools/crosstool/
108
109 Values of arch that kcbench/kcbenchrate understand: arm arm64
110 aarch64 riscv riscv64 powerpc powerpc64 x86_64
111
112 Building for archs not directly supported by kcbench/kcbenchrate
113 should work, too: just export ARCH= and CROSS_COMPILE= just like
114 you would when normally cross compiling a Linux kernel. Do not
115 use '--cross-compile' in that case and keep in mind that
116 kcbench/kcbenchrate configure the compiled Linux kernel with the
117 make target 'defconfig' (or 'allmodconfig', if you specify
118 '-m'), which might be unusual for the arch in question, but
119 might be good enough for benchmarking purposes.
120
121 Be aware there is a bigger risk running into compile errors (see
122 below) when cross compiling.
123
124 --crosscomp-scheme scheme
125 On Linux distributions that are known to ship cross compilers
126 kcbench/ kcbenchrate will assume you want to use those. This
127 parameter allows to specify one of the various different naming
128 schemes in cases this automatic detection fails or work you want
129 kcbench/kcbenchrate to find them using a 'generic' scheme that
130 should work with compilers from various sources, which is the
131 default on unknown distributions.
132
133 Valid values of scheme: debian fedora generic redhat ubuntu
134
135 --hostcc exec
136 Use exec as host compiler.
137
138 --infinite
139 Run endlessly to create system load.
140
141 --llvm Set LLVM=1 to use clang as compiler and LLVM utilities as GNU
142 binutils substitute.
143
144 --add-make-args string
145 Pass additional flags found in string to make when creating the
146 config or building the kernel. This option is meant for experts
147 that want to try unusual things, like specifying a special link‐
148 er (--add-make-args 'LD=ld.lld').
149
150 Use with caution!
151
152 --no-download
153 Never download Linux kernel sources from the web automatically.
154
155 --savefailedlogs path
156 Save log of failed compile runs to path.
157
159 The optimal number of compile jobs (-j) to get the best result depends
160 on the machine being benched. On most systems you will achieve the
161 best result if the number of jobs matches the number of CPU cores.
162 That for example is the case on this 4 core Intel processor without
163 SMT:
164
165 [cttest@localhost ~]$ bash kcbench -s 5.3 -n 1
166 Processor: Intel(R) Core(TM) i5-4570 CPU @ 3.20GHz [4 CPUs]
167 Cpufreq; Memory: Unknown; 15934 MByte RAM
168 Compiler used: gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1)
169 Linux compiled: 5.3.0 [/home/cttest/.cache/kcbench/linux-5.3/]
170 Config; Environment: defconfig; CCACHE_DISABLE="1"
171 Build command: make vmlinux
172 Run 1 (-j 4): 250.03 seconds / 14.40 kernels/hour
173 Run 2 (-j 6): 255.88 seconds / 14.07 kernels/hour
174
175 The run with 6 jobs was slower here. Trying a setting like that by de‐
176 fault looks like a waste of time on this machine, but other machines
177 deliver the best result when they are oversubscribed a little. That's
178 for example the case on this 6 core/12 threads processor, which
179 achieved its best result with 15 jobs:
180
181 [cttest@localhost ~]$ bash kcbench -s 5.3 -n 1
182 Processor: Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz [12 CPUs]
183 Cpufreq; Memory: Unknown; 15934 MByte RAM
184 Linux running: 5.6.0-0.rc2.git0.1.vanilla.knurd.2.fc31.x86_64
185 Compiler used: gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1)
186 Linux compiled: 5.3.0 [/home/cttest/.cache/kcbench/linux-5.3/]
187 Config; Environment: defconfig; CCACHE_DISABLE="1"
188 Build command: make vmlinux
189 Run 1 (-j 12): 92.55 seconds / 38.90 kernels/hour
190 Run 2 (-j 15): 91.91 seconds / 39.17 kernels/hour
191 Run 3 (-j 6): 113.66 seconds / 31.67 kernels/hour
192 Run 4 (-j 9): 101.32 seconds / 35.53 kernels/hour
193
194 You'll notice attempts that tried to utilize only the real cores (-j 6)
195 and oversubscribe them a little (-j 9), which look liked a waste of
196 time. But on some machines with SMT capable processors those will de‐
197 liver the best results, like on this AMD Threadripper processor with 64
198 core/128 threads:
199
200 $ kcbench
201 [cttest@localhost ~]$ bash kcbench -s 5.3 -n 1
202 Processor: AMD Ryzen Threadripper 3990X 64-Core Processor [128 CPUs]
203 Cpufreq; Memory: Unknown; 15934 MByte RAM
204 Linux running: 5.6.0-0.rc2.git0.1.vanilla.knurd.2.fc31.x86_64
205 Compiler used: gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1)
206 Linux compiled: 5.3.0 [/home/cttest/.cache/kcbench/linux-5.3/]
207 Config; Environment: defconfig; CCACHE_DISABLE="1"
208 Build command: make vmlinux
209 Run 1 (-j 128): 26.16 seconds / 137.61 kernels/hour
210 Run 2 (-j 136): 26.19 seconds / 137.46 kernels/hour
211 Run 3 (-j 64): 21.45 seconds / 167.83 kernels/hour
212 Run 4 (-j 72): 22.68 seconds / 158.73 kernels/hour
213
214 This is even more visible when compiling an allmodconfig configuration:
215
216 [cttest@localhost ~]$ bash kcbench -s 5.3 -n 1 -m
217 Processor: AMD Ryzen Threadripper 3990X 64-Core Processor [128 CPUs]
218 Cpufreq; Memory: Unknown; 63736 MByte RAM
219 Linux running: 5.6.0-0.rc2.git0.1.vanilla.knurd.2.fc31.x86_64
220 Compiler used: gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1)
221 Linux compiled: 5.3.0 [/home/cttest/.cache/kcbench/linux-5.3/]
222 Config; Environment: defconfig; CCACHE_DISABLE="1"
223 Build command: make vmlinux
224 Run 1 (-j 128): 260.43 seconds / 13.82 kernels/hour
225 Run 2 (-j 136): 262.67 seconds / 13.71 kernels/hour
226 Run 3 (-j 64): 215.54 seconds / 16.70 kernels/hour
227 Run 4 (-j 72): 215.97 seconds / 16.67 kernels/hour
228
229 This can happen if the SMT implementation is bad or something else
230 (memory, storage, ...) becomes a bottleneck. A few tests on above ma‐
231 chine indicated the memory interface was the limiting factor. A AMD
232 Epyc from the same processor generation did not show this effect and
233 delivered its best results when the number of jobs matched the number
234 of CPUs:
235
236 [cttest@localhost ~]$ bash kcbench -s 5.3 -n 1 -m
237 Processor: AMD EPYC 7742 64-Core Processor [256 CPUs]
238 Cpufreq; Memory: Unknown; 63736 MByte RAM
239 Linux running: 5.6.0-0.rc2.git0.1.vanilla.knurd.2.fc31.x86_64
240 Compiler used: gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1)
241 Linux compiled: 5.3.0 [/home/cttest/.cache/kcbench/linux-5.3/]
242 Config; Environment: defconfig; CCACHE_DISABLE="1"
243 Build command: make vmlinux
244 Run 1 (-j 256): 128.24 seconds / 28.07 kernels/hour
245 Run 2 (-j 268): 128.87 seconds / 27.94 kernels/hour
246 Run 3 (-j 128): 141.83 seconds / 25.38 kernels/hour
247 Run 4 (-j 140): 137.46 seconds / 26.19 kernels/hour
248
249 This table will tell you now many jobs kcbench will use by default:
250
251 # Cores: Default # of jobs
252 # 1 CPU: 1 2
253 # 2 CPUs ( no SMT ): 2 3
254 # 2 CPUs (2 threads/core): 2 3 1
255 # 4 CPUs ( no SMT ): 4 6
256 # 4 CPUs (2 threads/core): 4 6 2
257 # 6 CPUs ( no SMT ): 6 9
258 # 6 CPUs (2 threads/core): 6 9 3
259 # 8 CPUs ( no SMT ): 8 11
260 # 8 CPUs (2 threads/core): 8 11 4 6
261 # 12 CPUs ( no SMT ): 12 16
262 # 12 CPUs (2 threads/core): 12 16 6 9
263 # 16 CPUs ( no SMT ): 16 20
264 # 16 CPUs (2 threads/core): 16 20 8 11
265 # 20 CPUs ( no SMT ): 20 25
266 # 20 CPUs (2 threads/core): 20 25 10 14
267 # 24 CPUs ( no SMT ): 24 29
268 # 24 CPUs (2 threads/core): 24 29 12 16
269 # 28 CPUs ( no SMT ): 28 34
270 # 28 CPUs (2 threads/core): 28 34 14 18
271 # 32 CPUs ( no SMT ): 32 38
272 # 32 CPUs (2 threads/core): 32 38 16 20
273 # 32 CPUs (4 threads/core): 32 38 8 11
274 # 48 CPUs ( no SMT ): 48 55
275 # 48 CPUs (2 threads/core): 48 55 24 29
276 # 48 CPUs (4 threads/core): 48 55 12 16
277 # 64 CPUs ( no SMT ): 64 72
278 # 64 CPUs (2 threads/core): 64 72 32 38
279 # 64 CPUs (4 threads/core): 64 72 16 20
280 # 128 CPUs ( no SMT ): 128 140
281 # 128 CPUs (2 threads/core): 128 140 64 72
282 # 128 CPUs (4 threads/core): 128 140 32 38
283 # 256 CPUs ( no SMT ): 256 272
284 # 256 CPUs (2 threads/core): 256 272 128 140
285 # 256 CPUs (4 threads/core): 256 272 64 72
286
288 The compilation is unlikely to fail, as long as you are using a settled
289 GCC version to natively compile the source of a current Linux kernel
290 for popular architectures like ARM, ARM64/Aarch64, or x86_64. For oth‐
291 er cases there is a bigger risk that compilation will fail due to fac‐
292 tors outside of what kcbench/kcbenchrate control. They nevertheless
293 try to catch a few common problems and warn, but they can not catch
294 them all, as there are to many factors involved:
295
296 • Brand new compiler generations are sometimes stricter than their pre‐
297 decessors and thus might fail to compile even the latest Linux kernel
298 version. You might need to use a pre-release version of the next
299 Linux kernel release to make it work or simply need to wait until the
300 compiler or kernel developers solve the problem.
301
302 • Distributions enable different compiler features that might have an
303 impact on the kernel compilation. For example gcc9 was capable of
304 compiling Linux 4.19 on many distributions, but started to fail on
305 Ubuntu 19.10 due to a feature that got enabled in its GCC. Try com‐
306 piling a newer Linux kernel version in this case.
307
308 • Cross compilation increases the risk of running into compile problems
309 in general, as there are many compilers and architectures our there.
310 That for example is why compiling the Linux kernel for an unpopular
311 architecture is more likely to fail due to bugs in the compiler or
312 the Linux kernel sources that nobody had noticed before when the com‐
313 piler or kernel was released. This is even more likely to happen if
314 you start kcbench/kcbenchrate with '-m/--allmodconfig' to build a
315 more complex kernel.
316
318 Running benchmarks is very tricky. Here are a few of the aspects you
319 should keep mind when doing so:
320
321 • Do not compare results from two different archs (like ARM64 and
322 x86_64); kcbench/kcbenchrate compile different code in that case, as
323 they will compile a native kernel on each of those archs. This can
324 be avoided by cross compiling for a third arch that is not related to
325 any of the archs compared (say RISC-V when comparing ARM64 and
326 x86_64).
327
328 • Unless you want to bench compilers do not compare results from dif‐
329 ferent compiler generations, as they will apply different optimiza‐
330 tions techniques. For example to not compare results from GCC7 and
331 GCC9, as the later optimizes harder and thus will take more time gen‐
332 erating the code. That's also why the Linux version compiled by de‐
333 fault depends on the machine's compiler: you sometimes can't compile
334 older kernels with the latest compilers anyway, as new compiler gen‐
335 erations often uncover bugs in the Linux kernel source that need get
336 fixed for compiling to succeed. For example, when GCC10 was close to
337 release it was incapable of compile the then latest Linux version 5.5
338 in an allmodconfig configuration due to a bug in the Linux kernel
339 sources.
340
341 • Compiling a Linux kernel scales very well and thus can utilize pro‐
342 cessors quite well. But be aware that some parts of the Linux com‐
343 pile process will only use one thread (and thus one CPU core), for
344 example when linking vmlinuz; the other cores will idle meanwhile.
345 The effect on the result will grow with the number of CPU cores.
346
347 If you want to work against that consider using '-m' to build an
348 allmodconfig configuration with modules; comping a newer, more complex
349 Linux kernel version can also help. But the best way to avoid this ef‐
350 fect is by running kcbenchrate.
351
352 • kcbench/kcbenchrate by default set CCACHE_DISABLE=1 when calling
353 'make' to avoid interference from ccache.
354
356 To let kcbench decide everything automatically simply run:
357 $ kcbench
358
359 On a four core processor without SMT kcbench by default will compile 2
360 kernels with 4 jobs and 2 with 6 jobs. You can specify a setting like
361 this manually: .
362
363 : $ kcbench -s 5.4 --iterations 3 --jobs 2 --jobs 4
364
365 This will compile Linux 5.4 first 3 times with 2 jobs and then as often
366 with 4 jobs.
367
369 By default, the lines you are looking for look like this:
370
371 Run 1 (-j 4): 230.30 sec / 15.63 kernels/hour [P:389%, 24 maj. pagefaults]
372
373 Here it took 230.30 seconds to compile the Linux kernel image. With a
374 speed like this the machine can compile 15.63 kernels per hour
375 (60*60/230.30). The results from this 4 core machine also show the CPU
376 usage (P) was 389 percent; 24 major page faults occurred during this
377 run – this number should be small, as processing them takes some time
378 and thus slows down the build. This information is omitted, if less
379 than 20 major page faults happen. For details how the CPU usage is
380 calculated and major page faults are detected see the man page for GNU
381 'time', which kcbench/kcbenchrate rely on for their measurements.
382
383 When running with "-d|--detailedresults" you'll get more detailed re‐
384 sult:
385
386 Run 1 (-j 4): 230.30 sec / 15.63 kernels/hour [P:389%]
387 Elapsed Time(E): 2:30.10 (150.10 seconds)
388 Kernel time (S): 36.38 seconds
389 User time (U): 259.51 seconds
390 CPU usage (P): 197%
391 Major page faults (F): 0
392 Minor page faults (R): 9441809
393 Context switches involuntarily (c): 69031
394 Context switches voluntarily (w): 46955
395
397 • some math to detect the fastest setting and do one more run with it
398 before sanity checking the result and printing the best one, includ‐
399 ing standard deviation.
400
402 kcbenchrate(1), time(1)
403
405 Thorsten Leemhuis <linux [AT] leemhuis [DOT] info>
406
407
408
409Version 0.9 kcbench(1)