1kcbench(1) User Commands kcbench(1)
2
3
4
6 kcbench - Linux kernel compile benchmark, speed edition
7
9 kcbench [options]
10
12 Kcbench tries to compile a Linux kernel really quickly to benchmark a
13 system or test its stability. It will measure the time it takes to
14 compile a kernel and print the result after each attempt.
15
16 Kcbench is accompanied by kcbenchrate. Both are quite similar, but
17 work slightly different:
18
19 · kcbench tries to build one kernel really quickly. This approach is
20 also called 'speed run' and lets make start multiple compilers jobs
21 in parallel by using 'make -j #'. That way kcbench most of the time
22 will use all the CPU cores, except during those few phases where the
23 Linux kernel build process is singled threaded and thus uses just one
24 CPU core. That for example is the case when vmlinux is linked.
25
26 · kcbenchrate launches a worker on every CPU that complies one kernel
27 with just one job. This approach will utilize the machine and its
28 processors a little better than the speed approach used by kcbench,
29 as it should keep the CPU cores busy all the time, even during phases
30 where the Linux kernel build process uses just one CPU core. Note
31 that this approach takes a lot longer to generate a results and needs
32 a lot more storage space.
33
34 NOTE: The optimal number of compile jobs ('-j') that delivers the best
35 result depends on the machine being benched. See the section "ON THE
36 DEFAULT NUMBER OF JOBS" below for details.
37
38 IMPORTANT: To get comparable results from different machines you should
39 use the exact same operating system on all of them. There are multiple
40 reasons for this recommendation, but one of the main reasons is: the
41 Linux source code automatically downloaded to compile depends on the
42 compiler being used.
43
44 If you choose to ignore this recommendation at least make sure to hard
45 code the Linux version to compile ('-s 5.4'), as for example compiling
46 5.7 will take longer than 5.4 or 4.19 and thus lead to results one can‐
47 not compare. Also, make sure the compiler used on the systems you want
48 to compare is from similar, as for example gcc10 will try harder to op‐
49 timize the code than gcc8 or gcc9 and thus take more time.
50
51 Options
52 -b, --bypass
53 Omit the initial kernel compile to fill caches; saves time, but
54 first result might be slightly lower than the following ones.
55
56 -d, --detailedresults
57 Print more detailed results.
58
59 -h, --help
60 Show usage.
61
62 -i, --iterations int
63 Determines the number of kernels that kcbench will compile se‐
64 quentially with the different values of jobs ('-j'). Default: 2
65
66 -j, --jobs int(,int, int, ...)
67 Number of jobs to use when compiling a kernel('make -j #').
68
69 This option can be given multiple times (-j 2 -j 4 -j 8) or
70 'int' can be a list (-j "2 4 8"). The default depends on the
71 number of cores in the system and if it's processor uses SMT.
72 Run '--help' to query the default on the particular machine.
73
74 Important note: kcbench on machines with SMTs will do runs which
75 do not utilize all available CPU cores; this might look odd, but
76 there are reasons for it explained in the section "ON THE DE‐
77 FAULT NUMBER OF JOBS" below.
78
79 -m, --modconfig
80 Instead of using a config generated with 'defconfig' use one
81 built by 'allmodconfig' and compile modules as well. Takes a
82 lot longer to compile, which is more suitable for machines with
83 a lot of fast CPU cores.
84
85 -o, --outputdir dir
86 Use path to compile Linux. Passes 'O=dir/kcbench-worker/' to
87 make when calling it to compile a kernel; use a temporary direc‐
88 tory if not given.
89
90 -s, --src path|version
91 Look for sources in path, ~/.cache/kcbench/linux-version or
92 /usr/share/kcbench/linux-version. If not found try to download
93 version automatically unless '--no-download' was specified.
94
95 -v, --verbose
96 Increase verboselevel; option can be given multiple times.
97
98 -V, --version
99 Output program version.
100
101 --cc exec
102 Use exec as target compiler.
103
104 --cross-compile arch
105 EXPERIMENTAL: Cross compile the Linux kernel. Cross compilers
106 for this task are packaged in some Linux distribution. There
107 are also for pre-compiled compilers available on the internet,
108 for example here: https://mirrors.edge.ker‐
109 nel.org/pub/tools/crosstool/
110
111 Values of arch that kcbench/kcbenchrate understand: arm arm64
112 aarch64 riscv riscv64 powerpc powerpc64 x86_64
113
114 Building for archs not directly supported by kcbench/kcbenchrate
115 should work, too, if you export ARCH= and CROSS_COMPILE= just
116 like you would when normally cross compiling a Linux kernel. Do
117 not use '--cross-compile' in that case and keep in mind that
118 kcbench/kcbenchrate configure the compiled Linux kernel with the
119 make target 'defconfig' (or 'allmodconfig', if you specify
120 '-m'), which might be unusual for the arch in question, but nor‐
121 mally should be good enough for benchmarking.
122
123 Be aware there is a bigger risk running into compile errors (see
124 below) when cross compiling.
125
126 --crosscomp-scheme scheme
127 On Linux distributions that are known to ship cross compilers
128 kcbench/kcbenchrate will assume you want to use those. This pa‐
129 rameter allows to use a one of the various different scheme in
130 cases this automatic detection fails or work you want
131 kcbench/kcbenchrate to find them using a 'generic' scheme that
132 should work with compilers from various sources, which is the
133 default on unknown distributions.
134
135 Valid values of scheme: debian fedora generic redhat ubuntu
136
137 --hostcc exec
138 Use exec as host compiler.
139
140 --infinite
141 Run endlessly to create system load.
142
143 --llvm Set LLVM=1 to use clang as compiler and LLVM utilities as GNU
144 binutils substitute.
145
146 --add-make-args string
147 Pass additional flags found in string to make when creating the
148 config or building the kernel. This option is meant for experts
149 that want to try unusual things, like specifing a special linker
150 (--add-make-args 'LD=ld.lld'). Use with caution!
151
152 --no-download
153 Never download Linux kernel sources from the web automatically.
154
155 --savefailedlogs path
156 Save log of failed compile runs to path.
157
159 The optimal number of compile jobs (-j) to get the best result depends
160 on the machine being benched. On most systems you will achieve the
161 best result if the number of jobs matches the number of CPU cores.
162 That for example is the case on this 4 core Intel processor without
163 SMT:
164
165 [cttest@localhost ~]$ bash kcbench -s 5.3 -n 1
166 Processor: Intel(R) Core(TM) i5-4570 CPU @ 3.20GHz [4 CPUs]
167 Cpufreq; Memory: Unknown; MByte RAM
168 Compiler used: gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1)
169 Linux compiled: 5.3.0 [/home/cttest/.cache/kcbench/linux-5.3/]
170 Config; Environment: defconfig; CCACHE_DISABLE="1"
171 Build command: make vmlinux
172 Run 1 (-j 4): 250.03 seconds / 14.40 kernels/hour
173 Run 2 (-j 6): 255.88 seconds / 14.07 kernels/hour
174
175 The run with 6 jobs was slower here. Trying a setting like that by de‐
176 fault might look like a waste of time on this machine, but other ma‐
177 chines deliver the best result when they are oversubscribed a little.
178 That's for example the case on this 6 core/12 threads processor, that
179 achieved its best result with 15 jobs:
180
181 [cttest@localhost ~]$ bash kcbench -s 5.3 -n 1
182 Processor: Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz [12 CPUs]
183 Cpufreq; Memory: Unknown; 15934 MByte RAM
184 Linux running: 5.6.0-0.rc2.git0.1.vanilla.knurd.2.fc31.x86_64
185 Compiler used: gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1)
186 Linux compiled: 5.3.0 [/home/cttest/.cache/kcbench/linux-5.3/]
187 Config; Environment: defconfig; CCACHE_DISABLE="1"
188 Build command: make vmlinux
189 Run 1 (-j 12): 92.55 seconds / 38.90 kernels/hour
190 Run 2 (-j 15): 91.91 seconds / 39.17 kernels/hour
191 Run 3 (-j 6): 113.66 seconds / 31.67 kernels/hour
192 Run 4 (-j 9): 101.32 seconds / 35.53 kernels/hour
193
194 Here the attempts that tried to utilize only the real cores (-j 6) and
195 oversubscribe them a little (-j 9) look liked a waste of time. But on
196 some machines with SMT capable processors those will deliver the best
197 results, like on this AMD Threadripper processor with 64 core/128
198 threads:
199
200 $ kcbench
201 [cttest@localhost ~]$ bash kcbench -s 5.3 -n 1
202 Processor: AMD Ryzen Threadripper 3990X 64-Core Processor [128 CPUs]
203 Cpufreq; Memory: Unknown; 15934 MByte RAM
204 Linux running: 5.6.0-0.rc2.git0.1.vanilla.knurd.2.fc31.x86_64
205 Compiler used: gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1)
206 Linux compiled: 5.3.0 [/home/cttest/.cache/kcbench/linux-5.3/]
207 Config; Environment: defconfig; CCACHE_DISABLE="1"
208 Build command: make vmlinux
209 Run 1 (-j 128): 26.16 seconds / 137.61 kernels/hour
210 Run 2 (-j 136): 26.19 seconds / 137.46 kernels/hour
211 Run 3 (-j 64): 21.45 seconds / 167.83 kernels/hour
212 Run 4 (-j 72): 22.68 seconds / 158.73 kernels/hour
213
214 This is even more visible when compiling an allmodconfig configuration:
215
216 [cttest@localhost ~]$ bash kcbench -s 5.3 -n 1 -m
217 Processor: AMD Ryzen Threadripper 3990X 64-Core Processor [128 CPUs]
218 Cpufreq; Memory: Unknown; 63736 MByte RAM
219 Linux running: 5.6.0-0.rc2.git0.1.vanilla.knurd.2.fc31.x86_64
220 Compiler used: gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1)
221 Linux compiled: 5.3.0 [/home/cttest/.cache/kcbench/linux-5.3/]
222 Config; Environment: defconfig; CCACHE_DISABLE="1"
223 Build command: make vmlinux
224 Run 1 (-j 128): 260.43 seconds / 13.82 kernels/hour
225 Run 2 (-j 136): 262.67 seconds / 13.71 kernels/hour
226 Run 3 (-j 64): 215.54 seconds / 16.70 kernels/hour
227 Run 4 (-j 72): 215.97 seconds / 16.67 kernels/hour
228
229 This can happen if the SMT implementation is bad or something else is
230 the bottleneck. A few tests on above machine indicated the memory in‐
231 terface was the limiting factor. A AMD Epyc from the same processor
232 generation did not show this effect and delivered its best results when
233 the number of jobs matched the number of CPUs:
234
235 [cttest@localhost ~]$ bash kcbench -s 5.3 -n 1 -m
236 Processor: AMD EPYC 7742 64-Core Processor [256 CPUs]
237 Cpufreq; Memory: Unknown; 63736 MByte RAM
238 Linux running: 5.6.0-0.rc2.git0.1.vanilla.knurd.2.fc31.x86_64
239 Compiler used: gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1)
240 Linux compiled: 5.3.0 [/home/cttest/.cache/kcbench/linux-5.3/]
241 Config; Environment: defconfig; CCACHE_DISABLE="1"
242 Build command: make vmlinux
243 Run 1 (-j 256): 128.24 seconds / 28.07 kernels/hour
244 Run 2 (-j 268): 128.87 seconds / 27.94 kernels/hour
245 Run 3 (-j 128): 141.83 seconds / 25.38 kernels/hour
246 Run 4 (-j 140): 137.46 seconds / 26.19 kernels/hour
247
248 This table will tell you now many jobs kcbench will use by default:
249
250 # Cores: Default # of jobs
251 # 1 CPU: 1 2
252 # 2 CPUs ( no SMT ): 2 3
253 # 2 CPUs (2 threads/core): 2 3 1
254 # 4 CPUs ( no SMT ): 4 6
255 # 4 CPUs (2 threads/core): 4 6 2
256 # 6 CPUs ( no SMT ): 6 9
257 # 6 CPUs (2 threads/core): 6 9 3
258 # 8 CPUs ( no SMT ): 8 11
259 # 8 CPUs (2 threads/core): 8 11 4 6
260 # 12 CPUs ( no SMT ): 12 16
261 # 12 CPUs (2 threads/core): 12 16 6 9
262 # 16 CPUs ( no SMT ): 16 20
263 # 16 CPUs (2 threads/core): 16 20 8 11
264 # 20 CPUs ( no SMT ): 20 25
265 # 20 CPUs (2 threads/core): 20 25 10 14
266 # 24 CPUs ( no SMT ): 24 29
267 # 24 CPUs (2 threads/core): 24 29 12 16
268 # 28 CPUs ( no SMT ): 28 34
269 # 28 CPUs (2 threads/core): 28 34 14 18
270 # 32 CPUs ( no SMT ): 32 38
271 # 32 CPUs (2 threads/core): 32 38 16 20
272 # 32 CPUs (4 threads/core): 32 38 8 11
273 # 48 CPUs ( no SMT ): 48 55
274 # 48 CPUs (2 threads/core): 48 55 24 29
275 # 48 CPUs (4 threads/core): 48 55 12 16
276 # 64 CPUs ( no SMT ): 64 72
277 # 64 CPUs (2 threads/core): 64 72 32 38
278 # 64 CPUs (4 threads/core): 64 72 16 20
279 # 128 CPUs ( no SMT ): 128 140
280 # 128 CPUs (2 threads/core): 128 140 64 72
281 # 128 CPUs (4 threads/core): 128 140 32 38
282 # 256 CPUs ( no SMT ): 256 272
283 # 256 CPUs (2 threads/core): 256 272 128 140
284 # 256 CPUs (4 threads/core): 256 272 64 72
285
287 As long as you are using a settled GCC version to natively compile the
288 source of a current Linux kernel for popular architectures like ARM,
289 ARM64/Aarch64, or x86_64 the compilation is unlikely to fail. For oth‐
290 er cases there is a bigger risk that compilation will fail due to fac‐
291 tors outside of what kcbench/kcbenchrate control. They nevertheless
292 try to catch a few common problems and warn, but they can not catch
293 them all, as there are to many factors involved:
294
295 · Brand new compiler generations are sometimes stricter than their pre‐
296 decessors and thus might fail to compile even the latest Linux kernel
297 version. You might need to use a pre-release version of the next
298 Linux kernel release to make it work or simply need to wait until the
299 compiler or kernel developers solve the problem.
300
301 · Distributions enable different compiler features that might have an
302 impact on the kernel compilation. For example gcc9 was capable of
303 compiling Linux 4.19 on many distributions, but started to fail on
304 Ubuntu 19.10 due to a feature that got enabled in its GCC. Compile a
305 newer Linux kernel version in this case.
306
307 · Cross compilation increases the risk of running into compile problems
308 in general, as there are many compilers and architectures our there.
309 That for example is why compiling the Linux kernel for an unpopular
310 architecture is more likely to fail due to bugs in the compiler or
311 the Linux kernel sources that nobody had noticed before when the com‐
312 piler or kernel was released. This is even more likely to happen if
313 you start kcbench/kcbenchrate with '-m/--allmodconfig' to build a
314 more complex kernel.
315
317 Running benchmarks is very tricky. Here are a few of the aspects you
318 should keep mind when doing so:
319
320 · kcbench/kcbenchrate by default set CCACHE_DISABLE=1 when calling
321 'make' to avoid interference from ccache.
322
323 · Do not compare results from two different archs (like ARM64 and
324 x86_64); kcbench/kcbenchrate compile different code in that case, as
325 they will compile a native kernel on each of those archs. This can
326 be avoided by cross compiling for a third arch that is not related to
327 any of the archs compared (say RISC-V when comparing ARM64 and
328 x86_64).
329
330 · Unless you want to bench compilers do not compare results from dif‐
331 ferent compiler generations, as they will apply different optimiza‐
332 tions techniques. For example to not compare results from GCC7 and
333 GCC9, as the later optimizes harder and thus will take more time gen‐
334 erating the code. That's also why the Linux version compiled by de‐
335 fault depends on the machines compiler: you sometimes can't compile
336 older kernels with the latest compilers anyway, as new compiler gen‐
337 eration often uncover bugs in the Linux kernel source that need get
338 fixed for to make compiling succeed. For example, when GCC10 was
339 close to release it was incapable of compile the then latest Linux
340 version 5.5 in an allmodconfig configuration due to a bug in the Lin‐
341 ux kernel sources.
342
343 · Compiling a Linux kernel scales very well and thus can utilize pro‐
344 cessors quite well. But be aware that some parts of the Linux com‐
345 pile process will only use one thread (and thus one CPU core), for
346 example when linking vmlinuz; the other cores will idle meanwhile.
347 The effect on the result will grow with the number of CPU cores.
348
349 If you want to work against that consider using '-m' to build an
350 allmodconfig configuration with modules; comping a newer, more complex
351 Linux kernel version can also help. But the best way to avoid this ef‐
352 fect is by running kcbenchrate.
353
355 To let kcbench decide everything automatically simply run:
356 $ kcbench
357
358 On a four core processor without SMT kcbench by default will compile 2
359 kernels with 4 jobs and 2 with 6 jobs .
360
361 : $ kcbench -s 5.4 --iterations 3 --jobs 2 --jobs 4
362
363 This will compile Linux 5.4 first 3 times with 2 jobs and then as often
364 with 4 jobs.
365
367 By default, the lines you are looking for look like this:
368
369 Run 1 (-j 4): 230.30 sec / 15.63 kernels/hour [P:389%, 24 maj. pagefaults]
370
371 Here it took 230.30 seconds to compile the Linux kernel image. With a
372 speed like this the machine can compile 15.63 kernels per hour
373 (3600/230.30). The results from this 4 core machine also show the CPU
374 usage (P) was 389 percent; 24 major page faults occurred during this
375 run – this number should be small, as processing them takes some time
376 and thus slows down the build. This information is omitted, if less
377 than 20 major page faults happen. For details how the CPU usage is
378 calculated and major page faults are detected see the man page for GNU
379 'time', which kcbench/kcbenchrate rely on for their measurements.
380
381 When running with "-d|--detailedresults" you'll get more detailed re‐
382 sult:
383
384 Run 1 (-j 4): 230.30 sec / 15.63 kernels/hour [P:389%]
385 Elapsed Time(E): 2:30.10 (150.10 seconds)
386 Kernel time (S): 36.38 seconds
387 User time (U): 259.51 seconds
388 CPU usage (P): 197%
389 Major page faults (F): 0
390 Minor page faults (R): 9441809
391 Context switches involuntarily (c): 69031
392 Context switches voluntarily (w): 46955
393
395 · some math to detect the fastest setting and do one more run with this
396 it before sanity checking the result and printing the best one.
397
399 kcbenchrate(1), time(1)
400
402 Thorsten Leemhuis <linux [AT] leemhuis [DOT] info>
403
404
405
406Version 0.9 kcbench(1)