1kcbench(1)                       User Commands                      kcbench(1)
2
3
4

NAME

6       kcbench - Linux kernel compile benchmark, speed edition
7

SYNOPSIS

9       kcbench [options]
10

DESCRIPTION

12       Kcbench  tries  to  compile  a Linux kernel really quickly which can be
13       used to test a system's performance or stability.
14
15       Note: The number of compile jobs ('-j') that delivers the  best  result
16       depends  on the machine being benched.  See the section "ON THE DEFAULT
17       NUMBER OF JOBS" below for details.
18
19       To get comparable results from different machines you need to  use  the
20       exact same operating system on all of them.  There are multiple reasons
21       for this recommendation, but one of the main reasons is: the Linux ver‐
22       sion  this  benchmark  downloads  and compiles depends on the operating
23       system's default compiler.
24
25       If you choose to ignore this recommendation at least make sure to  hard
26       code  the Linux version to compile ('-s 5.4'), as for example compiling
27       5.7 will take longer than 5.4 or 4.19 and thus lead to results one can‐
28       not compare.  Also, make sure the compiler used on the systems you want
29       to compare is from similar, as for example gcc10 will try harder to op‐
30       timize the code than gcc8 or gcc9 and thus take more time for its work.
31
32       Kcbench  is  accompanied  by  kcbenchrate.  Both are quite similar, but
33       work slightly different:
34
35       • kcbench tries to build one kernel as fast as possible.  This approach
36         is called 'speed run' and let's make start multiple compilers jobs in
37         parallel by using 'make -j #'.  That way kcbench will use  a  lot  of
38         CPU  cores most of the time, except during those few phases where the
39         Linux kernel build process is singled threaded and thus utilizes just
40         one CPU core.  That for example is the case when vmlinux is linked.
41
42       • kcbenchrate  tries  to keep all CPU cores busy constantly by starting
43         workers on all of them, which each builds one kernel  with  just  one
44         job  ('make  -j 1').  This approach is called 'rate run'.  It takes a
45         lot longer to generate a result than kcbench; it  also  needs  a  lot
46         more  storage  space, but will utilize the machine and its processors
47         better.
48
49   Options
50       -b, --bypass
51              Omit the initial kernel compile to fill caches; saves time,  but
52              first result might be slightly lower than the following ones.
53
54       -d, --detailedresults
55              Print more detailed results.
56
57       -h, --help
58              Show usage.
59
60       -i, --iterations int
61              Determines  the  number of kernels that kcbench will compile se‐
62              quentially with different values of jobs ('-j').  Default: 2
63
64       -j, --jobs int(,int, int, ...)
65              Number of jobs to use when compiling a kernel('make -j #').
66
67              This option can be given multiple times (-j 2  -j  4  -j  8)  or
68              'int'  can  be  a list (-j "2 4 8").  The default depends on the
69              number of cores in the system and if  its  processor  uses  SMT.
70              Run '--help' to query the default on the particular machine.
71
72              Important note: kcbench on machines with SMTs will do runs which
73              do not utilize all available CPU cores; this might look odd, but
74              there  are reasons for this behaviour.  See "ON THE DEFAULT NUM‐
75              BER OF JOBS" below for details.
76
77       -m, --modconfig
78              Instead of using a config generated  with  'defconfig'  use  one
79              built  by  'allmodconfig'  and compile modules as well.  Takes a
80              lot longer to compile, which is more suitable for machines  with
81              a lot of fast CPU cores.
82
83       -o, --outputdir dir
84              Use  path  to  compile Linux.  Passes 'O=dir/kcbench-worker/' to
85              make when calling it to compile a kernel; use a temporary direc‐
86              tory if not given.
87
88       -s, --src path|version
89              Look  for  sources  in  path,  ~/.cache/kcbench/linux-version or
90              /usr/share/kcbench/linux-version.  If not found try to  download
91              version automatically unless '--no-download' was specified.
92
93       -v, --verbose
94              Increase verboselevel; option can be given multiple times.
95
96       -V, --version
97              Output program version.
98
99       --cc exec
100              Use exec as target compiler.
101
102       --cross-compile arch
103              EXPERIMENTAL:  Cross  compile the Linux kernel.  Cross compilers
104              for this task are packaged in some  Linux  distribution.   There
105              are  also  pre-compiled compilers available on the internet, for
106              example             here:              https://mirrors.edge.ker
107              nel.org/pub/tools/crosstool/
108
109              Values  of  arch  that kcbench/kcbenchrate understand: arm arm64
110              aarch64 riscv riscv64 powerpc powerpc64 x86_64
111
112              Building for archs not directly supported by kcbench/kcbenchrate
113              should work, too: just export ARCH= and CROSS_COMPILE= just like
114              you would when normally cross compiling a Linux kernel.  Do  not
115              use  '--cross-compile'  in  that  case  and  keep  in  mind that
116              kcbench/kcbenchrate configure the compiled Linux kernel with the
117              make  target  'defconfig'  (or  'allmodconfig',  if  you specify
118              '-m'), which might be unusual for  the  arch  in  question,  but
119              might be good enough for benchmarking purposes.
120
121              Be aware there is a bigger risk running into compile errors (see
122              below) when cross compiling.
123
124       --crosscomp-scheme scheme
125              On Linux distributions that are known to  ship  cross  compilers
126              kcbench/  kcbenchrate  will  assume you want to use those.  This
127              parameter allows to specify one of the various different  naming
128              schemes in cases this automatic detection fails or work you want
129              kcbench/kcbenchrate to find them using a 'generic'  scheme  that
130              should  work  with  compilers from various sources, which is the
131              default on unknown distributions.
132
133              Valid values of scheme: debian fedora generic redhat ubuntu
134
135       --hostcc exec
136              Use exec as host compiler.
137
138       --infinite
139              Run endlessly to create system load.
140
141       --llvm Set LLVM=1 to use clang as compiler and LLVM  utilities  as  GNU
142              binutils substitute.
143
144       --add-make-args string
145              Pass  additional flags found in string to make when creating the
146              config or building the kernel.  This option is meant for experts
147              that want to try unusual things, like specifying a special link‐
148              er (--add-make-args 'LD=ld.lld').
149
150              Use with caution!
151
152       --no-download
153              Never download Linux kernel sources from the web automatically.
154
155       --savefailedlogs path
156              Save log of failed compile runs to path.
157

ON THE DEFAULT NUMBER OF JOBS

159       The optimal number of compile jobs (-j) to get the best result  depends
160       on  the  machine  being  benched.  On most systems you will achieve the
161       best result if the number of jobs matches  the  number  of  CPU  cores.
162       That  for  example  is  the case on this 4 core Intel processor without
163       SMT:
164
165              [cttest@localhost ~]$ bash kcbench -s 5.3 -n 1
166              Processor:            Intel(R) Core(TM) i5-4570 CPU @ 3.20GHz [4 CPUs]
167              Cpufreq; Memory:      Unknown; 15934 MByte RAM
168              Compiler used:        gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1)
169              Linux compiled:       5.3.0 [/home/cttest/.cache/kcbench/linux-5.3/]
170              Config; Environment:  defconfig; CCACHE_DISABLE="1"
171              Build command:        make vmlinux
172              Run 1 (-j 4):         250.03 seconds / 14.40 kernels/hour
173              Run 2 (-j 6):         255.88 seconds / 14.07 kernels/hour
174
175       The run with 6 jobs was slower here.  Trying a setting like that by de‐
176       fault  looks  like  a waste of time on this machine, but other machines
177       deliver the best result when they are oversubscribed a little.   That's
178       for  example  the  case  on  this  6  core/12  threads processor, which
179       achieved its best result with 15 jobs:
180
181              [cttest@localhost ~]$ bash kcbench -s 5.3 -n 1
182              Processor:            Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz [12 CPUs]
183              Cpufreq; Memory:      Unknown; 15934 MByte RAM
184              Linux running:        5.6.0-0.rc2.git0.1.vanilla.knurd.2.fc31.x86_64
185              Compiler used:        gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1)
186              Linux compiled:       5.3.0 [/home/cttest/.cache/kcbench/linux-5.3/]
187              Config; Environment:  defconfig; CCACHE_DISABLE="1"
188              Build command:        make vmlinux
189              Run 1 (-j 12):        92.55 seconds / 38.90 kernels/hour
190              Run 2 (-j 15):        91.91 seconds / 39.17 kernels/hour
191              Run 3 (-j 6):         113.66 seconds / 31.67 kernels/hour
192              Run 4 (-j 9):         101.32 seconds / 35.53 kernels/hour
193
194       You'll notice attempts that tried to utilize only the real cores (-j 6)
195       and  oversubscribe  them  a  little (-j 9), which look liked a waste of
196       time.  But on some machines with SMT capable processors those will  de‐
197       liver the best results, like on this AMD Threadripper processor with 64
198       core/128 threads:
199
200              $ kcbench
201              [cttest@localhost ~]$ bash kcbench -s 5.3 -n 1
202              Processor:            AMD Ryzen Threadripper 3990X 64-Core Processor [128 CPUs]
203              Cpufreq; Memory:      Unknown; 15934 MByte RAM
204              Linux running:        5.6.0-0.rc2.git0.1.vanilla.knurd.2.fc31.x86_64
205              Compiler used:        gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1)
206              Linux compiled:       5.3.0 [/home/cttest/.cache/kcbench/linux-5.3/]
207              Config; Environment:  defconfig; CCACHE_DISABLE="1"
208              Build command:        make vmlinux
209              Run 1 (-j 128):       26.16 seconds / 137.61 kernels/hour
210              Run 2 (-j 136):       26.19 seconds / 137.46 kernels/hour
211              Run 3 (-j 64):        21.45 seconds / 167.83 kernels/hour
212              Run 4 (-j 72):        22.68 seconds / 158.73 kernels/hour
213
214       This is even more visible when compiling an allmodconfig configuration:
215
216              [cttest@localhost ~]$ bash kcbench -s 5.3 -n 1 -m
217              Processor:            AMD Ryzen Threadripper 3990X 64-Core Processor [128 CPUs]
218              Cpufreq; Memory:      Unknown; 63736 MByte RAM
219              Linux running:        5.6.0-0.rc2.git0.1.vanilla.knurd.2.fc31.x86_64
220              Compiler used:        gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1)
221              Linux compiled:       5.3.0 [/home/cttest/.cache/kcbench/linux-5.3/]
222              Config; Environment:  defconfig; CCACHE_DISABLE="1"
223              Build command:        make vmlinux
224              Run 1 (-j 128):       260.43 seconds / 13.82 kernels/hour
225              Run 2 (-j 136):       262.67 seconds / 13.71 kernels/hour
226              Run 3 (-j 64):        215.54 seconds / 16.70 kernels/hour
227              Run 4 (-j 72):        215.97 seconds / 16.67 kernels/hour
228
229       This can happen if the SMT implementation  is  bad  or  something  else
230       (memory,  storage, ...) becomes a bottleneck.  A few tests on above ma‐
231       chine indicated the memory interface was the limiting  factor.   A  AMD
232       Epyc  from  the  same processor generation did not show this effect and
233       delivered its best results when the number of jobs matched  the  number
234       of CPUs:
235
236              [cttest@localhost ~]$ bash kcbench -s 5.3 -n 1 -m
237              Processor:            AMD EPYC 7742 64-Core Processor [256 CPUs]
238              Cpufreq; Memory:      Unknown; 63736 MByte RAM
239              Linux running:        5.6.0-0.rc2.git0.1.vanilla.knurd.2.fc31.x86_64
240              Compiler used:        gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1)
241              Linux compiled:       5.3.0 [/home/cttest/.cache/kcbench/linux-5.3/]
242              Config; Environment:  defconfig; CCACHE_DISABLE="1"
243              Build command:        make vmlinux
244              Run 1 (-j 256):       128.24 seconds / 28.07 kernels/hour
245              Run 2 (-j 268):       128.87 seconds / 27.94 kernels/hour
246              Run 3 (-j 128):       141.83 seconds / 25.38 kernels/hour
247              Run 4 (-j 140):       137.46 seconds / 26.19 kernels/hour
248
249       This table will tell you now many jobs kcbench will use by default:
250
251               #                             Cores: Default # of jobs
252               #                             1 CPU: 1 2
253               #           2 CPUs (    no SMT    ): 2 3
254               #           2 CPUs (2 threads/core): 2 3 1
255               #           4 CPUs (    no SMT    ): 4 6
256               #           4 CPUs (2 threads/core): 4 6 2
257               #           6 CPUs (    no SMT    ): 6 9
258               #           6 CPUs (2 threads/core): 6 9 3
259               #           8 CPUs (    no SMT    ): 8 11
260               #           8 CPUs (2 threads/core): 8 11 4 6
261               #          12 CPUs (    no SMT    ): 12 16
262               #          12 CPUs (2 threads/core): 12 16 6 9
263               #          16 CPUs (    no SMT    ): 16 20
264               #          16 CPUs (2 threads/core): 16 20 8 11
265               #          20 CPUs (    no SMT    ): 20 25
266               #          20 CPUs (2 threads/core): 20 25 10 14
267               #          24 CPUs (    no SMT    ): 24 29
268               #          24 CPUs (2 threads/core): 24 29 12 16
269               #          28 CPUs (    no SMT    ): 28 34
270               #          28 CPUs (2 threads/core): 28 34 14 18
271               #          32 CPUs (    no SMT    ): 32 38
272               #          32 CPUs (2 threads/core): 32 38 16 20
273               #          32 CPUs (4 threads/core): 32 38 8 11
274               #          48 CPUs (    no SMT    ): 48 55
275               #          48 CPUs (2 threads/core): 48 55 24 29
276               #          48 CPUs (4 threads/core): 48 55 12 16
277               #          64 CPUs (    no SMT    ): 64 72
278               #          64 CPUs (2 threads/core): 64 72 32 38
279               #          64 CPUs (4 threads/core): 64 72 16 20
280               #         128 CPUs (    no SMT    ): 128 140
281               #         128 CPUs (2 threads/core): 128 140 64 72
282               #         128 CPUs (4 threads/core): 128 140 32 38
283               #         256 CPUs (    no SMT    ): 256 272
284               #         256 CPUs (2 threads/core): 256 272 128 140
285               #         256 CPUs (4 threads/core): 256 272 64 72
286

ON FAILED RUNS DUE TO COMPILATION ERRORS

288       The compilation is unlikely to fail, as long as you are using a settled
289       GCC version to natively compile the source of a  current  Linux  kernel
290       for popular architectures like ARM, ARM64/Aarch64, or x86_64.  For oth‐
291       er cases there is a bigger risk that compilation will fail due to  fac‐
292       tors  outside  of  what kcbench/kcbenchrate control.  They nevertheless
293       try to catch a few common problems and warn, but  they  can  not  catch
294       them all, as there are to many factors involved:
295
296       • Brand new compiler generations are sometimes stricter than their pre‐
297         decessors and thus might fail to compile even the latest Linux kernel
298         version.   You  might  need  to use a pre-release version of the next
299         Linux kernel release to make it work or simply need to wait until the
300         compiler or kernel developers solve the problem.
301
302       • Distributions  enable  different compiler features that might have an
303         impact on the kernel compilation.  For example gcc9  was  capable  of
304         compiling  Linux  4.19  on many distributions, but started to fail on
305         Ubuntu 19.10 due to a feature that got enabled in its GCC.  Try  com‐
306         piling a newer Linux kernel version in this case.
307
308       • Cross compilation increases the risk of running into compile problems
309         in general, as there are many compilers and architectures our  there.
310         That  for  example is why compiling the Linux kernel for an unpopular
311         architecture is more likely to fail due to bugs in  the  compiler  or
312         the Linux kernel sources that nobody had noticed before when the com‐
313         piler or kernel was released.  This is even more likely to happen  if
314         you  start  kcbench/kcbenchrate  with  '-m/--allmodconfig' to build a
315         more complex kernel.
316

HINTS

318       Running benchmarks is very tricky.  Here are a few of the  aspects  you
319       should keep mind when doing so:
320
321       • Do  not  compare  results  from  two  different archs (like ARM64 and
322         x86_64); kcbench/kcbenchrate compile different code in that case,  as
323         they  will  compile a native kernel on each of those archs.  This can
324         be avoided by cross compiling for a third arch that is not related to
325         any  of  the  archs  compared  (say  RISC-V  when comparing ARM64 and
326         x86_64).
327
328       • Unless you want to bench compilers do not compare results  from  dif‐
329         ferent  compiler  generations, as they will apply different optimiza‐
330         tions techniques.  For example to not compare results from  GCC7  and
331         GCC9, as the later optimizes harder and thus will take more time gen‐
332         erating the code.  That's also why the Linux version compiled by  de‐
333         fault  depends on the machine's compiler: you sometimes can't compile
334         older kernels with the latest compilers anyway, as new compiler  gen‐
335         erations  often uncover bugs in the Linux kernel source that need get
336         fixed for compiling to succeed.  For example, when GCC10 was close to
337         release it was incapable of compile the then latest Linux version 5.5
338         in an allmodconfig configuration due to a bug  in  the  Linux  kernel
339         sources.
340
341       • Compiling  a  Linux kernel scales very well and thus can utilize pro‐
342         cessors quite well.  But be aware that some parts of the  Linux  com‐
343         pile  process  will  only use one thread (and thus one CPU core), for
344         example when linking vmlinuz; the other cores  will  idle  meanwhile.
345         The effect on the result will grow with the number of CPU cores.
346
347       If  you  want  to  work  against  that  consider using '-m' to build an
348       allmodconfig configuration with modules; comping a newer, more  complex
349       Linux kernel version can also help.  But the best way to avoid this ef‐
350       fect is by running kcbenchrate.
351
352       • kcbench/kcbenchrate by  default  set  CCACHE_DISABLE=1  when  calling
353         'make' to avoid interference from ccache.
354

EXAMPLES

356       To let kcbench decide everything automatically simply run:
357              $ kcbench
358
359       On  a four core processor without SMT kcbench by default will compile 2
360       kernels with 4 jobs and 2 with 6 jobs.  You can specify a setting  like
361       this manually: .
362
363       : $ kcbench -s 5.4 --iterations 3 --jobs 2 --jobs 4
364
365       This will compile Linux 5.4 first 3 times with 2 jobs and then as often
366       with 4 jobs.
367

RESULTS

369       By default, the lines you are looking for look like this:
370
371              Run 1 (-j 4): 230.30 sec / 15.63 kernels/hour [P:389%, 24 maj. pagefaults]
372
373       Here it took 230.30 seconds to compile the Linux kernel image.  With  a
374       speed  like  this  the  machine  can  compile  15.63  kernels  per hour
375       (60*60/230.30).  The results from this 4 core machine also show the CPU
376       usage  (P)  was  389 percent; 24 major page faults occurred during this
377       run – this number should be small, as processing them takes  some  time
378       and  thus  slows  down the build.  This information is omitted, if less
379       than 20 major page faults happen.  For details how  the  CPU  usage  is
380       calculated  and major page faults are detected see the man page for GNU
381       'time', which kcbench/kcbenchrate rely on for their measurements.
382
383       When running with "-d|--detailedresults" you'll get more  detailed  re‐
384       sult:
385
386              Run 1 (-j 4): 230.30 sec / 15.63 kernels/hour [P:389%]
387              Elapsed Time(E): 2:30.10 (150.10 seconds)
388              Kernel time (S): 36.38 seconds
389              User time (U): 259.51 seconds
390              CPU usage (P): 197%
391              Major page faults (F): 0
392              Minor page faults (R): 9441809
393              Context switches involuntarily (c): 69031
394              Context switches voluntarily (w): 46955
395

MISSING FEATURES

397       • some  math  to detect the fastest setting and do one more run with it
398         before sanity checking the result and printing the best one,  includ‐
399         ing standard deviation.
400

SEE ALSO

402       kcbenchrate(1), time(1)
403

AUTHOR

405       Thorsten Leemhuis <linux [AT] leemhuis [DOT] info>
406
407
408
409Version 0.9                                                         kcbench(1)
Impressum