1kcbench(1)                       User Commands                      kcbench(1)
2
3
4

NAME

6       kcbench - Linux kernel compile benchmark, speed edition
7

SYNOPSIS

9       kcbench [options]
10

DESCRIPTION

12       Kcbench  tries  to compile a Linux kernel really quickly to benchmark a
13       system or test its stability.  It will measure the  time  it  takes  to
14       compile a kernel and print the result after each attempt.
15
16       Kcbench  is  accompanied  by  kcbenchrate.  Both are quite similar, but
17       work slightly different:
18
19       · kcbench tries to build one kernel really quickly.  This  approach  is
20         also  called  'speed run' and lets make start multiple compilers jobs
21         in parallel by using 'make -j #'.  That way kcbench most of the  time
22         will  use all the CPU cores, except during those few phases where the
23         Linux kernel build process is singled threaded and thus uses just one
24         CPU core.  That for example is the case when vmlinux is linked.
25
26       · kcbenchrate  launches  a worker on every CPU that complies one kernel
27         with just one job.  This approach will utilize the  machine  and  its
28         processors  a  little better than the speed approach used by kcbench,
29         as it should keep the CPU cores busy all the time, even during phases
30         where  the  Linux  kernel build process uses just one CPU core.  Note
31         that this approach takes a lot longer to generate a results and needs
32         a lot more storage space.
33
34       NOTE:  The optimal number of compile jobs ('-j') that delivers the best
35       result depends on the machine being benched.  See the section  "ON  THE
36       DEFAULT NUMBER OF JOBS" below for details.
37
38       IMPORTANT: To get comparable results from different machines you should
39       use the exact same operating system on all of them.  There are multiple
40       reasons  for  this  recommendation, but one of the main reasons is: the
41       Linux source code automatically downloaded to compile  depends  on  the
42       compiler being used.
43
44       If  you choose to ignore this recommendation at least make sure to hard
45       code the Linux version to compile ('-s 5.4'), as for example  compiling
46       5.7 will take longer than 5.4 or 4.19 and thus lead to results one can‐
47       not compare.  Also, make sure the compiler used on the systems you want
48       to compare is from similar, as for example gcc10 will try harder to op‐
49       timize the code than gcc8 or gcc9 and thus take more time.
50
51   Options
52       -b, --bypass
53              Omit the initial kernel compile to fill caches; saves time,  but
54              first result might be slightly lower than the following ones.
55
56       -d, --detailedresults
57              Print more detailed results.
58
59       -h, --help
60              Show usage.
61
62       -i, --iterations int
63              Determines  the  number of kernels that kcbench will compile se‐
64              quentially with the different values of jobs ('-j').  Default: 2
65
66       -j, --jobs int(,int, int, ...)
67              Number of jobs to use when compiling a kernel('make -j #').
68
69              This option can be given multiple times (-j 2  -j  4  -j  8)  or
70              'int'  can  be  a list (-j "2 4 8").  The default depends on the
71              number of cores in the system and if it's  processor  uses  SMT.
72              Run '--help' to query the default on the particular machine.
73
74              Important note: kcbench on machines with SMTs will do runs which
75              do not utilize all available CPU cores; this might look odd, but
76              there  are  reasons  for it explained in the section "ON THE DE‐
77              FAULT NUMBER OF JOBS" below.
78
79       -m, --modconfig
80              Instead of using a config generated  with  'defconfig'  use  one
81              built  by  'allmodconfig'  and compile modules as well.  Takes a
82              lot longer to compile, which is more suitable for machines  with
83              a lot of fast CPU cores.
84
85       -o, --outputdir dir
86              Use  path  to  compile Linux.  Passes 'O=dir/kcbench-worker/' to
87              make when calling it to compile a kernel; use a temporary direc‐
88              tory if not given.
89
90       -s, --src path|version
91              Look  for  sources  in  path,  ~/.cache/kcbench/linux-version or
92              /usr/share/kcbench/linux-version.  If not found try to  download
93              version automatically unless '--no-download' was specified.
94
95       -v, --verbose
96              Increase verboselevel; option can be given multiple times.
97
98       -V, --version
99              Output program version.
100
101       --cc exec
102              Use exec as target compiler.
103
104       --cross-compile arch
105              EXPERIMENTAL:  Cross  compile the Linux kernel.  Cross compilers
106              for this task are packaged in some  Linux  distribution.   There
107              are  also  for pre-compiled compilers available on the internet,
108              for        example        here:        https://mirrors.edge.ker
109              nel.org/pub/tools/crosstool/
110
111              Values  of  arch  that kcbench/kcbenchrate understand: arm arm64
112              aarch64 riscv riscv64 powerpc powerpc64 x86_64
113
114              Building for archs not directly supported by kcbench/kcbenchrate
115              should  work,  too,  if you export ARCH= and CROSS_COMPILE= just
116              like you would when normally cross compiling a Linux kernel.  Do
117              not  use  '--cross-compile'  in  that case and keep in mind that
118              kcbench/kcbenchrate configure the compiled Linux kernel with the
119              make  target  'defconfig'  (or  'allmodconfig',  if  you specify
120              '-m'), which might be unusual for the arch in question, but nor‐
121              mally should be good enough for benchmarking.
122
123              Be aware there is a bigger risk running into compile errors (see
124              below) when cross compiling.
125
126       --crosscomp-scheme scheme
127              On Linux distributions that are known to  ship  cross  compilers
128              kcbench/kcbenchrate will assume you want to use those.  This pa‐
129              rameter allows to use a one of the various different  scheme  in
130              cases   this   automatic   detection  fails  or  work  you  want
131              kcbench/kcbenchrate to find them using a 'generic'  scheme  that
132              should  work  with  compilers from various sources, which is the
133              default on unknown distributions.
134
135              Valid values of scheme: debian fedora generic redhat ubuntu
136
137       --hostcc exec
138              Use exec as host compiler.
139
140       --infinite
141              Run endlessly to create system load.
142
143       --llvm Set LLVM=1 to use clang as compiler and LLVM  utilities  as  GNU
144              binutils substitute.
145
146       --add-make-args string
147              Pass  additional flags found in string to make when creating the
148              config or building the kernel.  This option is meant for experts
149              that want to try unusual things, like specifing a special linker
150              (--add-make-args 'LD=ld.lld').  Use with caution!
151
152       --no-download
153              Never download Linux kernel sources from the web automatically.
154
155       --savefailedlogs path
156              Save log of failed compile runs to path.
157

ON THE DEFAULT NUMBER OF JOBS

159       The optimal number of compile jobs (-j) to get the best result  depends
160       on  the  machine  being  benched.  On most systems you will achieve the
161       best result if the number of jobs matches  the  number  of  CPU  cores.
162       That  for  example  is  the case on this 4 core Intel processor without
163       SMT:
164
165              [cttest@localhost ~]$ bash kcbench -s 5.3 -n 1
166              Processor:            Intel(R) Core(TM) i5-4570 CPU @ 3.20GHz [4 CPUs]
167              Cpufreq; Memory:      Unknown; MByte RAM
168              Compiler used:        gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1)
169              Linux compiled:       5.3.0 [/home/cttest/.cache/kcbench/linux-5.3/]
170              Config; Environment:  defconfig; CCACHE_DISABLE="1"
171              Build command:        make vmlinux
172              Run 1 (-j 4):         250.03 seconds / 14.40 kernels/hour
173              Run 2 (-j 6):         255.88 seconds / 14.07 kernels/hour
174
175       The run with 6 jobs was slower here.  Trying a setting like that by de‐
176       fault  might  look  like a waste of time on this machine, but other ma‐
177       chines deliver the best result when they are oversubscribed  a  little.
178       That's  for  example the case on this 6 core/12 threads processor, that
179       achieved its best result with 15 jobs:
180
181              [cttest@localhost ~]$ bash kcbench -s 5.3 -n 1
182              Processor:            Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz [12 CPUs]
183              Cpufreq; Memory:      Unknown; 15934 MByte RAM
184              Linux running:        5.6.0-0.rc2.git0.1.vanilla.knurd.2.fc31.x86_64
185              Compiler used:        gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1)
186              Linux compiled:       5.3.0 [/home/cttest/.cache/kcbench/linux-5.3/]
187              Config; Environment:  defconfig; CCACHE_DISABLE="1"
188              Build command:        make vmlinux
189              Run 1 (-j 12):        92.55 seconds / 38.90 kernels/hour
190              Run 2 (-j 15):        91.91 seconds / 39.17 kernels/hour
191              Run 3 (-j 6):         113.66 seconds / 31.67 kernels/hour
192              Run 4 (-j 9):         101.32 seconds / 35.53 kernels/hour
193
194       Here the attempts that tried to utilize only the real cores (-j 6)  and
195       oversubscribe  them a little (-j 9) look liked a waste of time.  But on
196       some machines with SMT capable processors those will deliver  the  best
197       results,  like  on  this  AMD  Threadripper  processor with 64 core/128
198       threads:
199
200              $ kcbench
201              [cttest@localhost ~]$ bash kcbench -s 5.3 -n 1
202              Processor:            AMD Ryzen Threadripper 3990X 64-Core Processor [128 CPUs]
203              Cpufreq; Memory:      Unknown; 15934 MByte RAM
204              Linux running:        5.6.0-0.rc2.git0.1.vanilla.knurd.2.fc31.x86_64
205              Compiler used:        gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1)
206              Linux compiled:       5.3.0 [/home/cttest/.cache/kcbench/linux-5.3/]
207              Config; Environment:  defconfig; CCACHE_DISABLE="1"
208              Build command:        make vmlinux
209              Run 1 (-j 128):       26.16 seconds / 137.61 kernels/hour
210              Run 2 (-j 136):       26.19 seconds / 137.46 kernels/hour
211              Run 3 (-j 64):        21.45 seconds / 167.83 kernels/hour
212              Run 4 (-j 72):        22.68 seconds / 158.73 kernels/hour
213
214       This is even more visible when compiling an allmodconfig configuration:
215
216              [cttest@localhost ~]$ bash kcbench -s 5.3 -n 1 -m
217              Processor:            AMD Ryzen Threadripper 3990X 64-Core Processor [128 CPUs]
218              Cpufreq; Memory:      Unknown; 63736 MByte RAM
219              Linux running:        5.6.0-0.rc2.git0.1.vanilla.knurd.2.fc31.x86_64
220              Compiler used:        gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1)
221              Linux compiled:       5.3.0 [/home/cttest/.cache/kcbench/linux-5.3/]
222              Config; Environment:  defconfig; CCACHE_DISABLE="1"
223              Build command:        make vmlinux
224              Run 1 (-j 128):       260.43 seconds / 13.82 kernels/hour
225              Run 2 (-j 136):       262.67 seconds / 13.71 kernels/hour
226              Run 3 (-j 64):        215.54 seconds / 16.70 kernels/hour
227              Run 4 (-j 72):        215.97 seconds / 16.67 kernels/hour
228
229       This can happen if the SMT implementation is bad or something  else  is
230       the  bottleneck.  A few tests on above machine indicated the memory in‐
231       terface was the limiting factor.  A AMD Epyc from  the  same  processor
232       generation did not show this effect and delivered its best results when
233       the number of jobs matched the number of CPUs:
234
235              [cttest@localhost ~]$ bash kcbench -s 5.3 -n 1 -m
236              Processor:            AMD EPYC 7742 64-Core Processor [256 CPUs]
237              Cpufreq; Memory:      Unknown; 63736 MByte RAM
238              Linux running:        5.6.0-0.rc2.git0.1.vanilla.knurd.2.fc31.x86_64
239              Compiler used:        gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1)
240              Linux compiled:       5.3.0 [/home/cttest/.cache/kcbench/linux-5.3/]
241              Config; Environment:  defconfig; CCACHE_DISABLE="1"
242              Build command:        make vmlinux
243              Run 1 (-j 256):       128.24 seconds / 28.07 kernels/hour
244              Run 2 (-j 268):       128.87 seconds / 27.94 kernels/hour
245              Run 3 (-j 128):       141.83 seconds / 25.38 kernels/hour
246              Run 4 (-j 140):       137.46 seconds / 26.19 kernels/hour
247
248       This table will tell you now many jobs kcbench will use by default:
249
250               #                             Cores: Default # of jobs
251               #                             1 CPU: 1 2
252               #           2 CPUs (    no SMT    ): 2 3
253               #           2 CPUs (2 threads/core): 2 3 1
254               #           4 CPUs (    no SMT    ): 4 6
255               #           4 CPUs (2 threads/core): 4 6 2
256               #           6 CPUs (    no SMT    ): 6 9
257               #           6 CPUs (2 threads/core): 6 9 3
258               #           8 CPUs (    no SMT    ): 8 11
259               #           8 CPUs (2 threads/core): 8 11 4 6
260               #          12 CPUs (    no SMT    ): 12 16
261               #          12 CPUs (2 threads/core): 12 16 6 9
262               #          16 CPUs (    no SMT    ): 16 20
263               #          16 CPUs (2 threads/core): 16 20 8 11
264               #          20 CPUs (    no SMT    ): 20 25
265               #          20 CPUs (2 threads/core): 20 25 10 14
266               #          24 CPUs (    no SMT    ): 24 29
267               #          24 CPUs (2 threads/core): 24 29 12 16
268               #          28 CPUs (    no SMT    ): 28 34
269               #          28 CPUs (2 threads/core): 28 34 14 18
270               #          32 CPUs (    no SMT    ): 32 38
271               #          32 CPUs (2 threads/core): 32 38 16 20
272               #          32 CPUs (4 threads/core): 32 38 8 11
273               #          48 CPUs (    no SMT    ): 48 55
274               #          48 CPUs (2 threads/core): 48 55 24 29
275               #          48 CPUs (4 threads/core): 48 55 12 16
276               #          64 CPUs (    no SMT    ): 64 72
277               #          64 CPUs (2 threads/core): 64 72 32 38
278               #          64 CPUs (4 threads/core): 64 72 16 20
279               #         128 CPUs (    no SMT    ): 128 140
280               #         128 CPUs (2 threads/core): 128 140 64 72
281               #         128 CPUs (4 threads/core): 128 140 32 38
282               #         256 CPUs (    no SMT    ): 256 272
283               #         256 CPUs (2 threads/core): 256 272 128 140
284               #         256 CPUs (4 threads/core): 256 272 64 72
285

ON FAILED RUNS DUE TO COMPILATION ERRORS

287       As long as you are using a settled GCC version to natively compile  the
288       source  of  a  current Linux kernel for popular architectures like ARM,
289       ARM64/Aarch64, or x86_64 the compilation is unlikely to fail.  For oth‐
290       er  cases there is a bigger risk that compilation will fail due to fac‐
291       tors outside of what kcbench/kcbenchrate  control.   They  nevertheless
292       try  to  catch  a  few common problems and warn, but they can not catch
293       them all, as there are to many factors involved:
294
295       · Brand new compiler generations are sometimes stricter than their pre‐
296         decessors and thus might fail to compile even the latest Linux kernel
297         version.  You might need to use a pre-release  version  of  the  next
298         Linux kernel release to make it work or simply need to wait until the
299         compiler or kernel developers solve the problem.
300
301       · Distributions enable different compiler features that might  have  an
302         impact  on  the  kernel compilation.  For example gcc9 was capable of
303         compiling Linux 4.19 on many distributions, but started  to  fail  on
304         Ubuntu 19.10 due to a feature that got enabled in its GCC.  Compile a
305         newer Linux kernel version in this case.
306
307       · Cross compilation increases the risk of running into compile problems
308         in  general, as there are many compilers and architectures our there.
309         That for example is why compiling the Linux kernel for  an  unpopular
310         architecture  is  more  likely to fail due to bugs in the compiler or
311         the Linux kernel sources that nobody had noticed before when the com‐
312         piler  or kernel was released.  This is even more likely to happen if
313         you start kcbench/kcbenchrate with  '-m/--allmodconfig'  to  build  a
314         more complex kernel.
315

HINTS

317       Running  benchmarks  is very tricky.  Here are a few of the aspects you
318       should keep mind when doing so:
319
320       · kcbench/kcbenchrate by  default  set  CCACHE_DISABLE=1  when  calling
321         'make' to avoid interference from ccache.
322
323       · Do  not  compare  results  from  two  different archs (like ARM64 and
324         x86_64); kcbench/kcbenchrate compile different code in that case,  as
325         they  will  compile a native kernel on each of those archs.  This can
326         be avoided by cross compiling for a third arch that is not related to
327         any  of  the  archs  compared  (say  RISC-V  when comparing ARM64 and
328         x86_64).
329
330       · Unless you want to bench compilers do not compare results  from  dif‐
331         ferent  compiler  generations, as they will apply different optimiza‐
332         tions techniques.  For example to not compare results from  GCC7  and
333         GCC9, as the later optimizes harder and thus will take more time gen‐
334         erating the code.  That's also why the Linux version compiled by  de‐
335         fault  depends  on the machines compiler: you sometimes can't compile
336         older kernels with the latest compilers anyway, as new compiler  gen‐
337         eration  often  uncover bugs in the Linux kernel source that need get
338         fixed for to make compiling succeed.  For  example,  when  GCC10  was
339         close  to  release  it was incapable of compile the then latest Linux
340         version 5.5 in an allmodconfig configuration due to a bug in the Lin‐
341         ux kernel sources.
342
343       · Compiling  a  Linux kernel scales very well and thus can utilize pro‐
344         cessors quite well.  But be aware that some parts of the  Linux  com‐
345         pile  process  will  only use one thread (and thus one CPU core), for
346         example when linking vmlinuz; the other cores  will  idle  meanwhile.
347         The effect on the result will grow with the number of CPU cores.
348
349       If  you  want  to  work  against  that  consider using '-m' to build an
350       allmodconfig configuration with modules; comping a newer, more  complex
351       Linux kernel version can also help.  But the best way to avoid this ef‐
352       fect is by running kcbenchrate.
353

EXAMPLES

355       To let kcbench decide everything automatically simply run:
356              $ kcbench
357
358       On a four core processor without SMT kcbench by default will compile  2
359       kernels with 4 jobs and 2 with 6 jobs .
360
361       : $ kcbench -s 5.4 --iterations 3 --jobs 2 --jobs 4
362
363       This will compile Linux 5.4 first 3 times with 2 jobs and then as often
364       with 4 jobs.
365

RESULTS

367       By default, the lines you are looking for look like this:
368
369              Run 1 (-j 4): 230.30 sec / 15.63 kernels/hour [P:389%, 24 maj. pagefaults]
370
371       Here it took 230.30 seconds to compile the Linux kernel image.  With  a
372       speed  like  this  the  machine  can  compile  15.63  kernels  per hour
373       (3600/230.30).  The results from this 4 core machine also show the  CPU
374       usage  (P)  was  389 percent; 24 major page faults occurred during this
375       run – this number should be small, as processing them takes  some  time
376       and  thus  slows  down the build.  This information is omitted, if less
377       than 20 major page faults happen.  For details how  the  CPU  usage  is
378       calculated  and major page faults are detected see the man page for GNU
379       'time', which kcbench/kcbenchrate rely on for their measurements.
380
381       When running with "-d|--detailedresults" you'll get more  detailed  re‐
382       sult:
383
384              Run 1 (-j 4): 230.30 sec / 15.63 kernels/hour [P:389%]
385              Elapsed Time(E): 2:30.10 (150.10 seconds)
386              Kernel time (S): 36.38 seconds
387              User time (U): 259.51 seconds
388              CPU usage (P): 197%
389              Major page faults (F): 0
390              Minor page faults (R): 9441809
391              Context switches involuntarily (c): 69031
392              Context switches voluntarily (w): 46955
393

MISSING FEATURES

395       · some math to detect the fastest setting and do one more run with this
396         it before sanity checking the result and printing the best one.
397

SEE ALSO

399       kcbenchrate(1), time(1)
400

AUTHOR

402       Thorsten Leemhuis <linux [AT] leemhuis [DOT] info>
403
404
405
406Version 0.9                                                         kcbench(1)
Impressum