1VDSO(7) Linux Programmer's Manual VDSO(7)
2
3
4
6 vdso - overview of the virtual ELF dynamic shared object
7
9 #include <sys/auxv.h>
10
11 void *vdso = (uintptr_t) getauxval(AT_SYSINFO_EHDR);
12
14 The "vDSO" (virtual dynamic shared object) is a small shared library
15 that the kernel automatically maps into the address space of all user-
16 space applications. Applications usually do not need to concern them‐
17 selves with these details as the vDSO is most commonly called by the C
18 library. This way you can code in the normal way using standard func‐
19 tions and the C library will take care of using any functionality that
20 is available via the vDSO.
21
22 Why does the vDSO exist at all? There are some system calls the kernel
23 provides that user-space code ends up using frequently, to the point
24 that such calls can dominate overall performance. This is due both to
25 the frequency of the call as well as the context-switch overhead that
26 results from exiting user space and entering the kernel.
27
28 The rest of this documentation is geared toward the curious and/or C
29 library writers rather than general developers. If you're trying to
30 call the vDSO in your own application rather than using the C library,
31 you're most likely doing it wrong.
32
33 Example background
34 Making system calls can be slow. In x86 32-bit systems, you can trig‐
35 ger a software interrupt (int $0x80) to tell the kernel you wish to
36 make a system call. However, this instruction is expensive: it goes
37 through the full interrupt-handling paths in the processor's microcode
38 as well as in the kernel. Newer processors have faster (but backward
39 incompatible) instructions to initiate system calls. Rather than re‐
40 quire the C library to figure out if this functionality is available at
41 run time, the C library can use functions provided by the kernel in the
42 vDSO.
43
44 Note that the terminology can be confusing. On x86 systems, the vDSO
45 function used to determine the preferred method of making a system call
46 is named "__kernel_vsyscall", but on x86-64, the term "vsyscall" also
47 refers to an obsolete way to ask the kernel what time it is or what CPU
48 the caller is on.
49
50 One frequently used system call is gettimeofday(2). This system call
51 is called both directly by user-space applications as well as indi‐
52 rectly by the C library. Think timestamps or timing loops or polling—
53 all of these frequently need to know what time it is right now. This
54 information is also not secret—any application in any privilege mode
55 (root or any unprivileged user) will get the same answer. Thus the
56 kernel arranges for the information required to answer this question to
57 be placed in memory the process can access. Now a call to gettimeof‐
58 day(2) changes from a system call to a normal function call and a few
59 memory accesses.
60
61 Finding the vDSO
62 The base address of the vDSO (if one exists) is passed by the kernel to
63 each program in the initial auxiliary vector (see getauxval(3)), via
64 the AT_SYSINFO_EHDR tag.
65
66 You must not assume the vDSO is mapped at any particular location in
67 the user's memory map. The base address will usually be randomized at
68 run time every time a new process image is created (at execve(2) time).
69 This is done for security reasons, to prevent "return-to-libc" attacks.
70
71 For some architectures, there is also an AT_SYSINFO tag. This is used
72 only for locating the vsyscall entry point and is frequently omitted or
73 set to 0 (meaning it's not available). This tag is a throwback to the
74 initial vDSO work (see History below) and its use should be avoided.
75
76 File format
77 Since the vDSO is a fully formed ELF image, you can do symbol lookups
78 on it. This allows new symbols to be added with newer kernel releases,
79 and allows the C library to detect available functionality at run time
80 when running under different kernel versions. Oftentimes the C library
81 will do detection with the first call and then cache the result for
82 subsequent calls.
83
84 All symbols are also versioned (using the GNU version format). This
85 allows the kernel to update the function signature without breaking
86 backward compatibility. This means changing the arguments that the
87 function accepts as well as the return value. Thus, when looking up a
88 symbol in the vDSO, you must always include the version to match the
89 ABI you expect.
90
91 Typically the vDSO follows the naming convention of prefixing all sym‐
92 bols with "__vdso_" or "__kernel_" so as to distinguish them from other
93 standard symbols. For example, the "gettimeofday" function is named
94 "__vdso_gettimeofday".
95
96 You use the standard C calling conventions when calling any of these
97 functions. No need to worry about weird register or stack behavior.
98
100 Source
101 When you compile the kernel, it will automatically compile and link the
102 vDSO code for you. You will frequently find it under the architecture-
103 specific directory:
104
105 find arch/$ARCH/ -name '*vdso*.so*' -o -name '*gate*.so*'
106
107 vDSO names
108 The name of the vDSO varies across architectures. It will often show
109 up in things like glibc's ldd(1) output. The exact name should not
110 matter to any code, so do not hardcode it.
111
112 user ABI vDSO name
113 ─────────────────────────────
114 aarch64 linux-vdso.so.1
115 arm linux-vdso.so.1
116 ia64 linux-gate.so.1
117 mips linux-vdso.so.1
118 ppc/32 linux-vdso32.so.1
119 ppc/64 linux-vdso64.so.1
120 riscv linux-vdso.so.1
121 s390 linux-vdso32.so.1
122 s390x linux-vdso64.so.1
123 sh linux-gate.so.1
124 i386 linux-gate.so.1
125 x86-64 linux-vdso.so.1
126 x86/x32 linux-vdso.so.1
127
128 strace(1), seccomp(2), and the vDSO
129 When tracing systems calls with strace(1), symbols (system calls) that
130 are exported by the vDSO will not appear in the trace output. Those
131 system calls will likewise not be visible to seccomp(2) filters.
132
134 The subsections below provide architecture-specific notes on the vDSO.
135
136 Note that the vDSO that is used is based on the ABI of your user-space
137 code and not the ABI of the kernel. Thus, for example, when you run an
138 i386 32-bit ELF binary, you'll get the same vDSO regardless of whether
139 you run it under an i386 32-bit kernel or under an x86-64 64-bit ker‐
140 nel. Therefore, the name of the user-space ABI should be used to de‐
141 termine which of the sections below is relevant.
142
143 ARM functions
144 The table below lists the symbols exported by the vDSO.
145
146 symbol version
147 ────────────────────────────────────────────────────────────
148 __vdso_gettimeofday LINUX_2.6 (exported since Linux 4.1)
149 __vdso_clock_gettime LINUX_2.6 (exported since Linux 4.1)
150
151 Additionally, the ARM port has a code page full of utility functions.
152 Since it's just a raw page of code, there is no ELF information for do‐
153 ing symbol lookups or versioning. It does provide support for differ‐
154 ent versions though.
155
156 For information on this code page, it's best to refer to the kernel
157 documentation as it's extremely detailed and covers everything you need
158 to know: Documentation/arm/kernel_user_helpers.txt.
159
160 aarch64 functions
161 The table below lists the symbols exported by the vDSO.
162
163 symbol version
164 ──────────────────────────────────────
165 __kernel_rt_sigreturn LINUX_2.6.39
166 __kernel_gettimeofday LINUX_2.6.39
167 __kernel_clock_gettime LINUX_2.6.39
168 __kernel_clock_getres LINUX_2.6.39
169
170 bfin (Blackfin) functions (port removed in Linux 4.17)
171 As this CPU lacks a memory management unit (MMU), it doesn't set up a
172 vDSO in the normal sense. Instead, it maps at boot time a few raw
173 functions into a fixed location in memory. User-space applications
174 then call directly into that region. There is no provision for back‐
175 ward compatibility beyond sniffing raw opcodes, but as this is an em‐
176 bedded CPU, it can get away with things—some of the object formats it
177 runs aren't even ELF based (they're bFLT/FLAT).
178
179 For information on this code page, it's best to refer to the public
180 documentation:
181 http://docs.blackfin.uclinux.org/doku.php?id=linux-kernel:fixed-code
182
183 mips functions
184 The table below lists the symbols exported by the vDSO.
185
186 symbol version
187 ──────────────────────────────────────────────────────────────
188 __kernel_gettimeofday LINUX_2.6 (exported since Linux 4.4)
189 __kernel_clock_gettime LINUX_2.6 (exported since Linux 4.4)
190
191 ia64 (Itanium) functions
192 The table below lists the symbols exported by the vDSO.
193
194 symbol version
195 ───────────────────────────────────────
196 __kernel_sigtramp LINUX_2.5
197 __kernel_syscall_via_break LINUX_2.5
198
199 __kernel_syscall_via_epc LINUX_2.5
200
201 The Itanium port is somewhat tricky. In addition to the vDSO above, it
202 also has "light-weight system calls" (also known as "fast syscalls" or
203 "fsys"). You can invoke these via the __kernel_syscall_via_epc vDSO
204 helper. The system calls listed here have the same semantics as if you
205 called them directly via syscall(2), so refer to the relevant documen‐
206 tation for each. The table below lists the functions available via
207 this mechanism.
208
209 function
210 ────────────────
211 clock_gettime
212 getcpu
213 getpid
214 getppid
215 gettimeofday
216 set_tid_address
217
218 parisc (hppa) functions
219 The parisc port has a code page with utility functions called a gateway
220 page. Rather than use the normal ELF auxiliary vector approach, it
221 passes the address of the page to the process via the SR2 register.
222 The permissions on the page are such that merely executing those ad‐
223 dresses automatically executes with kernel privileges and not in user
224 space. This is done to match the way HP-UX works.
225
226 Since it's just a raw page of code, there is no ELF information for do‐
227 ing symbol lookups or versioning. Simply call into the appropriate
228 offset via the branch instruction, for example:
229
230 ble <offset>(%sr2, %r0)
231
232 offset function
233 ────────────────────────────────────────────
234 00b0 lws_entry (CAS operations)
235 00e0 set_thread_pointer (used by glibc)
236 0100 linux_gateway_entry (syscall)
237
238 ppc/32 functions
239 The table below lists the symbols exported by the vDSO. The functions
240 marked with a * are available only when the kernel is a PowerPC64
241 (64-bit) kernel.
242
243 symbol version
244 ────────────────────────────────────────
245 __kernel_clock_getres LINUX_2.6.15
246 __kernel_clock_gettime LINUX_2.6.15
247 __kernel_clock_gettime64 LINUX_5.11
248 __kernel_datapage_offset LINUX_2.6.15
249 __kernel_get_syscall_map LINUX_2.6.15
250 __kernel_get_tbfreq LINUX_2.6.15
251 __kernel_getcpu * LINUX_2.6.15
252 __kernel_gettimeofday LINUX_2.6.15
253 __kernel_sigtramp_rt32 LINUX_2.6.15
254 __kernel_sigtramp32 LINUX_2.6.15
255 __kernel_sync_dicache LINUX_2.6.15
256 __kernel_sync_dicache_p5 LINUX_2.6.15
257
258 In kernel versions before Linux 5.6, the CLOCK_REALTIME_COARSE and
259 CLOCK_MONOTONIC_COARSE clocks are not supported by the __ker‐
260 nel_clock_getres and __kernel_clock_gettime interfaces; the kernel
261 falls back to the real system call.
262
263 ppc/64 functions
264 The table below lists the symbols exported by the vDSO.
265
266 symbol version
267 ────────────────────────────────────────
268 __kernel_clock_getres LINUX_2.6.15
269 __kernel_clock_gettime LINUX_2.6.15
270 __kernel_datapage_offset LINUX_2.6.15
271 __kernel_get_syscall_map LINUX_2.6.15
272 __kernel_get_tbfreq LINUX_2.6.15
273 __kernel_getcpu LINUX_2.6.15
274 __kernel_gettimeofday LINUX_2.6.15
275 __kernel_sigtramp_rt64 LINUX_2.6.15
276 __kernel_sync_dicache LINUX_2.6.15
277 __kernel_sync_dicache_p5 LINUX_2.6.15
278
279 In kernel versions before Linux 4.16, the CLOCK_REALTIME_COARSE and
280 CLOCK_MONOTONIC_COARSE clocks are not supported by the __ker‐
281 nel_clock_getres and __kernel_clock_gettime interfaces; the kernel
282 falls back to the real system call.
283
284 riscv functions
285 The table below lists the symbols exported by the vDSO.
286
287 symbol version
288 ────────────────────────────────────
289 __kernel_rt_sigreturn LINUX_4.15
290 __kernel_gettimeofday LINUX_4.15
291 __kernel_clock_gettime LINUX_4.15
292 __kernel_clock_getres LINUX_4.15
293 __kernel_getcpu LINUX_4.15
294 __kernel_flush_icache LINUX_4.15
295
296 s390 functions
297 The table below lists the symbols exported by the vDSO.
298
299 symbol version
300 ──────────────────────────────────────
301 __kernel_clock_getres LINUX_2.6.29
302 __kernel_clock_gettime LINUX_2.6.29
303 __kernel_gettimeofday LINUX_2.6.29
304
305 s390x functions
306 The table below lists the symbols exported by the vDSO.
307
308 symbol version
309 ──────────────────────────────────────
310 __kernel_clock_getres LINUX_2.6.29
311 __kernel_clock_gettime LINUX_2.6.29
312 __kernel_gettimeofday LINUX_2.6.29
313
314 sh (SuperH) functions
315 The table below lists the symbols exported by the vDSO.
316
317 symbol version
318 ──────────────────────────────────
319 __kernel_rt_sigreturn LINUX_2.6
320 __kernel_sigreturn LINUX_2.6
321 __kernel_vsyscall LINUX_2.6
322
323 i386 functions
324 The table below lists the symbols exported by the vDSO.
325
326 symbol version
327 ──────────────────────────────────────────────────────────────
328 __kernel_sigreturn LINUX_2.5
329 __kernel_rt_sigreturn LINUX_2.5
330 __kernel_vsyscall LINUX_2.5
331 __vdso_clock_gettime LINUX_2.6 (exported since Linux 3.15)
332
333 __vdso_gettimeofday LINUX_2.6 (exported since Linux 3.15)
334 __vdso_time LINUX_2.6 (exported since Linux 3.15)
335
336 x86-64 functions
337 The table below lists the symbols exported by the vDSO. All of these
338 symbols are also available without the "__vdso_" prefix, but you should
339 ignore those and stick to the names below.
340
341 symbol version
342 ─────────────────────────────────
343 __vdso_clock_gettime LINUX_2.6
344 __vdso_getcpu LINUX_2.6
345 __vdso_gettimeofday LINUX_2.6
346 __vdso_time LINUX_2.6
347
348 x86/x32 functions
349 The table below lists the symbols exported by the vDSO.
350
351 symbol version
352 ─────────────────────────────────
353 __vdso_clock_gettime LINUX_2.6
354 __vdso_getcpu LINUX_2.6
355 __vdso_gettimeofday LINUX_2.6
356 __vdso_time LINUX_2.6
357
358 History
359 The vDSO was originally just a single function—the vsyscall. In older
360 kernels, you might see that name in a process's memory map rather than
361 "vdso". Over time, people realized that this mechanism was a great way
362 to pass more functionality to user space, so it was reconceived as a
363 vDSO in the current format.
364
366 syscalls(2), getauxval(3), proc(5)
367
368 The documents, examples, and source code in the Linux source code tree:
369
370 Documentation/ABI/stable/vdso
371 Documentation/ia64/fsys.txt
372 Documentation/vDSO/* (includes examples of using the vDSO)
373
374 find arch/ -iname '*vdso*' -o -iname '*gate*'
375
377 This page is part of release 5.13 of the Linux man-pages project. A
378 description of the project, information about reporting bugs, and the
379 latest version of this page, can be found at
380 https://www.kernel.org/doc/man-pages/.
381
382
383
384Linux 2021-08-27 VDSO(7)