1VDSO(7) Linux Programmer's Manual VDSO(7)
2
3
4
6 vdso - overview of the virtual ELF dynamic shared object
7
9 #include <sys/auxv.h>
10
11 void *vdso = (uintptr_t) getauxval(AT_SYSINFO_EHDR);
12
14 The "vDSO" (virtual dynamic shared object) is a small shared library
15 that the kernel automatically maps into the address space of all user-
16 space applications. Applications usually do not need to concern them‐
17 selves with these details as the vDSO is most commonly called by the C
18 library. This way you can code in the normal way using standard func‐
19 tions and the C library will take care of using any functionality that
20 is available via the vDSO.
21
22 Why does the vDSO exist at all? There are some system calls the kernel
23 provides that user-space code ends up using frequently, to the point
24 that such calls can dominate overall performance. This is due both to
25 the frequency of the call as well as the context-switch overhead that
26 results from exiting user space and entering the kernel.
27
28 The rest of this documentation is geared toward the curious and/or C
29 library writers rather than general developers. If you're trying to
30 call the vDSO in your own application rather than using the C library,
31 you're most likely doing it wrong.
32
33 Example background
34 Making system calls can be slow. In x86 32-bit systems, you can trig‐
35 ger a software interrupt (int $0x80) to tell the kernel you wish to
36 make a system call. However, this instruction is expensive: it goes
37 through the full interrupt-handling paths in the processor's microcode
38 as well as in the kernel. Newer processors have faster (but backward
39 incompatible) instructions to initiate system calls. Rather than
40 require the C library to figure out if this functionality is available
41 at run time, the C library can use functions provided by the kernel in
42 the vDSO.
43
44 Note that the terminology can be confusing. On x86 systems, the vDSO
45 function used to determine the preferred method of making a system call
46 is named "__kernel_vsyscall", but on x86-64, the term "vsyscall" also
47 refers to an obsolete way to ask the kernel what time it is or what CPU
48 the caller is on.
49
50 One frequently used system call is gettimeofday(2). This system call
51 is called both directly by user-space applications as well as indi‐
52 rectly by the C library. Think timestamps or timing loops or polling—
53 all of these frequently need to know what time it is right now. This
54 information is also not secret—any application in any privilege mode
55 (root or any unprivileged user) will get the same answer. Thus the
56 kernel arranges for the information required to answer this question to
57 be placed in memory the process can access. Now a call to gettimeof‐
58 day(2) changes from a system call to a normal function call and a few
59 memory accesses.
60
61 Finding the vDSO
62 The base address of the vDSO (if one exists) is passed by the kernel to
63 each program in the initial auxiliary vector (see getauxval(3)), via
64 the AT_SYSINFO_EHDR tag.
65
66 You must not assume the vDSO is mapped at any particular location in
67 the user's memory map. The base address will usually be randomized at
68 run time every time a new process image is created (at execve(2) time).
69 This is done for security reasons, to prevent "return-to-libc" attacks.
70
71 For some architectures, there is also an AT_SYSINFO tag. This is used
72 only for locating the vsyscall entry point and is frequently omitted or
73 set to 0 (meaning it's not available). This tag is a throwback to the
74 initial vDSO work (see History below) and its use should be avoided.
75
76 File format
77 Since the vDSO is a fully formed ELF image, you can do symbol lookups
78 on it. This allows new symbols to be added with newer kernel releases,
79 and allows the C library to detect available functionality at run time
80 when running under different kernel versions. Oftentimes the C library
81 will do detection with the first call and then cache the result for
82 subsequent calls.
83
84 All symbols are also versioned (using the GNU version format). This
85 allows the kernel to update the function signature without breaking
86 backward compatibility. This means changing the arguments that the
87 function accepts as well as the return value. Thus, when looking up a
88 symbol in the vDSO, you must always include the version to match the
89 ABI you expect.
90
91 Typically the vDSO follows the naming convention of prefixing all sym‐
92 bols with "__vdso_" or "__kernel_" so as to distinguish them from other
93 standard symbols. For example, the "gettimeofday" function is named
94 "__vdso_gettimeofday".
95
96 You use the standard C calling conventions when calling any of these
97 functions. No need to worry about weird register or stack behavior.
98
100 Source
101 When you compile the kernel, it will automatically compile and link the
102 vDSO code for you. You will frequently find it under the architecture-
103 specific directory:
104
105 find arch/$ARCH/ -name '*vdso*.so*' -o -name '*gate*.so*'
106
107 vDSO names
108 The name of the vDSO varies across architectures. It will often show
109 up in things like glibc's ldd(1) output. The exact name should not
110 matter to any code, so do not hardcode it.
111
112 user ABI vDSO name
113 ─────────────────────────────
114 aarch64 linux-vdso.so.1
115 arm linux-vdso.so.1
116 ia64 linux-gate.so.1
117 mips linux-vdso.so.1
118 ppc/32 linux-vdso32.so.1
119 ppc/64 linux-vdso64.so.1
120 riscv linux-vdso.so.1
121 s390 linux-vdso32.so.1
122 s390x linux-vdso64.so.1
123 sh linux-gate.so.1
124 i386 linux-gate.so.1
125 x86-64 linux-vdso.so.1
126 x86/x32 linux-vdso.so.1
127
128 strace(1), seccomp(2), and the vDSO
129 When tracing systems calls with strace(1), symbols (system calls) that
130 are exported by the vDSO will not appear in the trace output. Those
131 system calls will likewise not be visible to seccomp(2) filters.
132
134 The subsections below provide architecture-specific notes on the vDSO.
135
136 Note that the vDSO that is used is based on the ABI of your user-space
137 code and not the ABI of the kernel. Thus, for example, when you run an
138 i386 32-bit ELF binary, you'll get the same vDSO regardless of whether
139 you run it under an i386 32-bit kernel or under an x86-64 64-bit ker‐
140 nel. Therefore, the name of the user-space ABI should be used to
141 determine which of the sections below is relevant.
142
143 ARM functions
144 The table below lists the symbols exported by the vDSO.
145
146 symbol version
147 ────────────────────────────────────────────────────────────
148 __vdso_gettimeofday LINUX_2.6 (exported since Linux 4.1)
149 __vdso_clock_gettime LINUX_2.6 (exported since Linux 4.1)
150
151 Additionally, the ARM port has a code page full of utility functions.
152 Since it's just a raw page of code, there is no ELF information for
153 doing symbol lookups or versioning. It does provide support for dif‐
154 ferent versions though.
155
156 For information on this code page, it's best to refer to the kernel
157 documentation as it's extremely detailed and covers everything you need
158 to know: Documentation/arm/kernel_user_helpers.txt.
159
160 aarch64 functions
161 The table below lists the symbols exported by the vDSO.
162
163 symbol version
164 ──────────────────────────────────────
165 __kernel_rt_sigreturn LINUX_2.6.39
166 __kernel_gettimeofday LINUX_2.6.39
167 __kernel_clock_gettime LINUX_2.6.39
168 __kernel_clock_getres LINUX_2.6.39
169
170 bfin (Blackfin) functions (port removed in Linux 4.17)
171 As this CPU lacks a memory management unit (MMU), it doesn't set up a
172 vDSO in the normal sense. Instead, it maps at boot time a few raw
173 functions into a fixed location in memory. User-space applications
174 then call directly into that region. There is no provision for back‐
175 ward compatibility beyond sniffing raw opcodes, but as this is an
176 embedded CPU, it can get away with things—some of the object formats it
177 runs aren't even ELF based (they're bFLT/FLAT).
178
179 For information on this code page, it's best to refer to the public
180 documentation:
181 http://docs.blackfin.uclinux.org/doku.php?id=linux-kernel:fixed-code
182
183 mips functions
184 The table below lists the symbols exported by the vDSO.
185
186 symbol version
187 ──────────────────────────────────────────────────────────────
188 __kernel_gettimeofday LINUX_2.6 (exported since Linux 4.4)
189 __kernel_clock_gettime LINUX_2.6 (exported since Linux 4.4)
190
191 ia64 (Itanium) functions
192 The table below lists the symbols exported by the vDSO.
193
194 symbol version
195 ───────────────────────────────────────
196 __kernel_sigtramp LINUX_2.5
197 __kernel_syscall_via_break LINUX_2.5
198
199 __kernel_syscall_via_epc LINUX_2.5
200
201 The Itanium port is somewhat tricky. In addition to the vDSO above, it
202 also has "light-weight system calls" (also known as "fast syscalls" or
203 "fsys"). You can invoke these via the __kernel_syscall_via_epc vDSO
204 helper. The system calls listed here have the same semantics as if you
205 called them directly via syscall(2), so refer to the relevant documen‐
206 tation for each. The table below lists the functions available via
207 this mechanism.
208
209 function
210 ────────────────
211 clock_gettime
212 getcpu
213 getpid
214 getppid
215 gettimeofday
216 set_tid_address
217
218 parisc (hppa) functions
219 The parisc port has a code page with utility functions called a gateway
220 page. Rather than use the normal ELF auxiliary vector approach, it
221 passes the address of the page to the process via the SR2 register.
222 The permissions on the page are such that merely executing those
223 addresses automatically executes with kernel privileges and not in user
224 space. This is done to match the way HP-UX works.
225
226 Since it's just a raw page of code, there is no ELF information for
227 doing symbol lookups or versioning. Simply call into the appropriate
228 offset via the branch instruction, for example:
229
230 ble <offset>(%sr2, %r0)
231
232 offset function
233 ────────────────────────────────────────────
234 00b0 lws_entry (CAS operations)
235 00e0 set_thread_pointer (used by glibc)
236 0100 linux_gateway_entry (syscall)
237
238 ppc/32 functions
239 The table below lists the symbols exported by the vDSO. The functions
240 marked with a * are available only when the kernel is a PowerPC64
241 (64-bit) kernel.
242
243 symbol version
244 ────────────────────────────────────────
245 __kernel_clock_getres LINUX_2.6.15
246 __kernel_clock_gettime LINUX_2.6.15
247 __kernel_datapage_offset LINUX_2.6.15
248 __kernel_get_syscall_map LINUX_2.6.15
249 __kernel_get_tbfreq LINUX_2.6.15
250 __kernel_getcpu * LINUX_2.6.15
251 __kernel_gettimeofday LINUX_2.6.15
252 __kernel_sigtramp_rt32 LINUX_2.6.15
253 __kernel_sigtramp32 LINUX_2.6.15
254 __kernel_sync_dicache LINUX_2.6.15
255 __kernel_sync_dicache_p5 LINUX_2.6.15
256
257 The CLOCK_REALTIME_COARSE and CLOCK_MONOTONIC_COARSE clocks are not
258 supported by the __kernel_clock_getres and __kernel_clock_gettime
259 interfaces; the kernel falls back to the real system call.
260
261 ppc/64 functions
262 The table below lists the symbols exported by the vDSO.
263
264
265 symbol version
266 ────────────────────────────────────────
267 __kernel_clock_getres LINUX_2.6.15
268 __kernel_clock_gettime LINUX_2.6.15
269 __kernel_datapage_offset LINUX_2.6.15
270 __kernel_get_syscall_map LINUX_2.6.15
271 __kernel_get_tbfreq LINUX_2.6.15
272 __kernel_getcpu LINUX_2.6.15
273 __kernel_gettimeofday LINUX_2.6.15
274 __kernel_sigtramp_rt64 LINUX_2.6.15
275 __kernel_sync_dicache LINUX_2.6.15
276 __kernel_sync_dicache_p5 LINUX_2.6.15
277
278 The CLOCK_REALTIME_COARSE and CLOCK_MONOTONIC_COARSE clocks are not
279 supported by the __kernel_clock_getres and __kernel_clock_gettime
280 interfaces; the kernel falls back to the real system call.
281
282 riscv functions
283 The table below lists the symbols exported by the vDSO.
284
285 symbol version
286 ────────────────────────────────────
287 __kernel_rt_sigreturn LINUX_4.15
288 __kernel_gettimeofday LINUX_4.15
289 __kernel_clock_gettime LINUX_4.15
290 __kernel_clock_getres LINUX_4.15
291 __kernel_getcpu LINUX_4.15
292 __kernel_flush_icache LINUX_4.15
293
294 s390 functions
295 The table below lists the symbols exported by the vDSO.
296
297 symbol version
298 ──────────────────────────────────────
299 __kernel_clock_getres LINUX_2.6.29
300 __kernel_clock_gettime LINUX_2.6.29
301 __kernel_gettimeofday LINUX_2.6.29
302
303 s390x functions
304 The table below lists the symbols exported by the vDSO.
305
306 symbol version
307 ──────────────────────────────────────
308 __kernel_clock_getres LINUX_2.6.29
309 __kernel_clock_gettime LINUX_2.6.29
310 __kernel_gettimeofday LINUX_2.6.29
311
312 sh (SuperH) functions
313 The table below lists the symbols exported by the vDSO.
314
315 symbol version
316 ──────────────────────────────────
317 __kernel_rt_sigreturn LINUX_2.6
318 __kernel_sigreturn LINUX_2.6
319 __kernel_vsyscall LINUX_2.6
320
321 i386 functions
322 The table below lists the symbols exported by the vDSO.
323
324 symbol version
325 ──────────────────────────────────────────────────────────────
326 __kernel_sigreturn LINUX_2.5
327 __kernel_rt_sigreturn LINUX_2.5
328 __kernel_vsyscall LINUX_2.5
329 __vdso_clock_gettime LINUX_2.6 (exported since Linux 3.15)
330
331 __vdso_gettimeofday LINUX_2.6 (exported since Linux 3.15)
332 __vdso_time LINUX_2.6 (exported since Linux 3.15)
333
334 x86-64 functions
335 The table below lists the symbols exported by the vDSO. All of these
336 symbols are also available without the "__vdso_" prefix, but you should
337 ignore those and stick to the names below.
338
339 symbol version
340 ─────────────────────────────────
341 __vdso_clock_gettime LINUX_2.6
342 __vdso_getcpu LINUX_2.6
343 __vdso_gettimeofday LINUX_2.6
344 __vdso_time LINUX_2.6
345
346 x86/x32 functions
347 The table below lists the symbols exported by the vDSO.
348
349 symbol version
350 ─────────────────────────────────
351 __vdso_clock_gettime LINUX_2.6
352 __vdso_getcpu LINUX_2.6
353 __vdso_gettimeofday LINUX_2.6
354 __vdso_time LINUX_2.6
355
356 History
357 The vDSO was originally just a single function—the vsyscall. In older
358 kernels, you might see that name in a process's memory map rather than
359 "vdso". Over time, people realized that this mechanism was a great way
360 to pass more functionality to user space, so it was reconceived as a
361 vDSO in the current format.
362
364 syscalls(2), getauxval(3), proc(5)
365
366 The documents, examples, and source code in the Linux source code tree:
367
368 Documentation/ABI/stable/vdso
369 Documentation/ia64/fsys.txt
370 Documentation/vDSO/* (includes examples of using the vDSO)
371
372 find arch/ -iname '*vdso*' -o -iname '*gate*'
373
375 This page is part of release 5.07 of the Linux man-pages project. A
376 description of the project, information about reporting bugs, and the
377 latest version of this page, can be found at
378 https://www.kernel.org/doc/man-pages/.
379
380
381
382Linux 2019-08-02 VDSO(7)