1VDSO(7) Linux Programmer's Manual VDSO(7)
2
3
4
6 vdso - overview of the virtual ELF dynamic shared object
7
9 #include <sys/auxv.h>
10
11 void *vdso = (uintptr_t) getauxval(AT_SYSINFO_EHDR);
12
14 The "vDSO" (virtual dynamic shared object) is a small shared library
15 that the kernel automatically maps into the address space of all user-
16 space applications. Applications usually do not need to concern them‐
17 selves with these details as the vDSO is most commonly called by the C
18 library. This way you can code in the normal way using standard func‐
19 tions and the C library will take care of using any functionality that
20 is available via the vDSO.
21
22 Why does the vDSO exist at all? There are some system calls the kernel
23 provides that user-space code ends up using frequently, to the point
24 that such calls can dominate overall performance. This is due both to
25 the frequency of the call as well as the context-switch overhead that
26 results from exiting user space and entering the kernel.
27
28 The rest of this documentation is geared toward the curious and/or C
29 library writers rather than general developers. If you're trying to
30 call the vDSO in your own application rather than using the C library,
31 you're most likely doing it wrong.
32
33 Example background
34 Making system calls can be slow. In x86 32-bit systems, you can trig‐
35 ger a software interrupt (int $0x80) to tell the kernel you wish to
36 make a system call. However, this instruction is expensive: it goes
37 through the full interrupt-handling paths in the processor's microcode
38 as well as in the kernel. Newer processors have faster (but backward
39 incompatible) instructions to initiate system calls. Rather than
40 require the C library to figure out if this functionality is available
41 at run time, the C library can use functions provided by the kernel in
42 the vDSO.
43
44 Note that the terminology can be confusing. On x86 systems, the vDSO
45 function used to determine the preferred method of making a system call
46 is named "__kernel_vsyscall", but on x86-64, the term "vsyscall" also
47 refers to an obsolete way to ask the kernel what time it is or what CPU
48 the caller is on.
49
50 One frequently used system call is gettimeofday(2). This system call
51 is called both directly by user-space applications as well as indi‐
52 rectly by the C library. Think timestamps or timing loops or polling—
53 all of these frequently need to know what time it is right now. This
54 information is also not secret—any application in any privilege mode
55 (root or any unprivileged user) will get the same answer. Thus the
56 kernel arranges for the information required to answer this question to
57 be placed in memory the process can access. Now a call to gettimeof‐
58 day(2) changes from a system call to a normal function call and a few
59 memory accesses.
60
61 Finding the vDSO
62 The base address of the vDSO (if one exists) is passed by the kernel to
63 each program in the initial auxiliary vector (see getauxval(3)), via
64 the AT_SYSINFO_EHDR tag.
65
66 You must not assume the vDSO is mapped at any particular location in
67 the user's memory map. The base address will usually be randomized at
68 run time every time a new process image is created (at execve(2) time).
69 This is done for security reasons, to prevent "return-to-libc" attacks.
70
71 For some architectures, there is also an AT_SYSINFO tag. This is used
72 only for locating the vsyscall entry point and is frequently omitted or
73 set to 0 (meaning it's not available). This tag is a throwback to the
74 initial vDSO work (see History below) and its use should be avoided.
75
76 File format
77 Since the vDSO is a fully formed ELF image, you can do symbol lookups
78 on it. This allows new symbols to be added with newer kernel releases,
79 and allows the C library to detect available functionality at run time
80 when running under different kernel versions. Oftentimes the C library
81 will do detection with the first call and then cache the result for
82 subsequent calls.
83
84 All symbols are also versioned (using the GNU version format). This
85 allows the kernel to update the function signature without breaking
86 backward compatibility. This means changing the arguments that the
87 function accepts as well as the return value. Thus, when looking up a
88 symbol in the vDSO, you must always include the version to match the
89 ABI you expect.
90
91 Typically the vDSO follows the naming convention of prefixing all sym‐
92 bols with "__vdso_" or "__kernel_" so as to distinguish them from other
93 standard symbols. For example, the "gettimeofday" function is named
94 "__vdso_gettimeofday".
95
96 You use the standard C calling conventions when calling any of these
97 functions. No need to worry about weird register or stack behavior.
98
100 Source
101 When you compile the kernel, it will automatically compile and link the
102 vDSO code for you. You will frequently find it under the architecture-
103 specific directory:
104
105 find arch/$ARCH/ -name '*vdso*.so*' -o -name '*gate*.so*'
106
107 vDSO names
108 The name of the vDSO varies across architectures. It will often show
109 up in things like glibc's ldd(1) output. The exact name should not
110 matter to any code, so do not hardcode it.
111
112 user ABI vDSO name
113 ─────────────────────────────
114 aarch64 linux-vdso.so.1
115 arm linux-vdso.so.1
116 ia64 linux-gate.so.1
117 mips linux-vdso.so.1
118 ppc/32 linux-vdso32.so.1
119 ppc/64 linux-vdso64.so.1
120 s390 linux-vdso32.so.1
121 s390x linux-vdso64.so.1
122 sh linux-gate.so.1
123 i386 linux-gate.so.1
124 x86-64 linux-vdso.so.1
125 x86/x32 linux-vdso.so.1
126
127 strace(1), seccomp(2), and the vDSO
128 When tracing systems calls with strace(1), symbols (system calls) that
129 are exported by the vDSO will not appear in the trace output. Those
130 system calls will likewise not be visible to seccomp(2) filters.
131
133 The subsections below provide architecture-specific notes on the vDSO.
134
135 Note that the vDSO that is used is based on the ABI of your user-space
136 code and not the ABI of the kernel. Thus, for example, when you run an
137 i386 32-bit ELF binary, you'll get the same vDSO regardless of whether
138 you run it under an i386 32-bit kernel or under an x86-64 64-bit ker‐
139 nel. Therefore, the name of the user-space ABI should be used to
140 determine which of the sections below is relevant.
141
142 ARM functions
143 The table below lists the symbols exported by the vDSO.
144
145 symbol version
146 ────────────────────────────────────────────────────────────
147 __vdso_gettimeofday LINUX_2.6 (exported since Linux 4.1)
148 __vdso_clock_gettime LINUX_2.6 (exported since Linux 4.1)
149
150 Additionally, the ARM port has a code page full of utility functions.
151 Since it's just a raw page of code, there is no ELF information for
152 doing symbol lookups or versioning. It does provide support for dif‐
153 ferent versions though.
154
155 For information on this code page, it's best to refer to the kernel
156 documentation as it's extremely detailed and covers everything you need
157 to know: Documentation/arm/kernel_user_helpers.txt.
158
159 aarch64 functions
160 The table below lists the symbols exported by the vDSO.
161
162 symbol version
163 ──────────────────────────────────────
164 __kernel_rt_sigreturn LINUX_2.6.39
165 __kernel_gettimeofday LINUX_2.6.39
166 __kernel_clock_gettime LINUX_2.6.39
167 __kernel_clock_getres LINUX_2.6.39
168
169 bfin (Blackfin) functions
170 As this CPU lacks a memory management unit (MMU), it doesn't set up a
171 vDSO in the normal sense. Instead, it maps at boot time a few raw
172 functions into a fixed location in memory. User-space applications
173 then call directly into that region. There is no provision for back‐
174 ward compatibility beyond sniffing raw opcodes, but as this is an
175 embedded CPU, it can get away with things—some of the object formats it
176 runs aren't even ELF based (they're bFLT/FLAT).
177
178 For information on this code page, it's best to refer to the public
179 documentation:
180 http://docs.blackfin.uclinux.org/doku.php?id=linux-kernel:fixed-code
181
182 mips functions
183 The table below lists the symbols exported by the vDSO.
184
185 symbol version
186 ──────────────────────────────────────────────────────────────
187 __kernel_gettimeofday LINUX_2.6 (exported since Linux 4.4)
188 __kernel_clock_gettime LINUX_2.6 (exported since Linux 4.4)
189
190 ia64 (Itanium) functions
191 The table below lists the symbols exported by the vDSO.
192
193 symbol version
194 ───────────────────────────────────────
195 __kernel_sigtramp LINUX_2.5
196 __kernel_syscall_via_break LINUX_2.5
197 __kernel_syscall_via_epc LINUX_2.5
198
199 The Itanium port is somewhat tricky. In addition to the vDSO above, it
200 also has "light-weight system calls" (also known as "fast syscalls" or
201 "fsys"). You can invoke these via the __kernel_syscall_via_epc vDSO
202 helper. The system calls listed here have the same semantics as if you
203 called them directly via syscall(2), so refer to the relevant documen‐
204 tation for each. The table below lists the functions available via
205 this mechanism.
206
207 function
208 ────────────────
209 clock_gettime
210 getcpu
211 getpid
212 getppid
213 gettimeofday
214 set_tid_address
215
216 parisc (hppa) functions
217 The parisc port has a code page full of utility functions called a
218 gateway page. Rather than use the normal ELF auxiliary vector
219 approach, it passes the address of the page to the process via the SR2
220 register. The permissions on the page are such that merely executing
221 those addresses automatically executes with kernel privileges and not
222 in user space. This is done to match the way HP-UX works.
223
224 Since it's just a raw page of code, there is no ELF information for
225 doing symbol lookups or versioning. Simply call into the appropriate
226 offset via the branch instruction, for example:
227
228 ble <offset>(%sr2, %r0)
229
230 offset function
231 ───────────────────────────────────────
232 00b0 lws_entry
233 00e0 set_thread_pointer
234 0100 linux_gateway_entry (syscall)
235 0268 syscall_nosys
236 0274 tracesys
237 0324 tracesys_next
238 0368 tracesys_exit
239 03a0 tracesys_sigexit
240 03b8 lws_start
241 03dc lws_exit_nosys
242 03e0 lws_exit
243 03e4 lws_compare_and_swap64
244 03e8 lws_compare_and_swap
245 0404 cas_wouldblock
246 0410 cas_action
247
248 ppc/32 functions
249 The table below lists the symbols exported by the vDSO. The functions
250 marked with a * are available only when the kernel is a PowerPC64
251 (64-bit) kernel.
252
253 symbol version
254 ────────────────────────────────────────
255 __kernel_clock_getres LINUX_2.6.15
256 __kernel_clock_gettime LINUX_2.6.15
257 __kernel_datapage_offset LINUX_2.6.15
258 __kernel_get_syscall_map LINUX_2.6.15
259 __kernel_get_tbfreq LINUX_2.6.15
260 __kernel_getcpu * LINUX_2.6.15
261 __kernel_gettimeofday LINUX_2.6.15
262 __kernel_sigtramp_rt32 LINUX_2.6.15
263 __kernel_sigtramp32 LINUX_2.6.15
264 __kernel_sync_dicache LINUX_2.6.15
265 __kernel_sync_dicache_p5 LINUX_2.6.15
266
267 The CLOCK_REALTIME_COARSE and CLOCK_MONOTONIC_COARSE clocks are not
268 supported by the __kernel_clock_getres and __kernel_clock_gettime
269 interfaces; the kernel falls back to the real system call.
270
271 ppc/64 functions
272 The table below lists the symbols exported by the vDSO.
273
274 symbol version
275 ────────────────────────────────────────
276 __kernel_clock_getres LINUX_2.6.15
277 __kernel_clock_gettime LINUX_2.6.15
278 __kernel_datapage_offset LINUX_2.6.15
279 __kernel_get_syscall_map LINUX_2.6.15
280 __kernel_get_tbfreq LINUX_2.6.15
281 __kernel_getcpu LINUX_2.6.15
282 __kernel_gettimeofday LINUX_2.6.15
283 __kernel_sigtramp_rt64 LINUX_2.6.15
284 __kernel_sync_dicache LINUX_2.6.15
285 __kernel_sync_dicache_p5 LINUX_2.6.15
286
287 The CLOCK_REALTIME_COARSE and CLOCK_MONOTONIC_COARSE clocks are not
288 supported by the __kernel_clock_getres and __kernel_clock_gettime
289 interfaces; the kernel falls back to the real system call.
290
291 s390 functions
292 The table below lists the symbols exported by the vDSO.
293
294 symbol version
295 ──────────────────────────────────────
296 __kernel_clock_getres LINUX_2.6.29
297 __kernel_clock_gettime LINUX_2.6.29
298 __kernel_gettimeofday LINUX_2.6.29
299
300 s390x functions
301 The table below lists the symbols exported by the vDSO.
302
303 symbol version
304 ──────────────────────────────────────
305 __kernel_clock_getres LINUX_2.6.29
306 __kernel_clock_gettime LINUX_2.6.29
307 __kernel_gettimeofday LINUX_2.6.29
308
309 sh (SuperH) functions
310 The table below lists the symbols exported by the vDSO.
311
312 symbol version
313 ──────────────────────────────────
314 __kernel_rt_sigreturn LINUX_2.6
315 __kernel_sigreturn LINUX_2.6
316 __kernel_vsyscall LINUX_2.6
317
318 i386 functions
319 The table below lists the symbols exported by the vDSO.
320
321 symbol version
322 ──────────────────────────────────────────────────────────────
323 __kernel_sigreturn LINUX_2.5
324 __kernel_rt_sigreturn LINUX_2.5
325 __kernel_vsyscall LINUX_2.5
326 __vdso_clock_gettime LINUX_2.6 (exported since Linux 3.15)
327 __vdso_gettimeofday LINUX_2.6 (exported since Linux 3.15)
328 __vdso_time LINUX_2.6 (exported since Linux 3.15)
329
330 x86-64 functions
331 The table below lists the symbols exported by the vDSO. All of these
332 symbols are also available without the "__vdso_" prefix, but you should
333 ignore those and stick to the names below.
334
335 symbol version
336 ─────────────────────────────────
337 __vdso_clock_gettime LINUX_2.6
338
339 __vdso_getcpu LINUX_2.6
340 __vdso_gettimeofday LINUX_2.6
341 __vdso_time LINUX_2.6
342
343 x86/x32 functions
344 The table below lists the symbols exported by the vDSO.
345
346 symbol version
347 ─────────────────────────────────
348 __vdso_clock_gettime LINUX_2.6
349 __vdso_getcpu LINUX_2.6
350 __vdso_gettimeofday LINUX_2.6
351 __vdso_time LINUX_2.6
352
353 History
354 The vDSO was originally just a single function—the vsyscall. In older
355 kernels, you might see that name in a process's memory map rather than
356 "vdso". Over time, people realized that this mechanism was a great way
357 to pass more functionality to user space, so it was reconceived as a
358 vDSO in the current format.
359
361 syscalls(2), getauxval(3), proc(5)
362
363 The documents, examples, and source code in the Linux source code tree:
364
365 Documentation/ABI/stable/vdso
366 Documentation/ia64/fsys.txt
367 Documentation/vDSO/* (includes examples of using the vDSO)
368
369 find arch/ -iname '*vdso*' -o -iname '*gate*'
370
372 This page is part of release 4.16 of the Linux man-pages project. A
373 description of the project, information about reporting bugs, and the
374 latest version of this page, can be found at
375 https://www.kernel.org/doc/man-pages/.
376
377
378
379Linux 2018-04-30 VDSO(7)