1VDSO(7) Linux Programmer's Manual VDSO(7)
2
3
4
6 vdso - overview of the virtual ELF dynamic shared object
7
9 #include <sys/auxv.h>
10
11 void *vdso = (uintptr_t) getauxval(AT_SYSINFO_EHDR);
12
14 The "vDSO" (virtual dynamic shared object) is a small shared library
15 that the kernel automatically maps into the address space of all user-
16 space applications. Applications usually do not need to concern them‐
17 selves with these details as the vDSO is most commonly called by the C
18 library. This way you can code in the normal way using standard func‐
19 tions and the C library will take care of using any functionality that
20 is available via the vDSO.
21
22 Why does the vDSO exist at all? There are some system calls the kernel
23 provides that user-space code ends up using frequently, to the point
24 that such calls can dominate overall performance. This is due both to
25 the frequency of the call as well as the context-switch overhead that
26 results from exiting user space and entering the kernel.
27
28 The rest of this documentation is geared toward the curious and/or C
29 library writers rather than general developers. If you're trying to
30 call the vDSO in your own application rather than using the C library,
31 you're most likely doing it wrong.
32
33 Example background
34 Making system calls can be slow. In x86 32-bit systems, you can trig‐
35 ger a software interrupt (int $0x80) to tell the kernel you wish to
36 make a system call. However, this instruction is expensive: it goes
37 through the full interrupt-handling paths in the processor's microcode
38 as well as in the kernel. Newer processors have faster (but backward
39 incompatible) instructions to initiate system calls. Rather than
40 require the C library to figure out if this functionality is available
41 at run time, the C library can use functions provided by the kernel in
42 the vDSO.
43
44 Note that the terminology can be confusing. On x86 systems, the vDSO
45 function used to determine the preferred method of making a system call
46 is named "__kernel_vsyscall", but on x86-64, the term "vsyscall" also
47 refers to an obsolete way to ask the kernel what time it is or what CPU
48 the caller is on.
49
50 One frequently used system call is gettimeofday(2). This system call
51 is called both directly by user-space applications as well as indi‐
52 rectly by the C library. Think timestamps or timing loops or polling—
53 all of these frequently need to know what time it is right now. This
54 information is also not secret—any application in any privilege mode
55 (root or any unprivileged user) will get the same answer. Thus the
56 kernel arranges for the information required to answer this question to
57 be placed in memory the process can access. Now a call to gettimeof‐
58 day(2) changes from a system call to a normal function call and a few
59 memory accesses.
60
61 Finding the vDSO
62 The base address of the vDSO (if one exists) is passed by the kernel to
63 each program in the initial auxiliary vector (see getauxval(3)), via
64 the AT_SYSINFO_EHDR tag.
65
66 You must not assume the vDSO is mapped at any particular location in
67 the user's memory map. The base address will usually be randomized at
68 run time every time a new process image is created (at execve(2) time).
69 This is done for security reasons, to prevent "return-to-libc" attacks.
70
71 For some architectures, there is also an AT_SYSINFO tag. This is used
72 only for locating the vsyscall entry point and is frequently omitted or
73 set to 0 (meaning it's not available). This tag is a throwback to the
74 initial vDSO work (see History below) and its use should be avoided.
75
76 File format
77 Since the vDSO is a fully formed ELF image, you can do symbol lookups
78 on it. This allows new symbols to be added with newer kernel releases,
79 and allows the C library to detect available functionality at run time
80 when running under different kernel versions. Oftentimes the C library
81 will do detection with the first call and then cache the result for
82 subsequent calls.
83
84 All symbols are also versioned (using the GNU version format). This
85 allows the kernel to update the function signature without breaking
86 backward compatibility. This means changing the arguments that the
87 function accepts as well as the return value. Thus, when looking up a
88 symbol in the vDSO, you must always include the version to match the
89 ABI you expect.
90
91 Typically the vDSO follows the naming convention of prefixing all sym‐
92 bols with "__vdso_" or "__kernel_" so as to distinguish them from other
93 standard symbols. For example, the "gettimeofday" function is named
94 "__vdso_gettimeofday".
95
96 You use the standard C calling conventions when calling any of these
97 functions. No need to worry about weird register or stack behavior.
98
100 Source
101 When you compile the kernel, it will automatically compile and link the
102 vDSO code for you. You will frequently find it under the architecture-
103 specific directory:
104
105 find arch/$ARCH/ -name '*vdso*.so*' -o -name '*gate*.so*'
106
107 vDSO names
108 The name of the vDSO varies across architectures. It will often show
109 up in things like glibc's ldd(1) output. The exact name should not
110 matter to any code, so do not hardcode it.
111
112 user ABI vDSO name
113 ─────────────────────────────
114 aarch64 linux-vdso.so.1
115 arm linux-vdso.so.1
116 ia64 linux-gate.so.1
117 mips linux-vdso.so.1
118 ppc/32 linux-vdso32.so.1
119 ppc/64 linux-vdso64.so.1
120 s390 linux-vdso32.so.1
121 s390x linux-vdso64.so.1
122 sh linux-gate.so.1
123 i386 linux-gate.so.1
124 x86-64 linux-vdso.so.1
125 x86/x32 linux-vdso.so.1
126
127 strace(1) and the vDSO
128 When tracing systems calls with strace(1), symbols (system calls) that
129 are exported by the vDSO will not appear in the trace output.
130
132 The subsections below provide architecture-specific notes on the vDSO.
133
134 Note that the vDSO that is used is based on the ABI of your user-space
135 code and not the ABI of the kernel. Thus, for example, when you run an
136 i386 32-bit ELF binary, you'll get the same vDSO regardless of whether
137 you run it under an i386 32-bit kernel or under an x86-64 64-bit ker‐
138 nel. Therefore, the name of the user-space ABI should be used to
139 determine which of the sections below is relevant.
140
141 ARM functions
142 The table below lists the symbols exported by the vDSO.
143
144 symbol version
145 ────────────────────────────────────────────────────────────
146 __vdso_gettimeofday LINUX_2.6 (exported since Linux 4.1)
147 __vdso_clock_gettime LINUX_2.6 (exported since Linux 4.1)
148
149 Additionally, the ARM port has a code page full of utility functions.
150 Since it's just a raw page of code, there is no ELF information for
151 doing symbol lookups or versioning. It does provide support for dif‐
152 ferent versions though.
153
154 For information on this code page, it's best to refer to the kernel
155 documentation as it's extremely detailed and covers everything you need
156 to know: Documentation/arm/kernel_user_helpers.txt.
157
158 aarch64 functions
159 The table below lists the symbols exported by the vDSO.
160
161 symbol version
162 ──────────────────────────────────────
163 __kernel_rt_sigreturn LINUX_2.6.39
164 __kernel_gettimeofday LINUX_2.6.39
165 __kernel_clock_gettime LINUX_2.6.39
166 __kernel_clock_getres LINUX_2.6.39
167
168 bfin (Blackfin) functions
169 As this CPU lacks a memory management unit (MMU), it doesn't set up a
170 vDSO in the normal sense. Instead, it maps at boot time a few raw
171 functions into a fixed location in memory. User-space applications
172 then call directly into that region. There is no provision for back‐
173 ward compatibility beyond sniffing raw opcodes, but as this is an
174 embedded CPU, it can get away with things—some of the object formats it
175 runs aren't even ELF based (they're bFLT/FLAT).
176
177 For information on this code page, it's best to refer to the public
178 documentation:
179 http://docs.blackfin.uclinux.org/doku.php?id=linux-kernel:fixed-code
180
181 mips functions
182 The table below lists the symbols exported by the vDSO.
183
184 symbol version
185 ──────────────────────────────────────────────────────────────
186 __kernel_gettimeofday LINUX_2.6 (exported since Linux 4.4)
187 __kernel_clock_gettime LINUX_2.6 (exported since Linux 4.4)
188
189 ia64 (Itanium) functions
190 The table below lists the symbols exported by the vDSO.
191
192 symbol version
193 ───────────────────────────────────────
194 __kernel_sigtramp LINUX_2.5
195 __kernel_syscall_via_break LINUX_2.5
196 __kernel_syscall_via_epc LINUX_2.5
197
198 The Itanium port is somewhat tricky. In addition to the vDSO above, it
199 also has "light-weight system calls" (also known as "fast syscalls" or
200 "fsys"). You can invoke these via the __kernel_syscall_via_epc vDSO
201 helper. The system calls listed here have the same semantics as if you
202 called them directly via syscall(2), so refer to the relevant documen‐
203 tation for each. The table below lists the functions available via
204 this mechanism.
205
206 function
207 ────────────────
208 clock_gettime
209 getcpu
210 getpid
211 getppid
212 gettimeofday
213 set_tid_address
214
215 parisc (hppa) functions
216 The parisc port has a code page full of utility functions called a
217 gateway page. Rather than use the normal ELF auxiliary vector
218 approach, it passes the address of the page to the process via the SR2
219 register. The permissions on the page are such that merely executing
220 those addresses automatically executes with kernel privileges and not
221 in user space. This is done to match the way HP-UX works.
222
223 Since it's just a raw page of code, there is no ELF information for
224 doing symbol lookups or versioning. Simply call into the appropriate
225 offset via the branch instruction, for example:
226
227 ble <offset>(%sr2, %r0)
228
229 offset function
230 ───────────────────────────────────────
231 00b0 lws_entry
232 00e0 set_thread_pointer
233 0100 linux_gateway_entry (syscall)
234 0268 syscall_nosys
235 0274 tracesys
236 0324 tracesys_next
237 0368 tracesys_exit
238 03a0 tracesys_sigexit
239 03b8 lws_start
240 03dc lws_exit_nosys
241 03e0 lws_exit
242 03e4 lws_compare_and_swap64
243 03e8 lws_compare_and_swap
244 0404 cas_wouldblock
245 0410 cas_action
246
247 ppc/32 functions
248 The table below lists the symbols exported by the vDSO. The functions
249 marked with a * are available only when the kernel is a PowerPC64
250 (64-bit) kernel.
251
252 symbol version
253 ────────────────────────────────────────
254 __kernel_clock_getres LINUX_2.6.15
255 __kernel_clock_gettime LINUX_2.6.15
256 __kernel_datapage_offset LINUX_2.6.15
257 __kernel_get_syscall_map LINUX_2.6.15
258 __kernel_get_tbfreq LINUX_2.6.15
259 __kernel_getcpu * LINUX_2.6.15
260 __kernel_gettimeofday LINUX_2.6.15
261 __kernel_sigtramp_rt32 LINUX_2.6.15
262 __kernel_sigtramp32 LINUX_2.6.15
263 __kernel_sync_dicache LINUX_2.6.15
264 __kernel_sync_dicache_p5 LINUX_2.6.15
265
266 The CLOCK_REALTIME_COARSE and CLOCK_MONOTONIC_COARSE clocks are not
267 supported by the __kernel_clock_getres and __kernel_clock_gettime
268 interfaces; the kernel falls back to the real system call.
269
270 ppc/64 functions
271 The table below lists the symbols exported by the vDSO.
272
273 symbol version
274 ────────────────────────────────────────
275 __kernel_clock_getres LINUX_2.6.15
276 __kernel_clock_gettime LINUX_2.6.15
277 __kernel_datapage_offset LINUX_2.6.15
278 __kernel_get_syscall_map LINUX_2.6.15
279 __kernel_get_tbfreq LINUX_2.6.15
280 __kernel_getcpu LINUX_2.6.15
281 __kernel_gettimeofday LINUX_2.6.15
282 __kernel_sigtramp_rt64 LINUX_2.6.15
283 __kernel_sync_dicache LINUX_2.6.15
284 __kernel_sync_dicache_p5 LINUX_2.6.15
285
286 The CLOCK_REALTIME_COARSE and CLOCK_MONOTONIC_COARSE clocks are not
287 supported by the __kernel_clock_getres and __kernel_clock_gettime
288 interfaces; the kernel falls back to the real system call.
289
290 s390 functions
291 The table below lists the symbols exported by the vDSO.
292
293 symbol version
294 ──────────────────────────────────────
295 __kernel_clock_getres LINUX_2.6.29
296 __kernel_clock_gettime LINUX_2.6.29
297 __kernel_gettimeofday LINUX_2.6.29
298
299 s390x functions
300 The table below lists the symbols exported by the vDSO.
301
302 symbol version
303 ──────────────────────────────────────
304 __kernel_clock_getres LINUX_2.6.29
305 __kernel_clock_gettime LINUX_2.6.29
306 __kernel_gettimeofday LINUX_2.6.29
307
308 sh (SuperH) functions
309 The table below lists the symbols exported by the vDSO.
310
311 symbol version
312 ──────────────────────────────────
313 __kernel_rt_sigreturn LINUX_2.6
314 __kernel_sigreturn LINUX_2.6
315 __kernel_vsyscall LINUX_2.6
316
317 i386 functions
318 The table below lists the symbols exported by the vDSO.
319
320 symbol version
321 ──────────────────────────────────────────────────────────────
322 __kernel_sigreturn LINUX_2.5
323 __kernel_rt_sigreturn LINUX_2.5
324 __kernel_vsyscall LINUX_2.5
325 __vdso_clock_gettime LINUX_2.6 (exported since Linux 3.15)
326 __vdso_gettimeofday LINUX_2.6 (exported since Linux 3.15)
327 __vdso_time LINUX_2.6 (exported since Linux 3.15)
328
329 x86-64 functions
330 The table below lists the symbols exported by the vDSO. All of these
331 symbols are also available without the "__vdso_" prefix, but you should
332 ignore those and stick to the names below.
333
334
335 symbol version
336 ─────────────────────────────────
337 __vdso_clock_gettime LINUX_2.6
338 __vdso_getcpu LINUX_2.6
339 __vdso_gettimeofday LINUX_2.6
340 __vdso_time LINUX_2.6
341
342 x86/x32 functions
343 The table below lists the symbols exported by the vDSO.
344
345 symbol version
346 ─────────────────────────────────
347 __vdso_clock_gettime LINUX_2.6
348 __vdso_getcpu LINUX_2.6
349 __vdso_gettimeofday LINUX_2.6
350 __vdso_time LINUX_2.6
351
352 History
353 The vDSO was originally just a single function—the vsyscall. In older
354 kernels, you might see that name in a process's memory map rather than
355 "vdso". Over time, people realized that this mechanism was a great way
356 to pass more functionality to user space, so it was reconceived as a
357 vDSO in the current format.
358
360 syscalls(2), getauxval(3), proc(5)
361
362 The documents, examples, and source code in the Linux source code tree:
363
364 Documentation/ABI/stable/vdso
365 Documentation/ia64/fsys.txt
366 Documentation/vDSO/* (includes examples of using the vDSO)
367
368 find arch/ -iname '*vdso*' -o -iname '*gate*'
369
371 This page is part of release 4.15 of the Linux man-pages project. A
372 description of the project, information about reporting bugs, and the
373 latest version of this page, can be found at
374 https://www.kernel.org/doc/man-pages/.
375
376
377
378Linux 2017-09-15 VDSO(7)