1xen-tscmode(7) Xen xen-tscmode(7)
2
3
4
6 xen-tscmode - Xen TSC (time stamp counter) and timekeeping discussion
7
9 As of Xen 4.0, a new config option called tsc_mode may be specified for
10 each domain. The default for tsc_mode handles the vast majority of
11 hardware and software environments. This document is targeted for Xen
12 users and administrators that may need to select a non-default
13 tsc_mode.
14
15 Proper selection of tsc_mode depends on an understanding not only of
16 the guest operating system (OS), but also of the application set that
17 will ever run on this guest OS. This is because tsc_mode applies
18 equally to both the OS and ALL apps that are running on this domain,
19 now or in the future.
20
21 Key questions to be answered for the OS and/or each application are:
22
23 • Does the OS/app use the rdtsc instruction at all? (We will explain
24 below how to determine this.)
25
26 • At what frequency is the rdtsc instruction executed by either the
27 OS or any running apps? If the sum exceeds about 10,000 rdtsc
28 instructions per second per processor, we call this a "high-TSC-
29 frequency" OS/app/environment. (This is relatively rare, and
30 developers of OS's and apps that are high-TSC-frequency are usually
31 aware of it.)
32
33 • If the OS/app does use rdtsc, will it behave incorrectly if "time
34 goes backwards" or if the frequency of the TSC suddenly changes?
35 If so, we call this a "TSC-sensitive" app or OS; otherwise it is
36 "TSC-resilient".
37
38 This last is the US$64,000 question as it may be very difficult (or,
39 for legacy apps, even impossible) to predict all possible failure
40 cases. As a result, unless proven otherwise, any app that uses rdtsc
41 must be assumed to be TSC-sensitive and, as we will see, this is the
42 default starting in Xen 4.0.
43
44 Xen's new tsc_mode parameter determines the circumstances under which
45 the family of rdtsc instructions are executed "natively" vs emulated.
46 Roughly speaking, native means rdtsc is fast but TSC-sensitive apps
47 may, under unpredictable circumstances, run incorrectly; emulated means
48 there is some performance degradation (unobservable in most cases), but
49 TSC-sensitive apps will always run correctly. Prior to Xen 4.0, all
50 rdtsc instructions were native: "fast but potentially incorrect."
51 Starting at Xen 4.0, the default is that all rdtsc instructions are
52 "correct but potentially slow". The tsc_mode parameter in 4.0 provides
53 an intelligent default but allows system administrator's to adjust how
54 rdtsc instructions are executed differently for different domains.
55
56 The non-default choices for tsc_mode are:
57
58 • tsc_mode=1 (always emulate).
59
60 All rdtsc instructions are emulated; this is the best choice when
61 TSC-sensitive apps are running and it is necessary to understand
62 worst-case performance degradation for a specific hardware
63 environment.
64
65 • tsc_mode=2 (never emulate).
66
67 This is the same as prior to Xen 4.0 and is the best choice if it
68 is certain that all apps running in this VM are TSC-resilient and
69 highest performance is required.
70
71 • tsc_mode=3 (PVRDTSCP).
72
73 This mode has been removed.
74
75 If tsc_mode is left unspecified (or set to tsc_mode=0), a hybrid
76 algorithm is utilized to ensure correctness while providing the best
77 performance possible given:
78
79 • the requirement of correctness,
80
81 • the underlying hardware, and
82
83 • whether or not the VM has been saved/restored/migrated
84
85 To understand this in more detail, the rest of this document must be
86 read.
87
89 To determine the frequency of rdtsc instructions that are emulated, an
90 "xl" command can be used by a privileged user of domain0. The command:
91
92 # xl debug-key s; xl dmesg | tail
93
94 provides information about TSC usage in each domain where TSC emulation
95 is currently enabled.
96
98 To understand tsc_mode completely, some background on TSC is required:
99
100 The x86 "timestamp counter", or TSC, is a 64-bit register on each
101 processor that increases monotonically. Historically, TSC incremented
102 every processor cycle, but on recent processors, it increases at a
103 constant rate even if the processor changes frequency (for example, to
104 reduce processor power usage). TSC is known by x86 programmers as the
105 fastest, highest-precision measurement of the passage of time so it is
106 often used as a foundation for performance monitoring. And since it is
107 guaranteed to be monotonically increasing and, at 64 bits, is
108 guaranteed to not wraparound within 10 years, it is sometimes used as a
109 random number or a unique sequence identifier, such as to stamp
110 transactions so they can be replayed in a specific order.
111
112 On most older SMP and early multi-core machines, TSC was not
113 synchronized between processors. Thus if an application were to read
114 the TSC on one processor, then was moved by the OS to another
115 processor, then read TSC again, it might appear that "time went
116 backwards". This loss of monotonicity resulted in many obscure
117 application bugs when TSC-sensitive apps were ported from a
118 uniprocessor to an SMP environment; as a result, many applications --
119 especially in the Windows world -- removed their dependency on TSC and
120 replaced their timestamp needs with OS-specific functions, losing both
121 performance and precision. On some more recent generations of multi-
122 core machines, especially multi-socket multi-core machines, the TSC was
123 synchronized but if one processor were to enter certain low-power
124 states, its TSC would stop, destroying the synchrony and again causing
125 obscure bugs. This reinforced decisions to avoid use of TSC
126 altogether. On the most recent generations of multi-core machines,
127 however, synchronization is provided across all processors in all power
128 states, even on multi-socket machines, and provide a flag that
129 indicates that TSC is synchronized and "invariant". Thus TSC is once
130 again useful for applications, and even newer operating systems are
131 using and depending upon TSC for critical timekeeping tasks when
132 running on these recent machines.
133
134 We will refer to hardware that ensures TSC is both synchronized and
135 invariant as "TSC-safe" and any hardware on which TSC is not (or may
136 not remain) synchronized as "TSC-unsafe".
137
138 As a result of TSC's sordid history, two classes of applications use
139 TSC: old applications designed for single processors, and the most
140 recent enterprise applications which require high-frequency high-
141 precision timestamping.
142
143 We will refer to apps that might break if running on a TSC-unsafe
144 machine as "TSC-sensitive"; apps that don't use TSC, or do use TSC but
145 use it in a way that monotonicity and frequency invariance are
146 unimportant as "TSC-resilient".
147
148 The emergence of virtualization once again complicates the usage of
149 TSC. When features such as save/restore or live migration are
150 employed, a guest OS and all its currently running applications may be
151 invisibly transported to an entirely different physical machine. While
152 TSC may be "safe" on one machine, it is essentially impossible to
153 precisely synchronize TSC across a data center or even a pool of
154 machines. As a result, when run in a virtualized environment, rare and
155 obscure "time going backwards" problems might once again occur for
156 those TSC-sensitive applications. Worse, if a guest OS moves from, for
157 example, a 3GHz machine to a 1.5GHz machine, attempts by an OS/app to
158 measure time intervals with TSC may without notice be incorrect by a
159 factor of two.
160
161 The rdtsc (read timestamp counter) instruction is used to read the TSC
162 register. The rdtscp instruction is a variant of rdtsc on recent
163 processors. We refer to these together as the rdtsc family of
164 instructions, or just "rdtsc". Instructions in the rdtsc family are
165 non-privileged, but privileged software may set a cpuid bit to cause
166 all rdtsc family instructions to trap. This trap can be detected by
167 Xen, which can then transparently "emulate" the results of the rdtsc
168 instruction and return control to the code following the rdtsc
169 instruction.
170
171 To provide a "safe" TSC, i.e. to ensure both TSC monotonicity and a
172 fixed rate, Xen provides rdtsc emulation whenever necessary or when
173 explicitly specified by a per-VM configuration option. TSC emulation
174 is relatively slow -- roughly 15-20 times slower than the rdtsc
175 instruction when executed natively. However, except when an OS or
176 application uses the rdtsc instruction at a high frequency (e.g. more
177 than about 10,000 times per second per processor), this performance
178 degradation is not noticeable (i.e. <0.3%). And, TSC emulation is
179 nearly always faster than OS-provided alternatives (e.g. Linux's
180 gettimeofday). For environments where it is certain that all apps are
181 TSC-resilient (e.g. "TSC-safeness" is not necessary) and highest
182 performance is a requirement, TSC emulation may be entirely disabled
183 (tsc_mode==2).
184
185 The default mode (tsc_mode==0) checks TSC-safeness of the underlying
186 hardware on which the virtual machine is launched. If it is TSC-safe,
187 rdtsc will execute at hardware speed; if it is not, rdtsc will be
188 emulated. Once a virtual machine is save/restored or migrated,
189 however, there are two possibilities: TSC remains native IF the source
190 physical machine and target physical machine have the same TSC
191 frequency (or, for HVM/PVH guests, if TSC scaling support is
192 available); else TSC is emulated. Note that, though emulated, the
193 "apparent" TSC frequency will be the TSC frequency of the initial
194 physical machine, even after migration.
195
196 Finally, tsc_mode==1 always enables TSC emulation, regardless of the
197 underlying physical hardware. The "apparent" TSC frequency will be the
198 TSC frequency of the initial physical machine, even after migration.
199 This mode is useful to measure any performance degradation that might
200 be encountered by a tsc_mode==0 domain after migration occurs, or a
201 tsc_mode==3 domain when it is running on TSC-unsafe hardware.
202
203 Note that while Xen ensures that an emulated TSC is "safe" across
204 migration, it does not ensure that it continues to tick at the same
205 rate during the actual migration. As an oversimplified example, if TSC
206 is ticking once per second in a guest, and the guest is saved when the
207 TSC is 1000, then restored 30 seconds later, TSC is only guaranteed to
208 be greater than or equal to 1001, not precisely 1030. This has some OS
209 implications as will be seen in the next section.
210
212 Related to TSC emulation, the "TSC Invariant" bit is architecturally
213 defined in a cpuid bit on the most recent x86 processors. If set, TSC
214 invariance ensures that the TSC is "safe", that is it will increment at
215 a constant rate regardless of power events, will be synchronized across
216 all processors, and was properly initialized to zero on all processors
217 at boot-time by system hardware/BIOS. As long as system software never
218 writes to TSC, TSC will be safe and continuously incremented at a fixed
219 rate and thus can be used as a system "clocksource".
220
221 This bit is used by some OS's, and specifically by Linux starting with
222 version 2.6.30(?), to select TSC as a system clocksource. Once
223 selected, TSC remains the Linux system clocksource unless manually
224 overridden. In a virtualized environment, since it is not possible to
225 synchronize TSC across all the machines in a pool or data center, a
226 migration may "break" TSC as a usable clocksource; while time will not
227 go backwards, it may not track wallclock time well enough to avoid
228 certain time-sensitive consequences. As a result, Xen can only expose
229 the TSC Invariant bit to a guest OS if it is certain that the domain
230 will never migrate. As of Xen 4.0, the "no_migrate=1" VM configuration
231 option may be specified to disable migration. If no_migrate is
232 selected and the VM is running on a physical machine with "TSC
233 Invariant", Linux 2.6.30+ will safely use TSC as the system
234 clocksource. But, attempts to migrate or, once saved, restore this
235 domain will fail.
236
237 There is another cpuid-related complication: The x86 cpuid instruction
238 is non-privileged. HVM domains are configured to always trap this
239 instruction to Xen, where Xen can "filter" the result. In a PV OS, all
240 cpuid instructions have been replaced by a paravirtualized equivalent
241 of the cpuid instruction ("pvcpuid") and also trap to Xen. But apps in
242 a PV guest that use a cpuid instruction execute it directly, without a
243 trap to Xen. As a result, an app may directly examine the physical TSC
244 Invariant cpuid bit and make decisions based on that bit.
245
247 Intel VMX TSC scaling and AMD SVM TSC ratio allow the guest TSC read by
248 guest rdtsc/p increasing in a different frequency than the host TSC
249 frequency.
250
251 If a HVM container in default TSC mode (tsc_mode=0) is created on a
252 host that provides constant TSC, its guest TSC frequency will be the
253 same as the host. If it is later migrated to another host that provides
254 constant TSC and supports Intel VMX TSC scaling/AMD SVM TSC ratio, its
255 guest TSC frequency will be the same before and after migration.
256
257 For above HVM container in default TSC mode (tsc_mode=0), if above
258 hosts support rdtscp, both guest rdtsc and rdtscp instructions will be
259 executed natively before and after migration.
260
262 Dan Magenheimer <dan.magenheimer@oracle.com>
263
264
265
2664.16.3 2022-12-19 xen-tscmode(7)