1xen-tscmode(7)                        Xen                       xen-tscmode(7)
2
3
4

NAME

6       xen-tscmode - Xen TSC (time stamp counter) and timekeeping discussion
7

OVERVIEW

9       As of Xen 4.0, a new config option called tsc_mode may be specified for
10       each domain.  The default for tsc_mode handles the vast majority of
11       hardware and software environments.  This document is targeted for Xen
12       users and administrators that may need to select a non-default
13       tsc_mode.
14
15       Proper selection of tsc_mode depends on an understanding not only of
16       the guest operating system (OS), but also of the application set that
17       will ever run on this guest OS.  This is because tsc_mode applies
18       equally to both the OS and ALL apps that are running on this domain,
19       now or in the future.
20
21       Key questions to be answered for the OS and/or each application are:
22
23       •   Does the OS/app use the rdtsc instruction at all?  (We will explain
24           below how to determine this.)
25
26       •   At what frequency is the rdtsc instruction executed by either the
27           OS or any running apps?  If the sum exceeds about 10,000 rdtsc
28           instructions per second per processor, we call this a "high-TSC-
29           frequency" OS/app/environment.  (This is relatively rare, and
30           developers of OS's and apps that are high-TSC-frequency are usually
31           aware of it.)
32
33       •   If the OS/app does use rdtsc, will it behave incorrectly if "time
34           goes backwards" or if the frequency of the TSC suddenly changes?
35           If so, we call this a "TSC-sensitive" app or OS; otherwise it is
36           "TSC-resilient".
37
38       This last is the US$64,000 question as it may be very difficult (or,
39       for legacy apps, even impossible) to predict all possible failure
40       cases.  As a result, unless proven otherwise, any app that uses rdtsc
41       must be assumed to be TSC-sensitive and, as we will see, this is the
42       default starting in Xen 4.0.
43
44       Xen's new tsc_mode parameter determines the circumstances under which
45       the family of rdtsc instructions are executed "natively" vs emulated.
46       Roughly speaking, native means rdtsc is fast but TSC-sensitive apps
47       may, under unpredictable circumstances, run incorrectly; emulated means
48       there is some performance degradation (unobservable in most cases), but
49       TSC-sensitive apps will always run correctly.  Prior to Xen 4.0, all
50       rdtsc instructions were native: "fast but potentially incorrect."
51       Starting at Xen 4.0, the default is that all rdtsc instructions are
52       "correct but potentially slow".  The tsc_mode parameter in 4.0 provides
53       an intelligent default but allows system administrator's to adjust how
54       rdtsc instructions are executed differently for different domains.
55
56       The non-default choices for tsc_mode are:
57
58tsc_mode=1 (always emulate).
59
60           All rdtsc instructions are emulated; this is the best choice when
61           TSC-sensitive apps are running and it is necessary to understand
62           worst-case performance degradation for a specific hardware
63           environment.
64
65tsc_mode=2 (never emulate).
66
67           This is the same as prior to Xen 4.0 and is the best choice if it
68           is certain that all apps running in this VM are TSC-resilient and
69           highest performance is required.
70
71tsc_mode=3 (PVRDTSCP).
72
73           This mode has been removed.
74
75       If tsc_mode is left unspecified (or set to tsc_mode=0), a hybrid
76       algorithm is utilized to ensure correctness while providing the best
77       performance possible given:
78
79       •   the requirement of correctness,
80
81       •   the underlying hardware, and
82
83       •   whether or not the VM has been saved/restored/migrated
84
85       To understand this in more detail, the rest of this document must be
86       read.
87

DETERMINING RDTSC FREQUENCY

89       To determine the frequency of rdtsc instructions that are emulated, an
90       "xl" command can be used by a privileged user of domain0.  The command:
91
92           # xl debug-key s; xl dmesg | tail
93
94       provides information about TSC usage in each domain where TSC emulation
95       is currently enabled.
96

TSC HISTORY

98       To understand tsc_mode completely, some background on TSC is required:
99
100       The x86 "timestamp counter", or TSC, is a 64-bit register on each
101       processor that increases monotonically.  Historically, TSC incremented
102       every processor cycle, but on recent processors, it increases at a
103       constant rate even if the processor changes frequency (for example, to
104       reduce processor power usage).  TSC is known by x86 programmers as the
105       fastest, highest-precision measurement of the passage of time so it is
106       often used as a foundation for performance monitoring.  And since it is
107       guaranteed to be monotonically increasing and, at 64 bits, is
108       guaranteed to not wraparound within 10 years, it is sometimes used as a
109       random number or a unique sequence identifier, such as to stamp
110       transactions so they can be replayed in a specific order.
111
112       On most older SMP and early multi-core machines, TSC was not
113       synchronized between processors.  Thus if an application were to read
114       the TSC on one processor, then was moved by the OS to another
115       processor, then read TSC again, it might appear that "time went
116       backwards".  This loss of monotonicity resulted in many obscure
117       application bugs when TSC-sensitive apps were ported from a
118       uniprocessor to an SMP environment; as a result, many applications --
119       especially in the Windows world -- removed their dependency on TSC and
120       replaced their timestamp needs with OS-specific functions, losing both
121       performance and precision. On some more recent generations of multi-
122       core machines, especially multi-socket multi-core machines, the TSC was
123       synchronized but if one processor were to enter certain low-power
124       states, its TSC would stop, destroying the synchrony and again causing
125       obscure bugs.  This reinforced decisions to avoid use of TSC
126       altogether.  On the most recent generations of multi-core machines,
127       however, synchronization is provided across all processors in all power
128       states, even on multi-socket machines, and provide a flag that
129       indicates that TSC is synchronized and "invariant".  Thus TSC is once
130       again useful for applications, and even newer operating systems are
131       using and depending upon TSC for critical timekeeping tasks when
132       running on these recent machines.
133
134       We will refer to hardware that ensures TSC is both synchronized and
135       invariant as "TSC-safe" and any hardware on which TSC is not (or may
136       not remain) synchronized as "TSC-unsafe".
137
138       As a result of TSC's sordid history, two classes of applications use
139       TSC: old applications designed for single processors, and the most
140       recent enterprise applications which require high-frequency high-
141       precision timestamping.
142
143       We will refer to apps that might break if running on a TSC-unsafe
144       machine as "TSC-sensitive"; apps that don't use TSC, or do use TSC but
145       use it in a way that monotonicity and frequency invariance are
146       unimportant as "TSC-resilient".
147
148       The emergence of virtualization once again complicates the usage of
149       TSC.  When features such as save/restore or live migration are
150       employed, a guest OS and all its currently running applications may be
151       invisibly transported to an entirely different physical machine.  While
152       TSC may be "safe" on one machine, it is essentially impossible to
153       precisely synchronize TSC across a data center or even a pool of
154       machines.  As a result, when run in a virtualized environment, rare and
155       obscure "time going backwards" problems might once again occur for
156       those TSC-sensitive applications.  Worse, if a guest OS moves from, for
157       example, a 3GHz machine to a 1.5GHz machine, attempts by an OS/app to
158       measure time intervals with TSC may without notice be incorrect by a
159       factor of two.
160
161       The rdtsc (read timestamp counter) instruction is used to read the TSC
162       register.  The rdtscp instruction is a variant of rdtsc on recent
163       processors.  We refer to these together as the rdtsc family of
164       instructions, or just "rdtsc".  Instructions in the rdtsc family are
165       non-privileged, but privileged software may set a cpuid bit to cause
166       all rdtsc family instructions to trap.  This trap can be detected by
167       Xen, which can then transparently "emulate" the results of the rdtsc
168       instruction and return control to the code following the rdtsc
169       instruction.
170
171       To provide a "safe" TSC, i.e. to ensure both TSC monotonicity and a
172       fixed rate, Xen provides rdtsc emulation whenever necessary or when
173       explicitly specified by a per-VM configuration option.  TSC emulation
174       is relatively slow -- roughly 15-20 times slower than the rdtsc
175       instruction when executed natively.  However, except when an OS or
176       application uses the rdtsc instruction at a high frequency (e.g. more
177       than about 10,000 times per second per processor), this performance
178       degradation is not noticeable (i.e. <0.3%).  And, TSC emulation is
179       nearly always faster than OS-provided alternatives (e.g. Linux's
180       gettimeofday).  For environments where it is certain that all apps are
181       TSC-resilient (e.g.  "TSC-safeness" is not necessary) and highest
182       performance is a requirement, TSC emulation may be entirely disabled
183       (tsc_mode==2).
184
185       The default mode (tsc_mode==0) checks TSC-safeness of the underlying
186       hardware on which the virtual machine is launched.  If it is TSC-safe,
187       rdtsc will execute at hardware speed; if it is not, rdtsc will be
188       emulated.  Once a virtual machine is save/restored or migrated,
189       however, there are two possibilities: TSC remains native IF the source
190       physical machine and target physical machine have the same TSC
191       frequency (or, for HVM/PVH guests, if TSC scaling support is
192       available); else TSC is emulated.  Note that, though emulated, the
193       "apparent" TSC frequency will be the TSC frequency of the initial
194       physical machine, even after migration.
195
196       Finally, tsc_mode==1 always enables TSC emulation, regardless of the
197       underlying physical hardware. The "apparent" TSC frequency will be the
198       TSC frequency of the initial physical machine, even after migration.
199       This mode is useful to measure any performance degradation that might
200       be encountered by a tsc_mode==0 domain after migration occurs, or a
201       tsc_mode==3 domain when it is running on TSC-unsafe hardware.
202
203       Note that while Xen ensures that an emulated TSC is "safe" across
204       migration, it does not ensure that it continues to tick at the same
205       rate during the actual migration.  As an oversimplified example, if TSC
206       is ticking once per second in a guest, and the guest is saved when the
207       TSC is 1000, then restored 30 seconds later, TSC is only guaranteed to
208       be greater than or equal to 1001, not precisely 1030.  This has some OS
209       implications as will be seen in the next section.
210

TSC INVARIANT BIT and NO_MIGRATE

212       Related to TSC emulation, the "TSC Invariant" bit is architecturally
213       defined in a cpuid bit on the most recent x86 processors.  If set, TSC
214       invariance ensures that the TSC is "safe", that is it will increment at
215       a constant rate regardless of power events, will be synchronized across
216       all processors, and was properly initialized to zero on all processors
217       at boot-time by system hardware/BIOS.  As long as system software never
218       writes to TSC, TSC will be safe and continuously incremented at a fixed
219       rate and thus can be used as a system "clocksource".
220
221       This bit is used by some OS's, and specifically by Linux starting with
222       version 2.6.30(?), to select TSC as a system clocksource.  Once
223       selected, TSC remains the Linux system clocksource unless manually
224       overridden.  In a virtualized environment, since it is not possible to
225       synchronize TSC across all the machines in a pool or data center, a
226       migration may "break" TSC as a usable clocksource; while time will not
227       go backwards, it may not track wallclock time well enough to avoid
228       certain time-sensitive consequences.  As a result, Xen can only expose
229       the TSC Invariant bit to a guest OS if it is certain that the domain
230       will never migrate.  As of Xen 4.0, the "no_migrate=1" VM configuration
231       option may be specified to disable migration.  If no_migrate is
232       selected and the VM is running on a physical machine with "TSC
233       Invariant", Linux 2.6.30+ will safely use TSC as the system
234       clocksource.  But, attempts to migrate or, once saved, restore this
235       domain will fail.
236
237       There is another cpuid-related complication: The x86 cpuid instruction
238       is non-privileged.  HVM domains are configured to always trap this
239       instruction to Xen, where Xen can "filter" the result.  In a PV OS, all
240       cpuid instructions have been replaced by a paravirtualized equivalent
241       of the cpuid instruction ("pvcpuid") and also trap to Xen.  But apps in
242       a PV guest that use a cpuid instruction execute it directly, without a
243       trap to Xen.  As a result, an app may directly examine the physical TSC
244       Invariant cpuid bit and make decisions based on that bit.
245

HARDWARE TSC SCALING

247       Intel VMX TSC scaling and AMD SVM TSC ratio allow the guest TSC read by
248       guest rdtsc/p increasing in a different frequency than the host TSC
249       frequency.
250
251       If a HVM container in default TSC mode (tsc_mode=0) is created on a
252       host that provides constant TSC, its guest TSC frequency will be the
253       same as the host. If it is later migrated to another host that provides
254       constant TSC and supports Intel VMX TSC scaling/AMD SVM TSC ratio, its
255       guest TSC frequency will be the same before and after migration.
256
257       For above HVM container in default TSC mode (tsc_mode=0), if above
258       hosts support rdtscp, both guest rdtsc and rdtscp instructions will be
259       executed natively before and after migration.
260

AUTHORS

262       Dan Magenheimer <dan.magenheimer@oracle.com>
263
264
265
2664.16.1                            2022-07-12                    xen-tscmode(7)
Impressum