1CBQ(8) Linux CBQ(8)
2
3
4
6 CBQ - Class Based Queueing
7
9 tc qdisc ... dev dev ( parent classid | root) [ handle major: ] cbq [
10 allot bytes ] avpkt bytes bandwidth rate [ cell bytes ] [ ewma log ] [
11 mpu bytes ]
12
13 tc class ... dev dev parent major:[minor] [ classid major:minor ] cbq
14 allot bytes [ bandwidth rate ] [ rate rate ] prio priority [ weight
15 weight ] [ minburst packets ] [ maxburst packets ] [ ewma log ] [ cell
16 bytes ] avpkt bytes [ mpu bytes ] [ bounded isolated ] [ split handle &
17 defmap defmap ] [ estimator interval timeconstant ]
18
19
21 Class Based Queueing is a classful qdisc that implements a rich
22 linksharing hierarchy of classes. It contains shaping elements as well
23 as prioritizing capabilities. Shaping is performed using link idle time
24 calculations based on the timing of dequeue events and underlying link
25 bandwidth.
26
27
29 When shaping a 10mbit/s connection to 1mbit/s, the link will be idle
30 90% of the time. If it isn't, it needs to be throttled so that it IS
31 idle 90% of the time.
32
33 During operations, the effective idletime is measured using an exponen‐
34 tial weighted moving average (EWMA), which considers recent packets to
35 be exponentially more important than past ones. The Unix loadaverage is
36 calculated in the same way.
37
38 The calculated idle time is subtracted from the EWMA measured one, the
39 resulting number is called 'avgidle'. A perfectly loaded link has an
40 avgidle of zero: packets arrive exactly at the calculated interval.
41
42 An overloaded link has a negative avgidle and if it gets too negative,
43 CBQ throttles and is then 'overlimit'.
44
45 Conversely, an idle link might amass a huge avgidle, which would then
46 allow infinite bandwidths after a few hours of silence. To prevent
47 this, avgidle is capped at maxidle.
48
49 If overlimit, in theory, the CBQ could throttle itself for exactly the
50 amount of time that was calculated to pass between packets, and then
51 pass one packet, and throttle again. Due to timer resolution con‐
52 straints, this may not be feasible, see the minburst parameter below.
53
54
56 Within the one CBQ instance many classes may exist. Each of these
57 classes contains another qdisc, by default tc-pfifo(8).
58
59 When enqueueing a packet, CBQ starts at the root and uses various meth‐
60 ods to determine which class should receive the data.
61
62 In the absence of uncommon configuration options, the process is rather
63 easy. At each node we look for an instruction, and then go to the
64 class the instruction refers us to. If the class found is a barren
65 leaf-node (without children), we enqueue the packet there. If it is not
66 yet a leaf node, we do the whole thing over again starting from that
67 node.
68
69 The following actions are performed, in order at each node we visit,
70 until one sends us to another node, or terminates the process.
71
72 (i) Consult filters attached to the class. If sent to a leafnode, we
73 are done. Otherwise, restart.
74
75 (ii) Consult the defmap for the priority assigned to this packet,
76 which depends on the TOS bits. Check if the referral is leaf‐
77 less, otherwise restart.
78
79 (iii) Ask the defmap for instructions for the 'best effort' priority.
80 Check the answer for leafness, otherwise restart.
81
82 (iv) If none of the above returned with an instruction, enqueue at
83 this node.
84
85 This algorithm makes sure that a packet always ends up somewhere, even
86 while you are busy building your configuration.
87
88 For more details, see tc-cbq-details(8).
89
90
92 When dequeuing for sending to the network device, CBQ decides which of
93 its classes will be allowed to send. It does so with a Weighted Round
94 Robin process in which each class with packets gets a chance to send in
95 turn. The WRR process starts by asking the highest priority classes
96 (lowest numerically - highest semantically) for packets, and will con‐
97 tinue to do so until they have no more data to offer, in which case the
98 process repeats for lower priorities.
99
100 Classes by default borrow bandwidth from their siblings. A class can be
101 prevented from doing so by declaring it 'bounded'. A class can also
102 indicate its unwillingness to lend out bandwidth by being 'isolated'.
103
104
106 The root of a CBQ qdisc class tree has the following parameters:
107
108
109 parent major:minor | root
110 This mandatory parameter determines the place of the CBQ
111 instance, either at the root of an interface or within an exist‐
112 ing class.
113
114 handle major:
115 Like all other qdiscs, the CBQ can be assigned a handle. Should
116 consist only of a major number, followed by a colon. Optional,
117 but very useful if classes will be generated within this qdisc.
118
119 allot bytes
120 This allotment is the 'chunkiness' of link sharing and is used
121 for determining packet transmission time tables. The qdisc allot
122 differs slightly from the class allot discussed below. Optional.
123 Defaults to a reasonable value, related to avpkt.
124
125 avpkt bytes
126 The average size of a packet is needed for calculating maxidle,
127 and is also used for making sure 'allot' has a safe value.
128 Mandatory.
129
130 bandwidth rate
131 To determine the idle time, CBQ must know the bandwidth of your
132 underlying physical interface, or parent qdisc. This is a vital
133 parameter, more about it later. Mandatory.
134
135 cell The cell size determines he granularity of packet transmission
136 time calculations. Has a sensible default.
137
138 mpu A zero sized packet may still take time to transmit. This value
139 is the lower cap for packet transmission time calculations -
140 packets smaller than this value are still deemed to have this
141 size. Defaults to zero.
142
143 ewma log
144 When CBQ needs to measure the average idle time, it does so
145 using an Exponentially Weighted Moving Average which smooths out
146 measurements into a moving average. The EWMA LOG determines how
147 much smoothing occurs. Lower values imply greater sensitivity.
148 Must be between 0 and 31. Defaults to 5.
149
150 A CBQ qdisc does not shape out of its own accord. It only needs to know
151 certain parameters about the underlying link. Actual shaping is done in
152 classes.
153
154
156 Classes have a host of parameters to configure their operation.
157
158
159 parent major:minor
160 Place of this class within the hierarchy. If attached directly
161 to a qdisc and not to another class, minor can be omitted.
162 Mandatory.
163
164 classid major:minor
165 Like qdiscs, classes can be named. The major number must be
166 equal to the major number of the qdisc to which it belongs.
167 Optional, but needed if this class is going to have children.
168
169 weight weight
170 When dequeuing to the interface, classes are tried for traffic
171 in a round-robin fashion. Classes with a higher configured qdisc
172 will generally have more traffic to offer during each round, so
173 it makes sense to allow it to dequeue more traffic. All weights
174 under a class are normalized, so only the ratios matter.
175 Defaults to the configured rate, unless the priority of this
176 class is maximal, in which case it is set to 1.
177
178 allot bytes
179 Allot specifies how many bytes a qdisc can dequeue during each
180 round of the process. This parameter is weighted using the
181 renormalized class weight described above. Silently capped at a
182 minimum of 3/2 avpkt. Mandatory.
183
184
185 prio priority
186 In the round-robin process, classes with the lowest priority
187 field are tried for packets first. Mandatory.
188
189
190 avpkt See the QDISC section.
191
192
193 rate rate
194 Maximum rate this class and all its children combined can send
195 at. Mandatory.
196
197
198 bandwidth rate
199 This is different from the bandwidth specified when creating a
200 CBQ disc! Only used to determine maxidle and offtime, which are
201 only calculated when specifying maxburst or minburst. Mandatory
202 if specifying maxburst or minburst.
203
204
205 maxburst
206 This number of packets is used to calculate maxidle so that when
207 avgidle is at maxidle, this number of average packets can be
208 burst before avgidle drops to 0. Set it higher to be more toler‐
209 ant of bursts. You can't set maxidle directly, only via this
210 parameter.
211
212
213 minburst
214 As mentioned before, CBQ needs to throttle in case of overlimit.
215 The ideal solution is to do so for exactly the calculated idle
216 time, and pass 1 packet. However, Unix kernels generally have a
217 hard time scheduling events shorter than 10ms, so it is better
218 to throttle for a longer period, and then pass minburst packets
219 in one go, and then sleep minburst times longer.
220
221 The time to wait is called the offtime. Higher values of min‐
222 burst lead to more accurate shaping in the long term, but to
223 bigger bursts at millisecond timescales. Optional.
224
225
226 minidle
227 If avgidle is below 0, we are overlimits and need to wait until
228 avgidle will be big enough to send one packet. To prevent a sud‐
229 den burst from shutting down the link for a prolonged period of
230 time, avgidle is reset to minidle if it gets too low.
231
232 Minidle is specified in negative microseconds, so 10 means that
233 avgidle is capped at -10us. Optional.
234
235
236 bounded
237 Signifies that this class will not borrow bandwidth from its
238 siblings.
239
240 isolated
241 Means that this class will not borrow bandwidth to its siblings
242
243
244 split major:minor & defmap bitmap[/bitmap]
245 If consulting filters attached to a class did not give a ver‐
246 dict, CBQ can also classify based on the packet's priority.
247 There are 16 priorities available, numbered from 0 to 15.
248
249 The defmap specifies which priorities this class wants to
250 receive, specified as a bitmap. The Least Significant Bit corre‐
251 sponds to priority zero. The split parameter tells CBQ at which
252 class the decision must be made, which should be a (grand)parent
253 of the class you are adding.
254
255 As an example, 'tc class add ... classid 10:1 cbq .. split 10:0
256 defmap c0' configures class 10:0 to send packets with priorities
257 6 and 7 to 10:1.
258
259 The complimentary configuration would then be: 'tc class add ...
260 classid 10:2 cbq ... split 10:0 defmap 3f' Which would send all
261 packets 0, 1, 2, 3, 4 and 5 to 10:1.
262
263 estimator interval timeconstant
264 CBQ can measure how much bandwidth each class is using, which tc
265 filters can use to classify packets with. In order to determine
266 the bandwidth it uses a very simple estimator that measures once
267 every interval microseconds how much traffic has passed. This
268 again is a EWMA, for which the time constant can be specified,
269 also in microseconds. The time constant corresponds to the slug‐
270 gishness of the measurement or, conversely, to the sensitivity
271 of the average to short bursts. Higher values mean less sensi‐
272 tivity.
273
274
276 The actual bandwidth of the underlying link may not be known, for exam‐
277 ple in the case of PPoE or PPTP connections which in fact may send over
278 a pipe, instead of over a physical device. CBQ is quite resilient to
279 major errors in the configured bandwidth, probably a the cost of
280 coarser shaping.
281
282 Default kernels rely on coarse timing information for making decisions.
283 These may make shaping precise in the long term, but inaccurate on sec‐
284 ond long scales.
285
286 See tc-cbq-details(8) for hints on how to improve this.
287
288
290 o Sally Floyd and Van Jacobson, "Link-sharing and Resource Manage‐
291 ment Models for Packet Networks", IEEE/ACM Transactions on Net‐
292 working, Vol.3, No.4, 1995
293
294
295 o Sally Floyd, "Notes on CBQ and Guaranteed Service", 1995
296
297
298 o Sally Floyd, "Notes on Class-Based Queueing: Setting Parame‐
299 ters", 1996
300
301
302 o Sally Floyd and Michael Speer, "Experimental Results for Class-
303 Based Queueing", 1998, not published.
304
305
306
307
309 tc(8)
310
311
313 Alexey N. Kuznetsov, <kuznet@ms2.inr.ac.ru>. This manpage maintained by
314 bert hubert <ahu@ds9a.nl>
315
316
317
318iproute2 16 December 2001 CBQ(8)