1RRD_PDPCALC(1)                      rrdtool                     RRD_PDPCALC(1)
2
3
4

NAME

6       PDP calculation explanation - PDP inner calculation logics with an
7       example by Tianpeng Xia
8

DESCRIPTION

10       This article explains how PDP are calculated in a detailed yet easy-to-
11       understand way, with an example.
12

Refreshing some basics about PDP

14   Fundamental knowledge
15       If you have not read the tutorials or man pages either on the official
16       site or those by others, then I strongly encourage you to do so.  As
17       said in the description, this article will only explain how a PDP is
18       calculated, but not the definition of it.  So please read the following
19       materials to get a basic understanding of PDP:
20
21       <http://rrdtool.vandenbogaerdt.nl/process.php> - By Alex van den
22       Bogaerdt. This article explained PDP in a very detailed and clear way,
23       however, it does not explain the "normalization process" in its
24       "Normalize interval" section in the right way( as opposed to the
25       official version I confirmed with @oetiker himself). The flaw can be
26       easily seen in the bar charts, discussed in the "Calculation logics"
27       section.
28
29       <https://oss.oetiker.ch/rrdtool/doc/rrdcreate.en.html> - This one is on
30       the official site. Actually it's the manual page for "rrdcreate", and
31       it reveals what's under the hood with regard to PDP calculation in its
32       "The HEARTBEAT and the STEP" section.
33
34       The text graph by Don Baarda provides a vivid explanation on how UNKOWN
35       data are produced and how heartbeat value can influence in the
36       sampling. Unfortunately, it fails to give a clear method by which PDPs
37       are calculated.
38
39       <https://oss.oetiker.ch/rrdtool/tut/rrdtutorial.en.html> - Another
40       detailed official tutorial by Alex van den Bogaerdt. Similarly, it only
41       provides examples with data evenly and exactly distributed according to
42       the step set.
43
44       If you don't like doing experiments or care about the inner mechanics
45       that much, you can just stop here and give more attention to more
46       practical topics like graph exports or command manual. But if you are
47       the sort of people like me who just care as much about the calculation
48       logics, please read on.
49

Calculation logics

51       Here begins the core part of this article. In the following content of
52       this section, I would like to give two versions of calculation methods,
53       one by Alex van den Bogaerdt and the other by @eotiker.
54
55       To provide an ASCII-friendly explanation, I will explain both versions
56       with the char below instead of a real image.
57
58         |
59         |    (v1)
60         | _______                        (v4)  (v5)
61         | |     |           (v3)        ____________
62         | |     |        ______________|     ||   |
63         | |     |        |            ||     ||   |
64         | |     |        |            ||     ||   |
65         | |     |   (v2) |            ||     ||   |
66         | |     |________|            ||     ||   |
67        --------------------------------------------->
68         0 1     3        7            17     20   21
69
70       The X axis means time slots( each second denotes one slot) and the Y
71       axis means the value.
72
73       Let's make everything a little clearer:
74
75       - The step is 5
76
77       - each PDP gets updated only if a value arrives at or after the last
78       slot of the PDP, for instance, the last slot of the PDP from 16 to 20
79       is 20
80
81       - The heartbeat is 20, so the samples during the entire 7-17 period is
82       not discarded
83
84       - At second 3, the first value comes in as v1, and so on
85
86       - Second 0 is the origin, and it does not count as a sample
87
88   Bogaerdt version
89       As can be seen on this page:
90       <http://rrdtool.vandenbogaerdt.nl/process.php>, after all the primary
91       data are transformed to rates( except for GAUGE, of course), they have
92       to go through a normalization process if they are not distributed
93       exactly according to the step or on well-defined boundaries in time, in
94       the words of the author.
95
96       What does that mean? Basically, if all the known (as opposed to an
97       unknown value) data make up at least 50% of all slots during a period,
98       then a PDP is calculated from them.
99
100       This version seems to go well until we reach the bar chart part.
101
102       According to the ASCII bar chart, we have the following results:
103
104       From second 1 on, the PDP of each period( 1-5,6-10, ...) is computed by
105       averaging all the values within it.
106
107       So: - the PDP from 1 to 5 is (v1*3+v2*2)/5
108
109       - the PDP from 6 to 10 is (v2*2+v3*3)/5
110
111       - the PDP from 11 to 15 is (v3*5)/5, since all the values in slots 11,
112       12, 13, 14 and 15 are the same, which is v3
113
114       - ...
115
116   The official version( also @oetiker version):
117       Using the same chart, this version suggests the following:
118
119       - the PDP from 1 to 5 is (v1*3+v2*2)/5
120
121       - the PDPs from 6 to 10 and 11 to 15 are the SAME, which is (v2*2+v3*8)
122
123       - ...
124
125   A Comparison and some explanation
126       So we have seen the above two versions and their PDPs from 6 to 10 and
127       11 to 15 do not comply with each other.
128
129       Why is that?
130
131       Because the difference between the official version and Bogaerdt
132       version stems from the way they do the calculation for PDP(6-10) and
133       PDP(11-15).
134
135       Let's discuss this in more detail using the above bar chart.
136
137       Bogaerdt's version,
138
139       PDPs are always computed individually no matter how values arrive.
140
141       For example, the value at slot 17 comes after the last slot of
142       PDP(11-15). Also, the immediate previous value before slot 17 is at 7.
143       All the slots from 7 to 17 are assigned v3. Since each PDP is computed
144       individually, PDP(6-10) is (v2*2+v3*3)/5 while the PDP(11-15) is
145       (v3*5)/5.
146
147       The official version
148
149       PDPs are always computed in terms of the steps which the next update
150       spans, be it 1 step, 2 steps or n steps; in other words, PDPs may be
151       computed together.
152
153       For example, the update at slot 17 spans PDP(6-10) and PDP(11-15)
154       because the immediate previous value is at 7 and 7 is within 6 and 10 ,
155       and 17 is after 15. PDP(1-5) and PDP(16-20) are not included since the
156       update at slot 7 has already triggered the calculation for PDP(1-5) and
157       the update at slot 17 comes before the last slot of PDP(16-20) which is
158       20.
159
160       That's the reason why PDP(6-10) and PDP(11-15) have the same value,
161       (v2*2+v3*8).
162

An example

164       If you are still confused, don't worry, an example is here to help you.
165
166       Let's get our hands dirty with some commands
167
168        rrdtool create target.rrd --start 1000000000  --step 5 DS:mem:GAUGE:20:0:100 RRA:AVERAGE:0.5:1:10
169        rrdtool update target.rrd 1000000003:8 1000000006:1 1000000017:6 \
170        1000000020:7 1000000021:7 1000000022:4 \
171        1000000023:3 1000000036:1 1000000037:2 \
172        1000000038:3 1000000039:3 1000000042:5
173        rrdtool fetch target.rrd AVERAGE --start 1000000000 --end 1000000045
174
175       Basically, the above codes contain 3 commands: create, update and
176       fetch. First create a new rrd file, and then we feed in some data and
177       last we fetch all the PDPs from the rrd.
178
179   Focus on single steps
180       In order to provide a detailed explanation, each the calculation
181       process of each PDP is provided.
182
183       Below is the output of the commands above:
184
185        1000000005: 5.2000000000e+00
186        1000000010: 5.5000000000e+00
187        1000000015: 5.5000000000e+00
188        1000000020: 6.6000000000e+00
189        1000000025: 1.7333333333e+00
190        1000000030: 1.7333333333e+00
191        1000000035: 1.7333333333e+00
192        1000000040: 2.8000000000e+00
193        1000000045: nan
194        1000000050: nan
195
196       NOTE: 1000000005 means the PDP from 1000000001 to 1000000005, and so
197       on. For concision and readability, we use only the last two digits, so
198       05 denotes 1000000005. We choose the type of the data source as gauge
199       because original values will be treated as rates, no additional
200       transformation is needed, see this article
201       <http://rrdtool.vandenbogaerdt.nl/process.php> for detail.
202
203       05: 5.2 = (8*3+1*2)/5
204
205       10: 5.5 = (1*1+6*9)/10
206
207       15: the same as the previous one
208
209       20: 6.6 = (6*2+7*3)/5
210
211       25: 1.73333 = (7+4+3+1*12)/15
212
213       ...
214
215       45: nan, as the last value is at 42,which does not trigger the
216       calculation for PDP(41-45)
217
218       50: nan, why this unknown PDP is shown is explained in this article
219       <https://oss.oetiker.ch/rrdtool/tut/rrdtutorial.en.html>
220

SUMMARY

222       All that said, I hope you get a clear understanding of the inner
223       calculation "magic" for PDPs.
224
225   Other References
226       ·   A great PowerShell shell script for generating ASCII bar charts:
227           <https://gallery.technet.microsoft.com/scriptcenter/Sample-Script-to-Generate-59c80d4c>
228
229       ·   <https://stackoverflow.com/questions/18924450/rrd-wrong-values>
230
231
232
2331.7.1                             2019-02-04                    RRD_PDPCALC(1)
Impressum