1math::changepoint(n)           Tcl Math Library           math::changepoint(n)
2
3
4
5______________________________________________________________________________
6

NAME

8       math::changepoint - Change point detection methods
9

SYNOPSIS

11       package require Tcl  8.6
12
13       package require TclOO
14
15       package require math::statistics
16
17       package require math::changepoint  ?0.1?
18
19       ::math::changepoint::cusum-detect data ?args?
20
21       ::math::changepoint::cusum-online ?args?
22
23       $cusumObj examine value
24
25       $cusumObj reset
26
27       ::math::changepoint::binary-segmentation data ?args?
28
29______________________________________________________________________________
30

DESCRIPTION

32       The math::changepoint package implements a number of well-known methods
33       to determine if a series of data contains a shift in the mean  or  not.
34       Note  that these methods only indicate if a shift in the mean is proba‐
35       bly. Due to the stochastic nature of the data that  will  be  analysed,
36       false  positives are possible.  The CUSUM method is implemented in both
37       an "offline" and an "online" version, so that it can be used either for
38       a  complete  data  series or for detecting changes in data that come in
39       one by one. The implementation has been based on these websites mostly:
40
41https://www.itl.nist.gov/div898/handbook/pmc/section3/pmc323.htm
42
43https://en.wikipedia.org/wiki/CUSUM
44
45       Basically, the deviation of the data from a given target value is accu‐
46       mulated  and when the total deviation becomes too large, a change point
47       is reported.  A second method, binary segmentation, is implemented only
48       as  an  "offline"  method,  as it needs to examine the data series as a
49       whole. In the variant contained here  the  following  ideas  have  been
50       used:
51
52       •      The segments in which the data series may be separated shold not
53              be too short, otherwise the ultimate result could be segments of
54              only one data point long. So a minimum length is used.
55
56       •      To  make  the  segmentation worthwhile there should be a minimum
57              gain in reducing the cost function (the sum of the squared devi‐
58              ations from the mean for each segment).
59
60       This  may not be in agreement with the descriptions of the method found
61       in various publications, but it is simple to understand and  intuitive.
62       One publication that provides more information on the method in general
63       is "Selective review of offline  change  point  detection  methods"  by
64       Truong et al. https://arxiv.org/abs/1801.00718.
65

PROCEDURES

67       The package defines the following public procedures:
68
69       ::math::changepoint::cusum-detect data ?args?
70              Examine a given data series and return the location of the first
71              change (if any)
72
73              double data
74                     Series of data to be examined
75
76              list args
77                     Optional list of key-value pairs:
78
79                     -target value
80                            The target (or mean) for the time series
81
82                     -tolerance value
83                            The tolerated standard deviation
84
85                     -kfactor value
86                            The factor by which to multiply the standard devi‐
87                            ation  (defaults to 0.5, typically between 0.5 and
88                            1.0)
89
90                     -hfactor value
91                            The factor determining the  limits  betweem  which
92                            the   "cusum"   statistic  is  accepted  (typicaly
93                            3.0-5.0, default 4.0)
94
95       ::math::changepoint::cusum-online ?args?
96              Class to examine data passed in against expected properties.  At
97              least the keywords -target and -tolerance must be given.
98
99              list args
100                     List of key-value pairs:
101
102                     -target value
103                            The target (or mean) for the time series
104
105                     -tolerance value
106                            The tolerated standard deviation
107
108                     -kfactor value
109                            The factor by which to multiply the standard devi‐
110                            ation (defaults to 0.5, typically between 0.5  and
111                            1.0)
112
113                     -hfactor value
114                            The  factor  determining  the limits betweem which
115                            the  "cusum"  statistic  is   accepted   (typicaly
116                            3.0-5.0, default 4.0)
117
118       $cusumObj examine value
119              Pass a value to the cusum-online object and examine it. If, with
120              this new value, the cumulative sum remains  within  the  bounds,
121              zero (0) is returned, otherwise one (1) is returned.
122
123              double value
124                     The new value
125
126       $cusumObj reset
127              Reset  the  cumulative  sum,  so  that the examination can start
128              afresh.
129
130       ::math::changepoint::binary-segmentation data ?args?
131              Apply the binary segmentation method recursively to find  change
132              points. Returns a list of indices of potential change points
133
134              list data
135                     Data to be examined
136
137              list args
138                     Optional key-value pairs:
139
140                     -minlength number
141                            Minimum number of points in each segment (default:
142                            5)
143
144                     -threshold value
145                            Factor applied to the standard deviation function‐
146                            ing  as  a  threshold  for accepting the change in
147                            cost function as an improvement (default: 1.0)
148

KEYWORDS

150       control, statistics
151

CATEGORY

153       Mathematics
154
156       Copyright (c) 2020 by Arjen Markus
157
158
159
160
161tcllib                                0.1                 math::changepoint(n)
Impressum