1GMTREGRESS(1)                         GMT                        GMTREGRESS(1)
2
3
4

NAME

6       gmtregress - Linear regression of 1-D data sets
7

SYNOPSIS

9       gmtregress  [  table ] [  -Amin/max/inc ] [  -Clevel ] [  -Ex|y|o|r ] [
10       -Fflags ] [  -N1|2|r|w ] [   -S[r]  ]  [   -Tmin/max/inc  |   -Tn  ]  [
11       -W[w][x][y][r] ] [  -V[level] ] [ -aflags ] [ -bbinary ] [ -dnodata ] [
12       -eregexp ] [ -ggaps ] [ -hheaders ] [ -iflags ] [ -oflags ]
13
14       Note: No space is allowed between the option flag  and  the  associated
15       arguments.
16

DESCRIPTION

18       gmtregress  reads one or more data tables [or stdin] and determines the
19       best linear regression model y = a + b* x for each  segment  using  the
20       chosen  parameters.   The  user may specify which data and model compo‐
21       nents should be reported.  By default, the model will be  evaluated  at
22       the  input  points,  but  alternatively  you can specify an equidistant
23       range over which to evaluate the model, or  turn  off  evaluation  com‐
24       pletely.   Instead of determining the best fit we can perform a scan of
25       all possible regression lines (for a range of slope angles) and examine
26       how  the chosen misfit measure varies with slope.  This is particularly
27       useful when analyzing data with many outliers.  Note: If  you  actually
28       need  to  work with log10 of x or y you can accomplish that transforma‐
29       tion during read by using the -i option.
30

REQUIRED ARGUMENTS

32       None
33

OPTIONAL ARGUMENTS

35       table  One or more ASCII (or binary, see -bi[ncols][type])  data  table
36              file(s) holding a number of data columns. If no tables are given
37              then we read from standard input.  The  first  two  columns  are
38              expected  to  contain  the  required x and y data.  Depending on
39              your -W and -E settings we may expect an additional 1-3  columns
40              with error estimates of one of both of the data coordinates, and
41              even their correlation.
42
43       -Amin/max/inc
44              Instead of determining a best-fit regression we explore the full
45              range  of  regressions.   Examine  all possible regression lines
46              with slope angles between  min  and  max,  using  steps  of  inc
47              degrees  [-90/+90/1].   For  each slope the optimum intercept is
48              determined based on your regression type (-E)  and  misfit  norm
49              (-N)  settings.   For  each  segment  we report the four columns
50              angle, E, slope, intercept, for the range of  specified  angles.
51              The best model parameters within this range are written into the
52              segment header and reported in verbose mode (-V).
53
54       -Clevel
55              Set the confidence level (in %) to use for the optional calcula‐
56              tion  of  confidence bands on the regression [95].  This is only
57              used if -F includes the output column c.
58
59       -Ex|y|o|r
60              Type of linear regression, i.e., select the type  of  misfit  we
61              should calculate.  Choose from x (regress x on y; i.e., the mis‐
62              fit is measured  horizontally  from  data  point  to  regression
63              line),  y  (regress  y on x; i.e., the misfit is measured verti‐
64              cally [Default]), o (orthogonal regression; i.e., the misfit  is
65              measured  from  data  point orthogonally to nearest point on the
66              line), or r (Reduced Major Axis regression; i.e., the misfit  is
67              the product of both vertical and horizontal misfits) [y].
68
69       -Fflags
70              Append  a combination of the columns you wish returned; the out‐
71              put order  will  match  the  order  specified.   Choose  from  x
72              (observed  x), y (observed y), m (model prediction), r (residual
73              = data minus model), c (symmetrical confidence interval  on  the
74              regression;  see  -C  for specifying the level), z (standardized
75              residuals or so-called z-scores) and w (outlier weights 0 or  1;
76              for  -Nw  these are the Reweighted Least Squares weights) [xymr‐
77              czw].  As an alternative to evaluating the model, just give  -Fp
78              and  we  instead write a single record with the model parameters
79              npoints xmean ymean angle  misfit  slope  intercept  sigma_slope
80              sigma_intercept.
81
82       -N1|2|r|w
83              Selects  the  norm  to  use  for the misfit calculation.  Choose
84              among 1 (L-1 measure; the mean of  the  absolute  residuals),  2
85              (Least-squares;  the mean of the squared residuals), r (LMS; The
86              least median of the squared residuals), or  w  (RLS;  Reweighted
87              Least  Squares: the mean of the squared residuals after outliers
88              identified via LMS have been removed) [Default  is  2].   Tradi‐
89              tional  regression  uses L-2 while L-1 and in particular LMS are
90              more robust in how they handle outliers.   As  alluded  to,  RLS
91              implies an initial LMS regression which is then used to identify
92              outliers in the data, assign these a zero weight, and then  redo
93              the regression using a L-2 norm.
94
95       -S[r]  Restricts  which  records  will  be output.  By default all data
96              records will be output in the format specified by -F.  Use -S to
97              exclude  data  points  identified as outliers by the regression.
98              Alternatively, use -Sr to reverse this and only output the  out‐
99              lier records.
100
101       -Tmin/max/inc | -Tn
102              Evaluate the best-fit regression model at the equidistant points
103              implied by the arguments.  If -Tn is given instead we will reset
104              min  and max to the extreme x-values for each segment and deter‐
105              mine inc so that there are exactly n output values for each seg‐
106              ment.   To  skip  the  model evaluation entirely, simply provide
107              -T0.
108
109       -W[w][x][y][r]
110              Specifies weighted regression and which  weights  will  be  pro‐
111              vided.  Append x if giving 1-sigma uncertainties in the x-obser‐
112              vations, y if giving 1-sigma uncertainties in y, and r if giving
113              correlations  between  x  and y observations, in the order these
114              columns appear in the input (after the two required and  leading
115              x,  y  columns).  Giving both x and y (and optionally r) implies
116              an orthogonal regression, otherwise giving x requires -Ex and  y
117              requires -Ey.  We convert uncertainties in x and y to regression
118              weights via the relationship weight = 1/sigma.  Use -Ww  if  the
119              we  should  interpret  the  input  columns  to  have precomputed
120              weights instead.  Note: residuals with respect to the regression
121              line  will be scaled by the given weights.  Most norms will then
122              square this weighted residual (-N1 is the only exception).
123
124       -V[level] (more ...)
125              Select verbosity level [c].
126
127       -acol=name[...] (more ...)
128              Set aspatial column associations col=name.
129
130       -bi[ncols][t] (more ...)
131              Select native binary input.
132
133       -bo[ncols][type] (more ...)
134              Select native binary output. [Default is same as input].
135
136       -d[i|o]nodata (more ...)
137              Replace input columns that equal nodata  with  NaN  and  do  the
138              reverse on output.
139
140       -e[~]"pattern" | -e[~]/regexp/[i] (more ...)
141              Only accept data records that match the given pattern.
142
143       -g[a]x|y|d|X|Y|D|[col]z[+|-]gap[u] (more ...)
144              Determine data gaps and line breaks.
145
146       -h[i|o][n][+c][+d][+rremark][+rtitle] (more ...)
147              Skip or produce header record(s).
148
149       -icols[+l][+sscale][+ooffset][,...] (more ...)
150              Select input columns and transformations (0 is first column).
151
152       -ocols[,...] (more ...)
153              Select output columns (0 is first column).
154
155       -^ or just -
156              Print  a  short  message  about  the syntax of the command, then
157              exits (NOTE: on Windows just use -).
158
159       -+ or just +
160              Print an extensive usage (help) message, including the  explana‐
161              tion  of  any  module-specific  option  (but  not the GMT common
162              options), then exits.
163
164       -? or no arguments
165              Print a complete usage (help) message, including the explanation
166              of all options, then exits.
167

ASCII FORMAT PRECISION

169       The ASCII output formats of numerical data are controlled by parameters
170       in your gmt.conf file. Longitude and latitude are  formatted  according
171       to   FORMAT_GEO_OUT,  absolute  time  is  under  the  control  of  FOR‐
172       MAT_DATE_OUT and FORMAT_CLOCK_OUT, whereas general floating point  val‐
173       ues are formatted according to FORMAT_FLOAT_OUT. Be aware that the for‐
174       mat in effect can lead to loss of precision in ASCII output, which  can
175       lead  to  various  problems  downstream.  If you find the output is not
176       written with enough precision, consider switching to binary output (-bo
177       if  available) or specify more decimals using the FORMAT_FLOAT_OUT set‐
178       ting.
179

EXAMPLES

181       To do a standard least-squares regression on the x-y data in points.txt
182       and  return  x,  y, and model prediction with 99% confidence intervals,
183       try
184
185              gmt regress points.txt -Fxymc -C99 > points_regressed.txt
186
187       To just get the slope for the above regression, try
188
189              slope=`gmt regress points.txt -Fp -o5`
190
191       To do a reweighted least-squares regression on the data  rough.txt  and
192       return x, y, model prediction and the RLS weights, try
193
194              gmt regress rough.txt -Fxymw > points_regressed.txt
195
196       To  do an orthogonal least-squares regression on the data crazy.txt but
197       first take the logarithm of both x and y, then return x, y, model  pre‐
198       diction and the normalized residuals (z-scores), try
199
200              gmt regress crazy.txt -Eo -Fxymz -i0-1l > points_regressed.txt
201
202       To examine how the orthogonal LMS misfits vary with angle between 0 and
203       90 in steps of 0.2 degrees for the same file, try
204
205              gmt regress points.txt -A0/90/0.2 -Eo -Nr > points_analysis.txt
206

REFERENCES

208       Draper, N. R., and H. Smith, 1998,  Applied  regression  analysis,  3rd
209       ed., 736 pp., John Wiley and Sons, New York.
210
211       Rousseeuw,  P. J., and A. M. Leroy, 1987, Robust regression and outlier
212       detection, 329 pp., John Wiley and Sons, New York.
213
214       York, D., N. M. Evensen, M. L. Martinez,  and  J.  De  Basebe  Delgado,
215       2004,  Unified  equations for the slope, intercept, and standard errors
216       of the best straight line, Am. J. Phys., 72(3), 367-375.
217

SEE ALSO

219       gmt, trend1d, trend2d
220
222       2019, P. Wessel, W. H. F. Smith, R. Scharroo, J. Luis, and F. Wobbe
223
224
225
226
2275.4.5                            Feb 24, 2019                    GMTREGRESS(1)
Impressum