1math::statistics(n)            Tcl Math Library            math::statistics(n)
2
3
4
5______________________________________________________________________________
6

NAME

8       math::statistics - Basic statistical functions and procedures
9

SYNOPSIS

11       package require Tcl  8
12
13       package require math::statistics  0.5
14
15       ::math::statistics::mean data
16
17       ::math::statistics::min data
18
19       ::math::statistics::max data
20
21       ::math::statistics::number data
22
23       ::math::statistics::stdev data
24
25       ::math::statistics::var data
26
27       ::math::statistics::pstdev data
28
29       ::math::statistics::pvar data
30
31       ::math::statistics::median data
32
33       ::math::statistics::basic-stats data
34
35       ::math::statistics::histogram limits values
36
37       ::math::statistics::corr data1 data2
38
39       ::math::statistics::interval-mean-stdev data confidence
40
41       ::math::statistics::t-test-mean data est_mean est_stdev confidence
42
43       ::math::statistics::test-normal data confidence
44
45       ::math::statistics::lillieforsFit data
46
47       ::math::statistics::quantiles data confidence
48
49       ::math::statistics::quantiles limits counts confidence
50
51       ::math::statistics::autocorr data
52
53       ::math::statistics::crosscorr data1 data2
54
55       ::math::statistics::mean-histogram-limits mean stdev number
56
57       ::math::statistics::minmax-histogram-limits min max number
58
59       ::math::statistics::linear-model xdata ydata intercept
60
61       ::math::statistics::linear-residuals xdata ydata intercept
62
63       ::math::statistics::test-2x2 n11 n21 n12 n22
64
65       ::math::statistics::print-2x2 n11 n21 n12 n22
66
67       ::math::statistics::control-xbar data ?nsamples?
68
69       ::math::statistics::control-Rchart data ?nsamples?
70
71       ::math::statistics::test-xbar control data
72
73       ::math::statistics::test-Rchart control data
74
75       ::math::statistics::tstat dof ?alpha?
76
77       ::math::statistics::mv-wls wt1 weights_and_values
78
79       ::math::statistics::mv-ols values
80
81       ::math::statistics::pdf-normal mean stdev value
82
83       ::math::statistics::pdf-exponential mean value
84
85       ::math::statistics::pdf-uniform xmin xmax value
86
87       ::math::statistics::pdf-gamma alpha beta value
88
89       ::math::statistics::pdf-poisson mu k
90
91       ::math::statistics::pdf-chisquare df value
92
93       ::math::statistics::pdf-student-t df value
94
95       ::math::statistics::pdf-beta a b value
96
97       ::math::statistics::cdf-normal mean stdev value
98
99       ::math::statistics::cdf-exponential mean value
100
101       ::math::statistics::cdf-uniform xmin xmax value
102
103       ::math::statistics::cdf-students-t degrees value
104
105       ::math::statistics::cdf-gamma alpha beta value
106
107       ::math::statistics::cdf-poisson mu k
108
109       ::math::statistics::cdf-beta a b value
110
111       ::math::statistics::random-normal mean stdev number
112
113       ::math::statistics::random-exponential mean number
114
115       ::math::statistics::random-uniform xmin xmax number
116
117       ::math::statistics::random-gamma alpha beta number
118
119       ::math::statistics::random-chisquare df number
120
121       ::math::statistics::random-student-t df number
122
123       ::math::statistics::random-beta a b number
124
125       ::math::statistics::histogram-uniform xmin xmax limits number
126
127       ::math::statistics::incompleteGamma x p ?tol?
128
129       ::math::statistics::incompleteBeta a b x ?tol?
130
131       ::math::statistics::filter varname data expression
132
133       ::math::statistics::map varname data expression
134
135       ::math::statistics::samplescount varname list expression
136
137       ::math::statistics::subdivide
138
139       ::math::statistics::plot-scale canvas xmin xmax ymin ymax
140
141       ::math::statistics::plot-xydata canvas xdata ydata tag
142
143       ::math::statistics::plot-xyline canvas xdata ydata tag
144
145       ::math::statistics::plot-tdata canvas tdata tag
146
147       ::math::statistics::plot-tline canvas tdata tag
148
149       ::math::statistics::plot-histogram canvas counts limits tag
150
151_________________________________________________________________
152

DESCRIPTION

154       The  math::statistics  package  contains  functions  and procedures for
155       basic statistical data analysis, such as:
156
157       ·      Descriptive  statistical  parameters  (mean,  minimum,  maximum,
158              standard deviation)
159
160       ·      Estimates  of  the  distribution  in  the form of histograms and
161              quantiles
162
163       ·      Basic testing of hypotheses
164
165       ·      Probability and cumulative density functions
166
167       It is meant to help in developing data analysis applications  or  doing
168       ad hoc data analysis, it is not in itself a full application, nor is it
169       intended to rival with full (non-)commercial statistical packages.
170
171       The purpose of this document is to describe the implemented  procedures
172       and  provide some examples of their usage. As there is ample literature
173       on the algorithms involved, we refer to relevant text  books  for  more
174       explanations.   The  package  contains  a fairly large number of public
175       procedures. They can be distinguished in  three  sets:  general  proce‐
176       dures,  procedures  that  deal with specific statistical distributions,
177       list procedures to select or transform data and simple plotting  proce‐
178       dures  (these require Tk).  Note: The data that need to be analyzed are
179       always contained in a simple list. Missing values  are  represented  as
180       empty list elements.
181

GENERAL PROCEDURES

183       The general statistical procedures are:
184
185       ::math::statistics::mean data
186              Determine the mean value of the given list of data.
187
188              list data
189                     - List of data
190
191
192       ::math::statistics::min data
193              Determine the minimum value of the given list of data.
194
195              list data
196                     - List of data
197
198
199       ::math::statistics::max data
200              Determine the maximum value of the given list of data.
201
202              list data
203                     - List of data
204
205
206       ::math::statistics::number data
207              Determine the number of non-missing data in the given list
208
209              list data
210                     - List of data
211
212
213       ::math::statistics::stdev data
214              Determine the sample standard deviation of the data in the given
215              list
216
217              list data
218                     - List of data
219
220
221       ::math::statistics::var data
222              Determine the sample variance of the data in the given list
223
224              list data
225                     - List of data
226
227
228       ::math::statistics::pstdev data
229              Determine the population standard deviation of the data  in  the
230              given list
231
232              list data
233                     - List of data
234
235
236       ::math::statistics::pvar data
237              Determine the population variance of the data in the given list
238
239              list data
240                     - List of data
241
242
243       ::math::statistics::median data
244              Determine  the  median  of the data in the given list (Note that
245              this requires sorting the data, which may be a costly operation)
246
247              list data
248                     - List of data
249
250
251       ::math::statistics::basic-stats data
252              Determine a list of all the descriptive parameters: mean,  mini‐
253              mum,  maximum, number of data, sample standard deviation, sample
254              variance, population standard deviation and population variance.
255
256              (This routine is called whenever either or all of the basic sta‐
257              tistical  parameters  are  required.  Hence all calculations are
258              done and the relevant values are returned.)
259
260              list data
261                     - List of data
262
263
264       ::math::statistics::histogram limits values
265              Determine histogram information for  the  given  list  of  data.
266              Returns a list consisting of the number of values that fall into
267              each interval.  (The first interval consists of all values lower
268              than  the  first limit, the last interval consists of all values
269              greater than the last limit.  There is one  more  interval  than
270              there are limits.)
271
272              list limits
273                     -  List  of  upper  limits  (in  ascending order) for the
274                     intervals of the histogram.
275
276              list values
277                     - List of data
278
279
280       ::math::statistics::corr data1 data2
281              Determine the correlation coefficient between two sets of data.
282
283              list data1
284                     - First list of data
285
286              list data2
287                     - Second list of data
288
289
290       ::math::statistics::interval-mean-stdev data confidence
291              Return the interval containing the mean value and one containing
292              the  standard  deviation  with  a  certain  level  of confidence
293              (assuming a normal distribution)
294
295              list data
296                     - List of raw data values (small sample)
297
298              float confidence
299                     - Confidence level (0.95 or 0.99 for instance)
300
301
302       ::math::statistics::t-test-mean data est_mean est_stdev confidence
303              Test whether the mean value of a sample is  in  accordance  with
304              the estimated normal distribution with a certain level of confi‐
305              dence.  Returns 1 if the test succeeds  or  0  if  the  mean  is
306              unlikely to fit the given distribution.
307
308              list data
309                     - List of raw data values (small sample)
310
311              float est_mean
312                     - Estimated mean of the distribution
313
314              float est_stdev
315                     - Estimated stdev of the distribution
316
317              float confidence
318                     - Confidence level (0.95 or 0.99 for instance)
319
320
321       ::math::statistics::test-normal data confidence
322              Test  whether the given data follow a normal distribution with a
323              certain level of confidence.  Returns 1 if the data are normally
324              distributed  within  the  level of confidence, returns 0 if not.
325              The underlying test is the Lilliefors test.
326
327              list data
328                     - List of raw data values
329
330              float confidence
331                     - Confidence level (one of 0.80, 0.90, 0.95 or 0.99)
332
333
334       ::math::statistics::lillieforsFit data
335              Returns the goodness of fit to a normal  distribution  according
336              to  Lilliefors.  The higher the number, the more likely the data
337              are indeed normally distributed. The test requires at least five
338              data points.
339
340              list data
341                     - List of raw data values
342
343
344       ::math::statistics::quantiles data confidence
345              Return the quantiles for a given set of data
346
347
348              list data
349                     - List of raw data values
350
351
352              float confidence
353                     - Confidence level (0.95 or 0.99 for instance)
354
355
356
357       ::math::statistics::quantiles limits counts confidence
358              Return the quantiles based on histogram information (alternative
359              to the call with two arguments)
360
361              list limits
362                     - List of upper limits from histogram
363
364              list counts
365                     - List of counts for for each interval in histogram
366
367              float confidence
368                     -  Confidence level (0.95 or 0.99 for instance)
369
370
371       ::math::statistics::autocorr data
372              Return the autocorrelation function as a list of values  (assum‐
373              ing equidistance between samples, about 1/2 of the number of raw
374              data)
375
376              The correlation is determined in such a way that the first value
377              is  always  1 and all others are equal to or smaller than 1. The
378              number of values involved will diminish as the "time" (the index
379              in the list of returned values) increases
380
381              list data
382                     -  Raw  data for which the autocorrelation must be deter‐
383                     mined
384
385
386       ::math::statistics::crosscorr data1 data2
387              Return the  cross-correlation  function  as  a  list  of  values
388              (assuming  equidistance between samples, about 1/2 of the number
389              of raw data)
390
391              The correlation is determined in such a way that the values  can
392              never  exceed 1 in magnitude. The number of values involved will
393              diminish as the "time" (the index in the list of  returned  val‐
394              ues) increases.
395
396              list data1
397                     - First list of data
398
399              list data2
400                     - Second list of data
401
402
403       ::math::statistics::mean-histogram-limits mean stdev number
404              Determine reasonable limits based on mean and standard deviation
405              for a histogram Convenience function - the  result  is  suitable
406              for the histogram function.
407
408              float mean
409                     - Mean of the data
410
411              float stdev
412                     - Standard deviation
413
414              int number
415                     - Number of limits to generate (defaults to 8)
416
417
418       ::math::statistics::minmax-histogram-limits min max number
419              Determine reasonable limits based on a minimum and maximum for a
420              histogram
421
422              Convenience function - the result is suitable for the  histogram
423              function.
424
425              float min
426                     - Expected minimum
427
428              float max
429                     - Expected maximum
430
431              int number
432                     - Number of limits to generate (defaults to 8)
433
434
435       ::math::statistics::linear-model xdata ydata intercept
436              Determine  the  coefficients for a linear regression between two
437              series of data (the model: Y = A  +  B*X).  Returns  a  list  of
438              parameters describing the fit
439
440              list xdata
441                     - List of independent data
442
443              list ydata
444                     - List of dependent data to be fitted
445
446              boolean intercept
447                     - (Optional) compute the intercept (1, default) or fit to
448                     a line through the origin (0)
449
450                     The result consists of the following list:
451
452                     ·      (Estimate of) Intercept A
453
454                     ·      (Estimate of) Slope B
455
456                     ·      Standard deviation of Y relative to fit
457
458                     ·      Correlation coefficient R2
459
460                     ·      Number of degrees of freedom df
461
462                     ·      Standard error of the intercept A
463
464                     ·      Significance level of A
465
466                     ·      Standard error of the slope B
467
468                     ·      Significance level of B
469
470
471       ::math::statistics::linear-residuals xdata ydata intercept
472              Determine the difference between actual data and predicted  from
473              the linear model.
474
475              Returns  a  list  of the differences between the actual data and
476              the predicted values.
477
478              list xdata
479                     - List of independent data
480
481              list ydata
482                     - List of dependent data to be fitted
483
484              boolean intercept
485                     - (Optional) compute the intercept (1, default) or fit to
486                     a line through the origin (0)
487
488
489       ::math::statistics::test-2x2 n11 n21 n12 n22
490              Determine  if two set of samples, each from a binomial distribu‐
491              tion, differ significantly or not (implying a different  parame‐
492              ter).
493
494              Returns  the "chi-square" value, which can be used to the deter‐
495              mine the significance.
496
497              int n11
498                     - Number of outcomes with the first value from the  first
499                     sample.
500
501              int n21
502                     - Number of outcomes with the first value from the second
503                     sample.
504
505              int n12
506                     - Number of outcomes with the second value from the first
507                     sample.
508
509              int n22
510                     -  Number of outcomes with the second value from the sec‐
511                     ond sample.
512
513
514       ::math::statistics::print-2x2 n11 n21 n12 n22
515              Determine if two set of samples, each from a binomial  distribu‐
516              tion,  differ significantly or not (implying a different parame‐
517              ter).
518
519              Returns a short report, useful in an interactive session.
520
521              int n11
522                     - Number of outcomes with the first value from the  first
523                     sample.
524
525              int n21
526                     - Number of outcomes with the first value from the second
527                     sample.
528
529              int n12
530                     - Number of outcomes with the second value from the first
531                     sample.
532
533              int n22
534                     -  Number of outcomes with the second value from the sec‐
535                     ond sample.
536
537
538       ::math::statistics::control-xbar data ?nsamples?
539              Determine the control limits for an xbar chart.  The  number  of
540              data in each subsample defaults to 4. At least 20 subsamples are
541              required.
542
543              Returns the mean, the lower limit, the upper limit and the  num‐
544              ber of data per subsample.
545
546              list data
547                     - List of observed data
548
549              int nsamples
550                     - Number of data per subsample
551
552
553       ::math::statistics::control-Rchart data ?nsamples?
554              Determine  the control limits for an R chart. The number of data
555              in each subsample (nsamples) defaults to 4. At least 20  subsam‐
556              ples are required.
557
558              Returns the mean range, the lower limit, the upper limit and the
559              number of data per subsample.
560
561              list data
562                     - List of observed data
563
564              int nsamples
565                     - Number of data per subsample
566
567
568       ::math::statistics::test-xbar control data
569              Determine if the data exceed the control  limits  for  the  xbar
570              chart.
571
572              Returns a list of subsamples (their indices) that indeed violate
573              the limits.
574
575              list control
576                     - Control limits as returned by the "control-xbar" proce‐
577                     dure
578
579              list data
580                     - List of observed data
581
582
583       ::math::statistics::test-Rchart control data
584              Determine if the data exceed the control limits for the R chart.
585
586              Returns a list of subsamples (their indices) that indeed violate
587              the limits.
588
589              list control
590                     - Control limits as returned by the "control-Rchart" pro‐
591                     cedure
592
593              list data
594                     - List of observed data
595
596

MULTIVARIATE LINEAR REGRESSION

598       Besides  the  linear regression with a single independent variable, the
599       statistics package provides two procedures  for  doing  ordinary  least
600       squares  (OLS)  and weighted least squares (WLS) linear regression with
601       several variables. They were written by Eric Kemp-Benedict.
602
603       In addition to these two, it provides a procedure (tstat) for calculat‐
604       ing the value of the t-statistic for the specified number of degrees of
605       freedom that is required to demonstrate a given level of significance.
606
607       Note: These procedures depend on the math::linearalgebra package.
608
609       Description of the procedures
610
611       ::math::statistics::tstat dof ?alpha?
612              Returns the value of the t-distribution t* satisfying
613
614                  P(t*)  =  1 - alpha/2
615                  P(-t*) =  alpha/2
616
617              for the number of degrees of freedom dof.
618
619              Given a sample of normally-distributed data x, with an  estimate
620              xbar for the mean and sbar for the standard deviation, the alpha
621              confidence interval for the estimate of the mean can  be  calcu‐
622              lated as
623
624                    ( xbar - t* sbar , xbar + t* sbar)
625
626              The  return  values  from  this  procedure can be compared to an
627              estimated t-statistic to determine whether the  estimated  value
628              of a parameter is significantly different from zero at the given
629              confidence level.
630
631              int dof
632                     Number of degrees of freedom
633
634              float alpha
635                     Confidence level of the t-distribution. Defaults to 0.05.
636
637
638       ::math::statistics::mv-wls wt1 weights_and_values
639              Carries out a weighted least squares linear regression  for  the
640              data points provided, with weights assigned to each point.
641
642              The linear model is of the form
643
644                  y = b0 + b1 * x1 + b2 * x2 ... + bN * xN + error
645
646              and each point satisfies
647
648                  yi = b0 + b1 * xi1 + b2 * xi2 + ... + bN * xiN + Residual_i
649
650
651              The procedure returns a list with the following elements:
652
653              ·      The r-squared statistic
654
655              ·      The adjusted r-squared statistic
656
657              ·      A  list containing the estimated coefficients b1, ... bN,
658                     b0 (The constant b0 comes last in the list.)
659
660              ·      A list containing the standard errors of the coefficients
661
662              ·      A list containing the 95% confidence bounds of the  coef‐
663                     ficients, with each set of bounds returned as a list with
664                     two values
665       Arguments:
666
667              list weights_and_values
668                     A list consisting of: the weight for the  first  observa‐
669                     tion,  the data for the first observation (as a sublist),
670                     the weight for the second observation (as a sublist)  and
671                     so on. The sublists of data are organised as lists of the
672                     value of the dependent variable  y  and  the  independent
673                     variables x1, x2 to xN.
674
675
676       ::math::statistics::mv-ols values
677              Carries  out an ordinary least squares linear regression for the
678              data points provided.
679
680              This procedure simply calls ::mvlinreg::wls with the weights set
681              to 1.0, and returns the same information.
682
683       Example of the use:
684
685       # Store the value of the unicode value for the "+/-" character
686       set pm "\u00B1"
687
688       # Provide some data
689       set data {{  -.67  14.18  60.03 -7.5  }
690                 { 36.97  15.52  34.24 14.61 }
691                 {-29.57  21.85  83.36 -7.   }
692                 {-16.9   11.79  51.67 -6.56 }
693                 { 14.09  16.24  36.97 -12.84}
694                 { 31.52  20.93  45.99 -25.4 }
695                 { 24.05  20.69  50.27  17.27}
696                 { 22.23  16.91  45.07  -4.3 }
697                 { 40.79  20.49  38.92  -.73 }
698                 {-10.35  17.24  58.77  18.78}}
699
700       # Call the ols routine
701       set results [::math::statistics::mv-ols $data]
702
703       # Pretty-print the results
704       puts "R-squared: [lindex $results 0]"
705       puts "Adj R-squared: [lindex $results 1]"
706       puts "Coefficients $pm s.e. -- \[95% confidence interval\]:"
707       foreach val [lindex $results 2] se [lindex $results 3] bounds [lindex $results 4] {
708           set lb [lindex $bounds 0]
709           set ub [lindex $bounds 1]
710           puts "   $val $pm $se -- \[$lb to $ub\]"
711       }
712
713

STATISTICAL DISTRIBUTIONS

715       In  the  literature  a large number of probability distributions can be
716       found. The statistics package supports:
717
718       ·      The normal or Gaussian distribution
719
720       ·      The uniform distribution - equal probability for all data within
721              a given interval
722
723       ·      The  exponential  distribution  -  useful as a model for certain
724              extreme-value distributions.
725
726       ·      The gamma distribution - based on the incomplete Gamma integral
727
728       ·      The chi-square distribution
729
730       ·      The student's T distribution
731
732       ·      The Poisson distribution
733
734       ·      PM - binomial,F.
735
736       In principle for each distribution one has procedures for:
737
738       ·      The probability density (pdf-*)
739
740       ·      The cumulative density (cdf-*)
741
742       ·      Quantiles for the given distribution (quantiles-*)
743
744       ·      Histograms for the given distribution (histogram-*)
745
746       ·      List of random values with the given distribution (random-*)
747
748       The following procedures have been implemented:
749
750       ::math::statistics::pdf-normal mean stdev value
751              Return the probability of a given value for a  normal  distribu‐
752              tion with given mean and standard deviation.
753
754              float mean
755                     - Mean value of the distribution
756
757              float stdev
758                     - Standard deviation of the distribution
759
760              float value
761                     - Value for which the probability is required
762
763
764       ::math::statistics::pdf-exponential mean value
765              Return  the probability of a given value for an exponential dis‐
766              tribution with given mean.
767
768              float mean
769                     - Mean value of the distribution
770
771              float value
772                     - Value for which the probability is required
773
774
775       ::math::statistics::pdf-uniform xmin xmax value
776              Return the probability of a given value for a uniform  distribu‐
777              tion with given extremes.
778
779              float xmin
780                     - Minimum value of the distribution
781
782              float xmin
783                     - Maximum value of the distribution
784
785              float value
786                     - Value for which the probability is required
787
788
789       ::math::statistics::pdf-gamma alpha beta value
790              Return the probability of a given value for a Gamma distribution
791              with given shape and rate parameters
792
793              float alpha
794                     - Shape parameter
795
796              float beta
797                     - Rate parameter
798
799              float value
800                     - Value for which the probability is required
801
802
803       ::math::statistics::pdf-poisson mu k
804              Return the probability of a given number of occurrences  in  the
805              same  interval  (k)  for  a Poisson distribution with given mean
806              (mu)
807
808              float mu
809                     - Mean number of occurrences
810
811              int k  - Number of occurences
812
813
814       ::math::statistics::pdf-chisquare df value
815              Return the probability of a given value for a chi square distri‐
816              bution with given degrees of freedom
817
818              float df
819                     - Degrees of freedom
820
821              float value
822                     - Value for which the probability is required
823
824
825       ::math::statistics::pdf-student-t df value
826              Return  the  probability of a given value for a Student's t dis‐
827              tribution with given degrees of freedom
828
829              float df
830                     - Degrees of freedom
831
832              float value
833                     - Value for which the probability is required
834
835
836       ::math::statistics::pdf-beta a b value
837              Return the probability of a given value for a Beta  distribution
838              with given shape parameters
839
840              float a
841                     - First shape parameter
842
843              float b
844                     - First shape parameter
845
846              float value
847                     - Value for which the probability is required
848
849
850       ::math::statistics::cdf-normal mean stdev value
851              Return  the cumulative probability of a given value for a normal
852              distribution with given mean and standard deviation, that is the
853              probability for values up to the given one.
854
855              float mean
856                     - Mean value of the distribution
857
858              float stdev
859                     - Standard deviation of the distribution
860
861              float value
862                     - Value for which the probability is required
863
864
865       ::math::statistics::cdf-exponential mean value
866              Return  the cumulative probability of a given value for an expo‐
867              nential distribution with given mean.
868
869              float mean
870                     - Mean value of the distribution
871
872              float value
873                     - Value for which the probability is required
874
875
876       ::math::statistics::cdf-uniform xmin xmax value
877              Return the cumulative probability of a given value for a uniform
878              distribution with given extremes.
879
880              float xmin
881                     - Minimum value of the distribution
882
883              float xmin
884                     - Maximum value of the distribution
885
886              float value
887                     - Value for which the probability is required
888
889
890       ::math::statistics::cdf-students-t degrees value
891              Return  the  cumulative  probability of a given value for a Stu‐
892              dent's t distribution with given number of degrees.
893
894              int degrees
895                     - Number of degrees of freedom
896
897              float value
898                     - Value for which the probability is required
899
900
901       ::math::statistics::cdf-gamma alpha beta value
902              Return the cumulative probability of a given value for  a  Gamma
903              distribution with given shape and rate parameters
904
905              float alpha
906                     - Shape parameter
907
908              float beta
909                     - Rate parameter
910
911              float value
912                     - Value for which the cumulative probability is required
913
914
915       ::math::statistics::cdf-poisson mu k
916              Return  the  cumulative  probability of a given number of occur‐
917              rences in the same interval (k) for a Poisson distribution  with
918              given mean (mu)
919
920              float mu
921                     - Mean number of occurrences
922
923              int k  - Number of occurences
924
925
926       ::math::statistics::cdf-beta a b value
927              Return  the  cumulative  probability of a given value for a Beta
928              distribution with given shape parameters
929
930              float a
931                     - First shape parameter
932
933              float b
934                     - First shape parameter
935
936              float value
937                     - Value for which the probability is required
938
939
940       ::math::statistics::random-normal mean stdev number
941              Return a list of "number" random values satisfying a normal dis‐
942              tribution with given mean and standard deviation.
943
944              float mean
945                     - Mean value of the distribution
946
947              float stdev
948                     - Standard deviation of the distribution
949
950              int number
951                     - Number of values to be returned
952
953
954       ::math::statistics::random-exponential mean number
955              Return  a  list of "number" random values satisfying an exponen‐
956              tial distribution with given mean.
957
958              float mean
959                     - Mean value of the distribution
960
961              int number
962                     - Number of values to be returned
963
964
965       ::math::statistics::random-uniform xmin xmax number
966              Return a list of "number" random  values  satisfying  a  uniform
967              distribution with given extremes.
968
969              float xmin
970                     - Minimum value of the distribution
971
972              float xmax
973                     - Maximum value of the distribution
974
975              int number
976                     - Number of values to be returned
977
978
979       ::math::statistics::random-gamma alpha beta number
980              Return  a list of "number" random values satisfying a Gamma dis‐
981              tribution with given shape and rate parameters
982
983              float alpha
984                     - Shape parameter
985
986              float beta
987                     - Rate parameter
988
989              int number
990                     - Number of values to be returned
991
992
993       ::math::statistics::random-chisquare df number
994              Return a list of "number" random values satisfying a chi  square
995              distribution with given degrees of freedom
996
997              float df
998                     - Degrees of freedom
999
1000              int number
1001                     - Number of values to be returned
1002
1003
1004       ::math::statistics::random-student-t df number
1005              Return a list of "number" random values satisfying a Student's t
1006              distribution with given degrees of freedom
1007
1008              float df
1009                     - Degrees of freedom
1010
1011              int number
1012                     - Number of values to be returned
1013
1014
1015       ::math::statistics::random-beta a b number
1016              Return a list of "number" random values satisfying a  Beta  dis‐
1017              tribution with given shape parameters
1018
1019              float a
1020                     - First shape parameter
1021
1022              float b
1023                     - Second shape parameter
1024
1025              int number
1026                     - Number of values to be returned
1027
1028
1029       ::math::statistics::histogram-uniform xmin xmax limits number
1030              Return the expected histogram for a uniform distribution.
1031
1032              float xmin
1033                     - Minimum value of the distribution
1034
1035              float xmax
1036                     - Maximum value of the distribution
1037
1038              list limits
1039                     - Upper limits for the buckets in the histogram
1040
1041              int number
1042                     - Total number of "observations" in the histogram
1043
1044
1045       ::math::statistics::incompleteGamma x p ?tol?
1046              Evaluate the incomplete Gamma integral
1047
1048                                  1       / x               p-1
1049                    P(p,x) =  --------   |   dt exp(-t) * t
1050                              Gamma(p)  / 0
1051
1052
1053              float x
1054                     - Value of x (limit of the integral)
1055
1056              float p
1057                     - Value of p in the integrand
1058
1059              float tol
1060                     - Required tolerance (default: 1.0e-9)
1061
1062
1063       ::math::statistics::incompleteBeta a b x ?tol?
1064              Evaluate the incomplete Beta integral
1065
1066              float a
1067                     - First shape parameter
1068
1069              float b
1070                     - Second shape parameter
1071
1072              float x
1073                     - Value of x (limit of the integral)
1074
1075              float tol
1076                     - Required tolerance (default: 1.0e-9)
1077
1078
1079       TO DO: more function descriptions to be added
1080

DATA MANIPULATION

1082       The data manipulation procedures act on lists or lists of lists:
1083
1084       ::math::statistics::filter varname data expression
1085              Return  a  list  consisting  of  the  data for which the logical
1086              expression is true (this command works analogously to  the  com‐
1087              mand foreach).
1088
1089              string varname
1090                     - Name of the variable used in the expression
1091
1092              list data
1093                     - List of data
1094
1095              string expression
1096                     - Logical expression using the variable name
1097
1098
1099       ::math::statistics::map varname data expression
1100              Return  a  list  consisting of the data that are transformed via
1101              the expression.
1102
1103              string varname
1104                     - Name of the variable used in the expression
1105
1106              list data
1107                     - List of data
1108
1109              string expression
1110                     - Expression to be used to transform (map) the data
1111
1112
1113       ::math::statistics::samplescount varname list expression
1114              Return a list consisting of the counts of all data in  the  sub‐
1115              lists of the "list" argument for which the expression is true.
1116
1117              string varname
1118                     - Name of the variable used in the expression
1119
1120              list data
1121                     - List of sublists, each containing the data
1122
1123              string expression
1124                     -  Logical  expression  to  test  the  data  (defaults to
1125                     "true").
1126
1127
1128       ::math::statistics::subdivide
1129              Routine PM - not implemented yet
1130

PLOT PROCEDURES

1132       The following simple plotting procedures are available:
1133
1134       ::math::statistics::plot-scale canvas xmin xmax ymin ymax
1135              Set the scale for a plot in the given canvas. All plot  routines
1136              expect  this  function to be called first. There is no automatic
1137              scaling provided.
1138
1139              widget canvas
1140                     - Canvas widget to use
1141
1142              float xmin
1143                     - Minimum x value
1144
1145              float xmax
1146                     - Maximum x value
1147
1148              float ymin
1149                     - Minimum y value
1150
1151              float ymax
1152                     - Maximum y value
1153
1154
1155       ::math::statistics::plot-xydata canvas xdata ydata tag
1156              Create a simple XY plot in the given canvas - the data are shown
1157              as  a  collection of dots. The tag can be used to manipulate the
1158              appearance.
1159
1160              widget canvas
1161                     - Canvas widget to use
1162
1163              float xdata
1164                     - Series of independent data
1165
1166              float ydata
1167                     - Series of dependent data
1168
1169              string tag
1170                     - Tag to give to the plotted data (defaults to xyplot)
1171
1172
1173       ::math::statistics::plot-xyline canvas xdata ydata tag
1174              Create a simple XY plot in the given canvas - the data are shown
1175              as a line through the data points. The tag can be used to manip‐
1176              ulate the appearance.
1177
1178              widget canvas
1179                     - Canvas widget to use
1180
1181              list xdata
1182                     - Series of independent data
1183
1184              list ydata
1185                     - Series of dependent data
1186
1187              string tag
1188                     - Tag to give to the plotted data (defaults to xyplot)
1189
1190
1191       ::math::statistics::plot-tdata canvas tdata tag
1192              Create a simple XY plot in the given canvas - the data are shown
1193              as  a  collection of dots. The horizontal coordinate is equal to
1194              the index. The tag can be used  to  manipulate  the  appearance.
1195              This  type of presentation is suitable for autocorrelation func‐
1196              tions for instance or for inspecting the  time-dependent  behav‐
1197              iour.
1198
1199              widget canvas
1200                     - Canvas widget to use
1201
1202              list tdata
1203                     - Series of dependent data
1204
1205              string tag
1206                     - Tag to give to the plotted data (defaults to xyplot)
1207
1208
1209       ::math::statistics::plot-tline canvas tdata tag
1210              Create a simple XY plot in the given canvas - the data are shown
1211              as a line. See plot-tdata for an explanation.
1212
1213              widget canvas
1214                     - Canvas widget to use
1215
1216              list tdata
1217                     - Series of dependent data
1218
1219              string tag
1220                     - Tag to give to the plotted data (defaults to xyplot)
1221
1222
1223       ::math::statistics::plot-histogram canvas counts limits tag
1224              Create a simple histogram in the given canvas
1225
1226              widget canvas
1227                     - Canvas widget to use
1228
1229              list counts
1230                     - Series of bucket counts
1231
1232              list limits
1233                     - Series of upper limits for the buckets
1234
1235              string tag
1236                     - Tag to give to the plotted data (defaults to xyplot)
1237
1238

THINGS TO DO

1240       The following procedures are yet to be implemented:
1241
1242       ·      F-test-stdev
1243
1244       ·      interval-mean-stdev
1245
1246       ·      histogram-normal
1247
1248       ·      histogram-exponential
1249
1250       ·      test-histogram
1251
1252       ·      test-corr
1253
1254       ·      quantiles-*
1255
1256       ·      fourier-coeffs
1257
1258       ·      fourier-residuals
1259
1260       ·      onepar-function-fit
1261
1262       ·      onepar-function-residuals
1263
1264       ·      plot-linear-model
1265
1266       ·      subdivide
1267

EXAMPLES

1269       The code below is a small example of how you can examine a set of data:
1270
1271       # Simple example:
1272       # - Generate data (as a cheap way of getting some)
1273       # - Perform statistical analysis to describe the data
1274       #
1275       package require math::statistics
1276
1277       #
1278       # Two auxiliary procs
1279       #
1280       proc pause {time} {
1281          set wait 0
1282          after [expr {$time*1000}] {set ::wait 1}
1283          vwait wait
1284       }
1285
1286       proc print-histogram {counts limits} {
1287          foreach count $counts limit $limits {
1288             if { $limit != {} } {
1289                puts [format "<%12.4g\t%d" $limit $count]
1290                set prev_limit $limit
1291             } else {
1292                puts [format ">%12.4g\t%d" $prev_limit $count]
1293             }
1294          }
1295       }
1296
1297       #
1298       # Our source of arbitrary data
1299       #
1300       proc generateData { data1 data2 } {
1301          upvar 1 $data1 _data1
1302          upvar 1 $data2 _data2
1303
1304          set d1 0.0
1305          set d2 0.0
1306          for { set i 0 } { $i < 100 } { incr i } {
1307             set d1 [expr {10.0-2.0*cos(2.0*3.1415926*$i/24.0)+3.5*rand()}]
1308             set d2 [expr {0.7*$d2+0.3*$d1+0.7*rand()}]
1309             lappend _data1 $d1
1310             lappend _data2 $d2
1311          }
1312          return {}
1313       }
1314
1315       #
1316       # The analysis session
1317       #
1318       package require Tk
1319       console show
1320       canvas .plot1
1321       canvas .plot2
1322       pack   .plot1 .plot2 -fill both -side top
1323
1324       generateData data1 data2
1325
1326       puts "Basic statistics:"
1327       set b1 [::math::statistics::basic-stats $data1]
1328       set b2 [::math::statistics::basic-stats $data2]
1329       foreach label {mean min max number stdev var} v1 $b1 v2 $b2 {
1330          puts "$label\t$v1\t$v2"
1331       }
1332       puts "Plot the data as function of \"time\" and against each other"
1333       ::math::statistics::plot-scale .plot1  0 100  0 20
1334       ::math::statistics::plot-scale .plot2  0 20   0 20
1335       ::math::statistics::plot-tline .plot1 $data1
1336       ::math::statistics::plot-tline .plot1 $data2
1337       ::math::statistics::plot-xydata .plot2 $data1 $data2
1338
1339       puts "Correlation coefficient:"
1340       puts [::math::statistics::corr $data1 $data2]
1341
1342       pause 2
1343       puts "Plot histograms"
1344       ::math::statistics::plot-scale .plot2  0 20 0 100
1345       set limits         [::math::statistics::minmax-histogram-limits 7 16]
1346       set histogram_data [::math::statistics::histogram $limits $data1]
1347       ::math::statistics::plot-histogram .plot2 $histogram_data $limits
1348
1349       puts "First series:"
1350       print-histogram $histogram_data $limits
1351
1352       pause 2
1353       set limits         [::math::statistics::minmax-histogram-limits 0 15 10]
1354       set histogram_data [::math::statistics::histogram $limits $data2]
1355       ::math::statistics::plot-histogram .plot2 $histogram_data $limits d2
1356
1357       puts "Second series:"
1358       print-histogram $histogram_data $limits
1359
1360       puts "Autocorrelation function:"
1361       set  autoc [::math::statistics::autocorr $data1]
1362       puts [::math::statistics::map $autoc {[format "%.2f" $x]}]
1363       puts "Cross-correlation function:"
1364       set  crossc [::math::statistics::crosscorr $data1 $data2]
1365       puts [::math::statistics::map $crossc {[format "%.2f" $x]}]
1366
1367       ::math::statistics::plot-scale .plot1  0 100 -1  4
1368       ::math::statistics::plot-tline .plot1  $autoc "autoc"
1369       ::math::statistics::plot-tline .plot1  $crossc "crossc"
1370
1371       puts "Quantiles: 0.1, 0.2, 0.5, 0.8, 0.9"
1372       puts "First:  [::math::statistics::quantiles $data1 {0.1 0.2 0.5 0.8 0.9}]"
1373       puts "Second: [::math::statistics::quantiles $data2 {0.1 0.2 0.5 0.8 0.9}]"
1374
1375
1376       If you run this example, then the following should be clear:
1377
1378       ·      There is a strong correlation between two time series,  as  dis‐
1379              played  by  the raw data and especially by the correlation func‐
1380              tions.
1381
1382       ·      Both time series show a significant periodic component
1383
1384       ·      The histograms are not very useful in identifying the nature  of
1385              the time series - they do not show the periodic nature.
1386

BUGS, IDEAS, FEEDBACK

1388       This  document,  and the package it describes, will undoubtedly contain
1389       bugs and other problems.  Please report such in the  category  math  ::
1390       statistics     of    the    Tcllib    SF    Trackers    [http://source
1391       forge.net/tracker/?group_id=12883].  Please also report any  ideas  for
1392       enhancements you may have for either package and/or documentation.
1393

KEYWORDS

1395       data analysis, mathematics, statistics
1396
1397
1398
1399math                                  0.5                  math::statistics(n)
Impressum