1math::statistics(n) Tcl Math Library math::statistics(n)
2
3
4
5______________________________________________________________________________
6
8 math::statistics - Basic statistical functions and procedures
9
11 package require Tcl 8
12
13 package require math::statistics 0.5
14
15 ::math::statistics::mean data
16
17 ::math::statistics::min data
18
19 ::math::statistics::max data
20
21 ::math::statistics::number data
22
23 ::math::statistics::stdev data
24
25 ::math::statistics::var data
26
27 ::math::statistics::pstdev data
28
29 ::math::statistics::pvar data
30
31 ::math::statistics::median data
32
33 ::math::statistics::basic-stats data
34
35 ::math::statistics::histogram limits values
36
37 ::math::statistics::corr data1 data2
38
39 ::math::statistics::interval-mean-stdev data confidence
40
41 ::math::statistics::t-test-mean data est_mean est_stdev confidence
42
43 ::math::statistics::test-normal data confidence
44
45 ::math::statistics::lillieforsFit data
46
47 ::math::statistics::quantiles data confidence
48
49 ::math::statistics::quantiles limits counts confidence
50
51 ::math::statistics::autocorr data
52
53 ::math::statistics::crosscorr data1 data2
54
55 ::math::statistics::mean-histogram-limits mean stdev number
56
57 ::math::statistics::minmax-histogram-limits min max number
58
59 ::math::statistics::linear-model xdata ydata intercept
60
61 ::math::statistics::linear-residuals xdata ydata intercept
62
63 ::math::statistics::test-2x2 n11 n21 n12 n22
64
65 ::math::statistics::print-2x2 n11 n21 n12 n22
66
67 ::math::statistics::control-xbar data ?nsamples?
68
69 ::math::statistics::control-Rchart data ?nsamples?
70
71 ::math::statistics::test-xbar control data
72
73 ::math::statistics::test-Rchart control data
74
75 ::math::statistics::tstat dof ?alpha?
76
77 ::math::statistics::mv-wls wt1 weights_and_values
78
79 ::math::statistics::mv-ols values
80
81 ::math::statistics::pdf-normal mean stdev value
82
83 ::math::statistics::pdf-exponential mean value
84
85 ::math::statistics::pdf-uniform xmin xmax value
86
87 ::math::statistics::pdf-gamma alpha beta value
88
89 ::math::statistics::pdf-poisson mu k
90
91 ::math::statistics::pdf-chisquare df value
92
93 ::math::statistics::pdf-student-t df value
94
95 ::math::statistics::pdf-beta a b value
96
97 ::math::statistics::cdf-normal mean stdev value
98
99 ::math::statistics::cdf-exponential mean value
100
101 ::math::statistics::cdf-uniform xmin xmax value
102
103 ::math::statistics::cdf-students-t degrees value
104
105 ::math::statistics::cdf-gamma alpha beta value
106
107 ::math::statistics::cdf-poisson mu k
108
109 ::math::statistics::cdf-beta a b value
110
111 ::math::statistics::random-normal mean stdev number
112
113 ::math::statistics::random-exponential mean number
114
115 ::math::statistics::random-uniform xmin xmax number
116
117 ::math::statistics::random-gamma alpha beta number
118
119 ::math::statistics::random-chisquare df number
120
121 ::math::statistics::random-student-t df number
122
123 ::math::statistics::random-beta a b number
124
125 ::math::statistics::histogram-uniform xmin xmax limits number
126
127 ::math::statistics::incompleteGamma x p ?tol?
128
129 ::math::statistics::incompleteBeta a b x ?tol?
130
131 ::math::statistics::filter varname data expression
132
133 ::math::statistics::map varname data expression
134
135 ::math::statistics::samplescount varname list expression
136
137 ::math::statistics::subdivide
138
139 ::math::statistics::plot-scale canvas xmin xmax ymin ymax
140
141 ::math::statistics::plot-xydata canvas xdata ydata tag
142
143 ::math::statistics::plot-xyline canvas xdata ydata tag
144
145 ::math::statistics::plot-tdata canvas tdata tag
146
147 ::math::statistics::plot-tline canvas tdata tag
148
149 ::math::statistics::plot-histogram canvas counts limits tag
150
151_________________________________________________________________
152
154 The math::statistics package contains functions and procedures for
155 basic statistical data analysis, such as:
156
157 · Descriptive statistical parameters (mean, minimum, maximum,
158 standard deviation)
159
160 · Estimates of the distribution in the form of histograms and
161 quantiles
162
163 · Basic testing of hypotheses
164
165 · Probability and cumulative density functions
166
167 It is meant to help in developing data analysis applications or doing
168 ad hoc data analysis, it is not in itself a full application, nor is it
169 intended to rival with full (non-)commercial statistical packages.
170
171 The purpose of this document is to describe the implemented procedures
172 and provide some examples of their usage. As there is ample literature
173 on the algorithms involved, we refer to relevant text books for more
174 explanations. The package contains a fairly large number of public
175 procedures. They can be distinguished in three sets: general proce‐
176 dures, procedures that deal with specific statistical distributions,
177 list procedures to select or transform data and simple plotting proce‐
178 dures (these require Tk). Note: The data that need to be analyzed are
179 always contained in a simple list. Missing values are represented as
180 empty list elements.
181
183 The general statistical procedures are:
184
185 ::math::statistics::mean data
186 Determine the mean value of the given list of data.
187
188 list data
189 - List of data
190
191
192 ::math::statistics::min data
193 Determine the minimum value of the given list of data.
194
195 list data
196 - List of data
197
198
199 ::math::statistics::max data
200 Determine the maximum value of the given list of data.
201
202 list data
203 - List of data
204
205
206 ::math::statistics::number data
207 Determine the number of non-missing data in the given list
208
209 list data
210 - List of data
211
212
213 ::math::statistics::stdev data
214 Determine the sample standard deviation of the data in the given
215 list
216
217 list data
218 - List of data
219
220
221 ::math::statistics::var data
222 Determine the sample variance of the data in the given list
223
224 list data
225 - List of data
226
227
228 ::math::statistics::pstdev data
229 Determine the population standard deviation of the data in the
230 given list
231
232 list data
233 - List of data
234
235
236 ::math::statistics::pvar data
237 Determine the population variance of the data in the given list
238
239 list data
240 - List of data
241
242
243 ::math::statistics::median data
244 Determine the median of the data in the given list (Note that
245 this requires sorting the data, which may be a costly operation)
246
247 list data
248 - List of data
249
250
251 ::math::statistics::basic-stats data
252 Determine a list of all the descriptive parameters: mean, mini‐
253 mum, maximum, number of data, sample standard deviation, sample
254 variance, population standard deviation and population variance.
255
256 (This routine is called whenever either or all of the basic sta‐
257 tistical parameters are required. Hence all calculations are
258 done and the relevant values are returned.)
259
260 list data
261 - List of data
262
263
264 ::math::statistics::histogram limits values
265 Determine histogram information for the given list of data.
266 Returns a list consisting of the number of values that fall into
267 each interval. (The first interval consists of all values lower
268 than the first limit, the last interval consists of all values
269 greater than the last limit. There is one more interval than
270 there are limits.)
271
272 list limits
273 - List of upper limits (in ascending order) for the
274 intervals of the histogram.
275
276 list values
277 - List of data
278
279
280 ::math::statistics::corr data1 data2
281 Determine the correlation coefficient between two sets of data.
282
283 list data1
284 - First list of data
285
286 list data2
287 - Second list of data
288
289
290 ::math::statistics::interval-mean-stdev data confidence
291 Return the interval containing the mean value and one containing
292 the standard deviation with a certain level of confidence
293 (assuming a normal distribution)
294
295 list data
296 - List of raw data values (small sample)
297
298 float confidence
299 - Confidence level (0.95 or 0.99 for instance)
300
301
302 ::math::statistics::t-test-mean data est_mean est_stdev confidence
303 Test whether the mean value of a sample is in accordance with
304 the estimated normal distribution with a certain level of confi‐
305 dence. Returns 1 if the test succeeds or 0 if the mean is
306 unlikely to fit the given distribution.
307
308 list data
309 - List of raw data values (small sample)
310
311 float est_mean
312 - Estimated mean of the distribution
313
314 float est_stdev
315 - Estimated stdev of the distribution
316
317 float confidence
318 - Confidence level (0.95 or 0.99 for instance)
319
320
321 ::math::statistics::test-normal data confidence
322 Test whether the given data follow a normal distribution with a
323 certain level of confidence. Returns 1 if the data are normally
324 distributed within the level of confidence, returns 0 if not.
325 The underlying test is the Lilliefors test.
326
327 list data
328 - List of raw data values
329
330 float confidence
331 - Confidence level (one of 0.80, 0.90, 0.95 or 0.99)
332
333
334 ::math::statistics::lillieforsFit data
335 Returns the goodness of fit to a normal distribution according
336 to Lilliefors. The higher the number, the more likely the data
337 are indeed normally distributed. The test requires at least five
338 data points.
339
340 list data
341 - List of raw data values
342
343
344 ::math::statistics::quantiles data confidence
345 Return the quantiles for a given set of data
346
347
348 list data
349 - List of raw data values
350
351
352 float confidence
353 - Confidence level (0.95 or 0.99 for instance)
354
355
356
357 ::math::statistics::quantiles limits counts confidence
358 Return the quantiles based on histogram information (alternative
359 to the call with two arguments)
360
361 list limits
362 - List of upper limits from histogram
363
364 list counts
365 - List of counts for for each interval in histogram
366
367 float confidence
368 - Confidence level (0.95 or 0.99 for instance)
369
370
371 ::math::statistics::autocorr data
372 Return the autocorrelation function as a list of values (assum‐
373 ing equidistance between samples, about 1/2 of the number of raw
374 data)
375
376 The correlation is determined in such a way that the first value
377 is always 1 and all others are equal to or smaller than 1. The
378 number of values involved will diminish as the "time" (the index
379 in the list of returned values) increases
380
381 list data
382 - Raw data for which the autocorrelation must be deter‐
383 mined
384
385
386 ::math::statistics::crosscorr data1 data2
387 Return the cross-correlation function as a list of values
388 (assuming equidistance between samples, about 1/2 of the number
389 of raw data)
390
391 The correlation is determined in such a way that the values can
392 never exceed 1 in magnitude. The number of values involved will
393 diminish as the "time" (the index in the list of returned val‐
394 ues) increases.
395
396 list data1
397 - First list of data
398
399 list data2
400 - Second list of data
401
402
403 ::math::statistics::mean-histogram-limits mean stdev number
404 Determine reasonable limits based on mean and standard deviation
405 for a histogram Convenience function - the result is suitable
406 for the histogram function.
407
408 float mean
409 - Mean of the data
410
411 float stdev
412 - Standard deviation
413
414 int number
415 - Number of limits to generate (defaults to 8)
416
417
418 ::math::statistics::minmax-histogram-limits min max number
419 Determine reasonable limits based on a minimum and maximum for a
420 histogram
421
422 Convenience function - the result is suitable for the histogram
423 function.
424
425 float min
426 - Expected minimum
427
428 float max
429 - Expected maximum
430
431 int number
432 - Number of limits to generate (defaults to 8)
433
434
435 ::math::statistics::linear-model xdata ydata intercept
436 Determine the coefficients for a linear regression between two
437 series of data (the model: Y = A + B*X). Returns a list of
438 parameters describing the fit
439
440 list xdata
441 - List of independent data
442
443 list ydata
444 - List of dependent data to be fitted
445
446 boolean intercept
447 - (Optional) compute the intercept (1, default) or fit to
448 a line through the origin (0)
449
450 The result consists of the following list:
451
452 · (Estimate of) Intercept A
453
454 · (Estimate of) Slope B
455
456 · Standard deviation of Y relative to fit
457
458 · Correlation coefficient R2
459
460 · Number of degrees of freedom df
461
462 · Standard error of the intercept A
463
464 · Significance level of A
465
466 · Standard error of the slope B
467
468 · Significance level of B
469
470
471 ::math::statistics::linear-residuals xdata ydata intercept
472 Determine the difference between actual data and predicted from
473 the linear model.
474
475 Returns a list of the differences between the actual data and
476 the predicted values.
477
478 list xdata
479 - List of independent data
480
481 list ydata
482 - List of dependent data to be fitted
483
484 boolean intercept
485 - (Optional) compute the intercept (1, default) or fit to
486 a line through the origin (0)
487
488
489 ::math::statistics::test-2x2 n11 n21 n12 n22
490 Determine if two set of samples, each from a binomial distribu‐
491 tion, differ significantly or not (implying a different parame‐
492 ter).
493
494 Returns the "chi-square" value, which can be used to the deter‐
495 mine the significance.
496
497 int n11
498 - Number of outcomes with the first value from the first
499 sample.
500
501 int n21
502 - Number of outcomes with the first value from the second
503 sample.
504
505 int n12
506 - Number of outcomes with the second value from the first
507 sample.
508
509 int n22
510 - Number of outcomes with the second value from the sec‐
511 ond sample.
512
513
514 ::math::statistics::print-2x2 n11 n21 n12 n22
515 Determine if two set of samples, each from a binomial distribu‐
516 tion, differ significantly or not (implying a different parame‐
517 ter).
518
519 Returns a short report, useful in an interactive session.
520
521 int n11
522 - Number of outcomes with the first value from the first
523 sample.
524
525 int n21
526 - Number of outcomes with the first value from the second
527 sample.
528
529 int n12
530 - Number of outcomes with the second value from the first
531 sample.
532
533 int n22
534 - Number of outcomes with the second value from the sec‐
535 ond sample.
536
537
538 ::math::statistics::control-xbar data ?nsamples?
539 Determine the control limits for an xbar chart. The number of
540 data in each subsample defaults to 4. At least 20 subsamples are
541 required.
542
543 Returns the mean, the lower limit, the upper limit and the num‐
544 ber of data per subsample.
545
546 list data
547 - List of observed data
548
549 int nsamples
550 - Number of data per subsample
551
552
553 ::math::statistics::control-Rchart data ?nsamples?
554 Determine the control limits for an R chart. The number of data
555 in each subsample (nsamples) defaults to 4. At least 20 subsam‐
556 ples are required.
557
558 Returns the mean range, the lower limit, the upper limit and the
559 number of data per subsample.
560
561 list data
562 - List of observed data
563
564 int nsamples
565 - Number of data per subsample
566
567
568 ::math::statistics::test-xbar control data
569 Determine if the data exceed the control limits for the xbar
570 chart.
571
572 Returns a list of subsamples (their indices) that indeed violate
573 the limits.
574
575 list control
576 - Control limits as returned by the "control-xbar" proce‐
577 dure
578
579 list data
580 - List of observed data
581
582
583 ::math::statistics::test-Rchart control data
584 Determine if the data exceed the control limits for the R chart.
585
586 Returns a list of subsamples (their indices) that indeed violate
587 the limits.
588
589 list control
590 - Control limits as returned by the "control-Rchart" pro‐
591 cedure
592
593 list data
594 - List of observed data
595
596
598 Besides the linear regression with a single independent variable, the
599 statistics package provides two procedures for doing ordinary least
600 squares (OLS) and weighted least squares (WLS) linear regression with
601 several variables. They were written by Eric Kemp-Benedict.
602
603 In addition to these two, it provides a procedure (tstat) for calculat‐
604 ing the value of the t-statistic for the specified number of degrees of
605 freedom that is required to demonstrate a given level of significance.
606
607 Note: These procedures depend on the math::linearalgebra package.
608
609 Description of the procedures
610
611 ::math::statistics::tstat dof ?alpha?
612 Returns the value of the t-distribution t* satisfying
613
614 P(t*) = 1 - alpha/2
615 P(-t*) = alpha/2
616
617 for the number of degrees of freedom dof.
618
619 Given a sample of normally-distributed data x, with an estimate
620 xbar for the mean and sbar for the standard deviation, the alpha
621 confidence interval for the estimate of the mean can be calcu‐
622 lated as
623
624 ( xbar - t* sbar , xbar + t* sbar)
625
626 The return values from this procedure can be compared to an
627 estimated t-statistic to determine whether the estimated value
628 of a parameter is significantly different from zero at the given
629 confidence level.
630
631 int dof
632 Number of degrees of freedom
633
634 float alpha
635 Confidence level of the t-distribution. Defaults to 0.05.
636
637
638 ::math::statistics::mv-wls wt1 weights_and_values
639 Carries out a weighted least squares linear regression for the
640 data points provided, with weights assigned to each point.
641
642 The linear model is of the form
643
644 y = b0 + b1 * x1 + b2 * x2 ... + bN * xN + error
645
646 and each point satisfies
647
648 yi = b0 + b1 * xi1 + b2 * xi2 + ... + bN * xiN + Residual_i
649
650
651 The procedure returns a list with the following elements:
652
653 · The r-squared statistic
654
655 · The adjusted r-squared statistic
656
657 · A list containing the estimated coefficients b1, ... bN,
658 b0 (The constant b0 comes last in the list.)
659
660 · A list containing the standard errors of the coefficients
661
662 · A list containing the 95% confidence bounds of the coef‐
663 ficients, with each set of bounds returned as a list with
664 two values
665 Arguments:
666
667 list weights_and_values
668 A list consisting of: the weight for the first observa‐
669 tion, the data for the first observation (as a sublist),
670 the weight for the second observation (as a sublist) and
671 so on. The sublists of data are organised as lists of the
672 value of the dependent variable y and the independent
673 variables x1, x2 to xN.
674
675
676 ::math::statistics::mv-ols values
677 Carries out an ordinary least squares linear regression for the
678 data points provided.
679
680 This procedure simply calls ::mvlinreg::wls with the weights set
681 to 1.0, and returns the same information.
682
683 Example of the use:
684
685 # Store the value of the unicode value for the "+/-" character
686 set pm "\u00B1"
687
688 # Provide some data
689 set data {{ -.67 14.18 60.03 -7.5 }
690 { 36.97 15.52 34.24 14.61 }
691 {-29.57 21.85 83.36 -7. }
692 {-16.9 11.79 51.67 -6.56 }
693 { 14.09 16.24 36.97 -12.84}
694 { 31.52 20.93 45.99 -25.4 }
695 { 24.05 20.69 50.27 17.27}
696 { 22.23 16.91 45.07 -4.3 }
697 { 40.79 20.49 38.92 -.73 }
698 {-10.35 17.24 58.77 18.78}}
699
700 # Call the ols routine
701 set results [::math::statistics::mv-ols $data]
702
703 # Pretty-print the results
704 puts "R-squared: [lindex $results 0]"
705 puts "Adj R-squared: [lindex $results 1]"
706 puts "Coefficients $pm s.e. -- \[95% confidence interval\]:"
707 foreach val [lindex $results 2] se [lindex $results 3] bounds [lindex $results 4] {
708 set lb [lindex $bounds 0]
709 set ub [lindex $bounds 1]
710 puts " $val $pm $se -- \[$lb to $ub\]"
711 }
712
713
715 In the literature a large number of probability distributions can be
716 found. The statistics package supports:
717
718 · The normal or Gaussian distribution
719
720 · The uniform distribution - equal probability for all data within
721 a given interval
722
723 · The exponential distribution - useful as a model for certain
724 extreme-value distributions.
725
726 · The gamma distribution - based on the incomplete Gamma integral
727
728 · The chi-square distribution
729
730 · The student's T distribution
731
732 · The Poisson distribution
733
734 · PM - binomial,F.
735
736 In principle for each distribution one has procedures for:
737
738 · The probability density (pdf-*)
739
740 · The cumulative density (cdf-*)
741
742 · Quantiles for the given distribution (quantiles-*)
743
744 · Histograms for the given distribution (histogram-*)
745
746 · List of random values with the given distribution (random-*)
747
748 The following procedures have been implemented:
749
750 ::math::statistics::pdf-normal mean stdev value
751 Return the probability of a given value for a normal distribu‐
752 tion with given mean and standard deviation.
753
754 float mean
755 - Mean value of the distribution
756
757 float stdev
758 - Standard deviation of the distribution
759
760 float value
761 - Value for which the probability is required
762
763
764 ::math::statistics::pdf-exponential mean value
765 Return the probability of a given value for an exponential dis‐
766 tribution with given mean.
767
768 float mean
769 - Mean value of the distribution
770
771 float value
772 - Value for which the probability is required
773
774
775 ::math::statistics::pdf-uniform xmin xmax value
776 Return the probability of a given value for a uniform distribu‐
777 tion with given extremes.
778
779 float xmin
780 - Minimum value of the distribution
781
782 float xmin
783 - Maximum value of the distribution
784
785 float value
786 - Value for which the probability is required
787
788
789 ::math::statistics::pdf-gamma alpha beta value
790 Return the probability of a given value for a Gamma distribution
791 with given shape and rate parameters
792
793 float alpha
794 - Shape parameter
795
796 float beta
797 - Rate parameter
798
799 float value
800 - Value for which the probability is required
801
802
803 ::math::statistics::pdf-poisson mu k
804 Return the probability of a given number of occurrences in the
805 same interval (k) for a Poisson distribution with given mean
806 (mu)
807
808 float mu
809 - Mean number of occurrences
810
811 int k - Number of occurences
812
813
814 ::math::statistics::pdf-chisquare df value
815 Return the probability of a given value for a chi square distri‐
816 bution with given degrees of freedom
817
818 float df
819 - Degrees of freedom
820
821 float value
822 - Value for which the probability is required
823
824
825 ::math::statistics::pdf-student-t df value
826 Return the probability of a given value for a Student's t dis‐
827 tribution with given degrees of freedom
828
829 float df
830 - Degrees of freedom
831
832 float value
833 - Value for which the probability is required
834
835
836 ::math::statistics::pdf-beta a b value
837 Return the probability of a given value for a Beta distribution
838 with given shape parameters
839
840 float a
841 - First shape parameter
842
843 float b
844 - First shape parameter
845
846 float value
847 - Value for which the probability is required
848
849
850 ::math::statistics::cdf-normal mean stdev value
851 Return the cumulative probability of a given value for a normal
852 distribution with given mean and standard deviation, that is the
853 probability for values up to the given one.
854
855 float mean
856 - Mean value of the distribution
857
858 float stdev
859 - Standard deviation of the distribution
860
861 float value
862 - Value for which the probability is required
863
864
865 ::math::statistics::cdf-exponential mean value
866 Return the cumulative probability of a given value for an expo‐
867 nential distribution with given mean.
868
869 float mean
870 - Mean value of the distribution
871
872 float value
873 - Value for which the probability is required
874
875
876 ::math::statistics::cdf-uniform xmin xmax value
877 Return the cumulative probability of a given value for a uniform
878 distribution with given extremes.
879
880 float xmin
881 - Minimum value of the distribution
882
883 float xmin
884 - Maximum value of the distribution
885
886 float value
887 - Value for which the probability is required
888
889
890 ::math::statistics::cdf-students-t degrees value
891 Return the cumulative probability of a given value for a Stu‐
892 dent's t distribution with given number of degrees.
893
894 int degrees
895 - Number of degrees of freedom
896
897 float value
898 - Value for which the probability is required
899
900
901 ::math::statistics::cdf-gamma alpha beta value
902 Return the cumulative probability of a given value for a Gamma
903 distribution with given shape and rate parameters
904
905 float alpha
906 - Shape parameter
907
908 float beta
909 - Rate parameter
910
911 float value
912 - Value for which the cumulative probability is required
913
914
915 ::math::statistics::cdf-poisson mu k
916 Return the cumulative probability of a given number of occur‐
917 rences in the same interval (k) for a Poisson distribution with
918 given mean (mu)
919
920 float mu
921 - Mean number of occurrences
922
923 int k - Number of occurences
924
925
926 ::math::statistics::cdf-beta a b value
927 Return the cumulative probability of a given value for a Beta
928 distribution with given shape parameters
929
930 float a
931 - First shape parameter
932
933 float b
934 - First shape parameter
935
936 float value
937 - Value for which the probability is required
938
939
940 ::math::statistics::random-normal mean stdev number
941 Return a list of "number" random values satisfying a normal dis‐
942 tribution with given mean and standard deviation.
943
944 float mean
945 - Mean value of the distribution
946
947 float stdev
948 - Standard deviation of the distribution
949
950 int number
951 - Number of values to be returned
952
953
954 ::math::statistics::random-exponential mean number
955 Return a list of "number" random values satisfying an exponen‐
956 tial distribution with given mean.
957
958 float mean
959 - Mean value of the distribution
960
961 int number
962 - Number of values to be returned
963
964
965 ::math::statistics::random-uniform xmin xmax number
966 Return a list of "number" random values satisfying a uniform
967 distribution with given extremes.
968
969 float xmin
970 - Minimum value of the distribution
971
972 float xmax
973 - Maximum value of the distribution
974
975 int number
976 - Number of values to be returned
977
978
979 ::math::statistics::random-gamma alpha beta number
980 Return a list of "number" random values satisfying a Gamma dis‐
981 tribution with given shape and rate parameters
982
983 float alpha
984 - Shape parameter
985
986 float beta
987 - Rate parameter
988
989 int number
990 - Number of values to be returned
991
992
993 ::math::statistics::random-chisquare df number
994 Return a list of "number" random values satisfying a chi square
995 distribution with given degrees of freedom
996
997 float df
998 - Degrees of freedom
999
1000 int number
1001 - Number of values to be returned
1002
1003
1004 ::math::statistics::random-student-t df number
1005 Return a list of "number" random values satisfying a Student's t
1006 distribution with given degrees of freedom
1007
1008 float df
1009 - Degrees of freedom
1010
1011 int number
1012 - Number of values to be returned
1013
1014
1015 ::math::statistics::random-beta a b number
1016 Return a list of "number" random values satisfying a Beta dis‐
1017 tribution with given shape parameters
1018
1019 float a
1020 - First shape parameter
1021
1022 float b
1023 - Second shape parameter
1024
1025 int number
1026 - Number of values to be returned
1027
1028
1029 ::math::statistics::histogram-uniform xmin xmax limits number
1030 Return the expected histogram for a uniform distribution.
1031
1032 float xmin
1033 - Minimum value of the distribution
1034
1035 float xmax
1036 - Maximum value of the distribution
1037
1038 list limits
1039 - Upper limits for the buckets in the histogram
1040
1041 int number
1042 - Total number of "observations" in the histogram
1043
1044
1045 ::math::statistics::incompleteGamma x p ?tol?
1046 Evaluate the incomplete Gamma integral
1047
1048 1 / x p-1
1049 P(p,x) = -------- | dt exp(-t) * t
1050 Gamma(p) / 0
1051
1052
1053 float x
1054 - Value of x (limit of the integral)
1055
1056 float p
1057 - Value of p in the integrand
1058
1059 float tol
1060 - Required tolerance (default: 1.0e-9)
1061
1062
1063 ::math::statistics::incompleteBeta a b x ?tol?
1064 Evaluate the incomplete Beta integral
1065
1066 float a
1067 - First shape parameter
1068
1069 float b
1070 - Second shape parameter
1071
1072 float x
1073 - Value of x (limit of the integral)
1074
1075 float tol
1076 - Required tolerance (default: 1.0e-9)
1077
1078
1079 TO DO: more function descriptions to be added
1080
1082 The data manipulation procedures act on lists or lists of lists:
1083
1084 ::math::statistics::filter varname data expression
1085 Return a list consisting of the data for which the logical
1086 expression is true (this command works analogously to the com‐
1087 mand foreach).
1088
1089 string varname
1090 - Name of the variable used in the expression
1091
1092 list data
1093 - List of data
1094
1095 string expression
1096 - Logical expression using the variable name
1097
1098
1099 ::math::statistics::map varname data expression
1100 Return a list consisting of the data that are transformed via
1101 the expression.
1102
1103 string varname
1104 - Name of the variable used in the expression
1105
1106 list data
1107 - List of data
1108
1109 string expression
1110 - Expression to be used to transform (map) the data
1111
1112
1113 ::math::statistics::samplescount varname list expression
1114 Return a list consisting of the counts of all data in the sub‐
1115 lists of the "list" argument for which the expression is true.
1116
1117 string varname
1118 - Name of the variable used in the expression
1119
1120 list data
1121 - List of sublists, each containing the data
1122
1123 string expression
1124 - Logical expression to test the data (defaults to
1125 "true").
1126
1127
1128 ::math::statistics::subdivide
1129 Routine PM - not implemented yet
1130
1132 The following simple plotting procedures are available:
1133
1134 ::math::statistics::plot-scale canvas xmin xmax ymin ymax
1135 Set the scale for a plot in the given canvas. All plot routines
1136 expect this function to be called first. There is no automatic
1137 scaling provided.
1138
1139 widget canvas
1140 - Canvas widget to use
1141
1142 float xmin
1143 - Minimum x value
1144
1145 float xmax
1146 - Maximum x value
1147
1148 float ymin
1149 - Minimum y value
1150
1151 float ymax
1152 - Maximum y value
1153
1154
1155 ::math::statistics::plot-xydata canvas xdata ydata tag
1156 Create a simple XY plot in the given canvas - the data are shown
1157 as a collection of dots. The tag can be used to manipulate the
1158 appearance.
1159
1160 widget canvas
1161 - Canvas widget to use
1162
1163 float xdata
1164 - Series of independent data
1165
1166 float ydata
1167 - Series of dependent data
1168
1169 string tag
1170 - Tag to give to the plotted data (defaults to xyplot)
1171
1172
1173 ::math::statistics::plot-xyline canvas xdata ydata tag
1174 Create a simple XY plot in the given canvas - the data are shown
1175 as a line through the data points. The tag can be used to manip‐
1176 ulate the appearance.
1177
1178 widget canvas
1179 - Canvas widget to use
1180
1181 list xdata
1182 - Series of independent data
1183
1184 list ydata
1185 - Series of dependent data
1186
1187 string tag
1188 - Tag to give to the plotted data (defaults to xyplot)
1189
1190
1191 ::math::statistics::plot-tdata canvas tdata tag
1192 Create a simple XY plot in the given canvas - the data are shown
1193 as a collection of dots. The horizontal coordinate is equal to
1194 the index. The tag can be used to manipulate the appearance.
1195 This type of presentation is suitable for autocorrelation func‐
1196 tions for instance or for inspecting the time-dependent behav‐
1197 iour.
1198
1199 widget canvas
1200 - Canvas widget to use
1201
1202 list tdata
1203 - Series of dependent data
1204
1205 string tag
1206 - Tag to give to the plotted data (defaults to xyplot)
1207
1208
1209 ::math::statistics::plot-tline canvas tdata tag
1210 Create a simple XY plot in the given canvas - the data are shown
1211 as a line. See plot-tdata for an explanation.
1212
1213 widget canvas
1214 - Canvas widget to use
1215
1216 list tdata
1217 - Series of dependent data
1218
1219 string tag
1220 - Tag to give to the plotted data (defaults to xyplot)
1221
1222
1223 ::math::statistics::plot-histogram canvas counts limits tag
1224 Create a simple histogram in the given canvas
1225
1226 widget canvas
1227 - Canvas widget to use
1228
1229 list counts
1230 - Series of bucket counts
1231
1232 list limits
1233 - Series of upper limits for the buckets
1234
1235 string tag
1236 - Tag to give to the plotted data (defaults to xyplot)
1237
1238
1240 The following procedures are yet to be implemented:
1241
1242 · F-test-stdev
1243
1244 · interval-mean-stdev
1245
1246 · histogram-normal
1247
1248 · histogram-exponential
1249
1250 · test-histogram
1251
1252 · test-corr
1253
1254 · quantiles-*
1255
1256 · fourier-coeffs
1257
1258 · fourier-residuals
1259
1260 · onepar-function-fit
1261
1262 · onepar-function-residuals
1263
1264 · plot-linear-model
1265
1266 · subdivide
1267
1269 The code below is a small example of how you can examine a set of data:
1270
1271 # Simple example:
1272 # - Generate data (as a cheap way of getting some)
1273 # - Perform statistical analysis to describe the data
1274 #
1275 package require math::statistics
1276
1277 #
1278 # Two auxiliary procs
1279 #
1280 proc pause {time} {
1281 set wait 0
1282 after [expr {$time*1000}] {set ::wait 1}
1283 vwait wait
1284 }
1285
1286 proc print-histogram {counts limits} {
1287 foreach count $counts limit $limits {
1288 if { $limit != {} } {
1289 puts [format "<%12.4g\t%d" $limit $count]
1290 set prev_limit $limit
1291 } else {
1292 puts [format ">%12.4g\t%d" $prev_limit $count]
1293 }
1294 }
1295 }
1296
1297 #
1298 # Our source of arbitrary data
1299 #
1300 proc generateData { data1 data2 } {
1301 upvar 1 $data1 _data1
1302 upvar 1 $data2 _data2
1303
1304 set d1 0.0
1305 set d2 0.0
1306 for { set i 0 } { $i < 100 } { incr i } {
1307 set d1 [expr {10.0-2.0*cos(2.0*3.1415926*$i/24.0)+3.5*rand()}]
1308 set d2 [expr {0.7*$d2+0.3*$d1+0.7*rand()}]
1309 lappend _data1 $d1
1310 lappend _data2 $d2
1311 }
1312 return {}
1313 }
1314
1315 #
1316 # The analysis session
1317 #
1318 package require Tk
1319 console show
1320 canvas .plot1
1321 canvas .plot2
1322 pack .plot1 .plot2 -fill both -side top
1323
1324 generateData data1 data2
1325
1326 puts "Basic statistics:"
1327 set b1 [::math::statistics::basic-stats $data1]
1328 set b2 [::math::statistics::basic-stats $data2]
1329 foreach label {mean min max number stdev var} v1 $b1 v2 $b2 {
1330 puts "$label\t$v1\t$v2"
1331 }
1332 puts "Plot the data as function of \"time\" and against each other"
1333 ::math::statistics::plot-scale .plot1 0 100 0 20
1334 ::math::statistics::plot-scale .plot2 0 20 0 20
1335 ::math::statistics::plot-tline .plot1 $data1
1336 ::math::statistics::plot-tline .plot1 $data2
1337 ::math::statistics::plot-xydata .plot2 $data1 $data2
1338
1339 puts "Correlation coefficient:"
1340 puts [::math::statistics::corr $data1 $data2]
1341
1342 pause 2
1343 puts "Plot histograms"
1344 ::math::statistics::plot-scale .plot2 0 20 0 100
1345 set limits [::math::statistics::minmax-histogram-limits 7 16]
1346 set histogram_data [::math::statistics::histogram $limits $data1]
1347 ::math::statistics::plot-histogram .plot2 $histogram_data $limits
1348
1349 puts "First series:"
1350 print-histogram $histogram_data $limits
1351
1352 pause 2
1353 set limits [::math::statistics::minmax-histogram-limits 0 15 10]
1354 set histogram_data [::math::statistics::histogram $limits $data2]
1355 ::math::statistics::plot-histogram .plot2 $histogram_data $limits d2
1356
1357 puts "Second series:"
1358 print-histogram $histogram_data $limits
1359
1360 puts "Autocorrelation function:"
1361 set autoc [::math::statistics::autocorr $data1]
1362 puts [::math::statistics::map $autoc {[format "%.2f" $x]}]
1363 puts "Cross-correlation function:"
1364 set crossc [::math::statistics::crosscorr $data1 $data2]
1365 puts [::math::statistics::map $crossc {[format "%.2f" $x]}]
1366
1367 ::math::statistics::plot-scale .plot1 0 100 -1 4
1368 ::math::statistics::plot-tline .plot1 $autoc "autoc"
1369 ::math::statistics::plot-tline .plot1 $crossc "crossc"
1370
1371 puts "Quantiles: 0.1, 0.2, 0.5, 0.8, 0.9"
1372 puts "First: [::math::statistics::quantiles $data1 {0.1 0.2 0.5 0.8 0.9}]"
1373 puts "Second: [::math::statistics::quantiles $data2 {0.1 0.2 0.5 0.8 0.9}]"
1374
1375
1376 If you run this example, then the following should be clear:
1377
1378 · There is a strong correlation between two time series, as dis‐
1379 played by the raw data and especially by the correlation func‐
1380 tions.
1381
1382 · Both time series show a significant periodic component
1383
1384 · The histograms are not very useful in identifying the nature of
1385 the time series - they do not show the periodic nature.
1386
1388 This document, and the package it describes, will undoubtedly contain
1389 bugs and other problems. Please report such in the category math ::
1390 statistics of the Tcllib SF Trackers [http://source‐
1391 forge.net/tracker/?group_id=12883]. Please also report any ideas for
1392 enhancements you may have for either package and/or documentation.
1393
1395 data analysis, mathematics, statistics
1396
1397
1398
1399math 0.5 math::statistics(n)