1PARALLELCPU(1)        User Contributed Perl Documentation       PARALLELCPU(1)
2
3
4

NAME

6       PDL::ParallelCPU - Parallel processor multi-threading support in PDL
7

DESCRIPTION

9       PDL has support for splitting up numerical processing between multiple
10       parallel processor threads (or pthreads) using the set_autopthread_targ
11       and set_autopthread_size functions.  This can improve processing
12       performance (by greater than 2-4X in most cases) by taking advantage of
13       multi-core and/or multi-processor machines.
14
15       As of 2.059, "online_cpus" in PDL::Core is used to set the number of
16       threads used if "PDL_AUTOPTHREAD_TARG" is not set.
17

SYNOPSIS

19         use PDL;
20
21         # Set target of 4 parallel pthreads to create, with a lower limit of
22         #  5Meg elements for splitting processing into parallel pthreads.
23         set_autopthread_targ(4);
24         set_autopthread_size(5);
25
26         $x = zeroes(5000,5000); # Create 25Meg element array
27
28         $y = $x + 5; # Processing will be split up into multiple pthreads
29
30         # Get the actual number of pthreads for the last
31         #  processing operation.
32         $actualPthreads = get_autopthread_actual();
33
34         # Or compare these to see CPU usage (first one only 1 pthread, second one 10)
35         # in the PDL shell:
36         $x = ones(10,1000,10000); set_autopthread_targ(1); $y = sin($x)*cos($x); p get_autopthread_actual;
37         $x = ones(10,1000,10000); set_autopthread_targ(10); $y = sin($x)*cos($x); p get_autopthread_actual;
38

Terminology

40       To reduce the confusion that existed in PDL before 2.075, this document
41       uses pthreading to refer to processor multi-threading, which is the use
42       of multiple processor threads to split up numerical processing into
43       parallel operations.
44

Functions that control PDL pthreads

46       This is a brief listing and description of the PDL pthreading
47       functions, see the PDL::Core docs for detailed information.
48
49       set_autopthread_targ
50            Set the target number of processor-threads (pthreads) for multi-
51            threaded processing. Setting auto_pthread_targ to 0 means that no
52            pthreading will occur.
53
54            See PDL::Core for details.
55
56       set_autopthread_size
57            Set the minimum size (in Meg-elements or 2**20 elements) of the
58            largest PDL involved in a function where auto-pthreading will be
59            performed. For small PDLs, it probably isn't worth starting
60            multiple pthreads, so this function is used to define a minimum
61            threshold where auto-pthreading won't be attempted.
62
63            See PDL::Core for details.
64
65       get_autopthread_actual
66            Get the actual number of pthreads executed for the last pdl
67            processing function.
68
69            See PDL::get_autopthread_actual for details.
70

Global Control of PDL pthreading using Environment Variables

72       PDL pthreading can be globally turned on, without modifying existing
73       code by setting environment variables PDL_AUTOPTHREAD_TARG and
74       PDL_AUTOPTHREAD_SIZE before running a PDL script.  These environment
75       variables are checked when PDL starts up and calls to
76       set_autopthread_targ and set_autopthread_size functions made with the
77       environment variable's values.
78
79       For example, if the environment var PDL_AUTOPTHREAD_TARG is set to 3,
80       and PDL_AUTOPTHREAD_SIZE is set to 10, then any pdl script will run as
81       if the following lines were at the top of the file:
82
83        set_autopthread_targ(3);
84        set_autopthread_size(10);
85

How It Works

87       The auto-pthreading process works by analyzing broadcast array
88       dimensions in PDL operations (those above the operation's "signature"
89       dimensions) and splitting up processing according to those and the
90       desired number of pthreads (i.e. the pthread target or pthread_targ).
91       The offsets, increments, and dimension-sizes (in case the whole
92       dimension does not divide neatly by the number of pthreads) that PDL
93       uses to step thru the data in memory are modified for each pthread so
94       each one sees a different set of data when performing processing.
95
96       Example
97
98        $x = sequence(20,4,3); # Small 3-D Array, size 20,4,3
99
100        # Setup auto-pthreading:
101        set_autopthread_targ(2); # Target of 2 pthreads
102        set_autopthread_size(0); # Zero so that the small PDLs in this example will be pthreaded
103
104        # This will be split up into 2 pthreads
105        $c = maximum($x);
106
107       For the above example, the maximum function has a signature of "(a(n);
108       [o]c())", which means that the first dimension of $x (size 20) is a
109       Core dimension of the maximum function. The other dimensions of $x
110       (size 4,3) are broadcast dimensions (i.e. will be broadcasted-over in
111       the maximum function.
112
113       The auto-pthreading algorithm examines the broadcasted dims of size
114       (4,3) and picks the 4 dimension, since it is evenly divisible by the
115       autopthread_targ of 2. The processing of the maximum function is then
116       split into two pthreads on the size-4 dimension, with dim indexes 0,2
117       processed by one pthread
118        and dim indexes 1,3 processed by the other pthread.
119

Limitations

121   Must have POSIX Threads Enabled
122       Auto-pthreading only works if your PDL installation was compiled with
123       POSIX threads enabled. This is normally the case if you are running on
124       Windows, Linux, MacOS X, or other unix variants.
125
126   Non-Threadsafe Code
127       Not all the libraries that PDL intefaces to are thread-safe, i.e. they
128       aren't written to operate in a multi-threaded environment without
129       crashing or causing side-effects. Some examples in the PDL core is the
130       fft function and the pnmout functions.
131
132       To operate properly with these types of functions, the PPCode flag
133       NoPthread has been introduced to indicate a function as not being
134       pthread-safe. See PDL::PP docs for details.
135
136   Size of PDL Dimensions and pthread Target
137       As of PDL 2.058, the broadcasted dimension sizes do not need to divide
138       exactly by the pthread target, although if one does, it will be used.
139
140       If no dimension is as large as the pthread target, the number of
141       pthreads will be the size of the largest broadcasted dimension.
142
143       In order to minimise idle CPUs on the last iteration at the end of the
144       broadcasted dimension, the algorithm that picks the dimension to
145       pthread on aims for the largest remainder in dividing the pthread
146       target into the sizes of the broadcasted dimensions. For example, if a
147       PDL has broadcasted dimension sizes of (9,6,2) and the
148       auto_pthread_targ is 4, the algorithm will pick the 1-th (size 6), as
149       that will leave a remainder of 2 (leaving 2 idle at the end) in
150       preference to one with size 9, which would leave 3 idle.
151
152   Speed improvement might be less than you expect.
153       If you have an 8-core machine and call auto_pthread_targ with 8 to
154       generate 8 parallel pthreads, you probably won't get a 8X improvement
155       in speed, due to memory bandwidth issues. Even though you have 8
156       separate CPUs crunching away on data, you will have (for most common
157       machine architectures) common RAM that now becomes your bottleneck. For
158       simple calculations (e.g simple additions) you can run into a
159       performance limit at about 4 pthreads. For more CPU-bound calculations
160       the limit will be higher.
161
163       Copyright 2011 John Cerney. You can distribute and/or modify this
164       document under the same terms as the current Perl license.
165
166       See: http://dev.perl.org/licenses/
167
168
169
170perl v5.34.0                      2022-02-28                    PARALLELCPU(1)
Impressum