1PARALLELCPU(1) User Contributed Perl Documentation PARALLELCPU(1)
2
3
4
6 PDL::ParallelCPU - Parallel processor multi-threading support in PDL
7
9 PDL has support for splitting up numerical processing between multiple
10 parallel processor threads (or pthreads) using the set_autopthread_targ
11 and set_autopthread_size functions. This can improve processing
12 performance (by greater than 2-4X in most cases) by taking advantage of
13 multi-core and/or multi-processor machines.
14
15 As of 2.059, "online_cpus" in PDL::Core is used to set the number of
16 threads used if "PDL_AUTOPTHREAD_TARG" is not set.
17
19 use PDL;
20
21 # Set target of 4 parallel pthreads to create, with a lower limit of
22 # 5Meg elements for splitting processing into parallel pthreads.
23 set_autopthread_targ(4);
24 set_autopthread_size(5);
25
26 $x = zeroes(5000,5000); # Create 25Meg element array
27
28 $y = $x + 5; # Processing will be split up into multiple pthreads
29
30 # Get the actual number of pthreads for the last
31 # processing operation.
32 $actualPthreads = get_autopthread_actual();
33
34 # Or compare these to see CPU usage (first one only 1 pthread, second one 10)
35 # in the PDL shell:
36 $x = ones(10,1000,10000); set_autopthread_targ(1); $y = sin($x)*cos($x); p get_autopthread_actual;
37 $x = ones(10,1000,10000); set_autopthread_targ(10); $y = sin($x)*cos($x); p get_autopthread_actual;
38
40 To reduce the confusion that existed in PDL before 2.075, this document
41 uses pthreading to refer to processor multi-threading, which is the use
42 of multiple processor threads to split up numerical processing into
43 parallel operations.
44
46 This is a brief listing and description of the PDL pthreading
47 functions, see the PDL::Core docs for detailed information.
48
49 set_autopthread_targ
50 Set the target number of processor-threads (pthreads) for multi-
51 threaded processing. Setting auto_pthread_targ to 0 means that no
52 pthreading will occur.
53
54 See PDL::Core for details.
55
56 set_autopthread_size
57 Set the minimum size (in Meg-elements or 2**20 elements) of the
58 largest PDL involved in a function where auto-pthreading will be
59 performed. For small PDLs, it probably isn't worth starting
60 multiple pthreads, so this function is used to define a minimum
61 threshold where auto-pthreading won't be attempted.
62
63 See PDL::Core for details.
64
65 get_autopthread_actual
66 Get the actual number of pthreads executed for the last pdl
67 processing function.
68
69 See PDL::get_autopthread_actual for details.
70
72 PDL pthreading can be globally turned on, without modifying existing
73 code by setting environment variables PDL_AUTOPTHREAD_TARG and
74 PDL_AUTOPTHREAD_SIZE before running a PDL script. These environment
75 variables are checked when PDL starts up and calls to
76 set_autopthread_targ and set_autopthread_size functions made with the
77 environment variable's values.
78
79 For example, if the environment var PDL_AUTOPTHREAD_TARG is set to 3,
80 and PDL_AUTOPTHREAD_SIZE is set to 10, then any pdl script will run as
81 if the following lines were at the top of the file:
82
83 set_autopthread_targ(3);
84 set_autopthread_size(10);
85
87 The auto-pthreading process works by analyzing broadcast array
88 dimensions in PDL operations (those above the operation's "signature"
89 dimensions) and splitting up processing according to those and the
90 desired number of pthreads (i.e. the pthread target or pthread_targ).
91 The offsets, increments, and dimension-sizes (in case the whole
92 dimension does not divide neatly by the number of pthreads) that PDL
93 uses to step thru the data in memory are modified for each pthread so
94 each one sees a different set of data when performing processing.
95
96 Example
97
98 $x = sequence(20,4,3); # Small 3-D Array, size 20,4,3
99
100 # Setup auto-pthreading:
101 set_autopthread_targ(2); # Target of 2 pthreads
102 set_autopthread_size(0); # Zero so that the small PDLs in this example will be pthreaded
103
104 # This will be split up into 2 pthreads
105 $c = maximum($x);
106
107 For the above example, the maximum function has a signature of "(a(n);
108 [o]c())", which means that the first dimension of $x (size 20) is a
109 Core dimension of the maximum function. The other dimensions of $x
110 (size 4,3) are broadcast dimensions (i.e. will be broadcasted-over in
111 the maximum function.
112
113 The auto-pthreading algorithm examines the broadcasted dims of size
114 (4,3) and picks the 4 dimension, since it is evenly divisible by the
115 autopthread_targ of 2. The processing of the maximum function is then
116 split into two pthreads on the size-4 dimension, with dim indexes 0,2
117 processed by one pthread
118 and dim indexes 1,3 processed by the other pthread.
119
121 Must have POSIX Threads Enabled
122 Auto-pthreading only works if your PDL installation was compiled with
123 POSIX threads enabled. This is normally the case if you are running on
124 Windows, Linux, MacOS X, or other unix variants.
125
126 Non-Threadsafe Code
127 Not all the libraries that PDL intefaces to are thread-safe, i.e. they
128 aren't written to operate in a multi-threaded environment without
129 crashing or causing side-effects. Some examples in the PDL core is the
130 fft function and the pnmout functions.
131
132 To operate properly with these types of functions, the PPCode flag
133 NoPthread has been introduced to indicate a function as not being
134 pthread-safe. See PDL::PP docs for details.
135
136 Size of PDL Dimensions and pthread Target
137 As of PDL 2.058, the broadcasted dimension sizes do not need to divide
138 exactly by the pthread target, although if one does, it will be used.
139
140 If no dimension is as large as the pthread target, the number of
141 pthreads will be the size of the largest broadcasted dimension.
142
143 In order to minimise idle CPUs on the last iteration at the end of the
144 broadcasted dimension, the algorithm that picks the dimension to
145 pthread on aims for the largest remainder in dividing the pthread
146 target into the sizes of the broadcasted dimensions. For example, if a
147 PDL has broadcasted dimension sizes of (9,6,2) and the
148 auto_pthread_targ is 4, the algorithm will pick the 1-th (size 6), as
149 that will leave a remainder of 2 (leaving 2 idle at the end) in
150 preference to one with size 9, which would leave 3 idle.
151
152 Speed improvement might be less than you expect.
153 If you have an 8-core machine and call auto_pthread_targ with 8 to
154 generate 8 parallel pthreads, you probably won't get a 8X improvement
155 in speed, due to memory bandwidth issues. Even though you have 8
156 separate CPUs crunching away on data, you will have (for most common
157 machine architectures) common RAM that now becomes your bottleneck. For
158 simple calculations (e.g simple additions) you can run into a
159 performance limit at about 4 pthreads. For more CPU-bound calculations
160 the limit will be higher.
161
163 Copyright 2011 John Cerney. You can distribute and/or modify this
164 document under the same terms as the current Perl license.
165
166 See: http://dev.perl.org/licenses/
167
168
169
170perl v5.38.0 2023-07-21 PARALLELCPU(1)