1PARALLELCPU(1)        User Contributed Perl Documentation       PARALLELCPU(1)
2
3
4

NAME

6       PDL::ParallelCPU - Parallel Processor MultiThreading Support in PDL
7       (Experimental)
8

DESCRIPTION

10       PDL has support (currently experimental) for splitting up numerical
11       processing between multiple parallel processor threads (or pthreads)
12       using the set_autopthread_targ and set_autopthread_size functions.
13       This can improve processing performance (by greater than 2-4X in most
14       cases) by taking advantage of multi-core and/or multi-processor
15       machines.
16

SYNOPSIS

18         use PDL;
19
20         # Set target of 4 parallel pthreads to create, with a lower limit of
21         #  5Meg elements for splitting processing into parallel pthreads.
22         set_autopthread_targ(4);
23         set_autopthread_size(5);
24
25         $a = zeroes(5000,5000); # Create 25Meg element array
26
27         $b = $a + 5; # Processing will be split up into multiple pthreads
28
29         # Get the actual number of pthreads for the last
30         #  processing operation.
31         $actualPthreads = get_autopthread_actual();
32

Terminology

34       The use of the term threading can be confusing with PDL, because it can
35       refer to PDL threading, as defined in the PDL::Threading docs, or to
36       processor multi-threading.
37
38       To reduce confusion with the existing PDL threading terminology, this
39       document uses pthreading to refer to processor multi-threading, which
40       is the use of multiple processor threads to split up numerical
41       processing into parallel operations.
42

Functions that control PDL PThreads

44       This is a brief listing and description of the PDL pthreading
45       functions, see the PDL::Core docs for detailed information.
46
47       set_autopthread_targ
48            Set the target number of processor-threads (pthreads) for multi-
49            threaded processing. Setting auto_pthread_targ to 0 means that no
50            pthreading will occur.
51
52            See PDL::Core for details.
53
54       set_autopthread_size
55            Set the minimum size (in Meg-elements or 2**20 elements) of the
56            largest PDL involved in a function where auto-pthreading will be
57            performed. For small PDLs, it probably isn't worth starting
58            multiple pthreads, so this function is used to define a minimum
59            threshold where auto-pthreading won't be attempted.
60
61            See PDL::Core for details.
62
63       get_autopthread_actual
64            Get the actual number of pthreads executed for the last pdl
65            processing function.
66
67            See PDL::get_autopthread_actual for details.
68

Global Control of PDL PThreading using Environment Variables

70       PDL PThreading can be globally turned on, without modifying existing
71       code by setting environment variables PDL_AUTOPTHREAD_TARG and
72       PDL_AUTOPTHREAD_SIZE before running a PDL script.  These environment
73       variables are checked when PDL starts up and calls to
74       set_autopthread_targ and set_autopthread_size functions made with the
75       environment variable's values.
76
77       For example, if the environment var PDL_AUTOPTHREAD_TARG is set to 3,
78       and PDL_AUTOPTHREAD_SIZE is set to 10, then any pdl script will run as
79       if the following lines were at the top of the file:
80
81        set_autopthread_targ(3);
82        set_autopthread_size(10);
83

How It Works

85       The auto-pthreading process works by analyzing threaded array
86       dimensions in PDL operations and splitting up processing based on the
87       thread dimension sizes and desired number of pthreads (i.e. the pthread
88       target or pthread_targ). The offsets and increments that PDL uses to
89       step thru the data in memory are modified for each pthread so each one
90       sees a different set of data when performing processing.
91
92       Example
93
94        $a = sequence(20,4,3); # Small 3-D Array, size 20,4,3
95
96        # Setup auto-pthreading:
97        set_autopthread_targ(2); # Target of 2 pthreads
98        set_autopthread_size(0); # Zero so that the small PDLs in this example will be pthreaded
99
100        # This will be split up into 2 pthreads
101        $c = maximum($a);
102
103       For the above example, the maximum function has a signature of "(a(n);
104       [o]c())", which means that the first dimension of $a (size 20) is a
105       Core dimension of the maximum function. The other dimensions of $a
106       (size 4,3) are threaded dimensions (i.e. will be threaded-over in the
107       maximum function.
108
109       The auto-pthreading algorithm examines the threaded dims of size (4,3)
110       and picks the 4 dimension, since it is evenly divisible by the
111       autopthread_targ of 2. The processing of the maximum function is then
112       split into two pthreads on the size-4 dimension, with dim indexes 0,2
113       processed by one pthread
114        and dim indexes 1,3 processed by the other pthread.
115

Limitations

117   Must have POSIX Threads Enabled
118       Auto-PThreading only works if your PDL installation was compiled with
119       POSIX threads enabled. This is normally the case if you are running on
120       linux, or other unix variants.
121
122   Non-Threadsafe Code
123       Not all the libraries that PDL intefaces to are thread-safe, i.e. they
124       aren't written to operate in a multi-threaded environment without
125       crashing or causing side-effects. Some examples in the PDL core is the
126       fft function and the pnmout functions.
127
128       To operate properly with these types of functions, the PPCode flag
129       NoPthread has been introduced to indicate a function as not being
130       pthread-safe. See PDL::PP docs for details.
131
132   Size of PDL Dimensions and PThread Target
133       Due to the way a PDL is split-up for operation using multiple pthreads,
134       the size of a dimension must be evenly divisible by the pthread target.
135       For example, if a PDL has threaded dimension sizes of (4,3,3) and the
136       auto_pthread_targ has been set to 2, then the first threaded dimension
137       (size 4) will be picked to be split up into two pthreads of size 2 and
138       2. However, if the threaded dimension sizes are (3,3,3) and the
139       auto_pthread_targ is still 2, then pthreading won't occur, because no
140       threaded dimensions are divisible by 2.
141
142       The algorithm that picks the actual number of pthreads has some smarts
143       (but could probably be improved) to adjust down from the
144       auto_pthread_targ to get a number of pthreads that can evenly divide
145       one of the threaded dimensions. For example, if a PDL has threaded
146       dimension sizes of (9,2,2) and the auto_pthread_targ is 4, the
147       algorithm will see that no dimension is divisible by 4, then adjust
148       down the target to 3, resulting in splitting up the first threaded
149       dimension (size 9) into 3 pthreads.
150
151   Speed improvement might be less than you expect.
152       If you have a 8 core machine and call auto_pthread_targ with 8 to
153       generate 8 parallel pthreads, you probably won't get a 8X improvement
154       in speed, due to memory bandwidth issues. Even though you have 8
155       separate CPUs crunching away on data, you will have (for most common
156       machine architectures) common RAM that now becomes your bottleneck. For
157       simple calculations (e.g simple additions) you can run into a
158       performance limit at about
159        4 pthreads. For more complex calculations the limit will be higher.
160
162       Copyright 2011 John Cerney. You can distribute and/or modify this
163       document under the same terms as the current Perl license.
164
165       See: http://dev.perl.org/licenses/
166
167
168
169perl v5.30.0                      2019-09-05                    PARALLELCPU(1)
Impressum