1md_docs_job-safety(3)               dirsrv               md_docs_job-safety(3)
2
3
4

NAME

6       md_docs_job-safety - Nunc Stans Job Safety Nunc Stans 0.2.0 comes with
7       many improvements for job safety. Most consumers of this framework will
8       not notice the difference if they are using it 'correctly', but in
9       other cases, you may find you have error conditions.
10
11       Jobs now flow through a set of states in their lifetime.
12
13   States
14       · WAITING: This represents a job that is idle, and not owned by a
15         worker or event thread. Any thread can alter this job.
16
17       · NEEDS_DELETE: This represents a job that is marked for deletion. It
18         cannot be accessed again!
19
20       · DELETED: This represents a job that is deleted. In theory, you can
21         never access a job in this state.
22
23       · NEEDS_ARM: This is a job that is about to be placed into the event or
24         work queue for arming, but has not yet been queued.
25
26       · ARMED: This is a job that is currently in the event queue or work
27         queue waiting to be executed.
28
29       · RUNNING: This is a job that is in the process of executing it's
30         callback right now.
31
32   Diagram
33   WAITING
34       All jobs start in the WAITING state. At this point, the job can have
35       two transitions. It is sent to ns_job_done, and marked as NEEDS_DELETE,
36       or it can be sent to ns_job_rearm, and marked as NEEDS_ARM. A job that
37       is WAITING can be safely modify with ns_job_set_* and accessed with
38       ns_job_get_* from any thread.
39
40   NEEDS_ARM
41       Once a job is in the NEEDS_ARM state, it can not be altered by
42       ns_job_set_*. It can be read from with ns_job_get_*. It can be sent to
43       ns_job_done (which moves to NEEDS_DELETE), but generally this is only
44       from within the job callback, with code like the following.
45
46       callback(ns_job_t *job) {
47           ns_job_rearm(job);
48           ns_job_done(job);
49       }
50
51
52       NEEDS_ARM in most cases will quickly move to the next state, ARMED
53
54   ARMED
55       In the ARMED state, this means that the job has been sucessfully queued
56       into the event or work queue. In the ARMED state, the job can be read
57       from with ns_job_get_*, but it cannot be altered with ns_job_set_*. If
58       a job could be altered while queued, this could cause issues with the
59       intent of what the job should do (set_data, set_cb, set_done_cb) etc.
60
61       A job that is ARMED and queued can NOT be removed from the queue, or
62       stopped from running. This is a point of no return!
63
64   RUNNING
65       In the RUNNING state, the job is in the process of executing the
66       callback that the job contains. While RUNNING, the thread that is
67       executing the callback may call ns_job_done, ns_job_rearm, ns_job_get_*
68       and ns_job_set_* upon the job. Note, that calling both ns_job_done and
69       ns_job_rearm from the callback, as the 'done' is a 'stronger' action we
70       will delete the job even though rearm was also called.
71
72       While RUNNING other threads (ie, not the worker thread executing the
73       callback) may only call ns_job_get_* upon the job. Due to the design of
74       the synchronisation underneath, this will block until the execution of
75       the callback, so for all intents and purposes by the time the external
76       thread is able to call ns_job_get_*, the job will have moved to
77       NEEDS_DELETE, NEEDS_ARM or WAITING.
78
79   NEEDS_DELETE
80       When you call ns_job_done, this marks the job as NEEDS_DELETE. The
81       deletion actually occurs at 'some later point'. When a job is set to
82       NEEDS_DELETE, you may not call any of the ns_job_get_* and ns_job_set_*
83       functions on the job.
84
85   DELETED
86       This state only exists on the job briefly. This means we are in the
87       process of deleting the job internally. We execute the ns_job_done_cb
88       at this point, so that the user may clean up and free any data as
89       required. Only the ns_job_done_cb thread may access the job at this
90       point.
91
92   Putting it all together
93       This state machine encourages certain types of work flows with jobs.
94       This is because the current states are opaque to the caller, and are
95       enforced inside of nunc-stans. The most obviously side effect of a
96       state machine violation is a ASSERT failure with -DDEBUG, or PR_FAILURE
97       from get()/set(). This encourages certain practices:
98
99       · Only single threads should be accessing jobs. This prevents races and
100         sync issues.
101
102       · Data and variables should exist in a single job. Avoid shared (heap)
103         memory locations!
104
105       · Changing jobs should only happen from within the callback, as you can
106         guarantee a consistent state without needing to spin/block on
107         ns_job_set_*.
108
109       · You may not need mutexes on your data or thread locals, as the job
110         provides the correct cpu synchronisation guarantees. Consider that
111         each job takes a 'root' data node, then all other allocated variables
112         are referenced there only by the single thread. You can now dispose
113         of mutexes, as the job will guarantee the synchronisation of this
114         data.
115
116       · Jobs work well if stack variables are used inside the callback
117         functions, rather than heap.
118
119       Some work flows that don't work well here:
120
121       · Having threads alter in-flight jobs. This causes race conditions and
122         inconsistencies.
123
124       · Sharing heap data via pointers in jobs. This means you need a mutex
125         on the data, which causes a serialisation point: Why bother with
126         thread pools if you are just going to serialise on some data points
127         anyway!
128
129       · Modifying jobs and what they handle. Don't do it! Just ns_job_done on
130         the job, and create a new one that matches what you want to do.
131
132       · Map reduce: Nunc-Stans doesn't provide a good way to aggregate data
133         on the return, IE reduce. You may need to provide a queue or some
134         other method to reduce if you were interested in this.
135
136   Examples
137       Inside of the nunc-stans project, the tests/cmocka/stress_test.c code
138       is a good example of a socket server and socket client using nunc-stans
139       that adheres to these principles.
140
141
142
143Version 1.3.8.4                 Thu Mar 14 2019          md_docs_job-safety(3)
Impressum