1md_docs_job-safety(3) dirsrv md_docs_job-safety(3)
2
3
4
6 md_docs_job-safety - Nunc Stans Job Safety Nunc Stans 0.2.0 comes with
7 many improvements for job safety. Most consumers of this framework will
8 not notice the difference if they are using it 'correctly', but in
9 other cases, you may find you have error conditions.
10
11 Jobs now flow through a set of states in their lifetime.
12
13 States
14 · WAITING: This represents a job that is idle, and not owned by a
15 worker or event thread. Any thread can alter this job.
16
17 · NEEDS_DELETE: This represents a job that is marked for deletion. It
18 cannot be accessed again!
19
20 · DELETED: This represents a job that is deleted. In theory, you can
21 never access a job in this state.
22
23 · NEEDS_ARM: This is a job that is about to be placed into the event or
24 work queue for arming, but has not yet been queued.
25
26 · ARMED: This is a job that is currently in the event queue or work
27 queue waiting to be executed.
28
29 · RUNNING: This is a job that is in the process of executing it's
30 callback right now.
31
32 Diagram
33 WAITING
34 All jobs start in the WAITING state. At this point, the job can have
35 two transitions. It is sent to ns_job_done, and marked as NEEDS_DELETE,
36 or it can be sent to ns_job_rearm, and marked as NEEDS_ARM. A job that
37 is WAITING can be safely modify with ns_job_set_* and accessed with
38 ns_job_get_* from any thread.
39
40 NEEDS_ARM
41 Once a job is in the NEEDS_ARM state, it can not be altered by
42 ns_job_set_*. It can be read from with ns_job_get_*. It can be sent to
43 ns_job_done (which moves to NEEDS_DELETE), but generally this is only
44 from within the job callback, with code like the following.
45
46 callback(ns_job_t *job) {
47 ns_job_rearm(job);
48 ns_job_done(job);
49 }
50
51
52 NEEDS_ARM in most cases will quickly move to the next state, ARMED
53
54 ARMED
55 In the ARMED state, this means that the job has been sucessfully queued
56 into the event or work queue. In the ARMED state, the job can be read
57 from with ns_job_get_*, but it cannot be altered with ns_job_set_*. If
58 a job could be altered while queued, this could cause issues with the
59 intent of what the job should do (set_data, set_cb, set_done_cb) etc.
60
61 A job that is ARMED and queued can NOT be removed from the queue, or
62 stopped from running. This is a point of no return!
63
64 RUNNING
65 In the RUNNING state, the job is in the process of executing the
66 callback that the job contains. While RUNNING, the thread that is
67 executing the callback may call ns_job_done, ns_job_rearm, ns_job_get_*
68 and ns_job_set_* upon the job. Note, that calling both ns_job_done and
69 ns_job_rearm from the callback, as the 'done' is a 'stronger' action we
70 will delete the job even though rearm was also called.
71
72 While RUNNING other threads (ie, not the worker thread executing the
73 callback) may only call ns_job_get_* upon the job. Due to the design of
74 the synchronisation underneath, this will block until the execution of
75 the callback, so for all intents and purposes by the time the external
76 thread is able to call ns_job_get_*, the job will have moved to
77 NEEDS_DELETE, NEEDS_ARM or WAITING.
78
79 NEEDS_DELETE
80 When you call ns_job_done, this marks the job as NEEDS_DELETE. The
81 deletion actually occurs at 'some later point'. When a job is set to
82 NEEDS_DELETE, you may not call any of the ns_job_get_* and ns_job_set_*
83 functions on the job.
84
85 DELETED
86 This state only exists on the job briefly. This means we are in the
87 process of deleting the job internally. We execute the ns_job_done_cb
88 at this point, so that the user may clean up and free any data as
89 required. Only the ns_job_done_cb thread may access the job at this
90 point.
91
92 Putting it all together
93 This state machine encourages certain types of work flows with jobs.
94 This is because the current states are opaque to the caller, and are
95 enforced inside of nunc-stans. The most obviously side effect of a
96 state machine violation is a ASSERT failure with -DDEBUG, or PR_FAILURE
97 from get()/set(). This encourages certain practices:
98
99 · Only single threads should be accessing jobs. This prevents races and
100 sync issues.
101
102 · Data and variables should exist in a single job. Avoid shared (heap)
103 memory locations!
104
105 · Changing jobs should only happen from within the callback, as you can
106 guarantee a consistent state without needing to spin/block on
107 ns_job_set_*.
108
109 · You may not need mutexes on your data or thread locals, as the job
110 provides the correct cpu synchronisation guarantees. Consider that
111 each job takes a 'root' data node, then all other allocated variables
112 are referenced there only by the single thread. You can now dispose
113 of mutexes, as the job will guarantee the synchronisation of this
114 data.
115
116 · Jobs work well if stack variables are used inside the callback
117 functions, rather than heap.
118
119 Some work flows that don't work well here:
120
121 · Having threads alter in-flight jobs. This causes race conditions and
122 inconsistencies.
123
124 · Sharing heap data via pointers in jobs. This means you need a mutex
125 on the data, which causes a serialisation point: Why bother with
126 thread pools if you are just going to serialise on some data points
127 anyway!
128
129 · Modifying jobs and what they handle. Don't do it! Just ns_job_done on
130 the job, and create a new one that matches what you want to do.
131
132 · Map reduce: Nunc-Stans doesn't provide a good way to aggregate data
133 on the return, IE reduce. You may need to provide a queue or some
134 other method to reduce if you were interested in this.
135
136 Examples
137 Inside of the nunc-stans project, the tests/cmocka/stress_test.c code
138 is a good example of a socket server and socket client using nunc-stans
139 that adheres to these principles.
140
141
142
143Version 1.3.8.4 Thu Mar 14 2019 md_docs_job-safety(3)