1STORAGE.CONF(5) InterNetNews Documentation STORAGE.CONF(5)
2
3
4
6 storage.conf - Configuration file for storage manager
7
9 The file pathetc/storage.conf contains the rules to be used in
10 assigning articles to different storage methods. These rules determine
11 where incoming articles will be stored.
12
13 The storage manager is a unified interface between INN and a variety of
14 different storage methods, allowing the news administrator to choose
15 between different storage methods with different trade-offs (or even
16 use several at the same time for different newsgroups, or articles of
17 different sizes). The rest of INN need not care what type of storage
18 method was used for a given article; the storage manager will figure
19 this out automatically when that article is retrieved via the storage
20 API. Note that you may also want to see the options provided in
21 inn.conf(5) regarding article storage.
22
23 The storage.conf file consists of a series of storage method entries.
24 Blank lines and lines beginning with a number sign ("#") are ignored.
25 The maximum number of characters in each line is 255. The order of
26 entries in this file is important, see below.
27
28 Each entry specifies a storage method and a set of rules. Articles
29 which match all of the rules of a storage method entry will be stored
30 using that storage method; if an article matches multiple storage
31 method entries, the first one will be used. Each entry is formatted as
32 follows:
33
34 method <methodname> {
35 class: <storage_class>
36 newsgroups: <wildmat>
37 size: <minsize>[,<maxsize>]
38 expires: <mintime>[,<maxtime>]
39 options: <options>
40 exactmatch: <bool>
41 }
42
43 If spaces or tabs are included in a value, that value must be enclosed
44 in double quotes (""). If either a number sign ("#") or a double quote
45 are meant to be included verbatim in a value, they should be escaped
46 with "\".
47
48 <methodname> is the name of a storage method to use for articles which
49 match the rules of this entry. The currently available storage methods
50 are:
51
52 cnfs
53 timecaf
54 timehash
55 tradspool
56 trash
57
58 See the "STORAGE METHODS" section below for more details.
59
60 The meanings of the keys in each storage method entry are as follows:
61
62 class: <storage_class>
63 An identifier for this storage method entry. <storage_class>
64 should be a number between 0 and 255. It should be unique across
65 all of the entries in this file. It is mainly used for specifying
66 expiration times by storage class as described in expire.ctl(5);
67 "timehash" and "timecaf" will also set the top-level directory in
68 which articles accepted by this storage class are stored. The
69 assignment of a particular number to a storage class is arbitrary
70 but permanent (since it is used in storage tokens). Storage
71 classes can be for instance numbered sequentially in storage.conf.
72
73 newsgroups: <wildmat>
74 What newsgroups are stored using this storage method. <wildmat> is
75 a uwildmat pattern which is matched against the newsgroups an
76 article is posted to. If storeonxref in inn.conf is true, this
77 pattern will be matched against the newsgroup names in the Xref
78 header field body; otherwise, it will be matched against the
79 newsgroup names in the Newsgroups header field body (see
80 inn.conf(5) for discussion of the differences between these
81 possibilities). Poison wildmat expressions (expressions starting
82 with "@") are allowed and can be used to exclude certain group
83 patterns: articles crossposted to poisoned newsgroups will not be
84 stored using this storage method. The <wildmat> pattern is matched
85 in order.
86
87 There is no default newsgroups pattern; if an entry should match
88 all newsgroups, use an explicit "newsgroups: *".
89
90 size: <minsize>[,<maxsize>]
91 A range of article sizes (in bytes) which should be stored using
92 this storage method. If <maxsize> is 0 or not given, the upper
93 size of articles is limited only by maxartsize in inn.conf. The
94 size: field is optional and may be omitted entirely if you want
95 articles of any size to be stored in this storage method (if, of
96 course, these articles fulfill all the other requirements of this
97 storage method entry). By default, <minsize> is set to 0.
98
99 expires: <mintime>[,<maxtime>]
100 A range of article expiration times which should be stored using
101 this storage method. Be careful; this is less useful than it may
102 appear at first. This is based only on the Expires: header of the
103 article, not on any local expiration policies or anything in
104 expire.ctl! If <mintime> is non-zero, then this entry will not
105 match any article without an Expires: header. This key is
106 therefore only really useful for assigning articles with requested
107 longer expire times to a separate storage method. Articles only
108 match if the time until expiration (that is to say, the amount of
109 time into the future that the Expires: header of the article
110 requests that it remain around) falls in the interval specified by
111 <mintime> and <maxtime>.
112
113 The format of these parameters is "0d0h0m0s" (days, hours, minutes,
114 and seconds into the future). If <maxtime> is "0s" or is not
115 specified, there is no upper bound on expire times falling into
116 this entry (note that this key has no effect on when the article
117 will actually be expired, but only on whether or not the article
118 will be stored using this storage method). This field is also
119 optional and may be omitted entirely if you do not want to store
120 articles according to their Expires: header, if any.
121
122 A <mintime> value greater than "0s" implies that this storage
123 method won't match any article without an Expires: header.
124
125 options: <options>
126 This key is for passing special options to storage methods that
127 require them (currently only "cnfs"). See the "STORAGE METHODS"
128 section below for a description of its use.
129
130 exactmatch: <bool>
131 If this key is set to true, all the newsgroups in the Newsgroups:
132 header of incoming articles will be examined to see if they match
133 newsgroups patterns. (Normally, any non-zero number of matching
134 newsgroups is sufficient, provided no newsgroup matches a poison
135 wildmat as described above.) This is a boolean value; "true",
136 "yes" and "on" are usable to enable this key. The case of these
137 values is not significant. The default is false.
138
139 If an article matches all of the constraints of an entry, it is stored
140 via that storage method and is associated with that <storage_class>.
141 This file is scanned in order and the first matching entry is used to
142 store the article.
143
144 If an article does not match any entry, either by being posted to a
145 newsgroup which does not match any of the <wildmat> patterns or by
146 being outside the size and expires ranges of all entries whose
147 newsgroups pattern it does match, the article is not stored and is
148 rejected by innd. When this happens, the error message:
149
150 cant store article: no matching entry in storage.conf
151
152 is logged to syslog. If you want to silently drop articles matching
153 certain newsgroup patterns or size or expires ranges, assign them to
154 the "trash" storage method rather than having them not match any
155 storage method entry.
156
158 Currently, there are five storage methods available. Each method has
159 its pros and cons; you can choose any mixture of them as is suitable
160 for your environment. Note that each method has an attribute
161 EXPENSIVESTAT which indicates whether checking the existence of an
162 article is expensive or not. This is used to run expireover(8).
163
164 cnfs
165 The "cnfs" storage method stores articles in large cyclic buffers
166 (CNFS stands for Cyclic News File System). Articles are stored in
167 CNFS buffers in arrival order, and when the buffer fills, it wraps
168 around to the beginning and stores new articles over the top of the
169 oldest articles in the buffer. The expire time of articles stored
170 in CNFS buffers is therefore entirely determined by how long it
171 takes the buffer to wrap around, which depends on how quickly data
172 is being stored in it. (This method is therefore said to have
173 self-expire functionality. It also means that when an article is
174 cancelled, the cycbuff doesn't go back and use space until it rolls
175 over and the whole cycbuff starts being reused.) EXPENSIVESTAT is
176 false for this method.
177
178 CNFS has its own configuration file, cycbuff.conf, which describes
179 some subtleties to the basic description given above. Storage
180 method entries for the "cnfs" storage method must have an options:
181 field specifying the metacycbuff into which articles matching that
182 entry should be stored; see cycbuff.conf(5) for details on
183 metacycbuffs.
184
185 Advantages: By far the fastest of all storage methods (except for
186 "trash"), since it eliminates the overhead of dealing with a file
187 system and creating new files. Unlike all other storage methods,
188 it does not require manual article expiration. With CNFS, the
189 server will never throttle itself due to a full spool disk, and
190 groups are restricted to just the buffer files given so that they
191 can never use more than the amount of disk space allocated to them.
192
193 Disadvantages: Article retention times are more difficult to
194 control because old articles are overwritten automatically.
195 Attacks on Usenet, such as flooding or massive amounts of spam, can
196 result in wanted articles expiring much faster than intended (with
197 no warning).
198
199 timecaf
200 This method stores multiple articles in one file, whose name is
201 based on the article's arrival time and the storage class. The
202 file name will be:
203
204 <patharticles>/timecaf-nn/bb/aacc.CF
205
206 where "nn" is the hexadecimal value of <storage_class>, "bb" and
207 "aacc" are the hexadecimal components of the arrival time, and "CF"
208 is a hardcoded extension. (The arrival time, in seconds since the
209 epoch, is converted to hexadecimal and interpreted as 0xaabbccdd,
210 with "aa", "bb", and "cc" used to build the path.) This method
211 does not have self-expire functionality (meaning expire has to run
212 periodically to delete old articles, as well as cancelled articles
213 if immediatecancel is not set to true in inn.conf). EXPENSIVESTAT
214 is false for this method.
215
216 Advantages: It is roughly four times faster than "timehash" for
217 article writes, since much of the file system overhead is bypassed,
218 while still retaining the same fine control over article retention
219 time.
220
221 Disadvantages: Using this method means giving up all but the most
222 careful manually fiddling with the article spool; in this aspect,
223 it looks like "cnfs". As one of the newer and least widely used
224 storage types, "timecaf" has not been as thoroughly tested as the
225 other methods.
226
227 timehash
228 This method is very similar to "timecaf" except that each article
229 is stored in a separate file. The name of the file for a given
230 article will be:
231
232 <patharticles>/time-nn/bb/cc/yyyy-aadd
233
234 where "nn" is the hexadecimal value of <storage_class>, "yyyy" is a
235 hexadecimal sequence number, and "bb", "cc", and "aadd" are
236 components of the arrival time in hexadecimal (the arrival time is
237 interpreted as documented above under "timecaf"). This method does
238 not have self-expire functionality. Cancelled articles are removed
239 immediately. EXPENSIVESTAT is true for this method.
240
241 Advantages: Heavy traffic groups do not cause bottlenecks, and a
242 fine control of article retention time is still possible.
243
244 Disadvantages: The ability to easily find all articles in a given
245 newsgroup and manually fiddle with the article spool is lost, and
246 INN still suffers from speed degradation due to file system
247 overhead (creating and deleting individual files is a slow
248 operation).
249
250 tradspool
251 Traditional spool, or "tradspool", is the traditional news article
252 storage format. Each article is stored in an individual text file
253 named:
254
255 <patharticles>/news/group/name/nnnnn
256
257 where "news/group/name" is the name of the newsgroup to which the
258 article was posted with each period changed to a slash, and "nnnnn"
259 is the sequence number of the article in that newsgroup. For
260 crossposted articles, the article is linked into each newsgroup to
261 which it is crossposted (using either hard or symbolic links).
262 This is the way versions of INN prior to 2.0 stored all articles,
263 as well as being the article storage format used by C News and
264 earlier news systems. This method does not have self-expire
265 functionality. Cancelled articles are removed immediately.
266 EXPENSIVESTAT is true for this method.
267
268 Advantages: It is widely used and well-understood; it can read
269 article spools written by older versions of INN and it is
270 compatible with all third-party INN add-ons. This storage
271 mechanism provides easy and direct access to the articles stored on
272 the server and makes writing programs that fiddle with the news
273 spool very easy, and gives fine control over article retention
274 times.
275
276 Disadvantages: It takes a very fast file system and I/O system to
277 keep up with current Usenet traffic volumes due to file system
278 overhead. Groups with heavy traffic tend to create a bottleneck
279 because of inefficiencies in storing large numbers of article files
280 in a single directory. It requires a nightly expire program to
281 delete old articles out of the news spool, a process that can slow
282 down the server for several hours or more.
283
284 trash
285 This method silently discards all articles stored in it. Its only
286 real uses are for testing and for silently discarding articles
287 matching a particular storage method entry (for whatever reason).
288 Articles stored in this method take up no disk space and can never
289 be retrieved, so this method has self-expire functionality of a
290 sort. EXPENSIVESTAT is false for this method.
291
293 The following sample storage.conf file would store all articles posted
294 to alt.binaries.* in the "BINARIES" CNFS metacycbuff, all articles over
295 roughly 50 KB in any other hierarchy in the "LARGE" CNFS metacycbuff,
296 all other articles in alt.* in one timehash class, and all other
297 articles in any newsgroups in a second timehash class, except for the
298 internal.* hierarchy which is stored in traditional spool format.
299
300 method tradspool {
301 class: 1
302 newsgroups: internal.*
303 }
304 method cnfs {
305 class: 2
306 newsgroups: alt.binaries.*
307 options: BINARIES
308 }
309 method cnfs {
310 class: 3
311 newsgroups: *
312 size: 50000
313 options: LARGE
314 }
315 method timehash {
316 class: 4
317 newsgroups: alt.*
318 }
319 method timehash {
320 class: 5
321 newsgroups: *
322 }
323
324 Notice that the last storage method entry will catch everything. This
325 is a good habit to get into; make sure that you have at least one
326 catch-all entry just in case something you did not expect falls through
327 the cracks. Notice also that the special rule for the internal.*
328 hierarchy is first, so it will catch even articles crossposted to
329 alt.binaries.* or over 50 KB in size.
330
331 As for poison wildmat expressions, if you have for instance an article
332 crossposted between misc.foo and misc.bar, the pattern:
333
334 misc.*,!misc.bar
335
336 will match that article whereas the pattern:
337
338 misc.*,@misc.bar
339
340 will not match that article. An article posted only to misc.bar will
341 fail to match either pattern.
342
343 Usually, high-volume groups and groups whose articles do not need to be
344 kept around very long (binaries groups, *.jobs*, news.lists.filters,
345 etc.) are stored in CNFS buffers. Use the other methods (or CNFS
346 buffers again) for everything else. However, it is as often as not
347 most convenient to keep in "tradspool" special hierarchies like local
348 hierarchies and hierarchies that should never expire or through the
349 spool of which you need to go manually.
350
352 Written by Katsuhiro Kondou <kondou@nec.co.jp> for InterNetNews.
353 Rewritten into POD by Julien Elie.
354
356 cycbuff.conf(5), expire.ctl(5), expireover(8), inn.conf(5), innd(8),
357 libinn_uwildmat(3).
358
359
360
361INN 2.6.5 2022-02-18 STORAGE.CONF(5)