1STORAGE.CONF(5)           InterNetNews Documentation           STORAGE.CONF(5)
2
3
4

NAME

6       storage.conf - Configuration file for storage manager
7

DESCRIPTION

9       The file pathetc/storage.conf contains the rules to be used in
10       assigning articles to different storage methods.  These rules determine
11       where incoming articles will be stored.
12
13       The storage manager is a unified interface between INN and a variety of
14       different storage methods, allowing the news administrator to choose
15       between different storage methods with different trade-offs (or even
16       use several at the same time for different newsgroups, or articles of
17       different sizes).  The rest of INN need not care what type of storage
18       method was used for a given article; the storage manager will figure
19       this out automatically when that article is retrieved via the storage
20       API.  Note that you may also want to see the options provided in
21       inn.conf(5) regarding article storage.
22
23       The storage.conf file consists of a series of storage method entries.
24       Blank lines and lines beginning with a number sign ("#") are ignored.
25       The maximum number of characters in each line is 255.  The order of
26       entries in this file is important, see below.
27
28       Each entry specifies a storage method and a set of rules.  Articles
29       which match all of the rules of a storage method entry will be stored
30       using that storage method; if an article matches multiple storage
31       method entries, the first one will be used.  Each entry is formatted as
32       follows:
33
34           method <methodname> {
35               class: <storage_class>
36               newsgroups: <wildmat>
37               size: <minsize>[,<maxsize>]
38               expires: <mintime>[,<maxtime>]
39               options: <options>
40               exactmatch: <bool>
41           }
42
43       If spaces or tabs are included in a value, that value must be enclosed
44       in double quotes ("").  If either a number sign ("#") or a double quote
45       are meant to be included verbatim in a value, they should be escaped
46       with "\".
47
48       <methodname> is the name of a storage method to use for articles which
49       match the rules of this entry.  The currently available storage methods
50       are:
51
52           cnfs
53           timecaf
54           timehash
55           tradspool
56           trash
57
58       See the "STORAGE METHODS" section below for more details.
59
60       The meanings of the keys in each storage method entry are as follows:
61
62       class: <storage_class>
63           An identifier for this storage method entry.  <storage_class>
64           should be a number between 0 and 255.  It should be unique across
65           all of the entries in this file.  It is mainly used for specifying
66           expiration times by storage class as described in expire.ctl(5);
67           "timehash" and "timecaf" will also set the top-level directory in
68           which articles accepted by this storage class are stored.  The
69           assignment of a particular number to a storage class is arbitrary
70           but permanent (since it is used in storage tokens).  Storage
71           classes can be for instance numbered sequentially in storage.conf.
72
73       newsgroups: <wildmat>
74           What newsgroups are stored using this storage method.  <wildmat> is
75           a uwildmat pattern which is matched against the newsgroups an
76           article is posted to.  If storeonxref in inn.conf is true, this
77           pattern will be matched against the newsgroup names in the Xref
78           header field body; otherwise, it will be matched against the
79           newsgroup names in the Newsgroups header field body (see
80           inn.conf(5) for discussion of the differences between these
81           possibilities).  Poison wildmat expressions (expressions starting
82           with "@") are allowed and can be used to exclude certain group
83           patterns: articles crossposted to poisoned newsgroups will not be
84           stored using this storage method.  The <wildmat> pattern is matched
85           in order.
86
87           There is no default newsgroups pattern; if an entry should match
88           all newsgroups, use an explicit "newsgroups: *".
89
90       size: <minsize>[,<maxsize>]
91           A range of article sizes (in bytes) which should be stored using
92           this storage method.  If <maxsize> is 0 or not given, the upper
93           size of articles is limited only by maxartsize in inn.conf.  The
94           size: field is optional and may be omitted entirely if you want
95           articles of any size to be stored in this storage method (if, of
96           course, these articles fulfill all the other requirements of this
97           storage method entry).  By default, <minsize> is set to 0.
98
99       expires: <mintime>[,<maxtime>]
100           A range of article expiration times which should be stored using
101           this storage method.  Be careful; this is less useful than it may
102           appear at first.  This is based only on the Expires header field of
103           the article, not on any local expiration policies or anything in
104           expire.ctl!  If <mintime> is non-zero, then this entry will not
105           match any article without an Expires header field.  This key is
106           therefore only really useful for assigning articles with requested
107           longer expire times to a separate storage method.  Articles only
108           match if the time until expiration (that is to say, the amount of
109           time into the future that the Expires header field of the article
110           requests that it remain around) falls in the interval specified by
111           <mintime> and <maxtime>.
112
113           The format of these parameters is "0d0h0m0s" (days, hours, minutes,
114           and seconds into the future).  If <maxtime> is "0s" or is not
115           specified, there is no upper bound on expire times falling into
116           this entry (note that this key has no effect on when the article
117           will actually be expired, but only on whether or not the article
118           will be stored using this storage method).  This field is also
119           optional and may be omitted entirely if you do not want to store
120           articles according to their Expires header field, if any.
121
122           A <mintime> value greater than "0s" implies that this storage
123           method won't match any article without an Expires header field.
124
125       options: <options>
126           This key is for passing special options to storage methods that
127           require them (currently only "cnfs").  See the "STORAGE METHODS"
128           section below for a description of its use.
129
130       exactmatch: <bool>
131           If this key is set to true, all the newsgroups in the Newsgroups
132           header field body of incoming articles will be examined to see if
133           they match newsgroups patterns.  (Normally, any non-zero number of
134           matching newsgroups is sufficient, provided no newsgroup matches a
135           poison wildmat as described above.)  This is a boolean value;
136           "true", "yes" and "on" are usable to enable this key.  The case of
137           these values is not significant.  The default is false.
138
139       If an article matches all of the constraints of an entry, it is stored
140       via that storage method and is associated with that <storage_class>.
141       This file is scanned in order and the first matching entry is used to
142       store the article.
143
144       If an article does not match any entry, either by being posted to a
145       newsgroup which does not match any of the <wildmat> patterns or by
146       being outside the size and expires ranges of all entries whose
147       newsgroups pattern it does match, the article is not stored and is
148       rejected by innd.  When this happens, the error message:
149
150           cant store article: no matching entry in storage.conf
151
152       is logged to syslog.  If you want to silently drop articles matching
153       certain newsgroup patterns or size or expires ranges, assign them to
154       the "trash" storage method rather than having them not match any
155       storage method entry.
156

STORAGE METHODS

158       Currently, there are five storage methods available.  Each method has
159       its pros and cons; you can choose any mixture of them as is suitable
160       for your environment.  Note that each method has an attribute
161       EXPENSIVESTAT which indicates whether checking the existence of an
162       article is expensive or not.  This is used to run expireover(8).
163
164       cnfs
165           The "cnfs" storage method stores articles in large cyclic buffers
166           (CNFS stands for Cyclic News File System).  Articles are stored in
167           CNFS buffers in arrival order, and when the buffer fills, it wraps
168           around to the beginning and stores new articles over the top of the
169           oldest articles in the buffer.  The expire time of articles stored
170           in CNFS buffers is therefore entirely determined by how long it
171           takes the buffer to wrap around, which depends on how quickly data
172           is being stored in it.  (This method is therefore said to have
173           self-expire functionality.  It also means that when an article is
174           cancelled, the cycbuff doesn't go back and use space until it rolls
175           over and the whole cycbuff starts being reused.)  EXPENSIVESTAT is
176           false for this method.
177
178           CNFS has its own configuration file, cycbuff.conf, which describes
179           some subtleties to the basic description given above.  Storage
180           method entries for the "cnfs" storage method must have an options:
181           field specifying the metacycbuff into which articles matching that
182           entry should be stored; see cycbuff.conf(5) for details on
183           metacycbuffs.
184
185           Advantages: By far the fastest of all storage methods (except for
186           "trash"), since it eliminates the overhead of dealing with a file
187           system and creating new files.  Unlike all other storage methods,
188           it does not require manual article expiration.  With CNFS, the
189           server will never throttle itself due to a full spool disk, and
190           groups are restricted to just the buffer files given so that they
191           can never use more than the amount of disk space allocated to them.
192
193           Disadvantages: Article retention times are more difficult to
194           control because old articles are overwritten automatically.
195           Attacks on Usenet, such as flooding or massive amounts of spam, can
196           result in wanted articles expiring much faster than intended (with
197           no warning).
198
199       timecaf
200           This method stores multiple articles in one file, whose name is
201           based on the article's arrival time and the storage class.  The
202           file name will be:
203
204               <patharticles>/timecaf-nn/bb/aacc.CF
205
206           where "nn" is the hexadecimal value of <storage_class>, "bb" and
207           "aacc" are the hexadecimal components of the arrival time, and "CF"
208           is a hardcoded extension.  (The arrival time, in seconds since the
209           epoch, is converted to hexadecimal and interpreted as 0xaabbccdd,
210           with "aa", "bb", and "cc" used to build the path.)  This method
211           does not have self-expire functionality (meaning expire has to run
212           periodically to delete old articles, as well as cancelled articles
213           if immediatecancel is not set to true in inn.conf).  EXPENSIVESTAT
214           is false for this method.
215
216           Advantages: It is roughly four times faster than "timehash" for
217           article writes, since much of the file system overhead is bypassed,
218           while still retaining the same fine control over article retention
219           time.
220
221           Disadvantages: Using this method means giving up all but the most
222           careful manually fiddling with the article spool; in this aspect,
223           it looks like "cnfs".  As one of the newer and least widely used
224           storage types, "timecaf" has not been as thoroughly tested as the
225           other methods.
226
227       timehash
228           This method is very similar to "timecaf" except that each article
229           is stored in a separate file.  The name of the file for a given
230           article will be:
231
232               <patharticles>/time-nn/bb/cc/yyyy-aadd
233
234           where "nn" is the hexadecimal value of <storage_class>, "yyyy" is a
235           hexadecimal sequence number, and "bb", "cc", and "aadd" are
236           components of the arrival time in hexadecimal (the arrival time is
237           interpreted as documented above under "timecaf").  This method does
238           not have self-expire functionality.  Cancelled articles are removed
239           immediately.  EXPENSIVESTAT is true for this method.
240
241           Advantages: Heavy traffic groups do not cause bottlenecks, and a
242           fine control of article retention time is still possible.
243
244           Disadvantages: The ability to easily find all articles in a given
245           newsgroup and manually fiddle with the article spool is lost, and
246           INN still suffers from speed degradation due to file system
247           overhead (creating and deleting individual files is a slow
248           operation).
249
250       tradspool
251           Traditional spool, or "tradspool", is the traditional news article
252           storage format.  Each article is stored in an individual text file
253           named:
254
255               <patharticles>/news/group/name/nnnnn
256
257           where "news/group/name" is the name of the newsgroup to which the
258           article was posted with each period changed to a slash, and "nnnnn"
259           is the sequence number of the article in that newsgroup.  For
260           crossposted articles, the article is linked into each newsgroup to
261           which it is crossposted (using either hard or symbolic links).
262           This is the way versions of INN prior to 2.0 stored all articles,
263           as well as being the article storage format used by C News and
264           earlier news systems.  This method does not have self-expire
265           functionality.  Cancelled articles are removed immediately.
266           EXPENSIVESTAT is true for this method.
267
268           Advantages: It is widely used and well-understood; it can read
269           article spools written by older versions of INN and it is
270           compatible with all third-party INN add-ons.  This storage
271           mechanism provides easy and direct access to the articles stored on
272           the server and makes writing programs that fiddle with the news
273           spool very easy, and gives fine control over article retention
274           times.
275
276           Disadvantages: It takes a very fast file system and I/O system to
277           keep up with current Usenet traffic volumes due to file system
278           overhead.  Groups with heavy traffic tend to create a bottleneck
279           because of inefficiencies in storing large numbers of article files
280           in a single directory.  It requires a nightly expire program to
281           delete old articles out of the news spool, a process that can slow
282           down the server for several hours or more.
283
284       trash
285           This method silently discards all articles stored in it.  Its only
286           real uses are for testing and for silently discarding articles
287           matching a particular storage method entry (for whatever reason).
288           Articles stored in this method take up no disk space and can never
289           be retrieved, so this method has self-expire functionality of a
290           sort.  EXPENSIVESTAT is false for this method.
291

EXAMPLES

293       The following sample storage.conf file would store all articles posted
294       to alt.binaries.* in the "BINARIES" CNFS metacycbuff, all articles over
295       roughly 50 KB in any other hierarchy in the "LARGE" CNFS metacycbuff,
296       all other articles in alt.* in one timehash class, and all other
297       articles in any newsgroups in a second timehash class, except for the
298       internal.* hierarchy which is stored in traditional spool format.
299
300           method tradspool {
301               class: 1
302               newsgroups: internal.*
303           }
304           method cnfs {
305               class: 2
306               newsgroups: alt.binaries.*
307               options: BINARIES
308           }
309           method cnfs {
310               class: 3
311               newsgroups: *
312               size: 50000
313               options: LARGE
314           }
315           method timehash {
316               class: 4
317               newsgroups: alt.*
318           }
319           method timehash {
320               class: 5
321               newsgroups: *
322           }
323
324       Notice that the last storage method entry will catch everything.  This
325       is a good habit to get into; make sure that you have at least one
326       catch-all entry just in case something you did not expect falls through
327       the cracks.  Notice also that the special rule for the internal.*
328       hierarchy is first, so it will catch even articles crossposted to
329       alt.binaries.* or over 50 KB in size.
330
331       As for poison wildmat expressions, if you have for instance an article
332       crossposted between misc.foo and misc.bar, the pattern:
333
334           misc.*,!misc.bar
335
336       will match that article whereas the pattern:
337
338           misc.*,@misc.bar
339
340       will not match that article.  An article posted only to misc.bar will
341       fail to match either pattern.
342
343       Usually, high-volume groups and groups whose articles do not need to be
344       kept around very long (binaries groups, *.jobs*, news.lists.filters,
345       etc.) are stored in CNFS buffers.  Use the other methods (or CNFS
346       buffers again) for everything else.  However, it is as often as not
347       most convenient to keep in "tradspool" special hierarchies like local
348       hierarchies and hierarchies that should never expire or through the
349       spool of which you need to go manually.
350

HISTORY

352       Written by Katsuhiro Kondou <kondou@nec.co.jp> for InterNetNews.
353       Rewritten into POD by Julien Elie.
354

SEE ALSO

356       cycbuff.conf(5), expire.ctl(5), expireover(8), inn.conf(5), innd(8),
357       libinn_uwildmat(3).
358
359
360
361INN 2.7.0                         2022-07-10                   STORAGE.CONF(5)
Impressum