1PULLNEWS(1) InterNetNews Documentation PULLNEWS(1)
2
3
4
6 pullnews - Pull news from multiple news servers and feed it to another
7
9 pullnews [-BhnOqRx] [-a hashfeed] [-b fraction] [-c config] [-C width]
10 [-d level] [-f fraction] [-F fakehop] [-g groups] [-G newsgroups] [-H
11 headers] [-k checkpt] [-l logfile] [-m header_pats] [-M num] [-N
12 timeout] [-p port] [-P hop_limit] [-Q level] [-r file] [-s to-
13 server[:port]] [-S max-run] [-t retries] [-T connect-pause] [-w num]
14 [-z article-pause] [-Z group-pause] [from-server ...]
15
17 The "Net::NNTP" module must be installed. This module is available as
18 part of the libnet distribution and comes with recent versions of Perl.
19 For older versions of Perl, you can download it from
20 <http://www.cpan.org/>.
21
23 pullnews reads a config file named pullnews.marks, and connects to the
24 upstream servers given there as a reader client. This file is looked
25 for in pathdb when pullnews is run as the user set in runasuser in
26 inn.conf (which is by default the "news" user); otherwise, this file is
27 looked for in the running user's home directory.
28
29 By default, pullnews connects to all servers listed in the
30 configuration file, but you can limit pullnews to specific servers by
31 listing them on the command line: a whitespace-separated list of server
32 names can be specified, like from-server for one of them. For each
33 server it connects to, it pulls over articles and feeds them to the
34 destination server via the IHAVE or POST commands. This means that the
35 system pullnews is run on must have feeding access to the destination
36 news server.
37
38 pullnews is designed for very small sites that do not want to bother
39 setting up traditional peering and is not meant for handling large
40 feeds.
41
43 -a hashfeed
44 This option is a deterministic way to control the flow of articles
45 and to split a feed. The hashfeed parameter must be in the form
46 "value/mod" or "start-end/mod". The Message-ID of each article is
47 hashed using MD5, which results in a 128-bit hash. The lowest
48 32 bits are then taken by default as the hashfeed value (which is
49 an integer). If the hashfeed value modulus "mod" plus one equals
50 "value" or is between "start" and "end", pullnews will feed the
51 article. All these numbers must be integers.
52
53 For instance:
54
55 pullnews -a 1/2 Feeds about 50% of all articles.
56 pullnews -a 2/2 Feeds the other 50% of all articles.
57
58 Another example:
59
60 pullnews -a 1-3/10 Feeds about 30% of all articles.
61 pullnews -a 4-5/10 Feeds about 20% of all articles.
62 pullnews -a 6-10/10 Feeds about 50% of all articles.
63
64 You can use an extended syntax of the form "value/mod:offset" or
65 "start-end/mod:offset" (using an underscore "_" instead of a colon
66 ":" is also recognized). As MD5 generates a 128-bit return value,
67 it is possible to specify from which byte-offset the 32-bit integer
68 used by hashfeed starts. The default value for "offset" is ":0"
69 and thirteen overlapping values from ":0" to ":12" can be used.
70 Only up to four totally independent values exist: ":0", ":4", ":8"
71 and ":12".
72
73 Therefore, it allows generating a second level of deterministic
74 distribution. Indeed, if pullnews feeds "1/2", it can go on
75 splitting thanks to "1-3/9:4" for instance. Up to four levels of
76 deterministic distribution can be used.
77
78 The algorithm is compatible with the one used by Diablo 5.1 and up.
79
80 -b fraction
81 Backtrack on server numbering reset. Specify the proportion (0.0
82 to 1.0) of a group's articles to pull when the server's article
83 number is less than our high for that group. When fraction is 1.0,
84 pull all the articles on a renumbered server. The default is to do
85 nothing.
86
87 -B Feed is header-only, that is to say pullnews only feeds the headers
88 of the articles, plus one blank line. It adds the Bytes header
89 field if the article does not already have one, and keeps the body
90 only if the article is a control article.
91
92 -c config
93 Normally, the config file is stored in pullnews.marks in pathdb
94 when pullnews is run as the news user, or otherwise in the running
95 user's home directory. If -c is given, config will be used as the
96 config file instead. This is useful if you're running pullnews as
97 a system user on an automated basis out of cron or as an individual
98 user, rather than the news user.
99
100 See "CONFIG FILE" below for the format of this file.
101
102 -C width
103 Use width characters per line for the progress table. The default
104 value is 50.
105
106 -d level
107 Set the debugging level to the integer level; more debugging output
108 will be logged as this increases. The default value is 0.
109
110 -f fraction
111 This changes the proportion of articles to get from each group to
112 fraction and should be in the range 0.0 to 1.0 (1.0 being the
113 default).
114
115 -F fakehop
116 Prepend fakehop as a host to the Path header field body of articles
117 fed.
118
119 -g groups
120 Specify a collection of groups to get. groups is a list of
121 newsgroups separated by commas (only commas, no spaces). Each
122 group must be defined in the config file, and only the remote hosts
123 that carry those groups will be contacted. Note that this is a
124 simple list of groups, not a wildmat expression, and wildcards are
125 not supported.
126
127 -G newsgroups
128 Add the comma-separated list of groups newsgroups to each server in
129 the configuration file (see also -g and -w).
130
131 -h Print a usage message and exit.
132
133 -H headers
134 Remove these named header fields (colon-separated list) from fed
135 articles.
136
137 -k checkpt
138 Checkpoint (save) the config file every checkpt articles (default
139 is 0, that is to say at the end of the session).
140
141 -l logfile
142 Log progress/stats to logfile (default is "stdout").
143
144 -m header_pats
145 Feed an article based on header field body matching. The argument
146 is a number of whitespace-separated tuples (each tuple being a
147 colon-separated header field name and regular expression). For
148 instance:
149
150 -m "Hdr1:regexp1 !Hdr2:regexp2 #Hdr3:regexp3 !#Hdr4:regexp4"
151
152 specifies that the article will be passed only if the "Hdr1" header
153 field body matches "regexp1" and the "Hdr2" header field body does
154 not match "regexp2". Besides, if the "Hdr3" header field body
155 matches "regexp3", that header is removed; and if the "Hdr4" header
156 field body does not match "regexp4", that header is removed.
157
158 -M num
159 Specify the maximum number of articles (per group) to process. The
160 default is to process all new articles. See also -f.
161
162 -n Do nothing but read articles -- does not feed articles downstream,
163 writes no rnews file, does not update the config file.
164
165 -N timeout
166 Specify the timeout length, as timeout seconds, when establishing
167 an NNTP connection.
168
169 -O Use an optimized mode: pullnews checks whether the article already
170 exists on the downstream server, before downloading it. It may
171 help for huge articles or a slow link to upstream hosts.
172
173 -p port
174 Connect to the destination news server on a port other than the
175 default of 119. This option does not change the port used to
176 connect to the source news servers.
177
178 -P hop_limit
179 Restrict feeding an article based on the number of hops it has
180 already made. Count the hops in the Path header field body
181 (hop_count), feeding the article only when hop_limit is "+num" and
182 hop_count is more than num; or hop_limit is "-num" and hop_count is
183 less than num.
184
185 -q Print out less status information while running.
186
187 -Q level
188 Set the quietness level ("-Q 2" is equivalent to "-q"). The higher
189 this value, the less gets logged. The default is 0.
190
191 -r file
192 Rather than feeding the downloaded articles to a destination
193 server, instead create a batch file that can later be fed to a
194 server using rnews. See rnews(1) for more information about the
195 batch file format.
196
197 -R Be a reader (use MODE READER and POST commands) to the downstream
198 server. The default is to use the IHAVE command.
199
200 -s to-server[:port]
201 Normally, pullnews will feed the articles it retrieves to the news
202 server running on localhost. To connect to a different host,
203 specify a server with the -s flag. You can also specify the port
204 with this same flag or use -p.
205
206 -S max-run
207 Specify the maximum time max-run in seconds for pullnews to run.
208
209 -t retries
210 The maximum number (retries) of attempts to connect to a server
211 (see also -T). The default is 0.
212
213 -T connect-pause
214 Pause connect-pause seconds between connection retries (see also
215 -t). The default is 1.
216
217 -w num
218 Set each group's high water mark (last received article number) to
219 num. If num is negative, calculate Current+num instead (i.e. get
220 the last num articles). Therefore, a num of 0 will re-get all
221 articles on the server; whereas a num of "-0" will get no old
222 articles, setting the water mark to Current (the most recent
223 article on the server).
224
225 -x If the -x flag is used, an Xref header field is added to any
226 article that lacks one. It can be useful for instance if articles
227 are fed to a news server which has xrefslave set in inn.conf.
228
229 -z article-pause
230 Sleep article-pause seconds between articles. The default is 0.
231
232 -Z group-pause
233 Sleep group-pause seconds between groups. The default is 0.
234
236 The config file for pullnews is divided into blocks, one block for each
237 remote server to connect to. A block begins with the host line (which
238 must have no leading whitespace) and contains just the hostname of the
239 remote server, optionally followed by authentication details (username
240 and password for that server). Note that authentication details can
241 also be provided for the downstream server (a host line could be added
242 for it in the configuration file, with no newsgroup to fetch).
243
244 Following the host line should be one or more newsgroup lines which
245 start with whitespace followed by the name of a newsgroup to retrieve.
246 Only one newsgroup should be listed on each line.
247
248 pullnews will update the config file to include the time the group was
249 last checked and the highest numbered article successfully retrieved
250 and transferred to the destination server. It uses this data to avoid
251 doing duplicate work the next time it runs.
252
253 The full syntax is:
254
255 <host> [<username> <password>]
256 <group> [<time> <high>]
257 <group> [<time> <high>]
258
259 where the <host> line must not have leading whitespace and the <group>
260 lines must.
261
262 A typical configuration file would be:
263
264 # Format group date high
265 data.pa.vix.com
266 rec.bicycles.racing 908086612 783
267 rec.humor.funny 908086613 18
268 comp.programming.threads
269 nnrp.vix.com pull sekret
270 comp.std.lisp
271
272 Note that an earlier run of pullnews has filled in details about the
273 last article downloads from the two rec.* groups. The two comp.*
274 groups were just added by the user and have not yet been checked.
275
276 The nnrp.vix.com server requires authentication, and pullnews will use
277 the username "pull" and the password "sekret".
278
280 pathbin/pullnews
281 The Perl script itself used to pull news from upstream servers and
282 feed it to another news server.
283
284 pathdb/pullnews.marks or ~/pullnews.marks
285 The default config file. It is stored in pullnews.marks in pathdb
286 when pullnews is run as the news user, or otherwise in the running
287 user's home directory.
288
290 pullnews was written by James Brister for INN. The documentation was
291 rewritten in POD by Russ Allbery <eagle@eyrie.org>.
292
293 Geraint A. Edwards greatly improved pullnews, adding no more than
294 16 new recognized flags, fixing some bugs and integrating the
295 backupfeed contrib script by Kai Henningsen, adding again 6 other
296 flags.
297
299 incoming.conf(5), rnews(1).
300
301
302
303INN 2.7.0 2022-07-10 PULLNEWS(1)