1NETSTIFF(1) netstiff NETSTIFF(1)
2
3
4
6 netstiff - powerful and easy tool to check for Web and FTP updates
7
8
10 netstiff [options] [command]
11
12
14 Netstiff (formerly known as webdiff) is a powerful and easy-to-use tool
15 which checks for Web page and/or FTP site updates.
16
17 For the Web, updates are recognized using several test criteria (diff,
18 html, size, date, md5sum, regexp). The FTP update checker is only able
19 to diff on directory listings and files and to compare size and date of
20 files.
21
22 Without a given command, netstiff will check for updates of the speci‐
23 fied URIs and then print the changes. If no configuration file exists,
24 the configurator is launched instead.
25
26 Netstiff exits after all configured URIs are checked. Occuring warn‐
27 ings and errors leave a message in the log file (~/.netstiff/lastlog)
28 and on stderr. Use it with cron if you want to check for updates regu‐
29 larly.
30
31
33 You can only pass one command to netstiff. It has to be the last argu‐
34 ment in the argument list.
35
36 Commands may be shortened down to one character (e.g. c instead of con‐
37 figure). Leading dashes are ignored.
38
39 If you start netstiff without command, the full command will be used.
40
41 configure
42 Use this command if you want to start the configurator, the
43 interactive configuration tool of netstiff. Of course, you may
44 also edit the configuration file in ~/.netstiff/config by hand.
45 Using the configurator is recommended if you are a new netstiff
46 user, because it explains the possible test methods, validates
47 your regexps, etc. Nevertheless, the configuration file format
48 is very easy. See section CONFIGURATION FILE.
49 The configurator will not initialize the netstiff cache for
50 added URIs, i.e. it will not download anything. To do so, you
51 have to run netstiff update first. This is a feature.
52 If the config file does not exit, the configuration tool is
53 started automatically.
54
55 diff Use this command if you want to see the differences between two
56 versions of saved content (Web pages or meta data). See
57 diff(1).
58
59 The version after the last reset (or the initial version) and
60 the version of the last update will be compared.
61
62 full Use this command if you simply want netstiff to check for
63 updates and print the diff.
64
65 full is a simple replacement for the following sequence:
66 netstiff update > /dev/null
67 netstiff diff
68 netstiff reset
69
70 help Use this command to get usage information about netstiff. To be
71 honest, this manual page in conjunction with the configurator is
72 a better documentation.
73
74 reset Use this command after you noticed all differences with the diff
75 command (see above), so that diff will not show you the same
76 changes again and again.
77
78 update Use this command if you want netstiff to fetch the data from the
79 specified URIs and show you only those - one per line - that
80 have changed since your last update.
81
82 version
83 This command will display version number and copyright.
84
85
87 You may pass the following options.
88
89 --no-stderr, -S
90 Use this option to suppress warning and error messages on
91 stderr. Thus the messages can only be seen in the log file.
92
93 --workdir DIR, -W DIR
94 Use this option if you want to specify another working direc‐
95 tory. The working directory is the directory where netstiff
96 reads the configuration file, stores the downloaded data and
97 writes it logs. It defaults to ~/.netstiff. See also section
98 BUGS.
99
100
102 There is no special case to handle status codes other than 200. In
103 practice, netstiff will neither follow redirections nor will it notice
104 any 4xx or 5xx error code. The resulting error pages are treated as
105 usual Web pages. No logged message. Please check on your own.
106
107
109 You want to add a new URI netstiff should check for updates.
110 netstiff conf
111 The configurator is not described here. I know some weaknesses in
112 usability, but you can get along with it.
113
114 When you are seeing your shell prompt again, you know that netstiff
115 should retrieve an initial version of the Web page you specified.
116 netstiff update
117 After some weeks in the sun you want to see if something has changed.
118 So you let netstiff check for updates.
119 netstiff
120 It is printing an URI! Let's see the changes!
121 netstiff diff
122 Oh, it is so much, that it does not fit on a screen!
123 netstiff d | pager
124 Now you are satisfied because you read all the changes. So you finally
125 do
126 netstiff reset
127 and netstiff forgets about the changes.
128
129
131 There is no need to manually edit the configuration file WORKDIR/config
132 (usually ~/.netstiff/config), because netstiff configure should do the
133 job. But sometimes it is easier to edit a simple file than to browse
134 through menus, or you are writing another application that changes net‐
135 stiff settings. So it is useful to describe the file format here.
136
137
138 RULES
139 · Whitespace at the begin and end of each line is ignored.
140
141 · Empty lines are ignored.
142
143 · A line beginning with # is regarded as comment.
144
145 · A line beginning with + is regarded as option. The + is followed by
146 the option name, some whitespace and the option value.
147
148 · A line neither beginning with # nor + is regarded as URI. URIs
149 without scheme (https://, http://, ftp://) are recognized as HTTP
150 URIs.
151
152 · The configurator interprets a comment right above an URI as the
153 title of the URI.
154
155 · Options always apply to the first URI above. Options without URI
156 line above are global options and affect every URI that does not
157 override these specific options.
158
159
160 CONFIGURATION OPTIONS
161 The following options are generally available:
162
163 test sets the test method (or test criteria).
164 See section TEST METHODS for a description. Defaults to diff.
165
166 timeout
167 sets the timeout (in seconds) for TCP connections.
168 Defaults to 20.
169
170 The following options only affect HTTP URIs:
171
172 client set the user-agent string.
173 Some web sites check the HTTP header field User-Agent and dis‐
174 play different content for different agents. By setting this
175 field you can pretend to use Mozilla Firefox, for example.
176 Because many log analyzer tools for webmasters display statis‐
177 tics about that field, you may spread the word about netstiff by
178 setting this variable to the truth: netstiff. ;-)
179 Example: + client Mozilla/5.0 (X11; U; Linux i686; en-US;
180 rv:1.8.1.12) Gecko/20080208 Galeon/2.0.4
181 This option is not set by default.
182
183 lang sets the accepted languages.
184 Internationalized web sites offer there contents in different
185 languages and may check the HTTP header field Accept-Language.
186 It contains a list of languages (and sometimes extra information
187 like associated countries) sorted by priority. The best way to
188 get a good value is to copy and paste it from the preferences of
189 your web browser.
190 Example: de,en;q=0.9
191 This option is not set by default.
192
193 proxy sets HTTP proxy host and port. Must be in the form host:port.
194 Will fail if no port is given.
195
196 range sets the range (in bytes) to get from a server.
197 Use this option if you are only interested in the changes within
198 a small region of a big file on a HTTP server. Examples are
199 12000-12500 or 13000- (till the end).
200 The Range feature is not supported by all web servers or for
201 every content. That means, that some web servers send the whole
202 content instead of only the given range.
203 This option is not set by default.
204
205 referer
206 sets the referrer.
207 Some web sites check the HTTP header field Referer and refuse to
208 display the wished contents if it is not appropriately set.
209 When clicking on a link in an ordinary web browser, the referrer
210 is set to the URI, where you clicked on the link. By setting
211 this option to an URI, you can pretend clicking on a link on the
212 web page of this URI. Please do not use this option to `adver‐
213 tise' your own homepage (so-called referer spamming).
214 This option is not set by default.
215
216 The following options only affect the test method html:
217
218 htmlcmd
219 sets the command that is used to produce non-HTML human-readable
220 output. The command will be run on a temporary file.
221 Doing many experiments I got my best results using + htmlcmd
222 lynx -nolist -dump. Other dumpers had features, like justified
223 text or well-formatted tables, that turned out to be disadvan‐
224 tages when looking at the diffs.
225 This option is not set by default. If you use the html test
226 method then, a very simple mechanism will hide HTML tags. It is
227 possible to get good results doing that, but it is not likely
228 and thus not recommended to leave this option unset.
229
230 The following options only affect the test methods diff and html:
231
232 start, end
233 Motivation: Many modern or CMS-generated web pages have a
234 dynamic and a static part. For example, at the beginning of a
235 web page there is always a randomly chosen citation the author
236 liked. At the end there is a calendar showing the current date,
237 a weather analysis for the next days, and some other useless
238 stuff. The information you want to monitor for changes (the
239 static part) is situated between those dynamic parts. It is
240 very often possible to figure out textual anchors, that indicate
241 the start or the end of the static part.
242 Using this options you can set regular expressions to that
243 anchors. For example, if the last entry of the navigation bar
244 is Imprint and thereafter comes the static part, set + start
245 /Imprint/. I hope, you can imagine analogous examples for the
246 end option.
247 Note, that the regular expressions act on the unprocessed input
248 (e.g. HTML source code), also when using the html test method.
249 These options are not set by default.
250
251 The following options only affect FTP URIs:
252
253 passive
254 is a boolean option (value true or false, case-insensitive).
255 Passive mode (PASV) will not be used on FTP connections iff set
256 to false.
257 Defaults to true.
258
259
260 EXAMPLE
261 # this is my netstiff config file
262 + test html
263 + htmlcmd lynx -nolist -dump
264 + client netstiff
265 + lang de_DE
266 + timeout 6
267
268 # local usage statistics
269 http://localhost/stats.php
270 + start /Statistics/
271 + end /Generating page took/
272
273 # sbeyer's homepage
274 http://pkqs.net/~sbeyer/
275
276 # buggy scripts test
277 http://localhost/buggyscripts/test.cgi
278 + test /Internal Server Error/
279
280 # muetze's funny videos
281 ftp://foo:duff23@muetze.localnet/funnyvideos/
282 + passive false
283
284
286 The following test methods can be used:
287
288 date On HTTP URIs, this method downloads the HTTP header to check
289 when the file has last been modified. To make this feature
290 work, the server should response the Last-Modified header
291 entity. This behaviour can become useless when fetching some
292 dynamic web sites.
293 On FTP URIs, this method requests the last modification date of
294 the file on the FTP site to check when the file has last been
295 modified.
296
297 diff This method downloads the HTTP content, FTP file or FTP direc‐
298 tory listing and saves the two last versions. Later you can use
299 netstiff diff to take a look at a diff of these versions.
300
301 html This method acts like diff, but assumes to get HTML input and
302 preprocesses it to it more human-readable.
303 See also the description of the htmlcmd option in section CON‐
304 FIGURATION FILE / CONFIGURATION OPTIONS.
305 This method is not available on FTP URIs.
306
307 md5sum This method downloads the HTTP header to check if the MD5 sum
308 has changed. The server should response the Content-MD5 header
309 entity to make this method work.
310 Use this method on big binary files on HTTP sites and only if
311 the server supports it. (netstiff will tell you.)
312 This method is not available on FTP URIs.
313
314 size On HTTP URIs, this method downloads the HTTP header to check if
315 the file size has changed. This feature needs the server to
316 response the Content-Length header entity.
317 On FTP URIs, this method requests the size of the file on the
318 FTP site to check if it has changed.
319
320 /regexp/
321 This method downloads the HTTP content and checks if the given
322 regular expression matches or not. The URI is prompted (when
323 using update) iff this match status has changed.
324 This method is not available on FTP URIs.
325
326
328 The number of errors are returned. So exit code 0 is success.
329
330
332 The regular expression stuff is using the eval function of Ruby. This
333 means that you are able to do non-regex-related stuff using special
334 strings as `regular expressions'. This is a big security issue when
335 using netstiff as a backend for e.g. Web applications. So do NOT do it
336 and NEVER start netstiff on foreign, unchecked configurations (-W can
337 be dangerous).
338
339 Feel free to send feedback, bug reports, etc.
340
341
343 © 2004, 2007-2008 Stephan Beyer <s-beyer@gmx.net>, GNU GPL
344
345
346
347sbeyer 20080331 NETSTIFF(1)