1NETSTIFF(1)                        netstiff                        NETSTIFF(1)
2
3
4

NAME

6       netstiff - powerful and easy tool to check for Web and FTP updates
7
8

SYNOPSIS

10       netstiff [options] [command]
11
12

DESCRIPTION

14       Netstiff (formerly known as webdiff) is a powerful and easy-to-use tool
15       which checks for Web page and/or FTP site updates.
16
17       For the Web, updates are recognized using several test criteria  (diff,
18       html, size, date, md5sum, regexp).  The FTP update checker is only able
19       to diff on directory listings and files and to compare size and date of
20       files.
21
22       Without  a given command, netstiff will check for updates of the speci‐
23       fied URIs and then print the changes.  If no configuration file exists,
24       the configurator is launched instead.
25
26       Netstiff  exits  after all configured URIs are checked.  Occuring warn‐
27       ings and errors leave a message in the log  file  (~/.netstiff/lastlog)
28       and on stderr.  Use it with cron if you want to check for updates regu‐
29       larly.
30
31

COMMANDS

33       You can only pass one command to netstiff. It has to be the last  argu‐
34       ment in the argument list.
35
36       Commands may be shortened down to one character (e.g. c instead of con‐
37       figure). Leading dashes are ignored.
38
39       If you start netstiff without command, the full command will be used.
40
41       configure
42              Use this command if you want  to  start  the  configurator,  the
43              interactive  configuration  tool of netstiff. Of course, you may
44              also edit the configuration file in ~/.netstiff/config by  hand.
45              Using  the configurator is recommended if you are a new netstiff
46              user, because it explains the possible test  methods,  validates
47              your  regexps, etc.  Nevertheless, the configuration file format
48              is very easy.  See section CONFIGURATION FILE.
49              The configurator will not  initialize  the  netstiff  cache  for
50              added  URIs,  i.e. it will not download anything.  To do so, you
51              have to run netstiff update first.  This is a feature.
52              If the config file does not  exit,  the  configuration  tool  is
53              started automatically.
54
55       diff   Use  this command if you want to see the differences between two
56              versions of  saved  content  (Web  pages  or  meta  data).   See
57              diff(1).
58
59              The  version  after  the last reset (or the initial version) and
60              the version of the last update will be compared.
61
62       full   Use this command if  you  simply  want  netstiff  to  check  for
63              updates and print the diff.
64
65              full is a simple replacement for the following sequence:
66              netstiff update > /dev/null
67              netstiff diff
68              netstiff reset
69
70       help   Use  this command to get usage information about netstiff. To be
71              honest, this manual page in conjunction with the configurator is
72              a better documentation.
73
74       reset  Use this command after you noticed all differences with the diff
75              command (see above), so that diff will not  show  you  the  same
76              changes again and again.
77
78       update Use this command if you want netstiff to fetch the data from the
79              specified URIs and show you only those - one  per  line  -  that
80              have changed since your last update.
81
82       version
83              This command will display version number and copyright.
84
85

OPTIONS

87       You may pass the following options.
88
89       --no-stderr, -S
90              Use  this  option  to  suppress  warning  and  error messages on
91              stderr.  Thus the messages can only be seen in the log file.
92
93       --workdir DIR, -W DIR
94              Use this option if you want to specify  another  working  direc‐
95              tory.  The  working  directory  is  the directory where netstiff
96              reads the configuration file, stores  the  downloaded  data  and
97              writes  it  logs.  It defaults to ~/.netstiff.  See also section
98              BUGS.
99
100

RESTRICTIONS

102       There is no special case to handle status  codes  other  than  200.  In
103       practice,  netstiff will neither follow redirections nor will it notice
104       any 4xx or 5xx error code. The resulting error  pages  are  treated  as
105       usual Web pages. No logged message. Please check on your own.
106
107

USAGE EXAMPLE

109       You want to add a new URI netstiff should check for updates.
110               netstiff conf
111       The  configurator  is  not  described  here.  I know some weaknesses in
112       usability, but you can get along with it.
113
114       When you are seeing your shell prompt again,  you  know  that  netstiff
115       should retrieve an initial version of the Web page you specified.
116               netstiff update
117       After  some  weeks in the sun you want to see if something has changed.
118       So you let netstiff check for updates.
119               netstiff
120       It is printing an URI! Let's see the changes!
121               netstiff diff
122       Oh, it is so much, that it does not fit on a screen!
123               netstiff d | pager
124       Now you are satisfied because you read all the changes. So you  finally
125       do
126               netstiff reset
127       and netstiff forgets about the changes.
128
129

CONFIGURATION FILE

131       There is no need to manually edit the configuration file WORKDIR/config
132       (usually ~/.netstiff/config), because netstiff configure should do  the
133       job.   But  sometimes it is easier to edit a simple file than to browse
134       through menus, or you are writing another application that changes net‐
135       stiff settings.  So it is useful to describe the file format here.
136
137
138   RULES
139        · Whitespace at the begin and end of each line is ignored.
140
141        · Empty lines are ignored.
142
143        · A line beginning with # is regarded as comment.
144
145        · A line beginning with + is regarded as option.  The + is followed by
146          the option name, some whitespace and the option value.
147
148        · A line neither beginning with # nor +  is  regarded  as  URI.   URIs
149          without  scheme  (https://,  http://, ftp://) are recognized as HTTP
150          URIs.
151
152        · The configurator interprets a comment right  above  an  URI  as  the
153          title of the URI.
154
155        · Options  always  apply  to the first URI above.  Options without URI
156          line above are global options and affect every  URI  that  does  not
157          override these specific options.
158
159
160   CONFIGURATION OPTIONS
161       The following options are generally available:
162
163       test   sets the test method (or test criteria).
164              See section TEST METHODS for a description.  Defaults to diff.
165
166       timeout
167              sets the timeout (in seconds) for TCP connections.
168              Defaults to 20.
169
170       The following options only affect HTTP URIs:
171
172       client set the user-agent string.
173              Some  web  sites check the HTTP header field User-Agent and dis‐
174              play different content for different agents.   By  setting  this
175              field  you  can  pretend  to  use  Mozilla Firefox, for example.
176              Because many log analyzer tools for webmasters  display  statis‐
177              tics about that field, you may spread the word about netstiff by
178              setting this variable to the truth: netstiff. ;-)
179              Example: +  client  Mozilla/5.0  (X11;  U;  Linux  i686;  en-US;
180              rv:1.8.1.12) Gecko/20080208 Galeon/2.0.4
181              This option is not set by default.
182
183       lang   sets the accepted languages.
184              Internationalized  web  sites  offer there contents in different
185              languages and may check the HTTP header  field  Accept-Language.
186              It contains a list of languages (and sometimes extra information
187              like associated countries) sorted by priority.  The best way  to
188              get a good value is to copy and paste it from the preferences of
189              your web browser.
190              Example: de,en;q=0.9
191              This option is not set by default.
192
193       proxy  sets HTTP proxy host and port.  Must be in the  form  host:port.
194              Will fail if no port is given.
195
196       range  sets the range (in bytes) to get from a server.
197              Use this option if you are only interested in the changes within
198              a small region of a big file on a  HTTP  server.   Examples  are
199              12000-12500 or 13000- (till the end).
200              The  Range  feature  is  not supported by all web servers or for
201              every content. That means, that some web servers send the  whole
202              content instead of only the given range.
203              This option is not set by default.
204
205       referer
206              sets the referrer.
207              Some web sites check the HTTP header field Referer and refuse to
208              display the wished contents if  it  is  not  appropriately  set.
209              When clicking on a link in an ordinary web browser, the referrer
210              is set to the URI, where you clicked on the  link.   By  setting
211              this option to an URI, you can pretend clicking on a link on the
212              web page of this URI.  Please do not use this option to  `adver‐
213              tise' your own homepage (so-called referer spamming).
214              This option is not set by default.
215
216       The following options only affect the test method html:
217
218       htmlcmd
219              sets the command that is used to produce non-HTML human-readable
220              output. The command will be run on a temporary file.
221              Doing many experiments I got my best  results  using  +  htmlcmd
222              lynx  -nolist -dump.  Other dumpers had features, like justified
223              text or well-formatted tables, that turned out to  be  disadvan‐
224              tages when looking at the diffs.
225              This  option  is  not  set  by default. If you use the html test
226              method then, a very simple mechanism will hide HTML tags.  It is
227              possible  to  get  good results doing that, but it is not likely
228              and thus not recommended to leave this option unset.
229
230       The following options only affect the test methods diff and html:
231
232       start, end
233              Motivation: Many  modern  or  CMS-generated  web  pages  have  a
234              dynamic  and  a  static part. For example, at the beginning of a
235              web page there is always a randomly chosen citation  the  author
236              liked.  At the end there is a calendar showing the current date,
237              a weather analysis for the next days,  and  some  other  useless
238              stuff.   The  information  you  want to monitor for changes (the
239              static part) is situated between those  dynamic  parts.   It  is
240              very often possible to figure out textual anchors, that indicate
241              the start or the end of the static part.
242              Using this options you  can  set  regular  expressions  to  that
243              anchors.   For  example, if the last entry of the navigation bar
244              is Imprint and thereafter comes the static  part,  set  +  start
245              /Imprint/.   I  hope, you can imagine analogous examples for the
246              end option.
247              Note, that the regular expressions act on the unprocessed  input
248              (e.g. HTML source code), also when using the html test method.
249              These options are not set by default.
250
251       The following options only affect FTP URIs:
252
253       passive
254              is  a  boolean  option  (value true or false, case-insensitive).
255              Passive mode (PASV) will not be used on FTP connections iff  set
256              to false.
257              Defaults to true.
258
259
260   EXAMPLE
261       # this is my netstiff config file
262       + test      html
263       + htmlcmd   lynx -nolist -dump
264       + client    netstiff
265       + lang      de_DE
266       + timeout   6
267
268       # local usage statistics
269       http://localhost/stats.php
270         + start   /Statistics/
271         + end     /Generating page took/
272
273       # sbeyer's homepage
274       http://pkqs.net/~sbeyer/
275
276       # buggy scripts test
277       http://localhost/buggyscripts/test.cgi
278         + test /Internal Server Error/
279
280       # muetze's funny videos
281       ftp://foo:duff23@muetze.localnet/funnyvideos/
282         + passive false
283
284

TEST METHODS

286       The following test methods can be used:
287
288       date   On  HTTP  URIs,  this  method downloads the HTTP header to check
289              when the file has last been  modified.   To  make  this  feature
290              work,  the  server  should  response  the  Last-Modified  header
291              entity.  This behaviour can become useless  when  fetching  some
292              dynamic web sites.
293              On  FTP URIs, this method requests the last modification date of
294              the file on the FTP site to check when the file  has  last  been
295              modified.
296
297       diff   This  method  downloads the HTTP content, FTP file or FTP direc‐
298              tory listing and saves the two last versions.  Later you can use
299              netstiff diff to take a look at a diff of these versions.
300
301       html   This  method  acts  like diff, but assumes to get HTML input and
302              preprocesses it to it more human-readable.
303              See also the description of the htmlcmd option in  section  CON‐
304              FIGURATION FILE / CONFIGURATION OPTIONS.
305              This method is not available on FTP URIs.
306
307       md5sum This  method  downloads  the HTTP header to check if the MD5 sum
308              has changed.  The server should response the Content-MD5  header
309              entity to make this method work.
310              Use  this  method  on big binary files on HTTP sites and only if
311              the server supports it. (netstiff will tell you.)
312              This method is not available on FTP URIs.
313
314       size   On HTTP URIs, this method downloads the HTTP header to check  if
315              the  file  size  has  changed.  This feature needs the server to
316              response the Content-Length header entity.
317              On FTP URIs, this method requests the size of the  file  on  the
318              FTP site to check if it has changed.
319
320       /regexp/
321              This  method  downloads the HTTP content and checks if the given
322              regular expression matches or not.  The URI  is  prompted  (when
323              using update) iff this match status has changed.
324              This method is not available on FTP URIs.
325
326

RETURN VALUE

328       The number of errors are returned. So exit code 0 is success.
329
330

BUGS

332       The  regular  expression stuff is using the eval function of Ruby. This
333       means that you are able to do  non-regex-related  stuff  using  special
334       strings  as  `regular  expressions'.  This is a big security issue when
335       using netstiff as a backend for e.g. Web applications. So do NOT do  it
336       and  NEVER  start netstiff on foreign, unchecked configurations (-W can
337       be dangerous).
338
339       Feel free to send feedback, bug reports, etc.
340
341
343       © 2004, 2007-2008 Stephan Beyer <s-beyer@gmx.net>, GNU GPL
344
345
346
347sbeyer                             20080331                        NETSTIFF(1)
Impressum