.\" -*- mode: troff; coding: utf-8 -*-
.\" Automatically generated by Pod::Man 5.01 (Pod::Simple 3.43)
.\"
.\" Standard preamble:
.\" ========================================================================
.de Sp \" Vertical space (when we can't use .PP)
.if t .sp .5v
.if n .sp
..
.de Vb \" Begin verbatim text
.ft CW
.nf
.ne \\$1
..
.de Ve \" End verbatim text
.ft R
.fi
..
.\" \*(C` and \*(C' are quotes in nroff, nothing in troff, for use with C<>.
.ie n \{\
. ds C` ""
. ds C' ""
'br\}
.el\{\
. ds C`
. ds C'
'br\}
.\"
.\" Escape single quotes in literal strings from groff's Unicode transform.
.ie \n(.g .ds Aq \(aq
.el .ds Aq '
.\"
.\" If the F register is >0, we'll generate index entries on stderr for
.\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index
.\" entries marked with X<> in POD. Of course, you'll have to process the
.\" output yourself in some meaningful fashion.
.\"
.\" Avoid warning from groff about undefined register 'F'.
.de IX
..
.nr rF 0
.if \n(.g .if rF .nr rF 1
.if (\n(rF:(\n(.g==0)) \{\
. if \nF \{\
. de IX
. tm Index:\\$1\t\\n%\t"\\$2"
..
. if !\nF==2 \{\
. nr % 0
. nr F 2
. \}
. \}
.\}
.rr rF
.\" ========================================================================
.\"
.IX Title "IGNORE.LIST 5"
.TH IGNORE.LIST 5 2023-01-21 "perl v5.36.0" "User Contributed Perl Documentation"
.\" For nroff, turn off justification. Always turn off hyphenation; it makes
.\" way too many mistakes in technical documents.
.if n .ad l
.nh
.SH NAME
ignore.list \- websec url monitoring configuration
.SH DESCRIPTION
.IX Header "DESCRIPTION"
.SS "IGNORE KEYWORDS"
.IX Subsection "IGNORE KEYWORDS"
When determining which parts of a particular web page has changed, you may
want to skip those paragraphs that contains certain predefined words. For
example, pages like InfoWorld, PC Magazine and PC Week often contain the
current date/time regardless of whether there is new or changed content. In
such cases, you can use IGNORE KEYWORDS to skip those paragraphs which
contains date/time information.
.PP
Ignore keywords are stored in a file called "ignore.list" in the same
directory as websec. Like the URL list, the ignore keywords are partitioned
into different sections. Each section has a user-defined name. An example is
shown below:
.PP
.Vb 6
\& [General]
\& all rights reserved
\& an error occurred
\& click here
\& comments
\& copyright
\&
\& [Date_Time]
\& January\es+\ed{1,2}
\& February\es+\ed{1,2}
\& March\es+\ed{1,2}
\& April\es+\ed{1,2}
\& May\es+\ed{1,2}
.Ve
.PP
In the example above, there are two sections: "General" and "Date_Time".
You can use them in the URL list as follows:
.PP
.Vb 1
\& Ignore = General
.Ve
.PP
You can also use multiple sections at one go:
.PP
.Vb 1
\& Ignore = General,Date_Time
.Ve
.PP
If you use certain ignore keywords regularly, you might want to add them to
a defaults section in the URL list.
.PP
Ignore keywords can contain regular expressions. For example, the ignore
keyword "January\es+\ed{1,2}" tells websec to look for the string "January",
followed by one or more spaces, followed by at least one but not more than
two digits.
.PP
Two sections of ignore keywords are supplied in this distribution. "General"
contains some general ignore keywords which you may want to use. "Date_Time"
contains date/time detectors coded using regular expressions. Feel free to
add your own!
.SS "IGNORE URLS"
.IX Subsection "IGNORE URLS"
Most advertisements in webpages are of the following form:
.PP
.Vb 4
\&
\&
\& Click here for free beer!
\&
.Ve
.PP
Such advertisements can be ignored when running webdiff using ignore URLs.
.PP
Ignore URLs are also stored in "ignore.list". They contain all of parts of
the URL referred to by the tag which you want to ignore. An example
is shown below:
.PP
.Vb 2
\& [Adverts]
\& page.url.com/advert/cgi\-bin/
.Ve
.PP
Use the "Adverts" section in the URL list as follows:
.PP
.Vb 1
\& IgnoreURL = Adverts
.Ve
.PP
You can also use multiple sections at one go:
.PP
.Vb 1
\& IgnoreURL = Adverts1,Adverts2
.Ve
.PP
If you use certain ignore URLs regularly, you might want to add them
to a defaults section in the URL list.
.PP
Like ignore keywords, ignore URLs can contain regular expressions.
.PP
An "Adverts" section is supplied in this distribution. Feel free to add your
own!
.SH "SEE ALSO"
.IX Header "SEE ALSO"
\&\fBurl.list\fR\|(5)
.SH AUTHOR
.IX Header "AUTHOR"
Baruch Even is maintaining this program.