1ZEBRASRV(8) Commands ZEBRASRV(8)
2
3
4
6 zebrasrv - Zebra Server
7
9 zebrasrv [-install] [-installa] [-remove] [-a file] [-v level]
10 [-l file] [-u uid] [-c config] [-f vconfig] [-C fname]
11 [-t minutes] [-k kilobytes] [-d daemon] [-w dir] [-p pidfile]
12 [-ziDST1] [listener-spec...]
13
15 Zebra is a high-performance, general-purpose structured text indexing
16 and retrieval engine. It reads structured records in a variety of input
17 formats (e.g. email, XML, MARC) and allows access to them through exact
18 boolean search expressions and relevance-ranked free-text queries.
19
20 zebrasrv is the Z39.50 and SRU frontend server for the Zebra search
21 engine and indexer.
22
23 On Unix you can run the zebrasrv server from the command line - and put
24 it in the background. It may also operate under the inet daemon. On
25 WIN32 you can run the server as a console application or as a WIN32
26 Service.
27
29 The options for zebrasrv are the same as those for YAZ' yaz-ztest.
30 Option -c specifies a Zebra configuration file - if omitted zebra.cfg
31 is read.
32
33 -a file
34 Specify a file for dumping PDUs (for diagnostic purposes). The
35 special name - (dash) sends output to stderr.
36
37 -S
38 Don't fork or make threads on connection requests. This is good for
39 debugging, but not recommended for real operation: Although the
40 server is asynchronous and non-blocking, it can be nice to keep a
41 software malfunction (okay then, a crash) from affecting all
42 current users. The server can only accept a single connection in
43 this mode.
44
45 -1
46 Like -S but after one session the server exits. This mode is for
47 debugging only.
48
49 -T
50 Operate the server in threaded mode. The server creates a thread
51 for each connection rather than a fork a process. Only available on
52 UNIX systems that offers POSIX threads.
53
54 -s
55 Use the SR protocol (obsolete).
56
57 -z
58 Use the Z39.50 protocol (default). This option and -s complement
59 each other. You can use both multiple times on the same command
60 line, between listener-specifications (see below). This way, you
61 can set up the server to listen for connections in both protocols
62 concurrently, on different local ports.
63
64 -l file
65 Specify an output file for the diagnostic messages. The default is
66 to write this information to stderr
67
68 -c config-file
69 Read configuration information from config-file. The default
70 configuration is ./zebra.cfg
71
72 -f vconfig
73 This specifies an XML file that describes one or more YAZ frontend
74 virtual servers. See section VIRTUAL HOSTS for details.
75
76 -C fname
77 Sets SSL certificate file name for server (PEM).
78
79 -v level
80 The log level. Use a comma-separated list of members of the set
81 {fatal,debug,warn,log,malloc,all,none}.
82
83 -u uid
84 Set user ID. Sets the real UID of the server process to that of the
85 given user. It's useful if you aren't comfortable with having the
86 server run as root, but you need to start it as such to bind a
87 privileged port.
88
89 -w working-directory
90 The server changes to this working directory during before
91 listening on incoming connections. This option is useful when the
92 server is operating from the inetd daemon (see -i).
93
94 -p pidfile
95 Specifies that the server should write its Process ID to file given
96 by pidfile. A typical location would be /var/run/zebrasrv.pid.
97
98 -i
99 Use this to make the the server run from the inetd server (UNIX
100 only). Make sure you use the logfile option -l in conjunction with
101 this mode and specify the -l option before any other options.
102
103 -D
104 Use this to make the server put itself in the background and run as
105 a daemon. If neither -i nor -D is given, the server starts in the
106 foreground.
107
108 -install
109 Use this to install the server as an NT service (Windows NT/2000/XP
110 only). Control the server by going to the Services in the Control
111 Panel.
112
113 -installa
114 Use this to install and activate the server as an NT service
115 (Windows NT/2000/XP only). Control the server by going to the
116 Services in the Control Panel.
117
118 -remove
119 Use this to remove the server from the NT services (Windows
120 NT/2000/XP only).
121
122 -t minutes
123 Idle session timeout, in minutes. Default is 60 minutes.
124
125 -k size
126 Maximum record size/message size, in kilobytes. Default is 1024 KB
127 (1 MB).
128
129 -d daemon
130 Set name of daemon to be used in hosts access file. See
131 hosts_access(5) and tcpd(8).
132
133 A listener-address consists of an optional transport mode followed by a
134 colon (:) followed by a listener address. The transport mode is either
135 a file system socket unix, a SSL TCP/IP socket ssl, or a plain TCP/IP
136 socket tcp (default).
137
138 For TCP, an address has the form
139
140 hostname | IP-number [: portnumber]
141
142
143 The port number defaults to 210 (standard Z39.50 port) for privileged
144 users (root), and 9999 for normal users. The special hostname "@" is
145 mapped to the address INADDR_ANY, which causes the server to listen on
146 any local interface.
147
148 The default behavior for zebrasrv - if started as non-privileged user -
149 is to establish a single TCP/IP listener, for the Z39.50 protocol, on
150 port 9999.
151
152 zebrasrv @
153 zebrasrv tcp:some.server.name.org:1234
154 zebrasrv ssl:@:3000
155
156
157 To start the server listening on the registered port for Z39.50, or on
158 a filesystem socket, and to drop root privileges once the ports are
159 bound, execute the server like this from a root shell:
160
161 zebrasrv -u daemon @
162 zebrasrv -u daemon tcp:@:210
163 zebrasrv -u daemon unix:/some/file/system/socket
164
165
166 Here daemon is an existing user account, and the unix socket
167 /some/file/system/socket is readable and writable for the daemon
168 account.
169
171 Z39.50 Initialization
172 During initialization, the server will negotiate to version 3 of the
173 Z39.50 protocol, and the option bits for Search, Present, Scan,
174 NamedResultSets, and concurrentOperations will be set, if requested by
175 the client. The maximum PDU size is negotiated down to a maximum of 1
176 MB by default.
177
178 Z39.50 Search
179 The supported query type are 1 and 101. All operators are currently
180 supported with the restriction that only proximity units of type "word"
181 are supported for the proximity operator. Queries can be arbitrarily
182 complex. Named result sets are supported, and result sets can be used
183 as operands without limitations. Searches may span multiple databases.
184
185 The server has full support for piggy-backed retrieval (see also the
186 following section).
187
188 Z39.50 Present
189 The present facility is supported in a standard fashion. The requested
190 record syntax is matched against the ones supported by the profile of
191 each record retrieved. If no record syntax is given, SUTRS is the
192 default. The requested element set name, again, is matched against any
193 provided by the relevant record profiles.
194
195 Z39.50 Scan
196 The attribute combinations provided with the termListAndStartPoint are
197 processed in the same way as operands in a query (see above).
198 Currently, only the term and the globalOccurrences are returned with
199 the termInfo structure.
200
201 Z39.50 Sort
202 Z39.50 specifies three different types of sort criteria. Of these Zebra
203 supports the attribute specification type in which case the use
204 attribute specifies the "Sort register". Sort registers are created for
205 those fields that are of type "sort" in the default.idx file. The
206 corresponding character mapping file in default.idx specifies the
207 ordinal of each character used in the actual sort.
208
209 Z39.50 allows the client to specify sorting on one or more input result
210 sets and one output result set. Zebra supports sorting on one result
211 set only which may or may not be the same as the output result set.
212
213 Z39.50 Close
214 If a Close PDU is received, the server will respond with a Close PDU
215 with reason=FINISHED, no matter which protocol version was negotiated
216 during initialization. If the protocol version is 3 or more, the server
217 will generate a Close PDU under certain circumstances, including a
218 session timeout (60 minutes by default), and certain kinds of protocol
219 errors. Once a Close PDU has been sent, the protocol association is
220 considered broken, and the transport connection will be closed
221 immediately upon receipt of further data, or following a short timeout.
222
223 Z39.50 Explain
224 Zebra maintains a "classic" Z39.50 Explain[1] database on the side.
225 This database is called IR-Explain-1 and can be searched using the
226 attribute set exp-1.
227
228 The records in the explain database are of type grs.sgml. The root
229 element for the Explain grs.sgml records is explain, thus explain.abs
230 is used for indexing.
231
232 Note
233 Zebra must be able to locate explain.abs in order to index the
234 Explain records properly. Zebra will work without it but the
235 information will not be searchable.
236
238 In addition to Z39.50, Zebra supports the more recent and web-friendly
239 IR protocol SRU[2]. SRU can be carried over SOAP or a REST-like
240 protocol that uses HTTP GET or POST to request search responses. The
241 request itself is made of parameters such as query, startRecord,
242 maximumRecords and recordSchema; the response is an XML document
243 containing hit-count, result-set records, diagnostics, etc. SRU can be
244 thought of as a re-casting of Z39.50 semantics in web-friendly terms;
245 or as a standardisation of the ad-hoc query parameters used by search
246 engines such as Google and AltaVista; or as a superset of A9's
247 OpenSearch (which it predates).
248
249 Zebra supports Z39.50, SRU GET, SRU POST, SRU SOAP (SRW) - on the same
250 port, recognising what protocol is used by each incoming requests and
251 handling them accordingly. This is a achieved through the use of Deep
252 Magic; civilians are warned not to stand too close.
253
254 Running zebrasrv as an SRU Server
255 Because Zebra supports all protocols on one port, it would seem to
256 follow that the SRU server is run in the same way as the Z39.50 server,
257 as described above. This is true, but only in an uninterestingly
258 vacuous way: a Zebra server run in this manner will indeed recognise
259 and accept SRU requests; but since it doesn't know how to handle the
260 CQL queries that these protocols use, all it can do is send failure
261 responses.
262
263 Note
264 It is possible to cheat, by having SRU search Zebra with a PQF
265 query instead of CQL, using the x-pquery parameter instead of
266 query. This is a non-standard extension of CQL, and a very naughty
267 thing to do, but it does give you a way to see Zebra serving SRU
268 ``right out of the box''. If you start your favourite Zebra server
269 in the usual way, on port 9999, then you can send your web browser
270 to:
271
272 http://localhost:9999/Default?version=1.1
273 &operation=searchRetrieve
274 &x-pquery=mineral
275 &startRecord=1
276 &maximumRecords=1
277
278
279 This will display the XML-formatted SRU response that includes the
280 first record in the result-set found by the query mineral. (For
281 clarity, the SRU URL is shown here broken across lines, but the
282 lines should be joined together to make single-line URL for the
283 browser to submit.)
284
285 In order to turn on Zebra's support for CQL queries, it's necessary to
286 have the YAZ generic front-end (which Zebra uses) translate them into
287 the Z39.50 Type-1 query format that is used internally. And to do this,
288 the generic front-end's own configuration file must be used. See the
289 section called “YAZ SERVER VIRTUAL HOSTS”; the salient point for SRU
290 support is that zebrasrv must be started with the -f frontendConfigFile
291 option rather than the -c zebraConfigFile option, and that the
292 front-end configuration file must include both a reference to the Zebra
293 configuration file and the CQL-to-PQF translator configuration file.
294
295 A minimal front-end configuration file that does this would read as
296 follows:
297
298
299 <yazgfs>
300 <server>
301 <config>zebra.cfg</config>
302 <cql2rpn>../../tab/pqf.properties</cql2rpn>
303 </server>
304 </yazgfs>
305
306
307 The <config> element contains the name of the Zebra configuration file
308 that was previously specified by the -c command-line argument, and the
309 <cql2rpn> element contains the name of the CQL properties file
310 specifying how various CQL indexes, relations, etc. are translated into
311 Type-1 queries.
312
313 A zebra server running with such a configuration can then be queried
314 using proper, conformant SRU URLs with CQL queries:
315
316 http://localhost:9999/Default?version=1.1
317 &operation=searchRetrieve
318 &query=title=utah and description=epicent*
319 &startRecord=1
320 &maximumRecords=1
321
322
324 Zebra running as an SRU server supports SRU version 1.1, including CQL
325 version 1.1. In particular, it provides support for the following
326 elements of the protocol.
327
328 SRU Search and Retrieval
329 Zebra supports the searchRetrieve operation.
330
331 One of the great strengths of SRU is that it mandates a standard query
332 language, CQL, and that all conforming implementations can therefore be
333 trusted to correctly interpret the same queries. It is with some shame,
334 then, that we admit that Zebra also supports an additional query
335 language, our own Prefix Query Format (PQF[3]). A PQF query is
336 submitted by using the extension parameter x-pquery, in which case the
337 query parameter must be omitted, which makes the request not valid SRU.
338 Please feel free to use this facility within your own applications; but
339 be aware that it is not only non-standard SRU but not even
340 syntactically valid, since it omits the mandatory query parameter.
341
342 SRU Scan
343 Zebra supports scan operation. Scanning using CQL syntax is the
344 default, where the standard scanClause parameter is used.
345
346 In addition, a mutant form of SRU scan is supported, using the
347 non-standard x-pScanClause parameter in place of the standard
348 scanClause to scan on a PQF query clause.
349
350 SRU Explain
351 Zebra supports explain.
352
353 The ZeeRex record explaining a database may be requested either with a
354 fully fledged SRU request (with operation=explain and version-number
355 specified) or with a simple HTTP GET at the server's basename. The
356 ZeeRex record returned in response is the one embedded in the YAZ
357 Frontend Server configuration file that is described in the the section
358 called “YAZ SERVER VIRTUAL HOSTS”.
359
360 Unfortunately, the data found in the CQL-to-PQF text file must be added
361 by hand-craft into the explain section of the YAZ Frontend Server
362 configuration file to be able to provide a suitable explain record. Too
363 bad, but this is all extreme new alpha stuff, and a lot of work has yet
364 to be done ..
365
366 There is no linkage whatsoever between the Z39.50 explain model and the
367 SRU explain response (well, at least not implemented in Zebra, that is
368 ..). Zebra does not provide a means using Z39.50 to obtain the ZeeRex
369 record.
370
371 Other SRU operations
372 In the Z39.50 protocol, Initialization, Present, Sort and Close are
373 separate operations. In SRU, however, these operations do not exist.
374
375 · SRU has no explicit initialization handshake phase, but commences
376 immediately with searching, scanning and explain operations.
377
378 · Neither does SRU have a close operation, since the protocol is
379 stateless and each request is self-contained. (It is true that
380 multiple SRU request/response pairs may be implemented as multiple
381 HTTP request/response pairs over a single persistent TCP/IP
382 connection; but the closure of that connection is not a
383 protocol-level operation.)
384
385 · Retrieval in SRU is part of the searchRetrieve operation, in which
386 a search is submitted and the response includes a subset of the
387 records in the result set. There is no direct analogue of Z39.50's
388 Present operation which requests records from an established result
389 set. In SRU, this is achieved by sending a subsequent
390 searchRetrieve request with the query cql.resultSetId=id where id
391 is the identifier of the previously generated result-set.
392
393 · Sorting in CQL is done within the searchRetrieve operation - in
394 v1.1, by an explicit sort parameter, but the forthcoming v1.2 or
395 v2.0 will most likely use an extension of the query language, CQL
396 sorting[4].
397
398 It can be seen, then, that while Zebra operating as an SRU server does
399 not provide the same set of operations as when operating as a Z39.50
400 server, it does provide equivalent functionality.
401
403 Surf into http://localhost:9999 to get an explain response, or use
404
405 http://localhost:9999/?version=1.1&operation=explain
406
407
408 See number of hits for a query
409
410 http://localhost:9999/?version=1.1&operation=searchRetrieve
411 &query=text=(plant%20and%20soil)
412
413
414 Fetch record 5-7 in Dublin Core format
415
416 http://localhost:9999/?version=1.1&operation=searchRetrieve
417 &query=text=(plant%20and%20soil)
418 &startRecord=5&maximumRecords=2&recordSchema=dc
419
420
421 Even search using PQF queries using the extended naughty parameter
422 x-pquery
423
424 http://localhost:9999/?version=1.1&operation=searchRetrieve
425 &x-pquery=@attr%201=text%20@and%20plant%20soil
426
427
428 Or scan indexes using the extended extremely naughty parameter
429 x-pScanClause
430
431 http://localhost:9999/?version=1.1&operation=scan
432 &x-pScanClause=@attr%201=text%20something
433
434
435
436 Don't do this in production code! But it's a great fast debugging aid.
437
439 The Virtual hosts mechanism allows a YAZ frontend server to support
440 multiple backends. A backend is selected on the basis of the TCP/IP
441 binding (port+listening address) and/or the virtual host.
442
443 A backend can be configured to execute in a particular working
444 directory. Or the YAZ frontend may perform CQL[5] to RPN conversion,
445 thus allowing traditional Z39.50 backends to be offered as a SRU[2]
446 service. SRU Explain information for a particular backend may also be
447 specified.
448
449 For the HTTP protocol, the virtual host is specified in the Host
450 header. For the Z39.50 protocol, the virtual host is specified as in
451 the Initialize Request in the OtherInfo, OID
452 1.2.840.10003.10.1000.81.1.
453
454 Note
455 Not all Z39.50 clients allows the VHOST information to be set. For
456 those the selection of the backend must rely on the TCP/IP
457 information alone (port and address).
458
459 The YAZ frontend server uses XML to describe the backend
460 configurations. Command-line option -f specifies filename of the XML
461 configuration.
462
463 The configuration uses the root element yazgfs. This element includes a
464 list of listen elements, followed by one or more server elements.
465
466 The listen describes listener (transport end point), such as TCP/IP,
467 Unix file socket or SSL server. Content for a listener:
468
469 CDATA (required)
470 The CDATA for the listen element holds the listener string, such as
471 tcp:@:210, tcp:server1:2100, etc.
472
473 attribute id (optional)
474 identifier for this listener. This may be referred to from server
475 sections.
476
477 Note
478 We expect more information to be added for the listen section in a
479 future version, such as CERT file for SSL servers.
480
481 The server describes a server and the parameters for this server type.
482 Content for a server:
483
484 attribute id (optional)
485 Identifier for this server. Currently not used for anything, but it
486 might be for logging purposes.
487
488 attribute listenref (optional)
489 Specifies listener for this server. If this attribute is not given,
490 the server is accessible from all listener. In order for the server
491 to be used for real, however, the virtual host must match (if
492 specified in the configuration).
493
494 element config (optional)
495 Specifies the server configuration. This is equivalent to the
496 config specified using command line option -c.
497
498 element directory (optional)
499 Specifies a working directory for this backend server. If
500 specified, the YAZ frontend changes current working directory to
501 this directory whenever a backend of this type is started (backend
502 handler bend_start), stopped (backend handler hand_stop) and
503 initialized (bend_init).
504
505 element host (optional)
506 Specifies the virtual host for this server. If this is specified a
507 client must specify this host string in order to use this backend.
508
509 element cql2rpn (optional)
510 Specifies a filename that includes CQL[5] to RPN conversion for
511 this backend server. See CQL[5] section in YAZ manual. If given,
512 the backend server will only "see" a Type-1/RPN query.
513
514 element explain (optional)
515 Specifies SRU[2] ZeeRex content for this server - copied verbatim
516 to the client. As things are now, some of the Explain content seems
517 redundant because host information, etc. is also stored elsewhere.
518
519 The format of the Explain record is described in detail, with
520 examples, on the file at the ZeeRex[6] web-site.
521
522 The XML below configures a server that accepts connections from two
523 ports, TCP/IP port 9900 and a local UNIX file socket. We name the
524 TCP/IP server public and the other server internal.
525
526
527 <yazgfs>
528 <listen id="public">tcp:@:9900</listen>
529 <listen id="internal">unix:/var/tmp/socket</listen>
530 <server id="server1">
531 <host>server1.mydomain</host>
532 <directory>/var/www/s1</directory>
533 <config>config.cfg</config>
534 </server>
535 <server id="server2">
536 <host>server2.mydomain</host>
537 <directory>/var/www/s2</directory>
538 <config>config.cfg</config>
539 <cql2rpn>../etc/pqf.properties</cql2rpn>
540 <explain xmlns="http://explain.z3950.org/dtd/2.0/">
541 <serverInfo>
542 <host>server2.mydomain</host>
543 <port>9900</port>
544 <database>a</database>
545 </serverInfo>
546 </explain>
547 </server>
548 <server id="server3" listenref="internal">
549 <directory>/var/www/s3</directory>
550 <config>config.cfg</config>
551 </server>
552 </yazgfs>
553
554
555
556 There are three configured backend servers. The first two servers,
557 "server1" and "server2", can be reached by both listener addresses -
558 since no listenref attribute is specified. In order to distinguish
559 between the two a virtual host has been specified for each of server in
560 the host elements.
561
562 For "server2" elements for CQL[5] to RPN conversion is supported and
563 explain information has been added (a short one here to keep the
564 example small).
565
566 The third server, "server3" can only be reached via listener
567 "internal".
568
570 zebraidx(1)
571
573 Index Data
574
576 1. Z39.50 Explain
577 http://www.loc.gov/z3950/agency/markup/07.html
578
579 2. SRU
580 http://www.loc.gov/standards/sru/
581
582 3. PQF
583 http://www.indexdata.com/yaz/doc/tools.html#PQF
584
585 4. CQL sorting
586 http://zing.z3950.org/cql/sorting.html
587
588 5. CQL
589 http://www.loc.gov/standards/sru/cql/
590
591 6. ZeeRex
592 http://explain.z3950.org/
593
594
595
596zebra 2.1.4 08/01/2018 ZEBRASRV(8)