1EXTRACT(1)                  General Commands Manual                 EXTRACT(1)
2
3
4

NAME

6       extract - determine meta-information about a file
7

SYNOPSIS

9       extract  [  -abdfghLnrsvV  ] [ -B language ] [ -H hash-algorithm ] [ -l
10       library ] [ -p type ] [ -x type ] file ...
11

DESCRIPTION

13       This manual page documents version 0.5.17 of the extract command.
14
15       extract tests each file specified in the argument list in an attempt to
16       infer  meta-information  from  it.   Each  file  is  subjected  to  the
17       meta-data extraction libraries from libextractor.
18
19       libextractor classifies meta-information (also referred to as keywords)
20       into types. A list of all types can be obtained with the -L option.
21
22

OPTIONS

24       -a      Do  not  remove  any  duplicates,  even  if  the keywords match
25               exactly and have the same type (i.e. because the  same  keyword
26               was found by different extractor libraries).
27
28       -b      Display the output in BiBTeX format. This implies the -d option
29
30       -B LANG Use  the  generic plaintext extractor for the language with the
31               2-letter language code LANG.  Supported languages are DA  (Dan‐
32               ish), DE (German), EN (English), ES (Spanish), FI (Finnish), FR
33               (French), GA (Gaelic), IT  (Italian),  NO  (Norwegian)  and  SV
34               (Swedish).
35
36       -d      Remove  duplicates only if the types match exactly. By default,
37               duplicates are removed if the types match  or  if  one  of  the
38               types is I unknown (in this case, the duplicate of unknown type
39               is removed).
40
41       -f      add the filename(s) (without directory) to  the  list  of  key‐
42               words.
43
44       -g      Use  grep-friendly  output  (all  keywords on a single line for
45               each file).  Use the  verbose  option  to  print  the  filename
46               first,  followed by the keywords.  Use the verbose option twice
47               to also display the keyword types.  This option will not  print
48               keyword types or non-textual metadata.
49
50       -h      Print a brief summary of the options.
51
52       -H ALGORITHM
53               Use  the  ALGORITHM  to  compute  a hash of each file (possible
54               algorithms are sha1 and md5).
55
56       -L      Print a list of all known keyword types.
57
58       -n      Do not use the default set of extractors (typically  all  stan‐
59               dard extractors, currently mp3, ogg, jpg, gif, png, tiff, real,
60               html, pdf and mime-types), use only  the  extractors  specified
61               with the .B -l option.
62
63       -r      Remove  all  duplicates disregarding differences in the keyword
64               type.
65
66       -s      Split keywords at delimiters (space, comma,  colon,  etc.)  and
67               list  split keywords to be of .I unknown type. This can also be
68               done by loading the split-library. Using this option guarantees
69               that  the splitting is performed after all other libraries have
70               been run. It is always performed before duplicate elimination.
71
72       -v      Print the version number and exit.
73
74       -V      Be verbose.
75
76       -B      Run the printable  extractor  (costly,  generic  extractor  for
77               binaries)
78
79       -l libraries
80               Use  the  specified  libraries to extract keywords. The general
81               format of libraries  is  .I  [[-]LIBRARYNAME[:[-]LIBRARYNAME]*]
82               where LIBRARYNAME is a libextractor compatible library and typ‐
83               ically of the form .I libextractor_jpeg.so.  The  minus  before
84               the libraryname indicates that this library should be run after
85               all the libraries that were specified so far. If the  minus  is
86               missing,  the  library  is  run before all previously specified
87               libraries.
88
89       -p type Print  only  the  keywords  matching  the  specified  type.  By
90               default,  all keywords that are found and not removed as dupli‐
91               cates are printed.
92
93       -x type Exclude keywords of the specified  type  from  the  output.  By
94               default,  all keywords that are found and not removed as dupli‐
95               cates are printed.
96

SEE ALSO

98       libextractor(3) - description of the libextractor library
99

EXAMPLES

101       $ extract test/test.jpg
102       comment - (C) 2001 by Christian Grothoff, using gimp 1.2 1
103       mimetype - image/jpeg
104
105       $ extract -Vf -x comment test/test.jpg
106       Keywords for file test/test.jpg:
107       mimetype - image/jpeg
108       filename - test.jpg
109
110       $ extract -p comment test/test.jpg
111       comment - (C) 2001 by Christian Grothoff, using gimp 1.2 1
112
113       $ extract -nV -l libextractor_png.so -p comment test/test.jpg test/test.png
114       Keywords for file test/test.jpg:
115       Keywords for file test/test.png:
116       comment - Testing keyword extraction
117
118
120       libextractor and the extract tool are released under the  GPL.   libex‐
121       tractor is a GNU project.
122
123

BUGS

125       A couple of file-formats (on the order of 10^3) are not recognized...
126
127

AUTHORS

129       extract   was   originally   written   by  Christian  Grothoff  <chris‐
130       tian@grothoff.org> and Vidyut Samanta <vids@cs.ucla.edu>.  Use  <libex‐
131       tractor@gnu.org> to contact the current maintainer(s).
132
133

AVAILABILITY

135       You   can   obtain   the   original   author's   latest   version  from
136       http://gnunet.org/libextractor/
137
138
139
140libextractor 0.5.17              Dec 29, 2006                       EXTRACT(1)
Impressum