1EXTRACT(1) General Commands Manual EXTRACT(1)
2
3
4
6 extract - determine meta-information about a file
7
9 extract [ -abdfghLnrsvV ] [ -B language ] [ -H hash-algorithm ] [ -l
10 library ] [ -p type ] [ -x type ] file ...
11
13 This manual page documents version 0.5.17 of the extract command.
14
15 extract tests each file specified in the argument list in an attempt to
16 infer meta-information from it. Each file is subjected to the
17 meta-data extraction libraries from libextractor.
18
19 libextractor classifies meta-information (also referred to as keywords)
20 into types. A list of all types can be obtained with the -L option.
21
22
24 -a Do not remove any duplicates, even if the keywords match
25 exactly and have the same type (i.e. because the same keyword
26 was found by different extractor libraries).
27
28 -b Display the output in BiBTeX format. This implies the -d option
29
30 -B LANG Use the generic plaintext extractor for the language with the
31 2-letter language code LANG. Supported languages are DA (Dan‐
32 ish), DE (German), EN (English), ES (Spanish), FI (Finnish), FR
33 (French), GA (Gaelic), IT (Italian), NO (Norwegian) and SV
34 (Swedish).
35
36 -d Remove duplicates only if the types match exactly. By default,
37 duplicates are removed if the types match or if one of the
38 types is I unknown (in this case, the duplicate of unknown type
39 is removed).
40
41 -f add the filename(s) (without directory) to the list of key‐
42 words.
43
44 -g Use grep-friendly output (all keywords on a single line for
45 each file). Use the verbose option to print the filename
46 first, followed by the keywords. Use the verbose option twice
47 to also display the keyword types. This option will not print
48 keyword types or non-textual metadata.
49
50 -h Print a brief summary of the options.
51
52 -H ALGORITHM
53 Use the ALGORITHM to compute a hash of each file (possible
54 algorithms are sha1 and md5).
55
56 -L Print a list of all known keyword types.
57
58 -n Do not use the default set of extractors (typically all stan‐
59 dard extractors, currently mp3, ogg, jpg, gif, png, tiff, real,
60 html, pdf and mime-types), use only the extractors specified
61 with the .B -l option.
62
63 -r Remove all duplicates disregarding differences in the keyword
64 type.
65
66 -s Split keywords at delimiters (space, comma, colon, etc.) and
67 list split keywords to be of .I unknown type. This can also be
68 done by loading the split-library. Using this option guarantees
69 that the splitting is performed after all other libraries have
70 been run. It is always performed before duplicate elimination.
71
72 -v Print the version number and exit.
73
74 -V Be verbose.
75
76 -B Run the printable extractor (costly, generic extractor for
77 binaries)
78
79 -l libraries
80 Use the specified libraries to extract keywords. The general
81 format of libraries is .I [[-]LIBRARYNAME[:[-]LIBRARYNAME]*]
82 where LIBRARYNAME is a libextractor compatible library and typ‐
83 ically of the form .I libextractor_jpeg.so. The minus before
84 the libraryname indicates that this library should be run after
85 all the libraries that were specified so far. If the minus is
86 missing, the library is run before all previously specified
87 libraries.
88
89 -p type Print only the keywords matching the specified type. By
90 default, all keywords that are found and not removed as dupli‐
91 cates are printed.
92
93 -x type Exclude keywords of the specified type from the output. By
94 default, all keywords that are found and not removed as dupli‐
95 cates are printed.
96
98 libextractor(3) - description of the libextractor library
99
101 $ extract test/test.jpg
102 comment - (C) 2001 by Christian Grothoff, using gimp 1.2 1
103 mimetype - image/jpeg
104
105 $ extract -Vf -x comment test/test.jpg
106 Keywords for file test/test.jpg:
107 mimetype - image/jpeg
108 filename - test.jpg
109
110 $ extract -p comment test/test.jpg
111 comment - (C) 2001 by Christian Grothoff, using gimp 1.2 1
112
113 $ extract -nV -l libextractor_png.so -p comment test/test.jpg test/test.png
114 Keywords for file test/test.jpg:
115 Keywords for file test/test.png:
116 comment - Testing keyword extraction
117
118
120 libextractor and the extract tool are released under the GPL. libex‐
121 tractor is a GNU project.
122
123
125 A couple of file-formats (on the order of 10^3) are not recognized...
126
127
129 extract was originally written by Christian Grothoff <chris‐
130 tian@grothoff.org> and Vidyut Samanta <vids@cs.ucla.edu>. Use <libex‐
131 tractor@gnu.org> to contact the current maintainer(s).
132
133
135 You can obtain the original author's latest version from
136 http://gnunet.org/libextractor/
137
138
139
140libextractor 0.5.17 Dec 29, 2006 EXTRACT(1)