1LANGIDENT(1) User Contributed Perl Documentation LANGIDENT(1)
2
3
4
6 langident - identifies the language files are written in
7
9 langident [OPTIONS] file1 [file2 ...]
10
12 Identifies the language files are written in using Perl module
13 Lingua::Identify.
14
15 OPTIONS
16 -a
17 Show all results (not just the most probable language).
18
19 -c
20 Show confidence level for most probable language (it will be the first
21 value right after the most probable language).
22
23 -d
24 Debug (development only).
25
26 -E ENCODING
27 Select an input encoding. Defaults to UTF-8.
28
29 # use ISO-8859-1 (latin1)
30 langident -E ISO-8859-1 file
31
32 -e METHODS
33 Select the method(s) to use. There are three ways of doing this:
34
35 # simply using a method
36 langident -e ngrams3 file
37
38 # using several methods (separate them with a comma)
39 langident -e prefixes3,suffixes3
40
41 # using several methods and assign different weights to each of them
42 langident -e smallwords=2,prefixes=1,ngrams3=1.3
43
44 The available methods are the following: smallwords, prefixes1,
45 prefixes2, prefixes3, prefixes4, suffixes1, suffixes2, suffixes3,
46 suffixes4, ngrams1, ngrams2, ngrams3 and ngrams4.
47
48 -h
49 Display help message and exit.
50
51 -l
52 List all available languages and exit.
53
54 -m NUMBER
55 Set maximum number of results (languages) to display (shows the N most
56 probable languages, by descending order of probability).
57
58 Overrides the -a switch.
59
60 -o LANGUAGES
61 Only work with specified languages.
62
63 # identify between Portuguese and English only
64 langident -o pt,en *
65
66 -p
67 Also show percentages.
68
69 -s SIZE
70 Maximum size to examine.
71
72 -v
73 Show version and exit.
74
76 Use methods ngrams2 and ngrams1, assigning the double of importance to
77 ngrams2 (-e switch); output will include the three most probable
78 languages (-m switch) with its percentages (-p switch) and also the
79 confidence level (-c switch) of the first result.
80
81 $ langident -e ngrams2=2,ngrams1 -c -p -m 3 README
82 README:en 65.7209505939491 7.8971987481393 ga 4.11905889385895 tr 4.08487011400505
83 $
84
86 • Add a switch to ignore HTML tags (and maybe other formats too)
87
89 Lingua::Identify(3), Text::ExtractWords(3), Text::Ngram(3),
90 Text::Affixes(3).
91
92 A linguist and/or a shrink.
93
94 The latest CVS version of "Lingua::Identify" (which includes langident)
95 can be attained at
96 http://natura.di.uminho.pt/natura/viewcvs.cgi/Lingua/Identify/
97
98 ISO 639 Language Codes, at http://www.w3.org/WAI/ER/IG/ert/iso639.htm
99
101 Jose Alves de Castro, <cog@cpan.org>
102
104 Copyright 2004 by Jose Alves de Castro
105
106 This library is free software; you can redistribute it and/or modify it
107 under the same terms as Perl itself.
108
109
110
111perl v5.34.0 2022-01-21 LANGIDENT(1)