1LANGIDENT(1)          User Contributed Perl Documentation         LANGIDENT(1)
2
3
4

NAME

6       langident - identifies the language files are written in
7

SYNOPSIS

9         langident [OPTIONS] file1 [file2 ...]
10

DESCRIPTION

12       Identifies the language files are written in using Perl module
13       Lingua::Identify.
14
15   OPTIONS
16   -a
17       Show all results (not just the most probable language).
18
19   -c
20       Show confidence level for most probable language (it will be the first
21       value right after the most probable language).
22
23   -d
24       Debug (development only).
25
26   -E ENCODING
27       Select an input encoding. Defaults to UTF-8.
28
29         # use ISO-8859-1 (latin1)
30         langident -E ISO-8859-1 file
31
32   -e METHODS
33       Select the method(s) to use. There are three ways of doing this:
34
35         # simply using a method
36         langident -e ngrams3 file
37
38         # using several methods (separate them with a comma)
39         langident -e prefixes3,suffixes3
40
41         # using several methods and assign different weights to each of them
42         langident -e smallwords=2,prefixes=1,ngrams3=1.3
43
44       The available methods are the following: smallwords, prefixes1,
45       prefixes2, prefixes3, prefixes4, suffixes1, suffixes2, suffixes3,
46       suffixes4, ngrams1, ngrams2, ngrams3 and ngrams4.
47
48   -h
49       Display help message and exit.
50
51   -l
52       List all available languages and exit.
53
54   -m NUMBER
55       Set maximum number of results (languages) to display (shows the N most
56       probable languages, by descending order of probability).
57
58       Overrides the -a switch.
59
60   -o LANGUAGES
61       Only work with specified languages.
62
63         # identify between Portuguese and English only
64         langident -o pt,en *
65
66   -p
67       Also show percentages.
68
69   -s SIZE
70       Maximum size to examine.
71
72   -v
73       Show version and exit.
74

EXAMPLES

76       Use methods ngrams2 and ngrams1, assigning the double of importance to
77       ngrams2 (-e switch); output will include the three most probable
78       languages (-m switch) with its percentages (-p switch) and also the
79       confidence level (-c switch) of the first result.
80
81         $ langident -e ngrams2=2,ngrams1 -c -p -m 3 README
82         README:en 65.7209505939491 7.8971987481393 ga 4.11905889385895 tr 4.08487011400505
83         $
84

TO DO

86       •     Add a switch to ignore HTML tags (and maybe other formats too)
87

SEE ALSO

89       Lingua::Identify(3), Text::ExtractWords(3), Text::Ngram(3),
90       Text::Affixes(3).
91
92       A linguist and/or a shrink.
93
94       The latest CVS version of "Lingua::Identify" (which includes langident)
95       can be attained at
96       http://natura.di.uminho.pt/natura/viewcvs.cgi/Lingua/Identify/
97
98       ISO 639 Language Codes, at http://www.w3.org/WAI/ER/IG/ert/iso639.htm
99

AUTHOR

101       Jose Alves de Castro, <cog@cpan.org>
102
104       Copyright 2004 by Jose Alves de Castro
105
106       This library is free software; you can redistribute it and/or modify it
107       under the same terms as Perl itself.
108
109
110
111perl v5.34.0                      2021-07-22                      LANGIDENT(1)
Impressum