ヤミRoot VoidGate

Viewing: Text::Language::Guess.3pm

.\" Automatically generated by Pod::Man 2.27 (Pod::Simple 3.28)
.\"
.\" Standard preamble:
.\" ========================================================================
.de Sp \" Vertical space (when we can't use .PP)
.if t .sp .5v
.if n .sp
..
.de Vb \" Begin verbatim text
.ft CW
.nf
.ne \\$1
..
.de Ve \" End verbatim text
.ft R
.fi
..
.\" Set up some character translations and predefined strings.  \*(-- will
.\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left
.\" double quote, and \*(R" will give a right double quote.  \*(C+ will
.\" give a nicer C++.  Capital omega is used to do unbreakable dashes and
.\" therefore won't be available.  \*(C` and \*(C' expand to `' in nroff,
.\" nothing in troff, for use with C<>.
.tr \(*W-
.ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p'
.ie n \{\
.    ds -- \(*W-
.    ds PI pi
.    if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch
.    if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\"  diablo 12 pitch
.    ds L" ""
.    ds R" ""
.    ds C` ""
.    ds C' ""
'br\}
.el\{\
.    ds -- \|\(em\|
.    ds PI \(*p
.    ds L" ``
.    ds R" ''
.    ds C`
.    ds C'
'br\}
.\"
.\" Escape single quotes in literal strings from groff's Unicode transform.
.ie \n(.g .ds Aq \(aq
.el       .ds Aq '
.\"
.\" If the F register is turned on, we'll generate index entries on stderr for
.\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index
.\" entries marked with X<> in POD.  Of course, you'll have to process the
.\" output yourself in some meaningful fashion.
.\"
.\" Avoid warning from groff about undefined register 'F'.
.de IX
..
.nr rF 0
.if \n(.g .if rF .nr rF 1
.if (\n(rF:(\n(.g==0)) \{
.    if \nF \{
.        de IX
.        tm Index:\\$1\t\\n%\t"\\$2"
..
.        if !\nF==2 \{
.            nr % 0
.            nr F 2
.        \}
.    \}
.\}
.rr rF
.\" ========================================================================
.\"
.IX Title "Text::Language::Guess 3"
.TH Text::Language::Guess 3 "2005-11-20" "perl v5.16.3" "User Contributed Perl Documentation"
.\" For nroff, turn off justification.  Always turn off hyphenation; it makes
.\" way too many mistakes in technical documents.
.if n .ad l
.nh
.SH "NAME"
Text::Language::Guess \- Trained module to guess a document's language
.SH "SYNOPSIS"
.IX Header "SYNOPSIS"
.Vb 1
\&    use Text::Language::Guess;
\&
\&    my $guesser = Text::Language::Guess\->new();
\&    my $lang = $guesser\->language_guess("bill.txt");
\&
\&        # prints \*(Aqen\*(Aq
\&    print "Best fit: $lang\en";
.Ve
.SH "DESCRIPTION"
.IX Header "DESCRIPTION"
Text::Language::Guess guesses a document's language. Its implementation
is simple: Using \f(CW\*(C`Text::ExtractWords\*(C'\fR and \f(CW\*(C`Lingua::StopWords\*(C'\fR from 
\&\s-1CPAN,\s0 it determines how many of the known stopwords the document 
contains for each language supported by \f(CW\*(C`Lingua::StopWords\*(C'\fR.
.PP
Each word in the document recognized as stopword
of a particular language scores one point for this language.
.PP
The \f(CW\*(C`language_guess()\*(C'\fR function takes a document as a parameter
and returns the abbreviation of the language that it is most likely
written in.
.PP
Supported Languages:
.IP "\(bu" 4
English (en)
.IP "\(bu" 4
French (fr)
.IP "\(bu" 4
Spanish (es)
.IP "\(bu" 4
Portugese (pt)
.IP "\(bu" 4
Italian (it)
.IP "\(bu" 4
German (de)
.IP "\(bu" 4
Dutch (nl)
.IP "\(bu" 4
Swedish (sv)
.IP "\(bu" 4
Norwegian (no)
.IP "\(bu" 4
Danish (da)
.SS "Methods"
.IX Subsection "Methods"
.ie n .IP """new()""" 4
.el .IP "\f(CWnew()\fR" 4
.IX Item "new()"
Initializes the guesser with all stopwords available for
all supported languges.
If \f(CW\*(C`new\*(C'\fR has been called before, subsequent
calls will return the same precomputed stoplist map,
avoiding collecting all stopwords again (as long as the
number of languages stays the same, see next
paragraph).
.Sp
You can limit the number of searched languages by specifying
the \f(CW\*(C`language\*(C'\fR parameter and passing it an array ref of wanted
languages:
.Sp
.Vb 2
\&        # Only guess between English and German
\&    $guesser = Text::Language::Guess\->new(languages => [\*(Aqen\*(Aq, \*(Aqde\*(Aq]);
.Ve
.ie n .IP """language_guess($textfile)""" 4
.el .IP "\f(CWlanguage_guess($textfile)\fR" 4
.IX Item "language_guess($textfile)"
Reads in a text file, extracts all words, scores them 
using the stopword maps and returns a single two-letter
string indicating the language the document is most likely
written in.
.ie n .IP """language_guess_string($string)""" 4
.el .IP "\f(CWlanguage_guess_string($string)\fR" 4
.IX Item "language_guess_string($string)"
Just like \f(CW\*(C`language_guess\*(C'\fR, but takes a string instead of a file name.
.ie n .IP """scores($textfile)""" 4
.el .IP "\f(CWscores($textfile)\fR" 4
.IX Item "scores($textfile)"
Like \f(CW\*(C`language_guess($textfile)\*(C'\fR, just returning
a ref to a hash mapping language strings (e.g. 'en')
to a score number. The entry with the highest score
is the most likely one.
.ie n .IP """scores_string($string)""" 4
.el .IP "\f(CWscores_string($string)\fR" 4
.IX Item "scores_string($string)"
Like \f(CW\*(C`scores\*(C'\fR, but takes a string instead of a file name.
.SH "EXAMPLES"
.IX Header "EXAMPLES"
.Vb 1
\&    use Text::Language::Guess;
\&
\&        # Guess language in a string instead of a file
\&    my $guesser = Text::Language::Guess\->new();
\&    my $lang = $guesser\->language_guess_string("Make love not war");
\&        # \*(Aqen\*(Aq
\&
\&
\&        # Limit number of languages to choose from
\&    my $guesser = Text::Language::Guess\->new(languages => [\*(Aqda\*(Aq, \*(Aqnl\*(Aq]);
\&    my $lang = $guesser\->language_guess_string(
\&                   "Which is closer to English, danish or dutch?");
\&        # \*(Aqnl\*(Aq
\&
\&
\&        # Show different scores
\&    my $guesser = Text::Language::Guess\->new();
\&    my $scores = $guesser\->scores_string(
\&        "This text is English, but other languages are scoring as well");
\&    use Data::Dumper;
\&    print Dumper($scores);
\&
\&        # $VAR1 = {
\&        #   \*(Aqpt\*(Aq => 1,
\&        #   \*(Aqen\*(Aq => 6,
\&        #   \*(Aqfr\*(Aq => 1,
\&        #   \*(Aqnl\*(Aq => 1
\&        # };
.Ve
.SH "LEGALESE"
.IX Header "LEGALESE"
Copyright 2005 by Mike Schilli, all rights reserved.
This program is free software, you can redistribute it and/or
modify it under the same terms as Perl itself.
.SH "AUTHOR"
.IX Header "AUTHOR"
2005, Mike Schilli <cpan@perlmeister.com>
User / IP	: 216.73.216.2
Host / Server	: 146.88.233.70 / dev.loger.cm
System	: Linux hybrid1120.fr.ns.planethoster.net 3.10.0-957.21.2.el7.x86_64 #1 SMP Wed Jun 5 14:26:44 UTC 2019 x86_64