ヤミRoot VoidGate

Viewing: Text::TermExtract.3pm

.\" Automatically generated by Pod::Man 2.27 (Pod::Simple 3.28)
.\"
.\" Standard preamble:
.\" ========================================================================
.de Sp \" Vertical space (when we can't use .PP)
.if t .sp .5v
.if n .sp
..
.de Vb \" Begin verbatim text
.ft CW
.nf
.ne \\$1
..
.de Ve \" End verbatim text
.ft R
.fi
..
.\" Set up some character translations and predefined strings.  \*(-- will
.\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left
.\" double quote, and \*(R" will give a right double quote.  \*(C+ will
.\" give a nicer C++.  Capital omega is used to do unbreakable dashes and
.\" therefore won't be available.  \*(C` and \*(C' expand to `' in nroff,
.\" nothing in troff, for use with C<>.
.tr \(*W-
.ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p'
.ie n \{\
.    ds -- \(*W-
.    ds PI pi
.    if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch
.    if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\"  diablo 12 pitch
.    ds L" ""
.    ds R" ""
.    ds C` ""
.    ds C' ""
'br\}
.el\{\
.    ds -- \|\(em\|
.    ds PI \(*p
.    ds L" ``
.    ds R" ''
.    ds C`
.    ds C'
'br\}
.\"
.\" Escape single quotes in literal strings from groff's Unicode transform.
.ie \n(.g .ds Aq \(aq
.el       .ds Aq '
.\"
.\" If the F register is turned on, we'll generate index entries on stderr for
.\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index
.\" entries marked with X<> in POD.  Of course, you'll have to process the
.\" output yourself in some meaningful fashion.
.\"
.\" Avoid warning from groff about undefined register 'F'.
.de IX
..
.nr rF 0
.if \n(.g .if rF .nr rF 1
.if (\n(rF:(\n(.g==0)) \{
.    if \nF \{
.        de IX
.        tm Index:\\$1\t\\n%\t"\\$2"
..
.        if !\nF==2 \{
.            nr % 0
.            nr F 2
.        \}
.    \}
.\}
.rr rF
.\" ========================================================================
.\"
.IX Title "TermExtract 3"
.TH TermExtract 3 "2008-03-10" "perl v5.16.3" "User Contributed Perl Documentation"
.\" For nroff, turn off justification.  Always turn off hyphenation; it makes
.\" way too many mistakes in technical documents.
.if n .ad l
.nh
.SH "NAME"
Text::TermExtract \- Extract terms from text
.SH "SYNOPSIS"
.IX Header "SYNOPSIS"
.Vb 1
\&    use Text::TermExtract;
\&
\&    my $text = { Hey, hey, how\*(Aqs it going? Wanna go to Wendy\*(Aqs 
\&                 tonight? Wendy\*(Aqs has great sandwiches." };
\&
\&    my $ext = Text::TermExtract\->new();
\&
\&    for my $word ( $ext\->terms_extract( $text, { max => 3 }) ) {
\&        print "$word\en";
\&    }
\&
\&    # "sandwiches"
\&    # "tonight"
\&    # "wendy"
.Ve
.SH "DESCRIPTION"
.IX Header "DESCRIPTION"
Text::TermExtract takes a simple approach at extracting the most interesting
terms from documents of arbitrary length.
.PP
There's more scientific methods to term extraction, like Yahoo's online 
term extraction \s-1API \s0(but you can't have it locally) and the Lingua::YaTeA 
module on \s-1CPAN \s0(which is so poorly documented that I couldn't figure out
how to use it).
.PP
So I wrote Text::TermExtract, which first tries to
guess the language a text is written in, kicks out the language\-
specific stopwords, weighs the rest with a hand-crafted formula and
returns a list of (hopefully) interesting words.
.PP
This is a very crude approach to term extraction, if you have a better
method and want to include it in Text::TermExtract, drop me an email,
I'm interested.
.SS "\s-1METHODS\s0"
.IX Subsection "METHODS"
.IP "\fInew()\fR" 4
.IX Item "new()"
Constructor.
.ie n .IP "terms_extract( $text, $opts )" 4
.el .IP "terms_extract( \f(CW$text\fR, \f(CW$opts\fR )" 4
.IX Item "terms_extract( $text, $opts )"
Goes through the text stringin \f(CW$text\fR, extracts the keywords and returns
them as a list.
.Sp
To limit the number of words returned, use the \f(CW\*(C`max\*(C'\fR option:
.Sp
.Vb 1
\&    $extr\->terms_extract( $text, { max => 10 } );
.Ve
.ie n .IP "exclude( $array_ref )" 4
.el .IP "exclude( \f(CW$array_ref\fR )" 4
.IX Item "exclude( $array_ref )"
Add a list of words to exclude. The words listed in the array passed
in as a reference will never be used as keywords.
.Sp
.Vb 1
\&    $extr\->exclude( [\*(Aqmoe\*(Aq, \*(Aqjoe\*(Aq] );
.Ve
.SH "LEGALESE"
.IX Header "LEGALESE"
Copyright 2008 by Mike Schilli, all rights reserved.
This program is free software, you can redistribute it and/or
modify it under the same terms as Perl itself.
.SH "AUTHOR"
.IX Header "AUTHOR"
2008, Mike Schilli <cpan@perlmeister.com>
User / IP	: 216.73.216.110
Host / Server	: 146.88.233.70 / dev.loger.cm
System	: Linux hybrid1120.fr.ns.planethoster.net 3.10.0-957.21.2.el7.x86_64 #1 SMP Wed Jun 5 14:26:44 UTC 2019 x86_64