|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.apache.lucene.util.AttributeSource
org.apache.lucene.analysis.TokenStream
org.apache.lucene.analysis.TokenFilter
org.apache.solr.analysis.BufferedTokenStream
org.apache.solr.analysis.CommonGramsFilter
public class CommonGramsFilter
Construct bigrams for frequently occurring terms while indexing. Single terms
are still indexed too, with bigrams overlaid. This is achieved through the
use of Token.setPositionIncrement(int)
. Bigrams have a type
of "gram" Example
Nested Class Summary |
---|
Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource |
---|
AttributeSource.AttributeFactory, AttributeSource.State |
Field Summary |
---|
Fields inherited from class org.apache.lucene.analysis.TokenFilter |
---|
input |
Constructor Summary | |
---|---|
CommonGramsFilter(TokenStream input,
Set commonWords)
Construct a token stream filtering the given input using a Set of common words to create bigrams. |
|
CommonGramsFilter(TokenStream input,
Set commonWords,
boolean ignoreCase)
Construct a token stream filtering the given input using a Set of common words to create bigrams, case-sensitive if ignoreCase is false (unless Set is CharArraySet). |
|
CommonGramsFilter(TokenStream input,
String[] commonWords)
Construct a token stream filtering the given input using an Array of common words to create bigrams. |
|
CommonGramsFilter(TokenStream input,
String[] commonWords,
boolean ignoreCase)
Construct a token stream filtering the given input using an Array of common words to create bigrams and is case-sensitive if ignoreCase is false. |
Method Summary | |
---|---|
void |
init()
|
static CharArraySet |
makeCommonSet(String[] commonWords)
Build a CharArraySet from an array of common words, appropriate for passing into the CommonGramsFilter constructor. |
static CharArraySet |
makeCommonSet(String[] commonWords,
boolean ignoreCase)
Build a CharArraySet from an array of common words, appropriate for passing into the CommonGramsFilter constructor,case-sensitive if ignoreCase is false. |
Token |
process(Token token)
Inserts bigrams for common words into a token stream. |
void |
reset()
|
Methods inherited from class org.apache.solr.analysis.BufferedTokenStream |
---|
next, output, peek, pushBack, read, write |
Methods inherited from class org.apache.lucene.analysis.TokenFilter |
---|
close, end |
Methods inherited from class org.apache.lucene.analysis.TokenStream |
---|
getOnlyUseNewAPI, incrementToken, next, setOnlyUseNewAPI |
Methods inherited from class org.apache.lucene.util.AttributeSource |
---|
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, restoreState, toString |
Methods inherited from class java.lang.Object |
---|
clone, finalize, getClass, notify, notifyAll, wait, wait, wait |
Constructor Detail |
---|
public CommonGramsFilter(TokenStream input, Set commonWords)
input
- TokenStream input in filter chaincommonWords
- The set of common words.public CommonGramsFilter(TokenStream input, Set commonWords, boolean ignoreCase)
commonWords
is an instance of
CharArraySet
(true if makeCommonSet()
was used to
construct the set) it will be directly used and ignoreCase
will be ignored since CharArraySet
directly controls case
sensitivity.
If commonWords
is not an instance of CharArraySet
, a
new CharArraySet will be constructed and ignoreCase
will be
used to specify the case sensitivity of that set.
input
- TokenStream input in filter chain.commonWords
- The set of common words.ignoreCase
- -Ignore case when constructing bigrams for common words.public CommonGramsFilter(TokenStream input, String[] commonWords)
input
- Tokenstream in filter chaincommonWords
- words to be used in constructing bigramspublic CommonGramsFilter(TokenStream input, String[] commonWords, boolean ignoreCase)
input
- Tokenstream in filter chaincommonWords
- words to be used in constructing bigramsignoreCase
- -Ignore case when constructing bigrams for common words.Method Detail |
---|
public void init()
public static final CharArraySet makeCommonSet(String[] commonWords)
passing false to
ignoreCase
public static final CharArraySet makeCommonSet(String[] commonWords, boolean ignoreCase)
commonWords
- ignoreCase
- If true, all words are lower cased first.
public Token process(Token token) throws IOException
process
in class BufferedTokenStream
IOException
public void reset() throws IOException
reset
in class BufferedTokenStream
IOException
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |