org.apache.solr.analysis
Class BufferedTokenStream
java.lang.Object
org.apache.lucene.util.AttributeSource
org.apache.lucene.analysis.TokenStream
org.apache.lucene.analysis.TokenFilter
org.apache.solr.analysis.BufferedTokenStream
- Direct Known Subclasses:
- CommonGramsFilter, CommonGramsQueryFilter, RemoveDuplicatesTokenFilter
public abstract class BufferedTokenStream
- extends TokenFilter
Handles input and output buffering of TokenStream
// Example of a class implementing the rule "A" "B" => "Q" "B"
class MyTokenStream extends BufferedTokenStream {
public MyTokenStream(TokenStream input) {super(input);}
protected Token process(Token t) throws IOException {
if ("A".equals(t.termText())) {
Token t2 = read();
if (t2!=null && "B".equals(t2.termText())) t.setTermText("Q");
if (t2!=null) pushBack(t2);
}
return t;
}
}
// Example of a class implementing "A" "B" => "A" "A" "B"
class MyTokenStream extends BufferedTokenStream {
public MyTokenStream(TokenStream input) {super(input);}
protected Token process(Token t) throws IOException {
if ("A".equals(t.termText()) && "B".equals(peek(1).termText()))
write((Token)t.clone());
return t;
}
}
NOTE: BufferedTokenStream does not clone() any Tokens. This is instead the
responsibility of the implementing subclass. In the "A" "B" => "A" "A" "B"
example above, the subclass must clone the additional "A" it creates.
- Version:
- $Id$
Method Summary |
Token |
next()
|
protected Iterable<Token> |
output()
Provides direct Iterator access to the buffered output stream. |
protected Token |
peek(int n)
Peek n tokens ahead in the buffered input stream, without modifying
the stream. |
protected abstract Token |
process(Token t)
Process a token. |
protected void |
pushBack(Token t)
Push a token back into the buffered input stream, such that it will
be returned by a future call to read() |
protected Token |
read()
Read a token from the buffered input stream. |
void |
reset()
|
protected void |
write(Token t)
Write a token to the buffered output stream |
Methods inherited from class org.apache.lucene.util.AttributeSource |
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, restoreState, toString |
BufferedTokenStream
public BufferedTokenStream(TokenStream input)
process
protected abstract Token process(Token t)
throws IOException
- Process a token. Subclasses may read more tokens from the input stream,
write more tokens to the output stream, or simply return the next token
to be output. Subclasses may return null if the token is to be dropped.
If a subclass writes tokens to the output stream and returns a
non-null Token, the returned Token is considered to be at the head of
the token output stream.
- Throws:
IOException
next
public final Token next()
throws IOException
- Overrides:
next
in class TokenStream
- Throws:
IOException
read
protected Token read()
throws IOException
- Read a token from the buffered input stream.
- Returns:
- null at EOS
- Throws:
IOException
pushBack
protected void pushBack(Token t)
- Push a token back into the buffered input stream, such that it will
be returned by a future call to
read()
peek
protected Token peek(int n)
throws IOException
- Peek n tokens ahead in the buffered input stream, without modifying
the stream.
- Parameters:
n
- Number of tokens into the input stream to peek, 1 based ...
0 is invalid
- Returns:
- a Token which exists in the input stream, any modifications
made to this Token will be "real" if/when the Token is
read()
from the stream.
- Throws:
IOException
write
protected void write(Token t)
- Write a token to the buffered output stream
output
protected Iterable<Token> output()
- Provides direct Iterator access to the buffered output stream.
Modifying any token in this Iterator will affect the resulting stream.
reset
public void reset()
throws IOException
- Overrides:
reset
in class TokenFilter
- Throws:
IOException
Copyright © 2010 Apache Software Foundation. All Rights Reserved.