|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.apache.solr.analysis.BaseTokenizerFactory
org.apache.solr.analysis.PatternTokenizerFactory
public class PatternTokenizerFactory
This tokenizer uses regex pattern matching to construct distinct tokens for the input stream. It takes two arguments: "pattern" and "group".
group=-1 (the default) is equivalent to "split". In this case, the tokens will
be equivalent to the output from (without empty tokens):
String.split(java.lang.String)
Using group >= 0 selects the matching group as the token. For example, if you have:
pattern = \'([^\']+)\' group = 0 input = aaa 'bbb' 'ccc'the output will be two tokens: 'bbb' and 'ccc' (including the ' marks). With the same input but using group=1, the output would be: bbb and ccc (no ' marks)
NOTE: This Tokenizer does not output tokens that are of zero length.
PatternTokenizer
Field Summary | |
---|---|
protected int |
group
|
static String |
GROUP
|
protected Pattern |
pattern
|
static String |
PATTERN
|
Fields inherited from class org.apache.solr.analysis.BaseTokenizerFactory |
---|
args, log |
Constructor Summary | |
---|---|
PatternTokenizerFactory()
|
Method Summary | |
---|---|
Tokenizer |
create(Reader in)
Split the input using configured pattern |
static List<Token> |
group(Matcher matcher,
String input,
int group)
Deprecated. |
void |
init(Map<String,String> args)
Require a configured pattern |
static List<Token> |
split(Matcher matcher,
String input)
Deprecated. |
Methods inherited from class org.apache.solr.analysis.BaseTokenizerFactory |
---|
getArgs |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final String PATTERN
public static final String GROUP
protected Pattern pattern
protected int group
Constructor Detail |
---|
public PatternTokenizerFactory()
Method Detail |
---|
public void init(Map<String,String> args)
init
in interface TokenizerFactory
init
in class BaseTokenizerFactory
public Tokenizer create(Reader in)
@Deprecated public static List<Token> split(Matcher matcher, String input)
@Deprecated public static List<Token> group(Matcher matcher, String input, int group)
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |