In Files

Methods

Class/Module Index [+]

Quicksearch

Ferret::Analysis::StandardTokenizer

Summary

The standard tokenizer is an advanced tokenizer which tokenizes most words correctly as well as tokenizing things like email addresses, web addresses, phone numbers, etc.

Example

"Dave's résumé, at http://www.davebalmain.com/ 1234"
  => ["Dave's", "résumé", "at", "http://www.davebalmain.com", "1234"]

Public Class Methods

new(lower = true) → tokenizer click to toggle source

Create a new StandardTokenizer which optionally downcases tokens. Downcasing is done according the current locale.

lower

set to false if you don't wish to downcase tokens

static VALUE
frb_standard_tokenizer_init(VALUE self, VALUE rstr) 
{
#ifndef POSH_OS_WIN32
    if (!frb_locale) frb_locale = setlocale(LC_CTYPE, "");
#endif
    return get_wrapped_ts(self, rstr, mb_standard_tokenizer_new());
}

[Validate]

Generated with the Darkfish Rdoc Generator 2.