class Twitter::Regex

A collection of regular expressions for parsing Tweet text. The regular expression list is frozen at load time to ensure immutability. These regular expressions are used throughout the Twitter classes. Special care has been taken to make sure these reular expressions work with Tweets in all languages.

Constants

CTRL_CHARS
DOMAIN_VALID_CHARS
HASHTAG
HASHTAG_ALPHA

A hashtag must contain at least one unicode letter or mark, as well as numbers, underscores, and select special characters.

HASHTAG_ALPHANUMERIC
HASHTAG_BOUNDARY
INVALID_CHARACTERS

Character not allowed in Tweets

LATIN_ACCENTS

Latin accented characters Excludes 0xd7 from the range (the multiplication sign, confusable with “x”). Also excludes 0xf7, the division sign

PUNCTUATION_CHARS
RTL_CHARACTERS
SPACE_CHARS
TLDS
UNICODE_SPACES

Space is more than %20, U+3000 for example is the full-width space used with Kanji. Provide a short-hand to access both the list of characters and a pattern suitible for use with String#split

Taken from: ActiveSupport::Multibyte::Handlers::UTF8Handler::UNICODE_WHITESPACE

Public Class Methods

[](key) click to toggle source

Return the regular expression for a given key. If the key is not a known symbol a nil will be returned.

# File lib/twitter-text/regex.rb, line 328
def self.[](key)
  REGEXEN[key]
end