module Babosa::UTF8::Proxy

A UTF-8 proxy for Babosa can be any object which responds to the methods in this module. The following proxies are provided by Babosa: {ActiveSupportProxy}, {DumbProxy}, {JavaProxy}, and {UnicodeProxy}.

Constants

CP1252

Public Instance Methods

downcase(string) click to toggle source

This is a stub for a method that should return a Unicode-aware downcased version of the given string.

# File lib/babosa/utf8/proxy.rb, line 49
def downcase(string)
  raise NotImplementedError
end
normalize_utf8(string) click to toggle source

This is a stub for a method that should return the Unicode NFC normalization of the given string.

# File lib/babosa/utf8/proxy.rb, line 61
def normalize_utf8(string)
  raise NotImplementedError
end
tidy_bytes(string) click to toggle source

Attempt to replace invalid UTF-8 bytes with valid ones. This method naively assumes if you have invalid UTF8 bytes, they are either Windows CP-1252 or ISO8859-1. In practice this isn't a bad assumption, but may not always work.

# File lib/babosa/utf8/proxy.rb, line 70
def tidy_bytes(string)
  string.scrub do |bad|
    tidy_byte(*bad.bytes).flatten.compact.pack('C*').unpack('U*').pack('U*')
  end
end
upcase(string) click to toggle source

This is a stub for a method that should return a Unicode-aware upcased version of the given string.

# File lib/babosa/utf8/proxy.rb, line 55
def upcase(string)
  raise NotImplementedError
end

Private Instance Methods

tidy_byte(byte) click to toggle source
# File lib/babosa/utf8/proxy.rb, line 120
def tidy_byte(byte)
  byte < 160 ? CP1252[byte] : byte < 192 ? [194, byte] : [195, byte - 64]
end