module Corefines::String::ForceUTF8

@!method #force_utf8

Returns a copy of _str_ with encoding changed to UTF-8 and all invalid
byte sequences replaced with the Unicode Replacement Character (U+FFFD).

If _str_ responds to +#scrub!+ (Ruby >=2.1), then it's used for
replacing invalid bytes. Otherwise a simple custom implementation is
used (may not return the same result as +#scrub!+).

@return [String] a valid UTF-8 string.

@!method #force_utf8!

Changes the encoding to UTF-8, replaces all invalid byte sequences with
the Unicode Replacement Character (U+FFFD) and returns self.
This is same as {#force_utf8}, except it indents the receiver in-place.

@return (see #force_utf8)

Public Instance Methods

force_utf8() click to toggle source
# File lib/corefines/string.rb, line 208
def force_utf8
  dup.force_utf8!
end
force_utf8!() click to toggle source
# File lib/corefines/string.rb, line 212
def force_utf8!
  str = force_encoding(Encoding::UTF_8)

  if str.respond_to? :scrub!
    str.scrub!

  else
    result = ''.force_encoding('BINARY')
    invalid = false

    str.chars.each do |c|
      if c.valid_encoding?
        result << c
        invalid = false
      elsif !invalid
        result << "\uFFFD"
        invalid = true
      end
    end

    replace result.force_encoding(Encoding::UTF_8)
  end
end