class HTML::Pipeline::SanitizationFilter

HTML filter with sanization routines and whitelists. This module defines what HTML is allowed in user provided content and fixes up issues with unbalanced tags and whatnot.

See the Sanitize docs for more information on the underlying library:

github.com/rgrove/sanitize/#readme

Context options:

:whitelist      - The sanitizer whitelist configuration to use. This
                  can be one of the options constants defined in this
                  class or a custom sanitize options hash.
:anchor_schemes - The URL schemes to allow in <a href> attributes. The
                  default set is provided in the ANCHOR_SCHEMES
                  constant in this class. If passed, this overrides any
                  schemes specified in the whitelist configuration.

This filter does not write additional information to the context.

Constants

ANCHOR_SCHEMES

These schemes are the only ones allowed in <a href> attributes by default.

FULL

Strip all HTML tags from the document.

LIMITED

A more limited sanitization whitelist. This includes all attributes, protocols, and transformers from WHITELIST but with a more locked down set of allowed elements.

LISTS
LIST_ITEM
TABLE
TABLE_ITEMS

List of table child elements. These must be contained by a <table> element or they are not allowed through. Otherwise they can be used to break out of places we're using tables to contain formatted user content (like pull request review comments).

TABLE_SECTIONS
WHITELIST

The main sanitization whitelist. Only these elements and attributes are allowed through by default.

Public Instance Methods

call() click to toggle source

Sanitize markup using the Sanitize library.

# File lib/html/pipeline/sanitization_filter.rb, line 118
def call
  Sanitize.clean_node!(doc, whitelist)
end
whitelist() click to toggle source

The whitelist to use when sanitizing. This can be passed in the context hash to the filter but defaults to WHITELIST constant value above.

# File lib/html/pipeline/sanitization_filter.rb, line 124
def whitelist
  whitelist = context[:whitelist] || WHITELIST
  anchor_schemes = context[:anchor_schemes]
  return whitelist unless anchor_schemes
  whitelist = whitelist.dup
  whitelist[:protocols] = (whitelist[:protocols] || {}).dup
  whitelist[:protocols]['a'] = (whitelist[:protocols]['a'] || {}).merge('href' => anchor_schemes)
  whitelist
end