Used for parsing a document in kramdown format.
If you want to extend the functionality of the parser, you need to do the following:
Here is a small example for an extended parser class that parses ERB style tags as raw text if they are used as span-level elements (an equivalent block-level parser should probably also be made to handle the block case):
require 'kramdown/parser/kramdown' class Kramdown::Parser::ERBKramdown < Kramdown::Parser::Kramdown def initialize(source, options) super @span_parsers.unshift(:erb_tags) end ERB_TAGS_START = /<%.*?%>/ def parse_erb_tags @src.pos += @src.matched_size @tree.children << Element.new(:raw, @src.matched) end define_parser(:erb_tags, ERB_TAGS_START, '<%') end
The new parser can be used like this:
require 'kramdown/document' # require the file with the above parser class Kramdown::Document.new(input_text, :input => 'ERBKramdown').to_html
Data | = | Struct.new(:name, :start_re, :span_start, :method) | Struct class holding all the needed data for one block/span-level parser method. | |
INDENT | = | /^(?:\t| {4})/ | Regexp for matching indentation (one tab or four spaces) | |
OPT_SPACE | = | / {0,3}/ | Regexp for matching the optional space (zero or up to three spaces) | |
TABLE_SEP_LINE | = | /^([+|: -]*?-[+|: -]*?)[ \t]*\n/ | ||
TABLE_HSEP_ALIGN | = | /[ ]?(:?)-+(:?)[ ]?/ | ||
TABLE_FSEP_LINE | = | /^[+|: =]*?=[+|: =]*?[ \t]*\n/ | ||
TABLE_ROW_LINE | = | /^(.*?)[ \t]*\n/ | ||
TABLE_PIPE_CHECK | = | /(?:\||.*?[^\\\n]\|)/ | ||
TABLE_LINE | = | /#{TABLE_PIPE_CHECK}.*?\n/ | ||
TABLE_START | = | /^#{OPT_SPACE}(?=\S)#{TABLE_LINE}/ | ||
FOOTNOTE_DEFINITION_START | = | /^#{OPT_SPACE}\[\^(#{ALD_ID_NAME})\]:\s*?(.*?\n#{CODEBLOCK_MATCH})/ | ||
FOOTNOTE_MARKER_START | = | /\[\^(#{ALD_ID_NAME})\]/ | ||
CODEBLOCK_START | = | INDENT | ||
CODEBLOCK_MATCH | = | /(?:#{BLANK_LINE}?(?:#{INDENT}[ \t]*\S.*\n)+(?:(?!#{BLANK_LINE} {0,3}\S|#{IAL_BLOCK_START}|#{EOB_MARKER}|^#{OPT_SPACE}#{LAZY_END_HTML_STOP}|^#{OPT_SPACE}#{LAZY_END_HTML_START})^[ \t]*\S.*\n)*)*/ | ||
FENCED_CODEBLOCK_START | = | /^~{3,}/ | ||
FENCED_CODEBLOCK_MATCH | = | /^(~{3,})\s*?\n(.*?)^\1~*\s*?\n/m | ||
TYPOGRAPHIC_SYMS | = | [['---', :mdash], ['--', :ndash], ['...', :hellip], ['\\<<', '<<'], ['\\>>', '>>'], ['<< ', :laquo_space], [' >>', :raquo_space], ['<<', :laquo], ['>>', :raquo]] | ||
TYPOGRAPHIC_SYMS_SUBST | = | Hash[*TYPOGRAPHIC_SYMS.flatten] | ||
TYPOGRAPHIC_SYMS_RE | = | /#{TYPOGRAPHIC_SYMS.map {|k,v| Regexp.escape(k)}.join('|')}/ | ||
IAL_CLASS_ATTR | = | 'class' | ||
ALD_ID_CHARS | = | /[\w-]/ | ||
ALD_ANY_CHARS | = | /\\\}|[^\}]/ | ||
ALD_ID_NAME | = | /\w#{ALD_ID_CHARS}*/ | ||
ALD_TYPE_KEY_VALUE_PAIR | = | /(#{ALD_ID_NAME})=("|')((?:\\\}|\\\2|[^\}\2])*?)\2/ | ||
ALD_TYPE_CLASS_NAME | = | /\.(#{ALD_ID_NAME})/ | ||
ALD_TYPE_ID_NAME | = | /#(\w[\w:-]*)/ | ||
ALD_TYPE_REF | = | /(#{ALD_ID_NAME})/ | ||
ALD_TYPE_ANY | = | /(?:\A|\s)(?:#{ALD_TYPE_KEY_VALUE_PAIR}|#{ALD_TYPE_ID_NAME}|#{ALD_TYPE_CLASS_NAME}|#{ALD_TYPE_REF})(?=\s|\Z)/ | ||
ALD_START | = | /^#{OPT_SPACE}\{:(#{ALD_ID_NAME}):(#{ALD_ANY_CHARS}+)\}\s*?\n/ | ||
EXT_STOP_STR | = | "\\{:/(%s)?\\}" | ||
EXT_START_STR | = | "\\{::(\\w+)(?:\\s(#{ALD_ANY_CHARS}*?)|)(\\/)?\\}" | ||
EXT_BLOCK_START | = | /^#{OPT_SPACE}(?:#{EXT_START_STR}|#{EXT_STOP_STR % ALD_ID_NAME})\s*?\n/ | ||
EXT_BLOCK_STOP_STR | = | "^#{OPT_SPACE}#{EXT_STOP_STR}\s*?\n" | ||
IAL_BLOCK | = | /\{:(?!:|\/)(#{ALD_ANY_CHARS}+)\}\s*?\n/ | ||
IAL_BLOCK_START | = | /^#{OPT_SPACE}#{IAL_BLOCK}/ | ||
BLOCK_EXTENSIONS_START | = | /^#{OPT_SPACE}\{:/ | ||
EXT_SPAN_START | = | /#{EXT_START_STR}|#{EXT_STOP_STR % ALD_ID_NAME}/ | ||
IAL_SPAN_START | = | /\{:(#{ALD_ANY_CHARS}+)\}/ | ||
SPAN_EXTENSIONS_START | = | /\{:/ | ||
EMPHASIS_START | = | /(?:\*\*?|__?)/ | ||
LIST_ITEM_IAL | = | /^\s*(?:\{:(?!(?:#{ALD_ID_NAME})?:|\/)(#{ALD_ANY_CHARS}+)\})\s*/ | ||
LIST_ITEM_IAL_CHECK | = | /^#{LIST_ITEM_IAL}?\s*\n/ | ||
LIST_START_UL | = | /^(#{OPT_SPACE}[+*-])([\t| ].*?\n)/ | ||
LIST_START_OL | = | /^(#{OPT_SPACE}\d+\.)([\t| ].*?\n)/ | ||
LIST_START | = | /#{LIST_START_UL}|#{LIST_START_OL}/ | ||
DEFINITION_LIST_START | = | /^(#{OPT_SPACE}:)([\t| ].*?\n)/ | ||
CODESPAN_DELIMITER | = | /`+/ | ||
HEADER_ID | = | /(?:[ \t]\{#(\w[\w-]*)\})?/ | ||
SETEXT_HEADER_START | = | /^(#{OPT_SPACE}[^ \t].*?)#{HEADER_ID}[ \t]*?\n(-|=)+\s*?\n/ | ||
ATX_HEADER_START | = | /^\#{1,6}/ | ||
ATX_HEADER_MATCH | = | /^(\#{1,6})(.+?)\s*?#*#{HEADER_ID}\s*?\n/ | ||
ABBREV_DEFINITION_START | = | /^#{OPT_SPACE}\*\[(.+?)\]:(.*?)\n/ | ||
BLANK_LINE | = | /(?:^\s*\n)+/ | ||
LINK_DEFINITION_START | = | /^#{OPT_SPACE}\[([^\n\]]+)\]:[ \t]*(?:<(.*?)>|([^'"\n]*?\S[^'"\n]*?))[ \t]*?(?:\n?[ \t]*?(["'])(.+?)\4[ \t]*?)?\n/ | ||
LINK_BRACKET_STOP_RE | = | /(\])|!?\[/ | ||
LINK_PAREN_STOP_RE | = | /(\()|(\))|\s(?=['"])/ | ||
LINK_INLINE_ID_RE | = | /\s*?\[([^\]]+)?\]/ | ||
LINK_INLINE_TITLE_RE | = | /\s*?(["'])(.+?)\1\s*?\)/m | ||
LINK_START | = | /!?\[(?=[^^])/ | ||
HR_START | = | /^#{OPT_SPACE}(\*|-|_)[ \t]*\1[ \t]*\1(\1|[ \t])*\n/ | ||
BLOCK_BOUNDARY | = | /#{BLANK_LINE}|#{EOB_MARKER}|#{IAL_BLOCK_START}|\Z/ | ||
EOB_MARKER | = | /^\^\s*?\n/ | ||
HTML_MARKDOWN_ATTR_MAP | = | {"0" => :raw, "1" => :default, "span" => :span, "block" => :block} | Mapping of markdown attribute value to content model. I.e. :raw when "0", :default when "1" (use default content model for the HTML element), :span when "span", :block when block and for everything else nil is returned. | |
TRAILING_WHITESPACE | = | /[ \t]*\n/ | ||
HTML_BLOCK_START | = | /^#{OPT_SPACE}<(#{REXML::Parsers::BaseParser::UNAME_STR}|\?|!--|\/)/ | ||
HTML_SPAN_START | = | /<(#{REXML::Parsers::BaseParser::UNAME_STR}|\?|!--|\/)/ | ||
SQ_PUNCT | = | '[!"#\$\%\'()*+,\-.\/:;<=>?\@\[\\\\\]\^_`{|}~]' | ||
SQ_CLOSE | = | %![^\ \\\\\t\r\n\\[{(-]! | ||
SQ_RULES | = | [ [/("|')(?=#{SQ_PUNCT}\B)/, [:rquote1]], # Special case for double sets of quotes, e.g.: # <p>He said, "'Quoted' words in a larger quote."</p> [/(\s?)"'(?=\w)/, [1, :ldquo, :lsquo]], [/(\s?)'"(?=\w)/, [1, :lsquo, :ldquo]], # Special case for decade abbreviations (the '80s): [/(\s?)'(?=\d\ds)/, [1, :rsquo]], # Get most opening single/double quotes: [/(\s)('|")(?=\w)/, [1, :lquote2]], # Single/double closing quotes: [/(#{SQ_CLOSE})('|")/, [1, :rquote2]], # Special case for e.g. "<i>Custer</i>'s Last Stand." [/("|')(\s|s\b|$)/, [:rquote1, 2]], # Any remaining single quotes should be opening ones: [/(.?)'/m, [1, :lsquo]], [/(.?)"/m, [1, :ldquo]], ] | ||
SQ_SUBSTS | = | { [:rquote1, '"'] => :rdquo, [:rquote1, "'"] => :rsquo, [:rquote2, '"'] => :rdquo, [:rquote2, "'"] => :rsquo, [:lquote1, '"'] => :ldquo, [:lquote1, "'"] => :lsquo, [:lquote2, '"'] => :ldquo, [:lquote2, "'"] => :lsquo, } | ||
SMART_QUOTES_RE | = | /[^\\]?["']/ | ||
BLOCKQUOTE_START | = | /^#{OPT_SPACE}> ?/ | ||
BLOCK_MATH_START | = | /^#{OPT_SPACE}(\\)?\$\$(.*?)\$\$(\s*?\n)?/m | ||
INLINE_MATH_START | = | /\$\$(.*?)\$\$/ | ||
LINE_BREAK | = | /( |\\\\)(?=\n)/ | ||
ACHARS | = | '\w\x80-\xFF' | ||
ACHARS | = | '\w' | ||
ACHARS | = | '[[:alnum:]]' | ||
AUTOLINK_START_STR | = | "<((mailto|https?|ftps?):.+?|[-.#{ACHARS}]+@[-#{ACHARS}]+(?:\.[-#{ACHARS}]+)*\.[a-z]+)>" | ||
AUTOLINK_START | = | /#{AUTOLINK_START_STR}/u | ||
AUTOLINK_START | = | /#{AUTOLINK_START_STR}/ | ||
LAZY_END_HTML_SPAN_ELEMENTS | = | HTML_SPAN_ELEMENTS + %w{script} | ||
LAZY_END_HTML_START | = | /<(?>(?!(?:#{LAZY_END_HTML_SPAN_ELEMENTS.join('|')})\b)#{REXML::Parsers::BaseParser::UNAME_STR})\s*(?>\s+#{REXML::Parsers::BaseParser::UNAME_STR}\s*=\s*(["']).*?\1)*\s*\/?>/m | ||
LAZY_END_HTML_STOP | = | /<\/(?!(?:#{LAZY_END_HTML_SPAN_ELEMENTS.join('|')})\b)#{REXML::Parsers::BaseParser::UNAME_STR}\s*>/m | ||
LAZY_END | = | /#{BLANK_LINE}|#{IAL_BLOCK_START}|#{EOB_MARKER}|^#{OPT_SPACE}#{LAZY_END_HTML_STOP}|^#{OPT_SPACE}#{LAZY_END_HTML_START}|\Z/ | ||
PARAGRAPH_START | = | /^#{OPT_SPACE}[^ \t].*?\n/ | ||
PARAGRAPH_MATCH | = | /^.*?\n/ | ||
PARAGRAPH_END | = | /#{LAZY_END}|#{DEFINITION_LIST_START}/ | ||
ESCAPED_CHARS | = | /\\([\\.*_+`<>()\[\]{}#!:|"'\$=-])/ |
Add a parser method
to the registry. The method name is automatically derived from the name or can explicitly be set by using the meth_name parameter.
This helper methods adds the approriate attributes to the element el of type a or img and the element itself to the @tree.
Parse the generic extension at the current point. The parameter type can either be :block or :span depending whether we parse a block or span extension tag.
Used for parsing the first line of a list item or a definition, i.e. the line with list item marker or the definition marker.
Parse the link at the current scanner position. This method is used to parse normal links as well as image links.
Create a new block-level element, taking care of applying a preceding block IAL if it exists. This method should always be used for creating a block-level element!
Parse all span-level elements in the source string of @src into el.
If the parameter stop_re (a regexp) is used, parsing is immediately stopped if the regexp matches and if no block is given or if a block is given and it returns true.
The parameter parsers can be used to specify the (span-level) parsing methods that should be used for parsing.
The parameter text_type specifies the type which should be used for created text nodes.
Reset the current parsing environment. The parameter env can be used to set initial values for one or more environment variables.
Update the given attributes hash attr with the information from the inline attribute list ial and all referenced ALDs.
Update the tree by parsing all :raw_text elements with the span-level parser (resets the environment) and by updating the attributes from the IALs.