Return a word hash without extra punctuation or short symbols, just stemmed words
# File lib/classifier-reborn/extensions/hasher.rb, line 28 def clean_word_hash(str) word_hash_for_words str.gsub(/[^\w\s]/,"").split end
Removes common punctuation symbols, returning a new string. E.g.,
"Hello (greeting's), with {braces} < >...?".without_punctuation => "Hello greetings with braces "
# File lib/classifier-reborn/extensions/hasher.rb, line 15 def without_punctuation(str) str .tr( ',?.!;:"@#$%^&*()_=+[]{}\|<>/`~', " " ) .tr( "'\-", "") end
Return a Hash of strings => ints. Each word in the string is stemmed, interned, and indexes to its frequency in the document.
# File lib/classifier-reborn/extensions/hasher.rb, line 21 def word_hash(str) word_hash = clean_word_hash(str) symbol_hash = word_hash_for_symbols(str.gsub(/[\w]/," ").split) return clean_word_hash(str).merge(symbol_hash) end
Generated with the Darkfish Rdoc Generator 2.