These are extensions to the String class to provide convenience methods for the Classifier package.
Author |
Lucas Carlson (lucas@rufy.com) |
Copyright |
Copyright (c) 2005 Lucas Carlson |
License |
LGPL |
Return a word hash without extra punctuation or short symbols, just stemmed words
# File lib/classifier/extensions/word_hash.rb, line 24 def clean_word_hash word_hash_for_words gsub(/[^\w\s]/,"").split end
# File lib/classifier/lsi/summary.rb, line 10 def paragraph_summary( count=1, separator=" [...] " ) perform_lsi split_paragraphs, count, separator end
# File lib/classifier/lsi/summary.rb, line 18 def split_paragraphs split /(\n\n|\r\r|\r\n\r\n)/ # TODO: make this less primitive end
# File lib/classifier/lsi/summary.rb, line 14 def split_sentences split /(\.|\!|\?)/ # TODO: make this less primitive end
# File lib/classifier/lsi/summary.rb, line 6 def summary( count=10, separator=" [...] " ) perform_lsi split_sentences, count, separator end
Removes common punctuation symbols, returning a new string. E.g.,
"Hello (greeting's), with {braces} < >...?".without_punctuation => "Hello greetings with braces "
# File lib/classifier/extensions/word_hash.rb, line 13 def without_punctuation tr( ',?.!;:"@#$%^&*()_=+[]{}\|<>/`~', " " ) .tr( "'\-", "") end
Return a Hash of strings => ints. Each word in the string is stemmed, interned, and indexes to its frequency in the document.
# File lib/classifier/extensions/word_hash.rb, line 19 def word_hash word_hash_for_words(gsub(/[^\w\s]/,"").split + gsub(/[\w]/," ").split) end
Generated with the Darkfish Rdoc Generator 2.