Parent

Files

ClassifierReborn::Bayes

Public Class Methods

new(*categories) click to toggle source

The class can be created with one or more categories, each of which will be initialized and given a training method. E.g.,

b = ClassifierReborn::Bayes.new 'Interesting', 'Uninteresting', 'Spam'
# File lib/classifier-reborn/bayes.rb, line 12
def initialize(*categories)
  @categories = Hash.new
  categories.each { |category| @categories[category.prepare_category_name] = Hash.new }
  @total_words = 0
  @category_counts = Hash.new(0)
end

Public Instance Methods

add_category(category) click to toggle source

Allows you to add categories to the classifier. For example:

b.add_category "Not spam"

WARNING: Adding categories to a trained classifier will result in an undertrained category that will tend to match more criteria than the trained selective categories. In short, try to initialize your categories at initialization.

# File lib/classifier-reborn/bayes.rb, line 122
def add_category(category)
  @categories[category.prepare_category_name] = Hash.new
end
Also aliased as: append_category
append_category(category) click to toggle source
Alias for: add_category
classifications(text) click to toggle source

Returns the scores in each category the provided text. E.g.,

b.classifications "I hate bad words and you"
=>  {"Uninteresting"=>-12.6997928013932, "Interesting"=>-18.4206807439524}

The largest of these scores (the one closest to 0) is the one picked out by classify

# File lib/classifier-reborn/bayes.rb, line 63
def classifications(text)
  score = Hash.new
  training_count = @category_counts.values.inject { |x,y| x+y }.to_f
  @categories.each do |category, category_words|
    score[category.to_s] = 0
    total = category_words.values.inject(0) {|sum, element| sum+element}
    Hasher.word_hash(text).each do |word, count|
      s = category_words.has_key?(word) ? category_words[word] : 0.1
      score[category.to_s] += Math.log(s/total.to_f)
    end
    # now add prior probability for the category
    s = @category_counts.has_key?(category) ? @category_counts[category] : 0.1
    score[category.to_s] += Math.log(s / training_count)
  end
  return score
end
classify(text) click to toggle source

Returns the classification of the provided text, which is one of the categories given in the initializer. E.g.,

b.classify "I hate bad words and you"
=>  'Uninteresting'
# File lib/classifier-reborn/bayes.rb, line 84
def classify(text)
  (classifications(text).sort_by { |a| -a[1] })[0][0]
end
method_missing(name, *args) click to toggle source

Provides training and untraining methods for the categories specified in Bayes#new For example:

b = ClassifierReborn::Bayes.new 'This', 'That', 'the_other'
b.train_this "This text"
b.train_that "That text"
b.untrain_that "That text"
b.train_the_other "The other text"
# File lib/classifier-reborn/bayes.rb, line 95
def method_missing(name, *args)
  category = name.to_s.gsub(/(un)?train_([\w]+)/, '\2').prepare_category_name
  if @categories.has_key? category
    args.each { |text| eval("#{$1}train(category, text)") }
  elsif name.to_s =~ /(un)?train_([\w]+)/
    raise StandardError, "No such category: #{category}"
  else
    super  #raise StandardError, "No such method: #{name}"
  end
end
train(category, text) click to toggle source

Provides a general training method for all categories specified in Bayes#new For example:

b = ClassifierReborn::Bayes.new 'This', 'That', 'the_other'
b.train :this, "This text"
b.train "that", "That text"
b.train "The other", "The other text"
# File lib/classifier-reborn/bayes.rb, line 25
def train(category, text)
  category = category.prepare_category_name
              @category_counts[category] += 1
  Hasher.word_hash(text).each do |word, count|
    @categories[category][word]     ||=     0
    @categories[category][word]      +=     count
    @total_words += count
  end
end
untrain(category, text) click to toggle source

Provides a untraining method for all categories specified in Bayes#new Be very careful with this method.

For example:

b = ClassifierReborn::Bayes.new 'This', 'That', 'the_other'
b.train :this, "This text"
b.untrain :this, "This text"
# File lib/classifier-reborn/bayes.rb, line 42
def untrain(category, text)
  category = category.prepare_category_name
  @category_counts[category] -= 1
  Hasher.word_hash(text).each do |word, count|
    if @total_words >= 0
      orig = @categories[category][word] || 0
      @categories[category][word] ||= 0
      @categories[category][word] -= count
      if @categories[category][word] <= 0
        @categories[category].delete(word)
        count = orig
      end
      @total_words -= count
    end
  end
end

[Validate]

Generated with the Darkfish Rdoc Generator 2.