Files

Ai4r::Classifiers::NaiveBayes

= Introduction

This is an implementation of a Naive Bayesian Classifier without any
specialisation (ie. for text classification)
Probabilities P(a_i | v_j) are estimated using m-estimates, hence the
m parameter as second parameter when isntantiating the class.
The estimation looks like this:

(n_c + mp) / (n + m)

the variables are:
n = the number of training examples for which v = v_j
n_c = number of examples for which v = v_j and a = a_i
p = a priori estimate for P(a_i | v_j)
m = the equivalent sample size

stores the conditional probabilities in an array named @pcp and in this form:
@pcp[attributes][values][classes]

This kind of estimator is useful when the training data set is relatively small.
If the data set is big enough, set it to 0, which is also the default value

For further details regarding Bayes and Naive Bayes Classifier have a look at those websites:
http://en.wikipedia.org/wiki/Naive_Bayesian_classification
http://en.wikipedia.org/wiki/Bayes%27_theorem

= Parameters

* :m => Optional. Default value is set to 0. It may be set to a value greater than 0 when
the size of the dataset is relatively small

= How to use it

  data = DataSet.new.load_csv_with_labels "bayes_data.csv"
  b = NaiveBayes.new.
    set_parameters({:m=>3}).
    build data
  b.eval(["Red", "SUV", "Domestic"])

Public Class Methods

new() click to toggle source
# File lib/ai4r/classifiers/naive_bayes.rb, line 62
def initialize
  @m = 0
  @class_counts = []
  @class_prob = [] # stores the probability of the classes
  @pcc = [] # stores the number of instances divided into attribute/value/class
  @pcp = [] # stores the conditional probabilities of the values of an attribute
  @klass_index = {} # hashmap for quick lookup of all the used klasses and their indice
  @values = {} # hashmap for quick lookup of all the values
end

Public Instance Methods

build(data) click to toggle source

counts values of the attribute instances and calculates the probability of the classes and the conditional probabilities Parameter data has to be an instance of CsvDataSet

# File lib/ai4r/classifiers/naive_bayes.rb, line 103
def build(data)
  raise "Error instance must be passed" unless data.is_a?(DataSet)
  raise "Data should not be empty" if data.data_items.length == 0

  initialize_domain_data(data)
  initialize_klass_index
  initialize_pc
  calculate_probabilities

  return self
end
eval(data) click to toggle source

You can evaluate new data, predicting its category. e.g.

b.eval(["Red", "SUV", "Domestic"])
  => 'No'
# File lib/ai4r/classifiers/naive_bayes.rb, line 76
def eval(data)
  prob = @class_prob.map {|cp| cp}
  prob = calculate_class_probabilities_for_entry(data, prob)
  index_to_klass(prob.index(prob.max))
end
get_probability_map(data) click to toggle source

Calculates the probabilities for the data entry Data. data has to be an array of the same dimension as the training data minus the class column. Returns a map containint all classes as keys: {Class_1 => probability, Class_2 => probability2 ... } Probability is <= 1 and of type Float. e.g.

b.get_probability_map(["Red", "SUV", "Domestic"])
  => {"Yes"=>0.4166666666666667, "No"=>0.5833333333333334}
# File lib/ai4r/classifiers/naive_bayes.rb, line 91
def get_probability_map(data)
  prob = @class_prob.map {|cp| cp}
  prob = calculate_class_probabilities_for_entry(data, prob)
  prob = normalize_class_probability prob
  probability_map = {}
  prob.each_with_index { |p, i| probability_map[index_to_klass(i)] = p }
  return probability_map
end

[Validate]

Generated with the Darkfish Rdoc Generator 2.