Files

Ai4r::Classifiers::ID3

Introduction

This is an implementation of the ID3 algorithm (Quinlan) Given a set of preclassified examples, it builds a top-down induction of decision tree, biased by the information gain and entropy measure.

How to use it

DATA_LABELS = [ 'city', 'age_range', 'gender', 'marketing_target'  ]

DATA_ITEMS = [  
       ['New York',  '<30',      'M', 'Y'],
       ['Chicago',     '<30',      'M', 'Y'],
       ['Chicago',     '<30',      'F', 'Y'],
       ['New York',  '<30',      'M', 'Y'],
       ['New York',  '<30',      'M', 'Y'],
       ['Chicago',     '[30-50)',  'M', 'Y'],
       ['New York',  '[30-50)',  'F', 'N'],
       ['Chicago',     '[30-50)',  'F', 'Y'],
       ['New York',  '[30-50)',  'F', 'N'],
       ['Chicago',     '[50-80]', 'M', 'N'],
       ['New York',  '[50-80]', 'F', 'N'],
       ['New York',  '[50-80]', 'M', 'N'],
       ['Chicago',     '[50-80]', 'M', 'N'],
       ['New York',  '[50-80]', 'F', 'N'],
       ['Chicago',     '>80',      'F', 'Y']
     ]

data_set = DataSet.new(:data_items=>DATA_SET, :data_labels=>DATA_LABELS)
id3 = Ai4r::Classifiers::ID3.new.build(data_set)

id3.get_rules
  # =>  if age_range=='<30' then marketing_target='Y'
        elsif age_range=='[30-50)' and city=='Chicago' then marketing_target='Y'
        elsif age_range=='[30-50)' and city=='New York' then marketing_target='N'
        elsif age_range=='[50-80]' then marketing_target='N'
        elsif age_range=='>80' then marketing_target='Y'
        else raise 'There was not enough information during training to do a proper induction for this data element' end

id3.eval(['New York', '<30', 'M'])
  # =>  'Y'

A better way to load the data

In the real life you will use lot more data training examples, with more attributes. Consider moving your data to an external CSV (comma separate values) file.

data_file = "#{File.dirname(__FILE__)}/data_set.csv"
data_set = DataSet.load_csv_with_labels data_file
id3 = Ai4r::Classifiers::ID3.new.build(data_set)

A nice tip for data evaluation

id3 = Ai4r::Classifiers::ID3.new.build(data_set)

age_range = '<30'
marketing_target = nil
eval id3.get_rules   
puts marketing_target
  # =>  'Y'

More about ID3 and decision trees

About the project

Author

Sergio Fierens

License

MPL 1.1

Url

ai4r.rubyforge.org/

Constants

LOG2

Attributes

data_set[R]

Public Instance Methods

build(data_set) click to toggle source

Create a new ID3 classifier. You must provide a DataSet instance as parameter. The last attribute of each item is considered as the item class.

# File lib/ai4r/classifiers/id3.rb, line 99
def build(data_set)
  data_set.check_not_empty
  @data_set = data_set
  preprocess_data(@data_set.data_items)
  return self
end
eval(data) click to toggle source

You can evaluate new data, predicting its category. e.g.

id3.eval(['New York',  '<30', 'F'])  # => 'Y'
# File lib/ai4r/classifiers/id3.rb, line 109
def eval(data)
  @tree.value(data) if @tree
end
get_rules() click to toggle source

This method returns the generated rules in ruby code. e.g.

id3.get_rules
  # =>  if age_range=='<30' then marketing_target='Y'
        elsif age_range=='[30-50)' and city=='Chicago' then marketing_target='Y'
        elsif age_range=='[30-50)' and city=='New York' then marketing_target='N'
        elsif age_range=='[50-80]' then marketing_target='N'
        elsif age_range=='>80' then marketing_target='Y'
        else raise 'There was not enough information during training to do a proper induction for this data element' end

It is a nice way to inspect induction results, and also to execute them:

age_range = '<30'
marketing_target = nil
eval id3.get_rules   
puts marketing_target
  # =>  'Y'
# File lib/ai4r/classifiers/id3.rb, line 130
def get_rules
  #return "Empty ID3 tree" if !@tree
  rules = @tree.get_rules
  rules = rules.collect do |rule|
      "#{rule[0..-2].join(' and ')} then #{rule.last}"
  end
  return "if #{rules.join("\nelsif ")}\nelse raise 'There was not enough information during training to do a proper induction for this data element' end"
end

[Validate]

Generated with the Darkfish Rdoc Generator 2.