Parent

Files

RightAws::EmrInterface

RightAWS::EmrInterface -- RightScale Amazon Elastic Map Reduce interface

The RightAws::EmrInterface class provides a complete interface to Amazon Elastic Map Reduce service.

For explanations of the semantics of each call, please refer to Amazon's documentation at aws.amazon.com/documentation/elasticmapreduce/

Create an interface handle:

emr = RightAws::EmrInterface.new(aws_access_key_id, aws_secret_access_key)

Create a job flow:

emr.run_job_flow(
  :name => 'job flow 1',
  :master_instance_type => 'm1.large',
  :slave_instance_type => 'm1.large',
  :instance_count => 5,
  :log_uri => 's3n://bucket/path/to/logs',
  :steps => [{
    :name => 'step 1',
    :jar => 's3n://bucket/path/to/code.jar',
    :main_class => 'com.foobar.emr.Step1',
    :args => ['arg', 'arg'],
  }]) #=> "j-9K18HM82Q0AE7"

Describe a job flow:

emr.describe_job_flows('j-9K18HM82Q0AE7') #=> {...}

Terminate a job flow:

emr.terminate_job_flows('j-9K18HM82Q0AE7') #=> true

Public Class Methods

bench_service() click to toggle source
# File lib/emr/right_emr_interface.rb, line 76
def self.bench_service
  @@bench.service
end
bench_xml() click to toggle source
# File lib/emr/right_emr_interface.rb, line 73
def self.bench_xml
  @@bench.xml
end
new(aws_access_key_id=nil, aws_secret_access_key=nil, params={}) click to toggle source

Create a new handle to a EMR service.

All handles share the same per process or per thread HTTP connection to EMR. Each handle is for a specific account. The params have the following options:

  • :endpoint_url a fully qualified url to Amazon API endpoint (this overwrites: :server, :port, :service, :protocol). Example: 'elasticmapreduce.amazonaws.com'

  • :server: EMR service host, default: DEFAULT_HOST

  • :port: EMR service port, default: DEFAULT_PORT

  • :protocol: 'http' or 'https', default: DEFAULT_PROTOCOL

  • :logger: for log messages, default: RAILS_DEFAULT_LOGGER else STDOUT

emr = RightAws::EmrInterface.new('xxxxxxxxxxxxxxxxxxxxx','xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx',
  {:logger => Logger.new('/tmp/x.log')}) #=> #<RightAws::EmrInterface::0xb7b3c30c>
# File lib/emr/right_emr_interface.rb, line 97
def initialize(aws_access_key_id=nil, aws_secret_access_key=nil, params={})
  init({ :name                => 'EMR',
         :default_host        => ENV['EMR_URL'] ? URI.parse(ENV['EMR_URL']).host   : DEFAULT_HOST,
         :default_port        => ENV['EMR_URL'] ? URI.parse(ENV['EMR_URL']).port   : DEFAULT_PORT,
         :default_service     => ENV['EMR_URL'] ? URI.parse(ENV['EMR_URL']).path   : DEFAULT_PATH,
         :default_protocol    => ENV['EMR_URL'] ? URI.parse(ENV['EMR_URL']).scheme : DEFAULT_PROTOCOL,
         :default_api_version => ENV['EMR_API_VERSION'] || API_VERSION },
       aws_access_key_id    || ENV['AWS_ACCESS_KEY_ID'] ,
       aws_secret_access_key|| ENV['AWS_SECRET_ACCESS_KEY'],
       params)
end

Public Instance Methods

add_instance_groups(job_flow_id, *instance_groups) click to toggle source

Adds instance groups to a running job flow.

Instance group configuration options are the same as the ones accepted by run_job_flow.

Only task instance groups may be added at runtime. Instance groups cannot be added to job flows that have only a master instance (i.e. 1 instance in total).

emr.add_instance_groups('j-2QE0KHA1LP4GS', {
  :bid_price => '0.1',
  :instance_count => '2',
  :instance_role => 'TASK',
  :instance_type => 'm1.small',
  :market => 'SPOT',
  :name => 'core group',
}) #=> true
# File lib/emr/right_emr_interface.rb, line 460
def add_instance_groups(job_flow_id, *instance_groups)
  request_hash = amazonize_instance_groups(instance_groups, 'InstanceGroups')
  request_hash['JobFlowId'] = job_flow_id
  link = generate_request("AddInstanceGroups", request_hash)
  request_info(link, AddInstanceGroupsParser.new(:logger => @logger))
rescue
  on_exception
end
add_job_flow_steps(job_flow_id, *steps) click to toggle source

Adds steps to a running job flow.

A maximum of 256 steps are allowed in a job flow. Steps can only be added to job flows that are starting, bootstrapping, running or waiting.

Step configuration options are the same as the ones accepted by run_job_flow.

emr.add_job_flow_steps('j-2QE0KHA1LP4GS', {
  :name => 'step 1',
  :jar => 's3n://bucket/path/to/code.jar',
  :main_class => 'com.foobar.emr.Step1',
  :args => ['arg', 'arg'],
  :properties => {
    'property' => 'value',
  },
  :action_on_failure => 'TERMINATE_JOB_FLOW',
}) #=> true
# File lib/emr/right_emr_interface.rb, line 429
def add_job_flow_steps(job_flow_id, *steps)
  request_hash = amazonize_steps(steps)
  request_hash['JobFlowId'] = job_flow_id
  link = generate_request("AddJobFlowSteps", request_hash)
  request_info(link, RightHttp2xxParser.new(:logger => @logger))
rescue
  on_exception
end
describe_job_flows(*job_flow_ids_and_options) click to toggle source

Returns a list of job flows that match all of supplied parameters.

Without parameters, returns job flows started in the last two weeks or running job flows started in the last two months.

Regardless of parameters, only jobs started in the last two months are returned.

# default list:
emr.describe_job_flows #=> [
  {:keep_job_flow_alive_when_no_steps=>false,
    :log_uri=>"s3n://bucket/path/to/logs",
    :master_instance_type=>"m1.small",
    :availability_zone=>"us-east-1d",
    :last_state_change_reason=>"Steps completed",
    :termination_protected=>false,
    :master_instance_id=>"i-1fe51278",
    :instance_count=>1,
    :ready_date_time=>"2011-08-31T18:58:58Z",
    :bootstrap_actions=>[],
    :master_public_dns_name=>"ec2-184-78-29-127.compute-1.amazonaws.com",
    :instance_groups=>
     [{:instance_request_count=>1,
       :last_state_change_reason=>"Job flow terminated",
       :instance_role=>"MASTER",
       :ready_date_time=>"2011-08-31T18:58:56Z",
       :instance_running_count=>0,
       :start_date_time=>"2011-08-31T18:58:19Z",
       :market=>"ON_DEMAND",
       :creation_date_time=>"2011-08-31T18:55:36Z",
       :name=>"master",
       :instance_group_id=>"ig-1D91GQR7A9H2K",
       :state=>"ENDED",
       :instance_type=>"m1.small",
       :end_date_time=>"2011-08-31T19:01:09Z"}],
    :start_date_time=>"2011-08-31T18:58:58Z",
    :steps=>
     [{:jar=>"s3n://bucket/path/to/code.jar",
       :main_class=>"com.foobar.emr.Step1",
       :start_date_time=>"2011-08-31T18:58:58Z",
       :properties=>{},
       :args=>[],
       :creation_date_time=>"2011-08-31T18:55:36Z",
       :action_on_failure=>"TERMINATE_JOB_FLOW",
       :name=>"step 1",
       :state=>"COMPLETED",
       :end_date_time=>"2011-08-31T19:00:34Z"}],
    :normalized_instance_hours=>1,
    :ami_version=>"1.0",
    :creation_date_time=>"2011-08-31T18:55:36Z",
    :name=>"jobflow 1",
    :hadoop_version=>"0.18",
    :job_flow_id=>"j-9K18HM82Q0AE7",
    :state=>"COMPLETED",
    :end_date_time=>"2011-08-31T19:01:09Z"}]

# describe specific job flows:
emr.describe_job_flows('j-9K18HM82Q0AE7', 'j-2QE0KHA1LP4GS') #=> [...]

# specify parameters:
emr.describe_job_flows(
  :created_after => Time.now - 86400,
  :created_before => Time.now - 3600,
  :job_flow_ids => ['j-9K18HM82Q0AE7', 'j-2QE0KHA1LP4GS'],
  :job_flow_states => ['RUNNING']
) #=> [...]

# combined job flow list and parameters syntax:
emr.describe_job_flows('j-9K18HM82Q0AE7', 'j-2QE0KHA1LP4GS',
  :job_flow_states => ['RUNNING']
) #=> [...]
# File lib/emr/right_emr_interface.rb, line 327
def describe_job_flows(*job_flow_ids_and_options)
  job_flow_ids, options = AwsUtils::split_items_and_params(job_flow_ids_and_options)
  # merge job flow ids passed in as arguments and in options
  unless job_flow_ids.empty?
    # do not modify passed in options
    options = options.dup
    if job_flow_ids_in_options = options[:job_flow_ids]
      # allow the same ids to be passed in either location;
      # remove duplicates
      options[:job_flow_ids] = (job_flow_ids_in_options + job_flow_ids).uniq
    else
      options[:job_flow_ids] = job_flow_ids
    end
  end
  request_hash = {}
  unless (job_flow_ids = options[:job_flow_ids]).right_blank?
    request_hash.update(amazonize_list("JobFlowIds.member", job_flow_ids))
  end
  unless (job_flow_states = options[:job_flow_states]).right_blank?
    request_hash = amazonize_list("JobFlowStates.member", job_flow_states)
  end
  request_hash['CreatedAfter'] = AwsUtils::utc_iso8601(options[:created_after]) unless options[:created_after].right_blank?
  request_hash['CreatedBefore'] = AwsUtils::utc_iso8601(options[:created_before]) unless options[:created_before].right_blank?
  link = generate_request("DescribeJobFlows", request_hash)
  request_cache_or_info(:describe_job_flows, link,  DescribeJobFlowsParser, @@bench, nil)
rescue
  on_exception
end
modify_instance_groups(*args) click to toggle source

Modifies instance groups.

The only modifiable parameter is instance count.

An instance group may only be modified when the job flow is running or waiting. Additionally, hadoop 0.20 is required to resize job flows.

# general syntax
emr.modify_instance_groups(
  {:instance_group_id => 'ig-P2OPM2L9ZQ4P', :instance_count => 5},
  {:instance_group_id => 'ig-J82ML0M94A7E', :instance_count => 1}
) #=> true

# shortcut syntax
emr.modify_instance_groups('ig-P2OPM2L9ZQ4P', 5) #=> true

Shortcut syntax supports modifying only one instance group at a time.

# File lib/emr/right_emr_interface.rb, line 492
def modify_instance_groups(*args)
  unless args.first.is_a?(Hash)
    if args.length != 2
      raise ArgumentError, "Must be given two arguments if arguments are not hashes"
    end
    args = [{:instance_group_id => args.first, :instance_count => args.last}]
  end
  request_hash = amazonize_list_with_key_mapping('InstanceGroups.member', MODIFY_INSTANCE_GROUP_KEY_MAPPINGS, args)
  link = generate_request("ModifyInstanceGroups", request_hash)
  request_info(link, RightHttp2xxParser.new(:logger => @logger))
rescue
  on_exception
end
run_job_flow(options={}) click to toggle source

Creates and starts running a new job flow.

The job flow will run the steps specified and terminate (unless keep alive option is set).

A maximum of 256 steps are allowed in a job flow.

At least the name, instance types, instance count and one step must be specified.

# simple usage:
emr.run_job_flow(
  :name => 'job flow 1',
  :master_instance_type => 'm1.large',
  :slave_instance_type => 'm1.large',
  :instance_count => 5,
  :log_uri => 's3n://bucket/path/to/logs',
  :steps => [{
    :name => 'step 1',
    :jar => 's3n://bucket/path/to/code.jar',
    :main_class => 'com.foobar.emr.Step1',
    :args => ['arg', 'arg'],
  }]) #=> "j-9K18HM82Q0AE7"

# advanced usage:
emr.run_job_flow(
  :name => 'job flow 1',
  :ec2_key_name => 'gsg-keypair',
  :hadoop_version => '0.20',
  :instance_groups => [{
    :bid_price => '0.1',
    :instance_count => '1',
    :instance_role => 'MASTER',
    :instance_type => 'm1.small',
    :market => 'SPOT',
    :name => 'master group',
  }, {
    :bid_price => '0.1',
    :instance_count => '2',
    :instance_role => 'CORE',
    :instance_type => 'm1.small',
    :market => 'SPOT',
    :name => 'core group',
  }, {
    :bid_price => '0.1',
    :instance_count => '2',
    :instance_role => 'TASK',
    :instance_type => 'm1.small',
    :market => 'SPOT',
    :name => 'task group',
  }],
  :keep_job_flow_alive_when_no_steps => true,
  :availability_zone => 'us-east-1a',
  :termination_protected => true,
  :log_uri => 's3n://bucket/path/to/logs',
  :steps => [{
    :name => 'step 1',
    :jar => 's3n://bucket/path/to/code.jar',
    :main_class => 'com.foobar.emr.Step1',
    :args => ['arg', 'arg'],
    :properties => {
      'property' => 'value',
    },
    :action_on_failure => 'TERMINATE_JOB_FLOW',
  }],
  :additional_info => '',
  :bootstrap_actions => [{
    :name => 'bootstrap action 1',
    :path => 's3n://bucket/path/to/bootstrap',
    :args => ['hello', 'world'],
  }],
) #=> "j-9K18HM82Q0AE7"
# File lib/emr/right_emr_interface.rb, line 244
def run_job_flow(options={})
  request_hash = amazonize_run_job_flow(options)
  request_hash.update(amazonize_bootstrap_actions(options[:bootstrap_actions]))
  request_hash.update(amazonize_instance_groups(options[:instance_groups]))
  request_hash.update(amazonize_steps(options[:steps]))
  link = generate_request("RunJobFlow", request_hash)
  request_info(link, RunJobFlowParser.new(:logger => @logger))
rescue
  on_exception
end
set_termination_protection(*job_flow_ids_and_options) click to toggle source

Locks a job flow so the EC2 instances in the cluster cannot be terminated by user intervention, an API call, or in the event of a job flow error. Cluster will still terminate upon successful completion of the job flow.

emr.set_termination_protection(
  'j-9K18HM82Q0AE7', 'j-2QE0KHA1LP4GS', :termination_protected => true
) #=> true

Protection can be enabled using the shortcut syntax:

emr.set_termination_protection('j-9K18HM82Q0AE7') #=> true
# File lib/emr/right_emr_interface.rb, line 380
def set_termination_protection(*job_flow_ids_and_options)
  job_flow_ids, options = AwsUtils::split_items_and_params(job_flow_ids_and_options)
  request_hash = amazonize_list('JobFlowIds.member', job_flow_ids)
  request_hash['TerminationProtected'] = case value = options[:termination_protected]
  when true
    'true'
  when false
    'false'
  when nil
    # if :termination_protected => nil was given, then unprotect;
    # if no :termination_protected option was given, protect
    if options.has_key?(:termination_protected)
      'false'
    else
      'true'
    end
  else
    # pass value through
    value
  end
  link = generate_request("SetTerminationProtection", request_hash)
  request_info(link, RightHttp2xxParser.new(:logger => @logger))
rescue
  on_exception
end
terminate_job_flows(*job_flow_ids) click to toggle source

Terminates specified job flows.

emr.terminate_job_flows('j-9K18HM82Q0AE7') #=> true
# File lib/emr/right_emr_interface.rb, line 360
def terminate_job_flows(*job_flow_ids)
  link = generate_request("TerminateJobFlows", amazonize_list('JobFlowIds.member', job_flow_ids))
  request_info(link, RightHttp2xxParser.new(:logger => @logger))
rescue
  on_exception
end

[Validate]

Generated with the Darkfish Rdoc Generator 2.