Learn more about Platform products at http://www.platform.com

[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]



MultiCluster Overview


This section describes the Platform LSF MultiCluster product ("MultiCluster"), its features and benefits.

Contents

[ Top ]


Benefits of MultiCluster

Within an organization, sites may have separate, independently managed LSF clusters. Having multiple LSF clusters could solve problems related to:

When you have more than one cluster, it is desirable to allow the clusters to cooperate to reap the following benefits of global load sharing:

MultiCluster enables a large organization to form multiple cooperating clusters of computers so that load sharing happens not only within clusters, but also among them. MultiCluster enables:

[ Top ]


Two MultiCluster Models

There are two different ways to share resources between clusters using MultiCluster. These models can be combined, for example, Cluster1 forwards jobs to Cluster2 using the job forwarding model, and Cluster2 borrows resources from Cluster3 using the resource leasing model.

Job forwarding model

In this model, the cluster that is starving for resources sends jobs over to the cluster that has resources to spare. To work together, two clusters must set up compatible send-jobs and receive-jobs queues.

With this model, scheduling of MultiCluster jobs is a process with two scheduling phases: the submission cluster selects a suitable remote receive-jobs queue, and forwards the job to it; then the execution cluster selects a suitable host and dispatches the job to it. This method automatically favors local hosts; a MultiCluster send-jobs queue always attempts to find a suitable local host before considering a receive-jobs queue in another cluster.

Resource leasing model

In this model, the cluster that is starving for resources takes resources away from the cluster that has resources to spare. To work together, the provider cluster must "export" resources to the consumer, and the consumer cluster must configure a queue to use those resources.

In this model, each cluster schedules work on a single system image, which includes both borrowed hosts and local hosts.

Choosing a model

Consider your own goals and priorities when choosing the best resource-sharing model for your site.

Resizable jobs

Resizable jobs across MultiCluster clusters is not supported. This implies following behaviors:

Only bresize release is supported in the job forwarding model from execution cluster:

[ Top ]


Testing the Resource Leasing Model

The following instructions explain how to configure the lease model on two clusters. Cluster2 will be the resource provider; it will export hosts to cluster1.

  1. In the provider cluster, edit the LSF_TOP/conf/lsbatch/cluster_name/configdir/lsb.resources file to specify the hosts to be exported.

    For example, in cluster2, one job slot each on hostE and hostF will be exported and can be used by cluster1:

    Begin HostExport
    PER_HOST     = hostE hostF
    SLOTS        = 1
    DISTRIBUTION = ([cluster1, 100])
    End HostExport
    
  2. Reconfigure the cluster:
    % badmin reconfig
    
  3. Use the bclusters command to make sure the cluster is configured correctly.

    For example, in cluster2:

    % bclusters
    ...
    [Resourse Lease Information]
    REMOTE_CLUSTER  RESOURCE_FLOW   STATUS
    cluster1        EXPORT          conn
    
  4. In the consumer cluster, edit the LSF_TOP/conf/lsbatch/cluster_name/configdir/lsb.queues file and add a queue that will use the hosts borrowed from the provider cluster as if they were local resources. For example, in cluster1:
    Begin Queue
    QUEUE_NAME  = ssimodel
    HOSTS       = all@cluster2
    DESCRIPTION = Jobs in this queue will use cluster2 hosts
    End Queue
    
  5. Reconfigure the cluster:
    % badmin reconfig
    
  6. Use the bclusters command to make sure the queue is configured correctly.

    For example, in cluster1:

    % bclusters
    ...
    [Resourse Lease Information ]
    REMOTE_CLUSTER  RESOURCE_FLOW   STATUS
    cluster2        IMPORT          conn
    
  7. Submit a job to the queue in cluster1. It must run on a host borrowed from cluster2.

Example

Submit a job to the queue named ssimodel in cluster1:

% bsub -q ssimodel -R "type==any" sleep 500
Job <204> is submitted to queue <ssimodel>.

% bjobs
JOBID   USER    STAT  QUEUE    FROM_HOST        EXEC_HOST      JOB_NAME   SUBMI
T_TIME
204     user1   RUN   ssimodel hostA            hostE@cluster2 sleep 500  Nov 
13 12:15

% bhosts
HOST_NAME          STATUS       JL/U   MAX  NJOBS   RUN  SSUSP  USUSP    RSV
hostE@cluster2     ok         -      1    1       1    0      0        0
hostA              ok          -      -    0       0    0      0        0

You can also view this job from cluster2, where it has a different job ID:

% bjobs
JOBID  USER  STAT  QUEUE            FROM_HOST       EXEC_HOST  JOB_NAME   SUBMI
T_TIME
854   user1  RUN  ssimodel@cluster1 hostA@cluster1  hostE      sleep 500  Nov 
13 12:15

[ Top ]


Testing the Job Forwarding Model

The following instructions explain how to configure the job forwarding model on two clusters. Cluster2 will be the execution cluster; it will run jobs for cluster1.

  1. In the submission cluster, edit the LSF_TOP/conf/lsbatch/cluster_name/configdir/lsb.queues file and add a queue to send jobs to the execution cluster.

    For example, configure a queue called sendq in cluster1 that will send all jobs to execute in cluster2:

    Begin Queue
    QUEUE_NAME  = sendq
    SNDJOBS_TO  = receiveq@cluster2
    HOSTS       = none
    DESCRIPTION = Jobs submitted to this queue will be run in 
    cluster2
    End Queue
    

    HOSTS = none specifies that this queue cannot place jobs on any local hosts.

  2. Reconfigure the cluster:
    % badmin reconfig
    
  3. In the execution cluster, edit the LSF_TOP/conf/lsbatch/cluster_name/configdir/lsb.queues file and add a queue to receive jobs sent from the submission cluster.

    For example, configure a queue called receiveq in cluster2 that will receive jobs from cluster1:

    Begin Queue
    QUEUE_NAME   = receiveq
    PRIORITY     = 40
    RCVJOBS_FROM = cluster1
    End Queue
    
  4. Reconfigure the cluster:
    % badmin reconfig
    
  5. Use the bclusters command to make sure the queues are configured correctly.

    For example, in cluster1:

    % bclusters
    LOCAL_QUEUE  JOB_FLOW  REMOTE  CLUSTER  STATUS
    sendq       send      receiveq  cluster2 ok
    

    For example, in cluster2:

    % bclusters
    LOCAL_QUEUE  JOB_FLOW  REMOTE  CLUSTER  STATUS
    receiveq       recv      -          cluster1 ok
    
  6. Submit a job to make sure the queues are configured correctly.

Example

Submit a job to cluster1 that will run in cluster2:

% bsub -q sendq -R "type==any" sleep 500
Job <103> is submitted to queue <sendq>.

% bjobs
JOBID  USER  STAT  QUEUE   FROM_HOST       EXEC_HOST       JOB_NAME   SUBMIT_TI
ME
103    user1 RUN   sendq  hostA           hostE@cluster2  sleep 500  Nov 13 
11:44

You can also view this job from cluster2, where it has a new job ID:

JOBID USER  STAT QUEUE         FROM_HOST       EXEC_HOST  JOB_NAME   SUBMIT_TIM
E
899   user1 RUN  receiveq      hostA@cluster1  hostE      sleep 500  Nov 13 
11:44

[ Top ]


[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]


      Date Modified: March 13, 2009
Platform Computing: www.platform.com

Platform Support: support@platform.com
Platform Information Development: doc@platform.com

Copyright © 1994-2009 Platform Computing Corporation. All rights reserved.