[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]
- About External Scheduler Plugin
- Writing an External Scheduler Plugin
- Building the External Scheduler Plugin
- Enabling and Using the External Scheduler Plugin
- Debugging the External Scheduling Plugin
[ Top ]
About External Scheduler Plugin
The default scheduler plugin modules provided by LSF may not satisfy all the particular scheduling policies you need. You can use the LSF scheduler plugin API to customize existing scheduling policies or implement new ones that can operate with existing LSF scheduler plugin modules.
- Certain scheduling policies can be implemented based on the specific requirements of your site.
- Customized policies can be incorporated with other LSF features to provide seamless behavior. Your custom scheduling policy can influence, modify, or override LSF scheduling decisions.
- Your plugin can take advantage of the load and host information already maintained by LSF.
- The scheduler plugin architecture is fully external and modular; new scheduling policies can be prototyped and deployed without having to change the compiled code of LSF.
Sample plugin code
Sample code for an example external scheduler plugin, and information about writing, building, and configuring your own custom scheduler plugin is located in:
LSF_TOP/7/misc/examples/external_plugin/[ Top ]
Writing an External Scheduler Plugin
Scheduling policies can be applied into two phases of a scheduling cycle: match phase and allocation phase.
Match/sort phase
In match phase, scheduler prepares candidate hosts for jobs. All jobs with the same resource requirements share the same candidate hosts. The plugin at this phase can decide which host is eligible for future consideration. If the host is not eligible for the job, it is removed from the candidate host list. At the same time, the plugin associates a pending reason with the removed host, which will be shown by the
bjobs
command.Finally, the plugin can decide which candidate host should be considered first in future.
The plugin in this phase provides two functions:
Doing filtering on candidate hosts
Doing ordering on candidate hosts
Input and output of match phase
The input/output of this phase are candHostGroupList and PendingReasonTable. Candidate hosts are divided into several groups. Jobs can only use hosts from one of candHostGroup in the candHostGroupList.
The plugin filters the candHostGroups in candHostGroupList, removes the ineligible hosts from the group, and sets the pending reason in the PendingReasonTable.
Plugin Invocation
Since each plugin does match/sort based on certain resource requirements, it decides which host is qualified and which should be first based on certain kinds of resource requirements. The scheduler organizes the Match() and Sort() into the handler of each resource requirement.
After the handler is created, all that plugin needs to do is to register it to scheduler framework. Then it is the scheduler framework's responsibility to call each handler doing match and sort and handling each specific resource requirement.
When the plugin registers the handler, a resource criteria type is associated with the handler. The Criteria Type indicates which kind of resource requirement the handler is handling.
Handler functions
Together with Match() and Sort(), there are other two handler functions:
Gets the user-specific resource requirements string, parses it, creates the handler- specific data, and finally attaches the data to related resource requirement.
Frees the handler-specific data when not needed.
See
sched_api.h
for details.Implementing match phase
See
sch.mod.matchexample.c
for details.Define resource criteria type, handler-specific data, and user specific pending reason.
The criteria type indicates the kind of resource requirement the handler is handling. Usually, the external plugin handler only handles external resource requirement (string) which is specified through
bsub
command using the-extsched
option.In order to use
-extsched
, you must set LSF_ENABLE_EXTSCHEDULER=y inlsf.conf
.New() function parses the external resource requirement string, and stores the parsed resource to handler-specific data.
handler-specific data is a container used to store any data which is needed by the handler.
If the plugin needs to set a user specific pending reason, a pending reason ID needs to be defined. See lsb_reason_set() in
sched_api.h
for more information.Implement handler functions: New(), Free(), Match(), and Sort().
- New():
- Free():
Free whatever in handler-specific data.
- Match(): (handler-specific data is passed in)
- Sort(): (handler-specific data is passed in)
Implement sched_init(). This function is the plugin initialization function, which is called when the plugin is loaded.
Allocation phase
In allocation phase, the scheduler makes allocation decisions for each job. It assigns host slot, memory, and other resources to the job. It also checks if the allocation satisfies all constrains defined in configuration, such as queue slot limit, deadline for the job, etc.
Your plugin at this phase can modify allocation decisions made by another LSF module.
- External plugin is only allowed to change the host slot distribution, i.e., reduce/increase the slot usage on certain host, add more hosts to the allocation. Other resource usage modification is not supported now.
- External plugin is not allowed to remove a host from an allocation.
- External plugin cannot change reservation in an allocation.
Input and output of allocation phase
job: current job we are making allocation for.
candHostGroupList: (see section 2.1.1)
pendingReasonTable: (see section 2.1.1)
alloc: LSF allocation decision is passed in, and plugin will modify it, and make its own allocation decision on top of it.
Invocation
At allocation phase, the plugin needs to provide a callback function, AllocatorFn, which adjusts allocation decisions made by LSF. This function must be registered to the scheduler framework. The scheduler framework calls it after LSF makes a decision for the job.
In addition to AllocatorFn(), the plugin may also need to provide a New() function in the handler for the user-specific resource criteria, if there are any. If there is no such user- specific resource requirement, AllocatorFn() is applied to all jobs.
Implementing allocation phase
See
sch.mod.allocexample.c
for details.Optional.
Define criteria type for external resource requirements.
Optional.
Implement New() function in the handler for the resource criteria type.
Implement callback AllocatorFn():
- Check if the allocation has the type of SCH_MOD_DECISION_DISPATCH. If not, just return (lsb_alloc_type()).
- Optional. Get external message, and decide whether to continue (lsb_job_getextresreq()).
- Get current slot distribution in allocation and availability information for all candidate hosts (lsb_alloc_gethostslot()).
- Modify the allocation (lsb_alloc_modify()).
Use lsb_alloc_modify() gradually, not for big changes, because lsb_alloc_modify() may return FALSE due to conflict with other scheduling policies, such as user slot limits on host.
In
sch.mod.allocexample.c
, slots are adjusted in small steps.Implement sched_init(). This function is the plugin initialization function, which is called when the plugin is loaded.
- Optional. Create a handler for resource requirement processing, and register it to the scheduler framework (lsb_resreq_registerhandler()).
- Register the allocation callback AllocatorFn() (lsb_alloc_registerallocator()).
[ Top ]
Building the External Scheduler Plugin
Set INCDIR and LIBDIR in the makefile to point to the appropriate directories for the LSF include files and libraries.
Create a
Make.def
for the platform on which you want to build the plugin. TheMake.def
should be located in the LSF_MISC directory at the same level ofMake.misc
.All
Make.def
templates for each platform are inconfig
directory. For example, if you want run examples on Solaris2.6, use following command to createMake.def
:ln -s config/Make.def.sparc-sol2 Make.defYou can also change the file, if necessary.
Run
make
in current directory.[ Top ]
Enabling and Using the External Scheduler Plugin
Use
sch.mod.matchexample.c
as an example.
- Copy
schmod_matchexample.so
to LSF_LIBDIR (defined in lsf.conf).- Configure the plugin in
lsb.modules
; add following line after all LSF modules:schmod_matchexample () ()badmin mbdrestart
- Use
bsub
to submit a job.If external message is needed, use the option
-extsched
.For example:
bsub -n 2 -extsched "EXAMPLE_MATCH_OPTIONS=goedel" -R "type==any" sleep 1000In order to use
-extsched
, you must set LSF_ENABLE_EXTSCHEDULER=y inlsf.conf
.- Use
bjobs
to look at external message, and customized pending reason.-------------------------------------------------------------------------- ./bjobs -lp Job <224>, User <yhu>, Project <default>, Status <PEND>, Queue <short>, Job Pri ority <500>, Command <sleep 1000> Thu Nov 29 15:08:05: Submitted from host <goedel> with hold, CWD <$HOME/LSF4_1/ utopia/lsbatch/cmd>, Requested Resources <type==any>; PENDING REASONS: Load information unavailable: pauli, varley, peano, bongo; Closed by LSF administrator: curie, togni; Customized pending reason number 20002: goedel; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ SCHEDULING PARAMETERS: r15s r1m r15m ut pg io ls it tmp swp mem loadSched - - - - - - - - - - - loadStop - - - - - - - - - - - total_jobs mbd_size loadSched - - loadStop - - EXTERNAL MESSAGES: MSG_ID FROM POST_TIME MESSAGE ATTACHMENT 0 - - - - 1 yhu Nov 29 15:08 EXAMPLE_MATCH_OPTIONS=goedel N ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ---------------------------------------------------------------------------[ Top ]
Scheduler API Reference Summary
See the following API man pages for details:
- AllocatorFn.3
- RsrcReqHandler_FreeFn.3
- RsrcReqHandler_MatchFn.3
- RsrcReqHandler_NewFn.3
- RsrcReqHandler_SortFn.3
- _RsrcReqHandlerType.3
- candHost.3
- candHostGroup.3
- hostSlot.3
- lsb_alloc_gethostslot.3
- lsb_alloc_modify.3
- lsb_alloc_registerallocator.3
- lsb_alloc_type.3
- lsb_cand_getavailslot.3
- lsb_cand_getnextgroup.3
- lsb_cand_removehost.3
- lsb_job_getaskedslot.3
- lsb_job_getextresreq.3
- lsb_job_getrsrcreqobject.3
- lsb_reason_set.3
- lsb_resreq_getextresreq.3
- lsb_resreq_registerhandler.3
- lsb_resreq_setobject.3
[ Top ]
Debugging the External Scheduling Plugin
mbschd.log.goedel
will show which plugins are successfully loaded. If loading fails, the error message is also logged.- Use debug tool to debug plugins, such
gdb
,dbx
, etc. Attach tombschd
, and set breakpoint in the functions of plugin.[ Top ]
[ Platform Documentation ] [ Title ] [ Contents ] [ Previous ] [ Next ] [ Index ]
Date Modified: March 13, 2009
Platform Computing: www.platform.com
Platform Support: support@platform.com
Platform Information Development: doc@platform.com
Copyright © 1994-2009 Platform Computing Corporation. All rights reserved.