gtpc3m0i | Concepts and Structures |
The ESA hardware influences both the function and structure of the TPF system. In a sense, the TPF system extends the function provided by the hardware.
IBM Enterprise Systems Architecture/390 (ESA/390) is the next evolutionary step in the IBM System/360, IBM System/370, IBM System/370-XA, and IBM Enterprise Systems Architecture/370 (ESA/370) lines. Concepts from all of these systems that apply in particular to the TPF system are described in this configuration section.
The term I-stream engine rather than the term CPU is used to emphasize that a single central processing unit (CPU) is only one component of a set of hardware comprising an ESA configuration. An I-stream engine interprets one sequential stream of instructions at a time and, within the context of IBM ESA/390, implies IBM ESA/390 instructions.
Evolutions in IBM large processor architecture have emphasized the use of multiple I-stream engines that share main storage and at least one channel subsystem to manage I/O. A channel subsystem can be characterized as a set of access paths to I/O devices. This is all packaged together and takes the place of what, in the past, was frequently but inaccurately called the CPU. Figure 7 emphasizes that main storage is shared among the CPUs (I-stream engines) and a channel subsystem.
The formal ESA architectural term for the structure shown in Figure 7 is configuration, denoted in this publication as ESA configuration. An ESA configuration implies one or more I-stream engines, shared main storage, and at least one channel subsystem, without regard for the devices that can be attached to the channel subsystem. This means that devices can be attached to several channel subsystems where each channel subsystem belongs to a different ESA configuration. So, a device can access different main storages over access paths unique to an ESA configuration. A single main storage is the principal attribute of an ESA configuration.
Figure 7. Logical Structure of an ESA Configuration with Two CPUs
Main storage is viewed as a long horizontal string of bits. The string of bits is subdivided into units of 8 bits, called a byte. Each byte location in storage is identified by a unique integer starting with zero (0), called an address. Addresses are either 24-bit or 31-bit integer values.
Three basic types of addresses are recognized for addressing main storage: absolute, real, and virtual.
Each segment table points to numerous page tables and each page table points to numerous pages. A page (4096 bytes) is the smallest unit of main storage for managing blocks of virtual storage.
A virtual address, in effect, is a code that tells the TPF system how to look up the absolute address in system tables. For example, a program processes a branch instruction. This branch instruction has an address as its target. This virtual address (the target address in the branch instruction) is presented to the decoding process and the absolute address is returned. When a virtual address is used by a CPU to access main storage, it is first converted by dynamic address translation (DAT) to a real address and then by prefixing, to an absolute address.
An address space is a range of virtual addresses. The addresses are usually contiguous, but they need not be. A page is 4096 bytes and is the minimum size of an address space. A program of fewer than 4096 bytes fits into a single page. All the addresses used in a program are set up assuming the program is loaded into main storage starting at location 0. In reality, it isn't, but this assumption makes decoding the virtual address somewhat easier. If the program size increases beyond 4096 bytes (the size of a single page), another page is allocated. It doesn't matter whether the newly allocated page is adjacent to the first page or a hundred pages away from it. The TPF system decodes the addresses in the same way.
The translation of addresses is controlled by a bit in the program status word (PSW), the dynamic address translation (DAT) mode bit (DAT-mode bit). When the bit is set to 1 (assuming DAT hardware has been installed on the processor), the translation of virtual addresses to absolute addresses proceeds automatically. For programs running with dynamic address translation enabled, address references are automatically interpreted as virtual addresses during processing of the program.
The TPF system makes use of two types of address spaces provided by the ESA architecture: primary virtual address space and home virtual address space. The main architectural difference between these address spaces is the use of different segment tables for address translation.
Use of these virtual address spaces permits the TPF system to support up to 2 gigabytes (GB) of main storage and to provide a level of Entry protection for application programs. In the TPF system, primary virtual address space is called ECB virtual memory (EVM), and is the only view of storage available to an Entry; home virtual address space is called system virtual memory (SVM), and there is one SVM for each I-stream engine in an ESA configuration.
A special use of the home virtual address space occurs on system IPLs and is called the IPL Virtual Memory (IVM). The IVM and SVM are similar except that page and segment tables, which are defined in the IVM, are not accessible in the SVM.
Virtual addressing is frequently associated with demand paging of both programs and data. In demand paging systems, some pages are loaded into main storage to help refer to their information. These pages are said to be paged-in. When an address does not appear in a page currently loaded in main storage (called a paging exception), pages can be paged out to secondary storage (usually modules to permit other pages to be paged in. One effect of paging is that main storage appears larger than it physically is.
Although other IBM operating systems use the dynamic address translation (DAT) facility in conjunction with demand paging, the TPF system, in fact, has not implemented demand paging. This is because all programs in the TPF system are assumed to use only a trivial amount of I-stream engine service (see the processing assumptions in TPF Processing Assumption and Performance). The overhead for paging out is greater than simply allowing a TPF program to complete.
Within an ESA configuration, most of the main storage is shared among all the I-stream engines. An address reference used in any I-stream engine normally locates to the same absolute address. However, each I-stream engine is given a unique area of main storage, addressed as locations 0 through 4095. This area is private storage and is called page 0. It is accessed using a prefix register. The prefix register of each I-stream engine is loaded with an absolute address, from the full range of main storage addresses, to mark the beginning of the private 4 KB block. The hardware translates any storage reference in the range of 0-4095 to the absolute address identified in the prefix register. To ensure the uniqueness of page 0 storage, the absolute addresses loaded in the prefix register of each of the I-stream engines in an ESA configuration must be unique. Within the TPF system, the prefix registers are loaded with address values that point to the high end of main storage (see Figure 8).
Page 0 contains data critical to system operation. When a program running on a particular I-stream engine is interrupted, the current program status word (PSW) is saved in page 0 for the I-stream engine. Page 0 also contains indirect references to main storage. This permits the further creation of private blocks of storage for an I-stream engine. For example, within the TPF system, an I-stream engine ID (a number) is held in each page 0 of an ESA configuration. This ID is frequently used as an index into a system table, held outside of page 0, to locate a value unique to the identified I-stream engine. When such a technique is employed to find I-stream engine unique values or tables, the phrase via a page 0 reference is used. Page 0 is used to permit identical code, held in shared main storage, to be simultaneously executed in multiple I-stream engines within an ESA configuration.
The TPF system makes use of three kinds of protection to protect main storage from destruction or misuse.
A program's access key is associated with the main storage where the program resides. The access key consists of a 4-bit binary code followed by 3 bits further describing the access permitted. The fetch-protection bit governs whether storing alone is monitored or whether both storing and fetching are controlled. The reference bit indicates fetching and storing references. The change bit is set whenever a byte has been stored into. These storage keys are not part of addressable storage.
In addition to these kinds of protection provided by the ESA architecture, TPF also provides macro authorization software and address space isolation for Entries (ECBs).
Arbitration logic, incorporated in an ESA configuration, handles the moments when more than one I-stream engine references the same main storage location during the same cycle. Only one I-stream engine gains access to the storage location, while others wait. Contention is minimized through multiple paths to main storage and by private I-stream engine storage buffers. Hardware and software synchronization resolves other main storage conflicts among I-stream engines and channel engines; for example, protection against overlaying data in I/O buffers.
On occasion within an ESA configuration, programs executing simultaneously in two or more I-stream engines attempt to execute or change the same area in shared main storage. A common technique for controlling access to a critical region of code is to set a bit, called a lock indicator, indicating the critical region of code is in use and that modifications of shared storage are being done. For example, a bit can be used to indicate that a certain data area is to be exclusively modified by only one I-stream engine at a time. Setting the bit is done prior to entering the critical region of code. Other I-streams ready to modify the same data area check the bit prior to entering the critical region. If they find the bit set, they wait until the bit is free. Unfortunately, testing and setting of the bit can take more than one machine cycle. An I-stream engine that has tested a bit and found the area it controls available can be interrupted before it can set the bit for its own use. When the interrupted I-stream engine returns from the interruption, it (perhaps erroneously) can regard the critical region as available, set the bit, and continue into the critical region. If this happens, more than one I-stream engine could use the critical region, causing severe damage.
The solution to this problem is quite important for operating systems in general. Normally, in the TPF system, the controlled data area is a system table shared among all the I-stream engines. Proper serialization of modifications to shared data is critical to the correct operation of the system.
An example showing the need for exclusive control of a shared system table is given in Processor Lock. For the moment, there is the additional problem of just synchronizing the bit setting among several I-stream engines.
In a multiple I-stream environment, a problem can result if more than one instruction is used to:
Let's look at a situation that illustrates the problem: It is necessary for a program that is executing on two I-stream engines at the same time to modify a shared system table without corrupting it. Here is the program, but first note that some liberty has been taken with the instruction formats to avoid the need for introducing unnecessary coding detail.
BUSY: TM(0) (Test main storage lock bit for 0) BZ OFF (Branch if off) B BUSY (Wait for lock bit to become 0) OFF: OI(1) (Set lock bit in main storage to 1) 'critical' (Critical Region to modify the shared table) NI(0) (Reset the lock bit to 0) 'exit'
Table 4 shows the relationship of instruction execution within each
I-stream engine to time and the setting of the lock indicator at each
step. Remember that each instruction requires I-stream engine cycles to
gain access to shared main storage under the arbitration previously
described. For this timing sequence, assume that the program is
executed on two I-streams separated by one cycle (one tick of the
I-stream engine clock). The granularity of the test under
mask/branch/and-immediate instruction sequence in Table 4 allows the lock indicator to be defeated.
Time | I-stream A | Lock Indicator (after execution) | I-stream B |
---|---|---|---|
t(1) | TM(0) | 0 | "delayed" by arbitration |
t(2) | BZ OFF | 0 | TM(0) |
t(3) | NI(0) | 1 | BZ OFF |
t(4) | enter "critical" | 1 | NI(1) |
t(5) | --- | 1 | enter "critical" |
This problem can be solved by using one of the following instructions:
In essence, all of these instructions permit a field to be reliably interrogated and modified in a multiple I-stream engine environment. Test and set operates on a bit, compare and swap operates on a 32-bit field, and compare double and swap operates on a 64-bit field. Their commonality is that they serialize prior to operating.
Serialization is the process of prioritizing requests that are made at exactly the same instant, causing the requests to occur one after the other. This ensures that main storage is not going to be changed by two different I-stream engines at the same time.
For example, if programs running in multiple I-stream engines simultaneously issue a test and set instruction to check an indicator that is 0, only one I-stream engine is informed that the bit is 0 when the instruction started. All other I-stream engines are shown a 1. Furthermore, when all the I-stream engines finish the execution of the Test and Set instruction, the bit is set to 1 in main storage. Unless you intend to modify some critical system code, learning the details of these instructions is unnecessary. The general idea presented here is necessary to understand some of the system locking procedures, control blocks, tables, and macros.
These instructions, test and set, compare and swap, and compare double and swap, are sometimes called interlocking instructions because they can result in a coordinated delay of several I-stream engines.
The TPF system favors the use of the test and set (TS) instruction because frequently, lock indicators are bits that are set to control access to critical regions of system code. The test and set (TS) instruction requires fewer registers than the compare and swap (CS) instruction and requires less execution time because only a single byte needs to be set.
Statistically speaking, in most cases, a shared table is needed by only one I-stream engine at any point in time. If an attempt to access a shared table that is locked occurs, then within the TPF system, the program in the other I-stream engine generally enters a loop. The loop consists of testing the indicator for the right to access the shared table. Such a loop is called a spin lock. This loop, a software granule of
time for synchronization, takes longer than the time the hardware takes to synchronize the bit setting. However, critical regions of TPF system code that update a shared table are usually only a few instructions. Very little time is wasted with spinning, because the spinning is seldom invoked, and if spinning is invoked, it lasts for only a few instructions.
An I-stream engine processes instructions one at a time. The processing of one instruction precedes the processing of the following instruction in the order in which the instructions appear in storage. This is called the conceptual sequence. Moreover, interruptions can take place between and within instructions.
During actual operation, instructions are broken down into smaller units. Their processing consists of a series of discrete steps. Depending on the instruction, operands can be fetched and processed in a piecemeal fashion, and some delay can occur between the fetching of operands and the storing of results. Within a given I-stream engine, access to shared main storage may not be in the same sequence implied by the conceptual sequence. This is related to instruction prefetching and the way the ESA hardware overlaps storage references in the control of the special private buffers, called caches. A serialization operation consists of completing all conceptually previous shared main storage accesses by an I-stream engine, as observed by other I-stream engines and by channel programs, before proceeding with the conceptually subsequent main storage accesses. All interruptions and the execution of certain instructions cause a serialization of CPU operations.
The operations of a conceptual sequence of code can be out of synchronization with its caches. So, there can be some delay in placing results in the shared main storage. The delay has no time limit and does not affect the sequence in which results are placed in storage. That is, the conceptual sequence is the actual sequence observed by other I-stream engines and the channel subsystem. However, the store instructions to shared main storage are completed only as a result of a serialization operation and before an I-stream engine enters the stopped state.
In a tightly coupled multiprocessing environment, operating system design makes sure that deadlock does not occur between I-stream engines. For instance, two or more I-stream engines may depend upon each other to issue a serialization operation to force an update of shared main storage. Fortunately, these details are handled by the software of the TPF system.
Keep in mind that the test and set instruction enables the system to enter the critical region for just one process. The test and set instruction is executed outside of the critical region, in multiple I-stream engines. Serialization is under the control of a single I-stream engine in contrast to the interlock instructions where two or more I-streams are called.
A channel subsystem manages the flow of data and I/O commands to an appropriate control unit which, in turn, controls I/O devices. The ESA architecture distinguishes between commands and instructions. Command refers to an I/O operation performed by a channel subsystem and instruction implies a non-I/O operation performed by an I-stream engine, with the exception of those instructions used to communicate with the channel subsystem itself. For example, a start subchannel (SSCH) instruction is used by an I-stream engine to pass a channel program, which is a sequence of channel command words, to the channel subsystem. Although a channel subsystem is itself a multiprocessing complex, most of these details can be ignored in this publication without distorting too much of the TPF system structure.
The TPF system uses an external lock facility (XLF) to maintain data integrity for shared modules in a loosely coupled complex.
XLF must be connected to and shared by all ESA configurations in the loosely coupled complex. There are several types of XLFs:
LLF is a hardware feature required for module CUs shared among multiple CPCs in a loosely coupled complex. The hardware feature of an XLF includes storage in the control unit (CU).
CFLF is the TPF support for the multi-path lock facility (MPLF). CFLF and MPLF are companion features to the 3990 Multi-Path Record Cache hardware feature. The hardware feature of an XLF includes storage in the control unit (CU).
CF record lock support provides an option of using one or more CFs as XLFs.
XLFs control (serialize) access to records in the database in a loosely coupled complex. A TPF system protocol, built upon these facilities, prevents other ESA configurations from accessing a record until the lock identity is removed from the lock table.
When a module I/O request is serviced by the TPF system, an I-stream engine sends the identifier of the requested record to the XLF. A lock table in the storage of the XLF holds the identifier of all the TPF records currently being modified (and therefore, held) by any one of the ESA configurations in the loosely coupled complex. If a lock identifier is held in the table, the XLF does not permit another request to place the same lock identifier in the table. Access to such a locked record by other ESA configurations is blocked until the lock identifier is removed from the table. The data used as a lock identifier differs based on the type of XLF being used. Additional detail of the XLF is contained in Data Organization, which describes the concept of record holding.
The interrupt mechanism is the means for coordinating multiprogramming between an I-stream engine and the engines of a channel subsystem. An interrupt is a hardware enforced transfer of control within an I-stream engine. An interruption usually takes place after an instruction is finished and before interpretation of the next instruction is started. The logic built into the ESA architecture is sufficient to preserve the information necessary to return to the interrupted point of departure. Further, interrupts of the same kind are inhibited generally by the TPF system, at least long enough to preserve the state of the I-stream engine and to save control information and data. Ultimately, return is made to the interrupted code without loss of data. Classes of interrupts inhibited in an I-stream engine do not prevent interrupt generating signals to be set in the device controllers and devices. These signals are essentially stacked within the channel subsystem which presents the signals to any I-stream engine that is willing to accept the interruption.
A program status word (PSW) includes the instruction address and other information used to control instruction sequencing and to determine the state of the I-stream engine. A PSW also includes the bits used to inhibit or permit interrupts. In addition to the current PSW, which is the PSW in control of an I-stream engine, there are PSWs associated with each class of interrupts. There are six classes of interrupts possible:
Each class of interrupts is assigned an old and a new PSW. The old and new PSWs are held in the Page 0 for the I-stream engine.
When an interrupt occurs, the current PSW is stored into the old PSW for the class of interrupt, and the new PSW for the class of interrupt is loaded into the current PSW.
Consider the example of processing two concurrent interrupts occurring in an I-stream engine, one is an input/output (I/O) interrupt and the other an external (EXT) interrupt. Assume the interrupted program is at location NSI-1, where NSI means the next sequential instruction.
The hardware interrupt processing accomplished by the ESA hardware is reviewed to emphasize the sequential processing done on the PSWs when concurrent interrupt forcing signals are presented to an I-stream engine.
The time sequence for the processing that takes place is given in Figure 9, where:
The arrows show data movement.
Figure 9. Concurrent Interrupts
The concurrent I/O interrupt is stacked by the hardware because I/O interrupts are disabled in the external new PSW when loaded, preventing the I/O interrupt from being honored at this time.
Clearly, the external interrupt receives higher priority than the I/O interrupt. The TPF interrupt processing code depends upon the hardware for stacking unprocessed interrupts, while the TPF system must ensure that data is not lost. This is done through judicious setting of the mask bits. A disabled (masked off) interruption condition is retained in the hardware, and when software processing of the interruption event has completed, re-enablement occurs to permit any further interruption to take place. A load PSW instruction may load the NSI (next sequential instruction) of the interrupted program as well as accomplish interruption enablement at the same time.
Interrupt handling emphasizes the importance that the design of the system places on meeting the demands of the current program as quickly as possible.
When an application program is interrupted, the very same program regains control immediately after the TPF system has serviced the interrupt. This is always true, except when a supervisor call (SVC) instruction is issued that specifically requests the relinquishing of control. In other IBM operating systems, the dispatching mechanism is normally called after an interrupt occurs; therefore the interrupted program may not get control back immediately after the interrupt is processed. But this is not so in the TPF system where the interrupted program usually receives control again after the interrupt. This may seem like a small detail, but it represents a fundamental difference between the TPF system and other operating systems.
Another difference between the TPF system and other IBM operating systems is the way in which an I-stream accepts a non-module I/O interrupt. Any I-stream engine in an ESA configuration is capable of accepting an I/O interrupt. This means that an I-stream engine accepting an I/O interrupt is not necessarily the same I-stream engine that issued the I/O operation causing the interrupt. However, in the TPF system, a non-module I/O interrupt is accepted only by the same I-stream engine that started the I/O operation. Furthermore, in the TPF system, not all I-stream engines are permitted to start non-module-related I/O operations. Applications running on any I-stream engine can issue the TPF macros related to I/O requests. However, a TPF I/O macro request from any I-stream engine can be moved, if necessary, to an I-stream engine that services I/O. Move, in this case, means that the I/O request is ultimately referred to by the registers and PSWs of the servicing I-stream engine. The process of moving work among I-stream engines is described in Action on the Cross List (Switching I-Stream Engines).
The distinction between problem state and supervisor state gives the TPF system the ability to control the execution of certain instructions that are critical to the operation of the system. These states are controlled by a bit in the PSW. Code that runs in supervisor state is permitted to execute privileged instructions. Within the TPF system, the privileged instruction set system mask (SSM) instruction is important for control of the system state.
Entering supervisor state and executing certain macros requires special authorization for a program in the TPF system. Each program has privilege class characteristics associated with it before processing. If the program is not authorized (that is, is not privileged) to process a particular class of macro and it tries to process one, an error is reported and the program is ended.
This prevents an unauthorized program from issuing macros that are intended solely for system control. Without the authorization facility, the TPF system would be vulnerable to corruption.
This discussion further illustrates the TPF system's emphasis on meeting the demands of the current program as quickly as possible. At this point, it is necessary to distinguish between a hardware interrupt and a software interrupt. Program, restart, I/O, machine check, and external interrupts are classified as hardware interrupts. An SVC interrupt is classified as a software interrupt.
When a hardware or software interrupt arrives while the TPF system is in problem state, as part of the hardware reaction to the interrupt, the system state changes to supervisor state, the current processing environment is saved (PSW swapping), and control is transferred to the designated TPF interrupt handler. The TPF interrupt handlers run with interrupts disabled to prevent the TPF system from falling into an infinite loop that could occur by processing subsequent interrupts. The disabling of interrupts does not degrade the TPF system because the interrupt handlers are deliberately designed as only short sequences of code.
For a software interrupt, the interrupt handler (that is, the macro decoder) identifies the action to be taken as a result of the interrupt, re-enables interrupts, and transfers control to a system program (a macro service routine) to perform the action. The system program, still in supervisor state, processes the action related to the interrupt, puts the system in problem state, and returns control to the program that was interrupted. When another software interrupt arrives, the process repeats itself.
For a hardware interrupt, the interrupt handler identifies the action to be taken as a result of the interrupt, queues the remaining processing for subsequent processing, re-enables interrupts, puts the TPF system into problem state, and returns control to the program that was interrupted. When another hardware interrupt occurs, the process repeats itself.
The various system programs that process in supervisor state mask interrupts depending on their types of processing so that when hardware interrupts do occur during their processing, the interrupts are stacked and subsequently processed once interrupts are unmasked.
The privileged instructions set system mask (SSM) and load PSW (LPSW) are used by the TPF system programs to change the system state.
The supervisor call (SVC) interrupt represents a deliberate request for a system service by an application program. For example, some of the system services that are called by SVCs are:
SVC interrupts cannot be disabled; however, the only way such an interrupt occurs is through the processing of an SVC instruction. The TPF system maintains control by effectively processing only one SVC for each I-stream engine at a time.
To repeat, although there are important exceptions in the TPF system, the same program that issues the SVC (which causes an SVC interrupt) regains control immediately after the TPF system has serviced the request. This is a fundamental difference between the TPF system and most other operating systems.
The acceptance of an interrupt within an I-stream engine is controlled by a PSW and related control registers. Interrupts and PSWs are essential for controlling I/O operations between the channel subsystem engines and an I-stream engine and for providing system services to applications.
Choosing the right terminology is, to some extent, a packaging phenomenon of a system design. For example, several central processing complexes (CPCs), all sharing modules, are supported by the TPF system. At some TPF installations, this represents a central processing site. However, as different forms of local and wide area communication interconnections become universal, the use of the term central must be used with caution. Some TPF installations, for instance, do not exist entirely in the same building. We still use the term central processing complex (CPC) to denote an ESA configuration attached through a channel subsystem to a set of devices or other ESA configurations. Figure 10 shows interconnected loosely coupled complexes where each CPC is an ESA configuration that can include multiple I-stream engines and a set of private devices such as tapes and modules. CPC is used to emphasize attachments that, architecturally, are external to an ESA configuration.
CPCs that share the module configuration as in Figure 11 are connected through channel-to-channel (CTC) support.
Figure 10. Interconnected Loosely Coupled Complexes
Figure 11. Loosely Coupled Complex
Both loosely coupled and tightly coupled multiprocessing require mechanisms for interprocessor communication to coordinate the work distributed among a set of cooperating processors in a variety of arrangements.
Essentially, there are several forms of interprocessor communication:
Interprocessor communication is associated with a mechanism for sending messages among ESA configurations in a loosely coupled complex, where the content of the message does not necessarily imply that a lock is called.
Using the TPF Application Requester (TPFAR) feature, a request for data retrieval from an IBM DATABASE 2 (DB2) database can be sent from a TPF system to an IBM MVS or IBM VM system.