HLA Language Reference and
User Manual
1
Overview
HLA, the High Level Assembler,
is a vast improvement over traditional assembly languages. With HLA, programmers can learn
assembly language faster than ever before and they can write assembly code
faster than ever before. John
Levine, comp.compilers moderator, makes the case for HLA when describing the PL/360 machine specific language:
1999/07/11 19:36:51, the moderator wrote:
"There's no reason
that assemblers have to have awful syntax. About 30 years ago I used Niklaus Wirth's PL360, which was
basically a S/360 assembler with Algol syntax and a a little syntactic sugar
like while loops that turned into the obvious branches. It really was an assembler, e.g., you
had to write out your expressions with explicit assignments of values to registers,
but it was nice. Wirth used it to
write Algol W, a small fast Algol subset, which was a predecessor to
Pascal. ... -John"
PL/360, and variants that
followed like PL/M, PL/M-86, and PL/68K, were true
"mid-level languages" that let you work down at the machine level
while using more modern control structures (i.e., those loosely based on the
PL/I language). Although many
refer to "C" as a "medium-level language", C truly is high
level when compared with languages like PL/*. The PL/* languages were very popular with those who
needed the power of assembly language in the early days of the microcomputer
revolution. While it’s stretching
the point to say that PL/M is "really an assembler," the basic idea
is sound. There really is no reason
that assemblers have to have an awful syntax.
HLA bridges the gap between
very low level languages and very high level languages.
Unlike the PL/* languages,
HLA really is an assembly language. You can do just about anything with HLA that you can do with
a traditional assembler like MASM, TASM, NASM, or Gas.
If you want to write low-level assembly code using x86 machine
instructions, HLA does not get in
your way; if you want to use
compares and conditional branches rather than structured control statements,
you can. On the other hand, if you
prefer to use more readable high-level control structures, HLA allows this, as
well. HLA lets you work at the
level you are most comfortable with and at the level that is most appropriate
for the task at hand.
Beyond supplying a
"non-awful" syntax, HLA has one other important feature -- it’s
extensible. HLA provides special
features that let you add new statements to the language. So if HLA is not "high level"
(or "low level") enough for your tastes, you can extend it. This document will expend
considerable effort describing exactly how to do this in a later section.
In addition to the HLA language
itself, the HLA system provides one other very important component - the HLA
Standard Library. This is a
collection of hundreds of functions that you can use to write assembly language
programs as quickly and easily as you would write C programs.
Ultimately, the best way to
view HLA is as a hybrid
language – it combines the best features of assembly languages (access to all
the low-level machine instructions and machine facilities) and high-level
languages (abstract data types, abstract control structures, and so on). Some
long-time HLA users use HLA as a high-level language, some people use it
strictly as a low-level assembler. The good news is the choice is entirely
your’s: you can use HLA in whatever capacity you desire.
2
What is a "High
Level Assembler"?
The name "High Level
Assembler" and its abbreviation "HLA" is certainly not new[1]. Nor is the concept of a high level
assembler. David Salomon in his 1992 text
"Assemblers and Loaders" (Ellis Horwood, ISBN 0-13-052564-2) uses
these terms to describe various assembly languages dating back to 1966. Furthermore, both IBM and Motorola have
assembler products with very similar names (e.g., IBM’s HLAsm, though it’s
somewhat debatable whether HLAsm is truly a high level assembler).
Salomon offers the following
definitions for a High Level Assembler (or HLA):
A high-level assembler
language (HLA) is a programming language where each instruction is translated
into a few machine instructions.
The translator is somewhat more complex than an assembler, but much
simpler than a compiler. Such a
language should not have features like the if, for, and case control structures, complex arithmetic, logical expressions,
and multi-dimensional arrays. It
should consist of simple instructions, closely resembling traditional assembler
instructions, and of a few simple data types.
Since Salomon describes a
couple of high level assemblers that exceed this definition, he offers a second
definition for high level assemblers that is a bit higher-level:
A high-level assembler
language (HLA) is a language that combines most of the features of higher-level
languages (easy to use control structures, variables, scope, data types, block
structure) with one important feature of assembler languages namely, machine
dependence.
Neither definition is
particularly useful for describing HLA/86 and other HLAs like Terse, MASM and
TASM. Of course the term
"High Level Assembler" is very nebulous and offers a fair amount of
latitude. Almost any macro
assembler could pass as an HLA on the basis that a macro-instruction expands
into a few machine instructions.
David Salomon describes several
different high level assemblers in his text. The examples he describes are PL/360, NEAT/3, PL516, and
BABBAGE.
PL/360 and PL516 are products
that conform to the second definition above. They allow simple arithmetic expressions and assignment
statements, the use of high level control structures (if, for, while, etc.), high level data declarations, and block
structure (among other things).
These languages expose the underlying machine’s registers and allow the
use of machine instructions using a "functional" syntax.
The NEAT/3 language is a much
lower-level language; basically it
is an assembly language for the NCR Century computers that provide COBOL-style
data declarations. Most of its
"instructions" translate one-for-one into Century machine
instructions, though it does automatically insert code to convert data types
from one format two another if the data types of an instruction’s operands are
incompatible.
The BABBAGE assembly language
is an expression-based assembly language (very similar to Terse). It allows simplified high level control
structures like if and while. The
interesting thing about this assembler is that it was the only assembler for
the GEC4000 family of computers.
In addition to the HLAs that
Salomon describes, there have been several other high level assemblers created
over the years. PL/M and PL/M-86
was designed by Intel for their 8080 and 8086 CPU families. This was an obvious adaptation of the
PL/360 style HLA for Intel’s CPUs.
PL/68 was also available for the Motorola 680x0 family. SL/65 was a similar adaptation of
PL/360 for the 6502 family.
At one point there was a
product named "High Level Assembler" for the Atari ST system (68K
based). Jim Neil has also created
an expression-based high level assembler (similar in principle to Babbage) for
Intel’s x86 family. MASM and TASM
(for the x86) also fall into the category of a high level assembler due to
their inclusion of high level control structures and logical expressions.
So where does HLA/86 fit into
these definitions? In truth, the
definition of HLA/86 falls somewhere between these two definitions. So the following paragraphs will define
the term "High Level Assembler" as it should apply to HLA/86 and
similar high level assemblers.
The first definition above is
overly restrictive. It implies
that any language that exceeds these limits is a high level language, not a
high level assembly or traditional assembly language. Obviously, this definition is too restrictive in the sense
that by this definition many traditional assemblers would have to be considered
as high level languages (even beyond a high level assembler). Furthermore, it elevates many traditional
assemblers to the status of an HLA even though we wouldn’t normally think of
them as high level assemblers;
i.e., most macro assemblers provide the ability to create instructions
that translate into a few machine instructions. Macro facilities, however, are something we expect out of a
modern assembly language; their
presence doesn’t make the language a "high level" assembly language
in most people’s mind.
Furthermore, most modern assemblers provide a mechanism for declaring
multi-dimensional arrays (even though you still have to use some sequence of
instructions to index into said arrays).
The second definition David
Salomon provides hits the other extreme.
Arguably, languages like C could be called HLAs under this definition
(yes, there are some machine dependent features in C, though probably not
enough to satisfy David Salomon’s original intent).
The definition of high level
assemblers like Terse, MASM, TASM, and HLA/86 fall somewhere between these
extremes. Therefore, this document
will define a high level assembler as follows:
A "high level
assembly language" (HLAL) is a language that provides a set of statements
or instructions that practically map one-to-one to machine instructions of the
underlying architecture. The HLAL
exposes the underlying machine architecture including access to machine
registers, flags, memory, I/O, and addressing modes. Any operation that is possible with a traditional assembler
should be possible within the HLAL.
In addition to providing access to the underlying architecture, the HLAL
must provide some abstractions that are not normally found in traditional
assemblers and that are typically found in traditional high level
languages; this could include
structured control statements (e.g., if, for, and while), high level data types and data structuring
facilities, extensive compile-time language facilities, run-time expression
evaluation, and standard library support.
A "High Level Assembler" is a translator that converts a high
level assembly language to machine code.
There is a very important
difference between this definition and the ones that David Salomon
provides. Specifically, a
high-level assembly language must provide access to the underlying machine
architecture. Within the HLAL you
must be able to specify any (reasonable) machine instruction that is available
on the CPU. The HLAL may provide
other statements that do not directly map to machine instructions (e.g., an if statement), but it must, at least, provide a set
of statements that practically
map one-to-one with the machine instructions. The "practically" modifier appears here for two
reasons. First of all, some
assembly source statements may map to two or more different, but equivalent,
machine instructions. A good
example is the x86 "mov reg, reg" which can map to two different
(though equivalent) opcodes depending on the setting of the direction bit in
the opcode. Most assemblers will
map the source statement to only one of these opcodes, hence there is not truly
a one-to-one mapping (since there exist some opcodes that do not map back to
some source instruction). Another
allowable restriction is that the HLAL may not allow the use of special
"protected mode instructions" if the language is intended only for
user-mode programming (as is the case for HLA/86).
In addition to supporting the underlying
machine architecture (which almost any traditional assembler will do), the HLAL
must also provide support for some features normally found in a high level
language. The definition does not
require that a HLAL support all the features listed above, nor is it restricted
to just the features listed, but a HLAL must support some of the features
traditionally found in a high level language. The number and type of features the HLAL supports determines
how "high level" the assembly language is. Like HLLs, we can have "low-level" HLALs,
"medium-level" HLALs, "high-level" HLALs, and even
"very high-level" HLALs.
NEAT/3, for example, would be a low-level HLAL since it provides
higher-level data types, conversions, and not much else.
MASM and TASM are probably best
considered medium-to-high-level HLALs since they provide high level data
structuring facilities, structured control statements, high level procedure
definitions and invocations, a limited block structure, powerful compile-time
language (macro) facilities, standard library support (e.g., the UCR Standard
Library and many other available library modules), and other high level
language features. In actual use,
the programmer is expected to normally use standard machine instructions and
rise up to the high level statements only as necessary.
The Terse language is a good
example of a medium level HLAL since it uses an expression syntax but otherwise
maps statements fairly closely to the assembly counterparts. It does provide some higher-level data
structuring capabilities, though this is inherited from the underlying
assembler(s) on which Terse is based.
PL/360 and PL516 are definitely
high-level HLALs because they fully support simplified arithmetic expressions,
control structures, high-level data types, and other features. These languages provide access to the
underlying architecture, but the emphasis is to use these langauges as a high
level language and drop down to the machine instructions only as necessary.
HLA/86 probably falls in the
high-level-to-very-high-level range because it provides high level data types
and data structuring abilities, high level and very high level control
structures, extensive parameter passing facilities (more than most high level
languages), a very extensive compile time language, a very extensive standard
library, built-in parsing facilities for language extension, and many other
features. As a general rule,
HLA/86 has a larger feature set than the other HLALs described above, but there
are a couple of design goals that limit the "high-levelness" of
HLA/86: (1) with one exception,
HLA never emits any code behind the programmer’s back that modifies registers
or flags (the one exception is object method invocation, and this is well
documented), and (2) HLA doesn’t support arithmetic expressions (it does
support a limited form of logical/boolean expressions). One interesting aspect of HLA/86 is
that it is extensible. Using features
built into the language, you can extend HLA/86’s syntax by adding new
statements and other features.
This feature gives you the ability to make HLA/86 as high level as you
desire (though it may take some effort to achieve certain language
features). The bottom line is
this: in some ways, HLA/86 is lower level than languages like PL/360 and PL516;
in other ways, it’s higher level than these HLALs. However, as the definition requires, almost anything you can
do with a traditional assembler is possible in HLA/86.
3
What is an
"Assembler"
Because high level assemblers
are clearly different that traditional assemblers, one might question whether a
high level assembly language is truly an assembly language and whether
translators for high level assembly languages can be properly called an
assembler. Unfortunately, there is
a consierable range of opinions as to exactly what consitutes an
"assembler" versus other translators. This document will not attempt to get involved in this
debate. Instead, this section
provides a set of definitions that are useful for describing assemblers at
various levels of abstraction.
Pure Assembler:
A "pure assembler" is a
program that processes an assembly langauge source file and translates the
source code using a direct mapping from source code instructions to individual
machine instructions (each source instruction is mapped to exactly one machine
instruction). The assembler only
provides machine-primitive data types like bytes, words, double words,
etc. A pure assembler does not
provide macro facilities. A pure
assembler always produces machine code as output.
Traditional Assembler:
A "traditional assembler" is
a pure assembler plus macro facilities.
The assembler may provides some "built-in macros" and
instruction synonyms, but in general, the built-in statements should still map
to individual machine instructions (note that the programmer may extend this by
writing macros). There is no
support by the assembler for run-time arithmetic or boolean expressions. A
traditional assembler may also provide some simple data typing facilities (such
as the ability to rename primitive data types as something else, e.g., byte->char). A traditional assembler always emits
machine code as output.
High Level Assembler:
A high-level assembler is a macro
assembler plus some additional high-level language-like facilities, such as
high-level control constructs or high-level-like procedure calls. If a
programmer elects to ignore these additional facilities, they still have all
the capabilities of a macro assembler at their disposal.
4
Is HLA a True
Assembly Language?
Some people are confused by
HLA. On the one hand it looks like a High Level Language, employing syntax
similar to Pascal and C/C++. On
the other hand, it does support the machine instructions found in a typical assembly
language. Many people accuse HLA of being a compiler rather than an
assembler. What’s the truth?
The truth is, assembly
languages have evolved, just as high-level languages have evolved, and we can
no longer use a definition for an assembler that made sense in the 1950s when
describing modern assemblers such as MASM, TASM, and HLA. Today, the best
definition we can use is that an assembler is a compiler for an assembly
language. An assembler accepts a source file written in some sort of assembly
language and produces an object file as its output.
The real question, then is not
whether HLA is an assembler, but whether the HLA language is an assembly
language. Some people argue that any compiler that includes any sort of
statement that compiles into more than one machine instruction cannot be called
an “assembler.” However, such an argument immediately eliminates macro
assemblers. Eliminating macro assemblers is unsatisfactory because almost every
modern assembler provides, at the very least, some simple macro facilities.
Whether you implement an “IF” statement with a macro (generally supplied by the
assembler’s author, as is the case, for example, with FASM) that you have to
include into your source file, or via a ‘macro’ that the assembler’s author has
provided as part of the assembler is really a matter of implementation. To the
end user of the assembler, the “IF” statement is just as much a part of the
language that they can employ regardless of the implementation. The fact that
assemblers such as MASM, TASM, and HLA provide these high-level-like control
structures in assembly language does not imply that the languages these
products implement are not assembly languages.
Some people argue that
“high-level assemblers” such as MASM, TASM, and HLA are not assemblers any more
than C/C++ compilers could be considered assemblers if those C/C++ compilers
support an in-line assembly capability. However, their arguments strengthen the
case for calling a product like HLA an “assembler.” After all, if we’re going
to continue to call C/C++ a high-level language even though it provides support
for machine instructions, then there is no reason we cannot call a product like
MASM, TASM, or HLA “assemblers”
even though they provide a modicum of support for high-level-like control
structures. Ultimately, it is the focus of the language that defines the type
of language it is. C/C++’s focus is on writing high-level language programs,
with a few machine instructions thrown in now and then when the high-level
language doesn’t quite handle everything. High-level assemblers, such as HLA,
MASM, and TASM are focused on writing assembly language modules. They have some
high-level control structures thrown in to simplify some tasks (e.g., in the
case of HLA, the high-level control structures exists as a bridge between HLLs
and assembly during the learning process), but the focus is mainly on writing
assembly language code.
Some people feel that if you
learn HLA (or some other high level assembler), then you’re not really learning
"assembly language."
This is utter nonsense. If
you thoroughly learn HLA, you’ll know assembly language programming inside and
out. Switching to a different
assembler from HLA would be no different, say, than switching from Gnu’s Gas
assembler to MASM (or vice versa). One might bemoan the features lost in such a
translation, but when going from HLA to some other assembler you’re typically giving
up features rather than gaining
anything.
Still there is a pervasive
argument that high level control structures like IF/WHILE/FOR/etc. don’t belong
in a true assembler. Well, HLA,
MASM, and TASM users can elect to ignore these statements (as many old-time
MASM programmers do; with HLA you can even disable these statements). As long as the rest of the assembler
supports a language that allows one to write "pure" assembly language
code, why would anyone question the validity of the title "assembly
language" for the code?
(Unless, of course, they have an ax to grind.) For those who are diametrically opposed to allowing any
language that contains IF/WHILE/FOR/etc. statements to be called assembly
lanugage, well, that’s why we call these things "high level assembly
languages." To note the fact
that they are a little more powerful than traditional assembly languages.
The bottom line is this: if you
learn HLA, you will learn assembly language programming. As long as you
understand how to write the low-level code (within HLA) and don’t rely
exclusively on the high-level control statements in your programs, no one can
truthfully question your assembly language programming knowledge.
5
HLA Design Goals
HLA was originally conceived as
a tool to teach assembly language programming. In early 1996 I decided to do a Windows version of my
electronic text “the Art of Assembly Language Programming” (AoA). After an attempt to develop a new
version of the “ UCR Standard Library for 80x86 Programmers” (a
mainstay of AoA), I came to the conclusion that MASM just wasn’t powerful
enough to make learning assembly language really easy. I decided to develop an assembler with
sufficient power, providing the tools for a good standard library as well as
satisify some other requirements. Therefore, HLA has two important goals: provide a system that
is powerful enough to develop code and macros to make learning assembly
language, which simultaneously providing a system that is easy for beginners to
learn.
The principle goal of HLA was
to leverage student’s existing programming knowledge. For example, a good Pascal programmer can get their first
C/C++ program operational in a few minutes. All they’ve got to do is note the similarities between the
two programming languages, make the appropriate syntactical changes, and
they’re up and running. Take that
same Pascal programming and expect them to learn LISP or Prolog the same way,
and you’ll not meet with the same success. LISP and Prolog are completely different, they use a
different “programming paradigm,” so the student has to “start over from
scratch” when learning these languages.
Although assembly language is an imperative language (like Pascal and
C/C++), there is a considerable “paradigm shift” when moving from one of these
high level languages to assembly.
In HLA, I wanted to create a language with high level control structures
and declarations that made it possible for someone familiar with an imperative
language like Pascal or C/C++ to get their first HLA program running in a matter
of minutes (or, at worst, a matter of hours). Of course, to achieve this goal, I needed to add high-level
data declarations and high-level control constructs to the HLA language.
The astute reader will quickly
point out that high level control structures are not assembly language and
letting the students use these types of statements is not really teaching them
assembly language. This is quite
true; since the purpose of
teaching an assembly language course is to teach the students “assembly
language programming” it is quite clear that HLA would fail if it only provided these high level control structures
(e.g., like the PL/M language does).
Fortunately, this is not the case.
HLA supports all standard assembly language instructions including CMP
and Jcc instructions, so you can still write “pure” assembly language programs
without using those high level language control structures. However, it does take time to learn the
several hundred different machine instructions. Traditionally, it’s taken my students (using only MASM)
about five weeks before they could really write any meaningful programs in
assembly language (you have to cover things like numeric representation, basic
CPU architecture, addressing modes, data types, and introduce the instruction set
before any real programs can be written).
HLA lets students write meaningful programs within about a week of it’s introduction
(e.g., the first assignment I give in a typical quarter is to write an
“addition table” program that computes the outer product [addition table] of
the two vectors 0..15 and 0..15, printing the table formatted nicely). They achieve this by using statements
they already know (like IF and WHILE) with the injection of just a few assembly
language concepts (registers, and the MOV and ADD instructions) plus an
introduction to the HLA Standard Library.
Over the next several weeks, these students write more and more complex
programs as they are introduced to new assembly language and HLA concepts
(e.g., data representation, basic architecture, addressing modes, data types,
and additional instructions). At
about the sixth week, I begin “weaning” these students off the high level
language statements and force them to use the low level machine
instructions. It turns out that
they learn how to simulate an IF statement at roughly the same point in the
quarter as they did when they used only MASM, but the big difference is that
they’ve written a lot more code up to that point proving out other concepts in
machine organzation and assembly language programming. In my limited experience with classroom
testing, I’ve found that students spend less time on the class, cover more
material, and retain the knowledge better (by the time of the final exam) than
they did when I only used MASM.
The general goal of reducing
the learning curve for students is achieved several ways.
(1) As noted above, HLA allows a gradual transition from high
level languages into pure assembly language. My favorite analogy here is the Nicoderm CQ smoking
cessation system (“gradual steps are better.”). Like the Nicoderm system, HLA lets students learn assembly
language in gradual steps rather than throwing them into the water and shouting
“sink or swim!”
(2) In addition to letting the students employ high level language
statements in their assembly language programs, HLA contains several other
familiar concepts and syntactical items that ease the transition from high
level language programming to assembly language. For example, HLA uses the familiar (to C/C++ programmers) “/*”
and “*/” comment delimiters (as well as the “//” comment delimiter). Statements generally end with a
semicolon (just as in high level languages). Machine instructions use a functional notation rather than
“mnemonic-operand” notation. Constant, type, and variable declarations should
look very familiar to Pascal programmers.
HLA’s standard library should look comfortable to anyone who has used
the C/C++ standard library.
In addition to syntactical
similarities, well-written HLA programs share a similar programming style with
modern high level languages. So a
student who has learned how to write readable Pascal, C/C++, or Java programs
will be able to write readable HLA programs with almost no additional study. Contrast this with the style guide I’ve
written for (MASM) assembly language programmers that is quite a bit different
than high level languages and takes a while to master.
Another factor many people
don’t consider is the evaluation of a programming project. At UCR we are given about 1.5-2 hours
per student per quarter of reader (student grader) time to grade projects. Experienced readers who can grade (or
want to grade) assembly language projects are few and far inbetween. Most readers get “stuck” with grading
the assembly class rather than volunteer for the job. The fact that most student assembly language projects have a
horrible programming style and are hard to read only exacerbates this
situation. HLA helps solve this
problem. Since good HLA
programming style is very similar to good C/C++ style, UCR’s readers have a
much easier time reading the projects and evaluating their programming
style. Also, since the students
have (presumably) learned good programming style in the prerequisite course(s),
they tend to write easier to read HLA programs than MASM programs. This lets me assign more projects
without fear of exceeding my reader budget each quarter.
HLA’s advantages are easily
summed up by a complaint I had from a student once. She said “HLA drives me nuts. It’s so similar to C++ that I often get confused and try out
something that would work in C++ only have have the HLA compiler reject
it.” I agreed with this student
that this was a bit of a problem, but I also mentioned “what about all the
times you’ve tried something from C++ and it HAS worked?” She thought about it for a moment and
walked away agreeing with my assessment of her complaint. Had this student been learning assembly
the traditional way, she wouldn’t have bothered to try anything. She would had to have spent extra time
learning how to achieve what she wanted by reading an assembly text or she
would have missed out on the opportunity to actually learn something new. HLA’s similarity to C++ encouraged her
to try something out on her own.
The experiments weren’t always successful, but in those cases where they
were, she benefited greatly from this.
This anecdote, more than any other, sums up what my goals with HLA were
and describes the success I believe I have achieved with it.
6
How to Learn Assembly
Programming Using HLA
Of course, a compiler without a
language reference manual and tutorial is useless. This document will provide a reference to the HLA
programming language. It is not,
however, appropriate pedagogy for beginners (it’s more suitable for those who
already know assembly language programming and wish to learn HLA’s
syntax). A better text for
beginners is "The Art of Assembly Language Programming/32-bit
Edition." This provides a
complete college level textbook that teaches assembly language programming from
the ground up using HLA. You can
find a copy of "AoA" on Webster at http://webster.cs.ucr.edu. Webster also contains the latest
version of HLA as well as tons of HLA sample source code. That’s the first place you should go for
information on learning HLA.
7
Legal Notice
The HLA v1.xx implementation is
a prototype intended to test language design and
implementation features. I
(Randall Hyde) have placed this code and language design in the public domain
so others may benefit from this work.
However, keep in mind that, as a prototype, HLA is not up to
contemporary commercial standards for software quality. It is your responsibility to evaluate
whether HLA is suitable for whatever purpose you intend its use.
At any given time there are
several known and unknown defects in this software. Some may be corrected in later releases of HLA v1.x, some
may never be corrected in the v1.x series. I (Randall Hyde) do not warrant or guarantee this software
in any way. In particular, you
cannot expect corrections of any given defect in the system. Obviously, I try to fix known problems
(if possible), but I refuse to be held legally responsible for such defects in
the software.
Note that defects will come in
three general varieties: defects that cause the compiler to fail or generate
bad code, defects in support code (e.g., the HLA Standard Library or other
example code), and defects in the documentation accompanying this product. No
guarantee applies to anything in HLA, especially in these three areas.
The purpose of developing a
prototype implementation of the HLA language was to try out language design and
implementation ideas. The
prototype phase of HLA development is rapidly coming to an end and an
"official" HLA language design will be forthcoming. HLA v2.0 will implement this new
language. The only guarantees I
make about compatibility between HLA v1.x and HLA v2.0 is that there will be some incompatibilities. The exact nature and magnitude of those
incompatibilities is unknown at this point, but it is safe to assume that no
HLA v1.x program will compile under HLA v2.0 without at least some minor source
code changes. So please don’t get
the idea that any investment you make in HLA source code will be protected in
v2.0 (note: after the release of v2.0 this is a relatively safe assumption to
make, though there will still be no guarantees). The changes in the source language between HLA v1.25 and HLA
v1.26, between HLA v1.80 and v1.82, and between HLA v1.101 and v1.102 are but a
small harbinger of the changes that will occur between v1.x and v2.0.
The HLA Standard Library may
also undergo changes over time.
For example, the HLA v1.x API has significant differences from HLA
stdlib v4.x and later. So expect this to happen and plan accordingly if you
intend to port your HLA code to v2.0 eventually.
Because HLA is constantly
changing (typical of a prototype), it is very difficult to keep the
documentation in phase with the language.
You can expect this documentation (and all HLA documentation) to contain
omissions (e.g., of new features that have yet to be documented), discussion of
features removed from HLA, and incorrect descriptions of HLA features. Every attempt will be made to keep the
documentation in phase with the software, but like so many free software
projects, lack of time and motivation prevents perfection[2].
This software is not fit for
use in mission-critical or life-support software systems. This software is principally intended
for evaluation and educational (i.e., learning assembly language) purposes
only. It has been successfully
used to develop commercial applications (including nuclear reactor control
consoles) and it has been successfully used in educational environments, but
again, you are personally responsible for determining the fitness of this
software and documentation for your particular application and you must take
responsibility for that choice.
HLA’s current design makes use
of other software tools that I (Randall Hyde) did not write. These tools
include the FASM assembler, the MASM assembler, the Microsoft Linker, the
Microsoft Librarian, the Pelles C linker, the Pelles C librarian, Borland’s
Turbo Assembler, Borland’s Turbo Linker, Borland’s Turbo Librarian, and the
Free Software Foundations ld and as programs. Because some of these tools are
commercial products and are covered by various license agreements, not all of
these tools come with the HLA distribution. For example, if you want to use the Microsoft or Borland tools,
you’ll have to obtain copies of them from some other source. Note that using
HLA does not require the Microsoft or Borland tools; HLA is simply compatible
with these tools if you already own them and would prefer to use them. HLA does ship with all the tools you
need to effectively use HLA; the use of these non-free tools is optional. Licenses
for all the products shipped with HLA are included in the package and you may
view the licenses any time by specifying the “-license” command-line option
when running the HLA program.
8
Installing HLA Under Windows
Easy Installation:
Run the hlasetup.exe program provided
for the Windows distribution. Tell it to put the system in the C:\HLA
subdirectory (the default install location). 99% of all unsuccessful
installations under Windows occur because people try to install the code manually.
Run the hlasetup program. It
will save you a lot of grief.
Manual Installation:
HLA can operate in one of
several modes. In the standard mode it converts an HLA source file directly
into an object file like most assemblers. In other modes, it has the ability to
translate HLA source code into another source form that is compatible with
several other assemblers, such as MASM, TASM, FASM, and Gas. A separate assembler, such
as MASM, can compile that low-level intermediate code to produce an object code
file. Strictly speaking, this step
(converting to a low-level assembler format and assembling via
MASM/FASM/GASM/Gas is not necessary, but there are some times when it’s
advantageous to work in this manner. Finally, you must link the object code
output from the assembler using a linker program. Typically you will link the object code produced by one or
more HLA source files with the HLA Standard Library (hlalib.lib) and, possibly,
several operating system specific library files (e.g., kernel32.lib under
Win32). Most of this activity
takes place transparently whenever you ask HLA to compile your HLA source
file(s). However, for the whole process
to run smoothly, you must have installed HLA and all the support files
correctly. This section will
discuss how to set up HLA on your system.
First, you will need an HLA
distribution for your particular Operating System. These instructions describe installation under Windows; see
the appropriate sections in this manual if you’re using Linux, FreeBSD, or Mac
OSX. The latest version of
HLA is always available on Webster at http://webster.cs.ucr.edu. You should go there and download the
latest version if you do not already possess it.
The HLA.ZIP package contains
the HLA compiler, the HLA Standard Library, and a set of include files for the
HLA Standard Library. It also
includes copies of the FASM assembler, the Pelles C librarian and linker, and
some other tools. These tools will let you produce executable files under
Windows. In theory, everything you
need to run HLA (using the internal object code generation module or using FASM
as an external back-end assembler for HLA) is provided in the ZIP file.
If you want to use MASM as a
back-end assembler to HLA, then you should grab a copy of the MASM assembler
and linker (assuming you don’t already own these). The easiest way to get all
the MASM files you need is to download the "MASM32" package from
http://www.pdq.com.au/home/hutch/masm.htm or any of the other places on the net
where you can find the MASM32 package.
Once you unzip this file, it’s easy to install the MASM32 package using
the install program it supplies.
Here are the steps I went
through to install MASM32 on my system (skip these steps if you’re not
interested in using MASM as a back-end to HLA; generally, this is an advanced
facility, so beginners should skip to the next step described in this section):
•
I downloaded
masm32v6.zip from the URL above (later versions are probably okay too, although
there is a slight chance that the installation will be different.
•
I double-clicked on
the masm32v6.zip file (which runs WinZip on my system).
•
I choose to extract
"install.exe". I told
WinZip to extract this file to C:\.
•
I double-clicked on
the "install.exe" icon and selected the "C:" drive in the
window that popped up. Then I hit
the install button and waited while MASM32 extracted all the pertinent files. This produced a directory called
"MASM32". MASM32 is a
powerful assembly language development subsystem in its own right; but it uses the traditional MASM syntax
rather than the HLA syntax. So
we’ll use MASM32 mainly for the assembler, linker, and library files. MASM32 also includes a simple
editor/IDE and several other tools that may be useful to an HLA
programmer. Feel free to check
this software out and see if it is useful to you. For now, note that the executable files you will ultimately
need are ML.EXE, ML.ERR, LINK.EXE, and a couple of DLLs. You can find them in the MASM32\BIN
subdirectory. Leave them there for
the time being. The MASM32\LIB
directory also contains many Win32 library files you will need. Again, leave them alone for the time
being.
Here are the steps I went through to install
HLA.ZIP on my system:
•
If you haven’t
already done so, download the HLA executables file from Webster at
http://webster.cs.ucr.edu. On
Webster you can download several different ZIP files associated with HLA from
the HLA download page. The
"Executables" is the only one you’ll absolutely need; however, you’ll probably want to grab
the documentation and examples files as well. If you’re curious, or you want some more example code, you
can download the source listings to the HLA Standard Library. If you’re really curious (or masochistic), you can download the HLA
compiler source listings to (this is not for casual browsing!).
•
I downloaded the
HLA.zip file while writing this (v1.102).
There are generally two versions available on Webster – the “frozen” HLA
v1.99 version and the latest version (v1.102 as I write this, but it’s probably
much higher as you’re reading this).
If you’re learning assembly language using the published edition of “The
Art of Assembly Language” you might want to consider grabbing v1.99 (the
“frozen” version) as the library code and examples match those in “The Art of
Assembly Language”. Later versions of HLA have subtle language and library
differences that may create problems for beginning users. If you’re brave, or you already have
assembly language experience and you want to play around with HLA, you can grab
the latest and greatest version and work from there. I chose to download this
file (HLA.ZIP) to my "C:\" root directory.
•
After downloading
HLA.zip to my C: drive, I double-clicked on the icon to run WinZip. I selected "Extract" and told
WinZip to extract all the files to my C:\ directory. This created an "HLA" subdirectory in my root on
C: with two subdirectories (include and lib) and two EXE files (HLA.EXE and
HLAPARSE.EXE. The HLA program is a
"shell" program that runs the HLA compiler (HLAPARSE.EXE), MASM
(ML.EXE), the linker (LINK.EXE), and other programs. You can think of HLA.EXE as the "HLA Compiler".
•
Next, I set some
environment variables Warning: 99% of all broken installs happen because the
following environment variables are not set up properly. Be
sure to enter these commands exactly as specified:
path=c:\hla;c:\masm32\bin;%path%
set
lib=c:\masm32\lib;c:\hla\hlalib;%lib%
set
hlalib=c:\hla\hlalib\hlalib.lib
set hlainc=c:\hla\include
•
Type “set” by itself
on a command line (and hit Enter) and verify that the above environment
variables are set properly in the list that Windows produces.
•
HLA is a Win32
Console Window program. To run HLA
you must open up a console
Window. Under Windows 2000 and XP,
Microsoft has hidden this away in
Start->Programs->Accessories->Command Prompt. You might find it in another
location. You can also start the
command prompt processor by selecting Start->Run and entering
"cmd".
•
At this point, HLA
should be properly installed and ready to run. Try typing "HLA -?" at the command line prompt and
verify that you get the HLA help message.
If not, go back and figure out what you’ve done wrong up to this point
(it doesn’t hurt to start over from the beginning if you’re lost).
•
Thus far, you’ve
verified that HLA.EXE is operational.
•
Next, let’s verify
the correct operation of the linker.
Type "polink /?" and verify that the linker program runs. You can ignore the help screen that
appears. You don’t need to know
about this stuff.
•
Now it’s time to try
your hand at writing an honest to goodness HLA program and verify that the
whole system is working. Here’s
the canonical "Hello World" program written in HLA. Enter it into a text editor and save it
using the filename "HW.HLA":
program HelloWorld;
#include(
"stdlib.hhf" )
begin HelloWorld;
stdout.put( "Hello, World of Assembly
Language", nl );
end HelloWorld;
•
WARNING: if you
are a notepad.exe user, note that in certain Windows modes notepad will always
append “.txt” to the end of the filename you specify. This will cause HLA to fail if you attempt to compile
such source files (because HLA expects the filename to end with “.hla” not
“.txt”. Be aware of this problem.
•
Make sure you’re in
the same directory containing the HW.HLA file and type the following command at
the "C:>" prompt:
"HLA -v HW". The
"-v" option tells HLA to produce VERBOSE output during
compilation. This is helpful for
determining what went wrong if the system fails somewhere along the line. This command should output similar to
the following (this changes all the time, so don’t expect it to be exact):
HLA (High Level Assembler)
Use '-license' to see licensing
information.
Version Version 1.102 build
19257 (prototype)
Win32 COFF output
OBJ output using internal FASM
back-end
-test active
HLA Lib Path:
g:\hla\hlalib\hlalib.lib
HLA include path:
g:\hla\hlalibsrc\working\hlainc
HLA temp path:
Linker Lib Path: g:\hla\hlalib;C:\Program
Files\Microsoft Visual Studio\VC98\mf
c\lib;C:\Program
Files\Microsoft Visual Studio\VC98\lib;g:\hla\hlalib
Files:
1: t.hla
Compiling 't.hla' to 't.obj'
using command line:
[hlaparse -WIN32
-level=high -v -sf -ccoff -test
"t.hla"]
----------------------
HLA (High Level Assembler)
Parser
use '-license' to view license
information
Version Version 1.102 build
19256 (prototype)
-t active
File: t.hla
Output Path: ""
hlainc Path:
"g:\hla\hlalibsrc\working\hlainc"
Compiler running under Windows
OS
Back-end assembler: FASM
Language Level: high
Compiling "t.hla" to
"t.obj"
Compilation complete, 9
lines, 0.271 seconds, 33 lines/second
Using flat assembler version
C1.66
3 passes, 604 bytes.
----------------------
Linking via [polink
@"t.link._.link"]
POLINK: warning: /SECTION:.bss
ignored; section is missing.
•
If you get all of
this output, you’re in business.
Note the “POLINK: warning:” message. This warning is due to a
defect in the POLINK linker. You can ignore it. It has no impact on the use of
HLA or the generation of a correct executable file.
Manually installing HLA is a
complex and slightly involved process.
Fortunately, the hlasetup.exe program automates almost everything so
that you don’t have to worry about changing registry settings and things like
that. If you’re a first-time HLA user, you definitely want to use this method
to install HLA. Manual installation is really intended for upgrades as new
versions of HLA appear. You do not have to change the environment variables to
install a new version of HLA, simply extract the executable files over the top
of your existing installation and everything will work fine.
The most common two problems
people have running HLA involve the location of the Win32 library files and the
choice of linker. During the
linking phase, HLA (well, polink.exe actually) requires the kernel32.lib,
user32.lib, and gdi32.lib library files.
These must be present in the pathname(s) specified by the LIB
environment variable. If, during the
linker phase, HLA complains about missing object modules, make sure that the
LIB path specifies the directory containing these files. If you’re a MS VC++ user, installation
of VC++ should have set up the LIB
path for you. If not, then locate
these files (they are part of the MASM32 distribution) and copy them to the
HLA\HLALIB directory.
Another common problem with
running HLA is the use of the wrong link.exe program. Microsoft has distributed several different versions of
link.exe; in particular, there are
16-bit linkers and 32-bit linkers.
You must use a 32-bit segmented linker with HLA. If you get complaints about "stack
size exceeded" or other errors during the linker phase, this is a good
indication that you’re using a 16-bit version of the linker. Obtain and use a 32-bit version and
things will work. Don’t forget
that the 32-bit linker must appear in the execution path (specified by the PATH
environment variable) before the 16-bit linker. Better yet, unless you have a good reason to do otherwise,
stick with the polink.exe linker program provided with the HLA download.
For more information, please
see the sections on HLA Internal Operation and Customizing HLA.
9
Installing HLA Under Linux, FreeBSD, or Mac OSX
HLA is not a stand alone
program. Under Linux, FreeBSD, and
Mac OSX it is a compiler that translates HLA source code into a lower-level
assembly language that Gas or FASM (Linux only) must process. Finally, you must link the object code
output using a linker program such as the GNU ld linker. Typically you will link the object code
produced by one or more HLA source files with the HLA Standard Library (hlalib.a). Most of this activity takes place
transparently whenever you ask HLA to compile your HLA source file(s). However, for the whole process to run
smoothly, you must have installed HLA and all the support files correctly. This section will discuss how to
set up HLA on your system.
First, you will need an HLA
distribution for Linux, FreeBSD, or Mac OSX (hereafter referred to as
*NIX). Please see Webster or the
previous section if you’re attempting to install HLA on a different OS such as
Windows. The latest version
of HLA is always available on Webster at http://webster.cs.ucr.edu. You should go there and download the
latest version if you do not already possess it.
Under *NIX, HLA can operate in
one of three modes: it can directly produce object files (.o files) that you
can link with ld; it can produce a low-level assembly language output file that
you can assemble using the Free Software Foundation’s Gas assembler, or it can
produce a low-level assembly language output file that you can assemble using
FASM (Linux only). The HLA package
contains the HLA compiler, FASM (Linux version only), the HLA Standard Library,
and a set of include files for the HLA Standard Library. If you write an HLA program want Gas to
process it, you’ll need to make sure you have a reasonable version of Gas
available (Gas is available on most *NIX distributions, so this shouldn’t be a
problem). Note that the HLA Gas
output can only be assembled on Linux and FreeBSD by Gas v2.10 or later (so you
will need the 2.10 or later binutils distribution). Note that (apparently) HLA
does not work with 64-bit versions of Gas, so make sure you’re using a 32-bit
version of Gas with HLA or use the object code output feature. Note that under Mac OSX, HLA emits
output specifically for the Macintosh version of Gas.
Here’s the steps I went through
to install HLA on my Linux system:
•
First, if you haven’t
already done so, download the HLA executables file from Webster at
http://webster.cs.ucr.edu. On
Webster you can download several different ZIP files associated with HLA from
the HLA download page. The
"Linux Executables", “FreeBSD executables”, or “Mac OSX executables”
is the only one you’ll absolutely need;
however, you’ll probably want to grab the documentation and examples
files as well. If you’re curious,
or you want some more example code, you can download the source listings to the
HLA Standard Library. If you’re really curious (or masochistic), you can download the HLA
compiler source listings to (this is not for casual browsing!).
•
I always use the BASH
shell when working under *NIX. Feel free to use whatever shell you’re most
comfortable with, but the commands I list in the following directions assume
you’re using BASH and will not work with other shells. If you’re an advanced
*NIX user, you can translate the commands as necessary. If you’re not an
advanced *NIX user, the run BASH from the command line (by typing “bash”) and
stick with the BASH shell. Mac
users note: the “terminal” application (which provides the command-line shell)
does not run bash by default. Yout must explicitly set run “bash” from the
command line before proceeding.
•
I downloaded the
linux.tar.gz file (or Linux), freebsd.tar.gz (for FreeBSD), and mac.tar.gz file
(for Mac OSX) for HLA v1.102 while writing this. Most likely, there is a much later version available as
you’re reading this. Be sure to
get the latest version. I created
a “/usr/hla” directory and then I downloaded this file to my
"/usr/hla"
directory; you can put the
file whereever you like, though this documentation assumes that all HLA files
wind up in the "/usr/hla/..." directory tree. Note: the xxxx.tar.gz file downloads into ./hla. So you should CD
(change directory) into /usr before untaring the file. Warning: It’s
a real good idea to first install HLA in the “/usr/hla” subdirectory, even if you
don’t want to leave it there. Get HLA working first, then move it to a
different directory node if you don’t want it at “/hla/usr”. The only reason for a beginner to
install HLA some place else is if they don’t have root access on the system can
can’t create the “/usr/hla” subdirectory (in which case I’d recommand having
the system administrator install it for you). Advanced *NIX users should be
able to translate the following commands and put HLA wherever they like, but
“/usr/hla” really is the best place.
•
After downloading
linux.tar.gz/freebsd.tar.gz, I executed the following shell command: "gzip
-d linux.tar.gz" (for FreeBSD: “gzip –d freebsd.tar.gz”, for Mac: “gzip –d
mac.tar.gz”). Once decompression
was complete, I extracted the individual files using the command "tar xvf xxxx.tar" (xxxx=linux, freebsd, or mac). This extracted several
executable files (e.g., "hla" and "hlaparse") along
with two subdirectories (include and hlalib). The HLA program is a "shell" program that
runs the HLA compiler (hlaparse), gas (as), FASM (fasm, Linux only), the linker
(ld), and other programs. You can
think of hla as the "HLA Compiler". It would be a real good idea, at this point, to set the
permissions on "hla" and "hlaparse" so that everyone can
read and execute them. You should
also set read and execute permissions on the two subdirectories and read
permissions on all the files within the directories (if this isn’t the default
state). Do a "man chmod"
from the Linux command-line if you don’t know how to change permissions. Note
that you will need to be doing this as root (see “man su”) if you’re installing
HLA at “/usr/hla”.
•
If you prefer a more
“Unix-like” environment, you could copy the hla and hlaparse (and other
executable) files to the “/usr/bin” or “/usr/local/bin” subdirectory. This
step, however, is optional.
•
Next, (logged in as a
plain user rather than root or the super-user), I edited the
".profile" file in my home directory ("/home/rhyde" in my
particular case, this will probably be different for you). I found the line that defined the
"path" variable, it originally looked like this on my system:
"PATH=$DBROOT/bin:$DBROOT/pgm:$PATH"
I edited this line to add the path to the HLA directory, producing the
following:
"PATH=$DBROOT/bin:$DBROOT/pgm:/usr/hla":$PATH
Without this modification, *NIX will probably not find HLA when you attempt to
execute it unless you type a full path (e.g., "/usr/hla/hla") when
running the program. Since this is
a pain, you’ll definitely want to add "/usr/hla" to your path. Of
course, if you’ve chosen to copy hla and hlaparse to the “/usr/bin” or
“/usr/local/bin” directory, chances are pretty good you won’t have to change
the path as it already contains these directories. Note that “.profile” is a BASH-related file, you may need to
edit a different file if you’re using a different shell.
•
Next, I added the
following four lines to ".profile" (note that Linux filenames
beginning with a period don’t normally show up in directory listings unless you
supply the "-a" option to ls):
hlalib=/usr/hla/hlalib/hlalib.a
export
hlalib
hlainc=/usr/hla/include
export
hlainc
These four lines define (and export) environment variables that HLA needs
during compilation. Without these
environment variables, HLA will probably complain about not being able to find
include files, or the linker (ld) will complain about strange undefined symbols
when you attempt to compile your programs. Note that this step is optional if
you leave the library and include files installed in the /usr/hla directory
subtree. Note: these are BASH
commands and may need to be
changed if you’re using a different shell.
Optionally, you can add the following two lines to the .bashrc file (but make
sure you’ve created the /tmp directory if you do this):
hlatemp=/tmp
export
hlatemp
After saving the ".profile" shell, you can tell Linux to make the
changes to the system by using the command:
source
.profile
Note: this discussion only applies to users who run the BASH shell. If you are using a different shell
(like the C-Shell or the Korn Shell), then the directions for setting the path
and environment variables differs slightly. Please see the documentation for your particular shell if
you don’t know how to do this.
•
At this point, HLA
should be properly installed and ready to run. Try typing "HLA -?" at the command line prompt and
verify that you get the HLA help message.
If not, go back and figure out what you’ve done wrong up to this point
(it doesn’t hurt to start over from the beginning if you’re lost).
•
Now it’s time to try
your hand at writing an honest to goodness HLA program and verify that the
whole system is working. Here’s
the canonical "Hello World" program written in HLA. Enter it into a text editor and save it
using the filename "hw.hla":
program HelloWorld;
#include(
"stdlib.hhf" )
begin HelloWorld;
stdout.put( "Hello, World of Assembly
Language", nl );
end HelloWorld;
•
Make sure you’re in
the same directory containing the "hw.hla" file and type the
following command at the prompt:
"hla -v hw". The
"-v" option tells HLA to produce VERBOSE output during compilation. This is helpful for determining what
went wrong if the system fails somewhere along the line. This command should produce output like
the following:
HLA (High Level Assembler)
Parser
Copyright 2001, by Randall
Hyde, all rights reserved.
Version Version 1.32 build 4895
(prototype)
-t active
File: t.hla
Compiling "t.hla" to
"t.asm"
HLA (High Level Assembler)
Copyright 1999, by Randall
Hyde, all rights reserved.
Version Version 1.32 build 4895
(prototype)
ELF output
Using GAS assembler
GAS output
-test active
Files:
1: t.hla
Compiling 't.hla' to 't.asm'
using command line
[hlaparse -v -sg -test
"t.hla"]
Assembling "t.asm"
via [as -o t.o "t.asm"]
Linking via [ld -o "t" "t.o"
"/usr/hla/hlalib/hlalib.a"]
Versions of HLA may appear for
other Operating Systems (beyond Windows, Linux, FreeBSD, and MacOSX) as
well. Check out Webster to see if
any progress has been made in this direction. Note a very unique thing about HLA: Carefully written (console)
applications will compile and run on all supported operating systems without
change. This is unheard of for
assembly language! So if you are
using multiple operating systems supported by HLA, you’ll probably want to
download files for all supported OSes.
For more information, please
see the sections on HLA Internal Operation and Customizing HLA.
10
Using HLA with the
RadASM Integrated Development Environment
Please see the separate
RadASM/HLA user’s manual for details concerning the use of the RadASM
integrated development environment. The hlasetup.exe program automatically
installs RadASM on your system (and properly sets up the associated INI files).
When upgrading to newer versions of RadASM, be sure to save the radasm\hla
folder and the hla.ini files before copying the new version of RadASM over the
old version. Again, for more information concerning RadASM, see the separate
RadASM user’s manual.
11
Using HLA with the
HIDE Integrated Development Environment
Sevag has written a nice
HLA-specified integrated development environment for HLA called HIDE (HLA IDE).
This one is a bit easier to install, set up, and use than RadASM (at the cost
of being a little less flexible). HIDE is great for beginners who want to get
up and running with a minimal amount of fuss. You can find HIDE at the HIDE
home page:
http://www.geocities.com/kahlinor/HIDE.html
12
HLA Internal
Operation
To effectively use HLA, it helps
to understand how HLA translates HLA source files into executable machine code.
This information is particularly useful if you install HLA incorrectly and you
cannot successfully compile a simple demo program. Beyond that, this
information can also help you take advantage of more advanced HLA and OS
features.
As noted earlier in this
document, HLA is not a single
application; the HLA system is a collection of programs that work together to
translate your HLA source files into executable files. This is not unusual,
most compilers and assemblers provide only part of the conversion from source
to executable (e.g., you still have to run a linker with most compilers and
assemblers to produce an executable).
The HLA system offers a rich
set of different configurations that allow you to mix and match components to
efficiently process your assembly language applications. First of all, HLA is
relatively portable. The
compiler itself is written with Flex, Bison, C/C++, along with some
platform-independent assembly language code (written in HLA, of course). This
makes it fairly easy to move the compiler from one operating system to another.
Currently, HLA is supported under Windows, Linux, FreeBSD, and Mac OSX. Plans include porting HLA to QNX and
Solaris at some point in the future. Even within a single operating system, HLA
offers multiple configurations that you can employ, based on your needs and
desires. This section will describes some of the possible configurations you
might create.
The compilation of a typical
HLA source file using a command line such as “hla hw” goes through three or
four major phases:
•
The HLA.EXE (Windows) or hla (other OSes) program
processes command-line parameters and acts as a “traffic cop” directing the
execution of the remaining components of the HLA system.
•
The HLAPARSE.EXE
(Windows) or hlaparse (other OSes) program is responsible for translating the
HLA source file into either an object file or into the syntax of some other
assembler. Usually, the HLAPARSE program is run automatically by some other
program such as HLA.EXE/hla or HIDE (the HLA Integrated Development
Environment). You would not normally run HLAPARSE directly from a command-line
(though it is certainly possible to do this if you are so inclined).
•
If you elect to have
HLA produce an assembly language output file rather than an object module, then
the next step towards producing an executable file is to run the associated
assembler on the source output that HLA produced. This step isn’t strictly
necessary because HLA can produce an object file directly without using some
external assembler, but there are some (rather esoteric) reasons why you might
want to go through some other assembler rather than having HLA directly produce
the object file. Generally, the HLA.EXE (hla) program will automatically run
the assembler for you.
•
The last step is to
run a linker to combine the object module the previous steps created with the
HLA Standard Library and any other necessary object modules for the project.
The output of the linkage step is an executable file (assuming, of course,
there were no errors in the compilation of your program). Generally, the
HLA.EXE (hla) program will automatically run the linker for you.
There is a fifth, optional,
step that can also take place under Windows. If you are creating an application
that makes use of compiled resources, as the fourth step (before the linking
stage) the HLA.EXE (Windows only) program can run a resource compiler to
translate those resources into an object module (.res) that the linker can link
into your final executable.
As it turns out, the HLA system
can employ a wide variety of linkers, librarians, assemblers, and other tools
based on the underlying operating system. Here is the list of tools that HLA
has been qualified with:
Under Windows:
•
Microsoft’s MASM
assembler
•
Microsoft’s linker
•
Microsoft’s resource
compiler
•
Microsofts LIB
library manager
•
The Flat Assembler
(FASM)
•
Pelles C POLINK
linker
•
Pelles C POLIB
library manager
•
Pelles C PORC
resource compiler
•
Borland’s Turbo
Assembler
Note: you can use Borland’s
TLINK and TLIB utilities with HLA, but you will have to manually run these
applications; the HLA system will not automatically execute them.
Under Linux and FreeBSD:
•
The Free Software
Foundation’s (FSF) Gas assembler (as)
•
FSF’s linker (ld)
•
The Flat Assembler
(FASM, Linux only)
Under Mac OSX:
•
The Free Software
Foundation’s (FSF) Gas assembler (as, special Macintosh version).
•
FSF’s GNU linker (ld)
A couple of obvious questions
that might come up: “Why provide all these options? Why not simply pick a single configuration and go with
that?” Well, as it turns out,
there are advantages and disadvantages to each configuration and allowing
multiple configurations affords you the most flexibility when writing code.
12.1
Standard
Configurations Under Windows
The “standard” HLA
configuration under Windows consists of HLA.EXE, HLAPARSE.EXE, PORC.EXE, and
POLINK.EXE. This standard
configuration generates object files directly, compiles any resource files
using the Pelles C PORC.EXE resource compiler, and links the object modules
together using the Pelles C linker (POLINK). The Pelles C tools were chosen for
the standard configuration under Windows because they are freely distributable
(unlike the Microsoft tools). For those who care about such things,
HLAPARSE.EXE produces object modules directly using components built from the
Flat Assembler (FASM), so you know you’re getting the optimal output code that
FASM produces (generally, the code quality is a little bit better than MASM or
TASM).
So why would anyone want to
have HLA produce assembly language output to be run through a different
assembler (much like GCC does)?
For common applications, there is no need to do this. However, in some
specialized situations having this facility is quite useful. For example,
rather than using an internal version of FASM as HLA’s back-end native code
generator, you may elect to have HLA generate FASM source code to be processed
by the FASM assembler. There are three reasons for doing this:
•
You want to see how
HLA would translate the HLA program (in HLA syntax) to a lower-level assembly
language (in FASM syntax); this is great, for example, for seeing how macros
expand or how HLA processes high-level control constructs.
•
Because the internal
version of FASM that HLA uses has to coexist in memory with HLA, the amount of
memory allocated to this internal FASM is much less than is available to a
standalone version of FASM. Therefore, it is possible that some very large projects will not compile using the internal
version of FASM but will compile if you produce a FASM source file and run the
external version of FASM on it.
•
If there is a defect
in the internal version of FASM that prevents HLA from directly generating an
object code file, you can probably produce a source file and successfully
compile your program using the external version of FASM (the internal version
was a rewrite of the FASM assembler, so the fact that it contains a defect does
not suggest that the external version contains the same defect).
Another configuration is to
have HLA produce a MASM compatible output file and use Microsoft’s MASM to
translate that output source file into an object file. There are several reasons why you might want to use MASM:
•
MASM can inject
symbolic debugging information (usable by OllyDbg or Visual Studio’s debugger)
into the object file, making it easier to debug HLA applications.
•
FASM (internal or
external) may have some code generation defect that you can’t work around.
•
FASM’s output might
not be completely compatible with some other object module tool you’re using.
•
You want to take HLA
output and merge it with some MASM projects you’ve got.
Although MASM is not a freely
distributable program (and, therefore, is not included in the HLA download),
you may download a copy for free from the Microsoft Web site or obtain a copy
as part of the MASM32 package.
One last assembler choice under
Windows is Borland’s Turbo Assembler (TASM). There is one main reason why you
would want to use TASM to process HLA output – you want to link HLA output with
a Borland Delphi project. Delphi is very particular about the object files it
will link against. Effectively, you can only use TASM-generated output files
when linking with Delphi code. Therefore, if you want to link your HLA modules
into a Delphi application, you’ll need to use the TASM output mode. Like MASM,
TASM is not a freely distributable product and cannot be included as part of
the HLA download. However, Borland will provide a free copy as part of their
free C++ download on their website (registration required).
Under Windows, you may use
either the freely distributable Pelle’s C linker (Polink) or the Microsoft
linker to process the object code output from the HLA system. Polink is
provided with the HLA download (subject, of course, to the Pelles C license
agreement). Microsoft’s linker is a commercial product (and as such, it is not
included as part of the HLA download), but it is available as a free download
from Microsoft’s web site and as part of the MASM32 package. HLA will use
either linker as the final stage in producing an executable. The Microsoft
linker has been around longer and has, arguably, fewer bugs than Polink, but
the choice is your’s. Another possible linker option is the Borland Turbo
linker (TLINK). Just note that HLA.EXE will not automatically run TLINK; you
will have to run it manually after producing an OMF object file with HLA. Also
note that only MASM and TASM are capable of producing OMF files. FASM and HLA’s
internal code generator do not generate OMF object code files, so you cannot
use TLINK with their output.
To produce libraries, you may
optionally employ a librarian such as Microsoft’s LIB.EXE, the Pelle’s C
POLIB.EXE, or Borland’s Turbo Librarian (TLIB.EXE). The HLA.EXE program does
not automatically run these programs; you will have to run them manually to
create a .LIB file from your object files. Please see the documentation for
these products for details on their use. The HLA download includes the
POLIB.EXE program and the HLA standard library source code includes a make file
option that will use any of these three librarians to produce the HLA
hlalib.lib library file.
Note that it is possible to mix
and match modules in the HLA system, within certain reasonable limitations. For
example, you could use the FASM assembler and the Microsoft linker, the TASM
assembler and the POLINK linker, or even the MASM assembler the TLINK linker.
In general, FASM output works fine with the Microsoft linker and librarian or
the Pelle’s C linker and librarian, MASM output works best with Microsoft’s
linker and librarian, and Turbo assembler works best with the Borland tools or
the Microsoft tools.
Under Windows, the default
configuration is to generate an MSCOFF object file directly and use the POLINK
linker to process the resulting object file(s). See the section on “Customizing
HLA” for details on changing the default configuration.
12.2
Standard
Configurations Under Linux, FreeBSD, and Mac OSX
Under *NIX (Linux, FreeBSD, and
Mac OSX), HLA supports fewer configurations than under Windows but this is
primarily because the main tools available for Linux are all freely
distributable and there is no need to support commercial tools. There are three
different ways to generate object code files and only one linker and one
librarian option available under Linux. There is no resource compiler (that HLA
would automatically use).
HLA can generate object files
in one of two different ways under *NIX:
•
The hlaparse program
can generate a FASM-compatible source file that can be further processed by the
Linux version of the FASM assembler to produce an ELF file (Linux only).
•
The hlaparse program
can generate a Gas-compatible source file (using the .intel_syntax mode) that
the FSF Gas assembler can convert to an ELF file.
Under *NIX you don’t get a
choice of linkers. Everyone uses the FSF/GNU ld (load) program as the standard
system linker. The HLA package also uses ld. In a similar vein, your only librarian choice is the FSF/GNU
ar (archive) program. These tools work great and they’re freely distributable,
so they’re the perfect back ends to the HLA system.
The HLA download for Linux
includes the FASM assembler but it does not include Gas (as), ld, or ar. These
are standard GNU tools that ship with nearly every version of Linux, so there
is no need to duplicate that code in the HLA package. Note that you must be
using a 32-bit version of Gas, version 2.10 or later (64-bit versions may not
work automatically with HLA). The
FreeBSD version of HLA uses only the GNU assembler (again, v2.10 or later). The
Mac OSX version of HLA uses the Macintosh version of GNU’s Gas assembler and
loader.
Under Linux, the default HLA
configuration generates a Gas compatible assembly language file and then runs
Gas to produce an ELF object code file. If you would prefer to produce FASM output and use the FASM
assembler, or generate ELF output directly using the internal version of FASM,
then see the section on “Customizing HLA” for more details. Under FreeBSD, HLA produces a Gas
output file and then runs Gas to process it; no FASM option is available under
FreeBSD. Likewise, under Mac OSX HLA produces a Mac-Gas compatible assembly
language file that is processed by the Macintosh version of Gas to produce a
Mach-o object file.
12.3
Non-Standard
Configurations
It is possible, though
uncommon, to use HLA in ways that aren’t 100% compatible with the underlying
operating system. For example, under Windows you can use HLA to produce a
Gas-compatible assembly language source file. Likewise, you can use HLA under
Linux to produce a MASM or TASM compatible assembly language source file.
However, note that when HLA produces a Gas file, it includes certain start-up
code that is only appropriate for Linux; this is true even if you do this under
Windows. Similarly, producing a MASM or TASM source file includes start-up code
that is only appropriate for Windows, even if the file is produced under
Linux. So even if it were possible
to run these products under the “wrong” operating system (e.g., MASM under
Linux), the resulting object files would not be in a format acceptable to the
OS and the code emitted by the HLA compiler wouldn’t run properly.
Nevertheless, if you just want to view the assembly language file that HLA
produces, it doesn’t really matter what operating system you’re running under,
so you may as well pick an output format that you are most confortable with.
Note that as of HLA v1.102, it
is possible to specify the target OS using the HLA command-line options
“-win32”, “-linus”, “-freebsd”, and “-macos”. However, these are “source output” only options. That is, if
you specify “-win32” under a *NIX OS, you’ll get a source file (FASM, MASM, or
TASM) that can be compiled by an appropriate assembler, but you must compile
that source file under the target OS (Windows in this case).
13
Using the HLA Command-Line Compiler
Once you’ve installed HLA and
verified that it is operational, you can run the HLA compiler. The HLA compiler consists of two
executables: hla(.exe)[3],
which is a shell that processes command line arguments, compiles
".hla" files to ".asm" files (or directly to .obj files),
assembles the ".asm" files by calling an assembler (except in object
output mode), and links the resulting files together using a linker
program; the second executable is
hlaparse(.exe) which compiles a single ".hla" file to an assembly
language file or an object code file.
Generally, you would only run hla(.exe). The hla(.exe) program automatically runs the hlaparse(.exe)
and assembler/linker programs. The
hla(.exe) command uses the following syntax:
hla optional_command_line_parameters Filename_list
The filenames list consists of
one or more unambiguous filenames having the extension: ".hla",
".asm" or ".obj"/".o"[4]. HLA will first run the hlaparse(.exe)
program on all files with the HLA extension (producing files with the same
basename and an ASM extension).
Then HLA runs the assembler on all files with the ".asm"
extension (including the files produced by hlaparse). Finally, HLA runs the linker to combine all the object files
together (including the ".obj"/".o" files the assembler
produces). The ultimate result,
assuming there were no errors along the way, is an executable file (with an EXE
extension under Windows, with no extension under Linux).
HLA supports the following
command line parameters:
Usage: hla options filename(s)
HLA (High Level Assembler -
FASM back end, POLINK linker)
Version 1.102 build 19257
(prototype)
Generic Options:
-license
Display license information.
-@ Do not generate linker
response file.
-@@ Force generation of a new
linker response file.
-dxx Define VAL symbol xx to have type
BOOLEAN and value TRUE.
-dxx=str Defile
VAL symbol xx to have type STRING and value str.
-sym Dump symbol table after compile.
-test Send diagnostic info to stdout rather than
stderr.
-v Verbose compile (also
sends output to stdout).
-? Display this help
message.
Language Control:
-level=h
High-level assembly language.
-level=m
Medium-level assembly language.
-level=l
Low-level assembly language.
-level=v
Machine-level assembly language (very low level).
Source Output Control:
-sourcemode Compile to source instructions (rather than hex
opcodes)
-s Compile to
.ASM files only (using default ASM syntax).
-sh Compile to
pseudo-HLA source file. (implies sourcemode).
-sm Compile to MASM
source files only.
-sf Compile to FASM
source files only.
-sn Compile to NASM
source files only.
-st Compile to TASM
source files only.
-sg Compile to GAS
source files only.
-sx Compile to GAS
source files for Mac OSX only.
-code1st
Emit machine instructions before data in code segment.
HLAPARSE Compiler/Back-end
Assembler Output Control:
-c Compile and assemble to
object file only.
-cf Compile and assemble to object
file only (using FASM).
-cn
Compile and assemble
to object file only (using NASM).
-cm Compile/assemble to object
using MASM (Windows only).
-ct Compile/assemble to object
using TASM (Windows only).
-cg Compile/assemble to object
using GAS (Linux/FreeBSD only).
-cx Compile/assemble to object
using GAS (Mac only).
-co Compile/assemble to object
using internal FASM back-end (Win32).
-o:omf
Produce OMF files (for Windows).
-o:coff
Produce win32 COFF files (for Windows).
-o:elf
Produce ELF files (for Linux or FreeBSD).
-o:macho
Produce Mach-o files (for Mac OSX).
-axxxxx
Pass xxxxx as command line parameter to assembler.
Executable Output Control:
-xf Compile/assemble/link to
executable (using FASM).
-xn Compile/assemble/link to
executable (using NASM).
-xm Compile/assemble/link to
object using MASM (Windows only).
-xt Compile/assemble/link to
object using TASM (Windows only).
-xg Compile/assemble/link to
object using GAS (Linux/FreeBSD only).
-xx Compile/assemble/link to
object using GAS (Mac only).
-xo Compile/assemble/link to
object internal FASM back-end (Windows only).
-win32
Generate code for Win32 OS.
-linux
Generate code for Linux OS.
-freebsd
Generate code for FreeBSD OS.
-macos
Generate code for Mac OSX.
Linker Control:
-lxxxxx
Pass xxxxx as command line parameter to linker.
-e:name
Executable output filename (appends ".exe" under Windows).
-x:name
Executable output filename (does not append ".exe").
-m Create a map file during
link
-w Compile as windows app
(default is console app).
-polink
Force use of Pelles C linker/resource compiler.
-mslink
Force use of Microsoft linker/resource compiler.
Temporary file control and assembly control:
-p:path
Use <path> as the working directory for temporary files.
(overrides hlatmp environment variable.)
-r:name
<name> is a text file containing cmd line options.
-obj:path Use <path> as the directory to hold the
object files.
-i:path
Include path (used to override HLAINC environment variable).
-lib:path Library path (used to overide HLALIB environment
variable).
HLA Environment Variables:
hlalib=<path> Sets path to hlalib.lib
file
(e.g., c:\hla\hlalib\hlalib.lib)
hlainc=<path> Sets path to HLA include
subdirectory
(e.g.,
c:\hla\include)
hlatmp=<path> Sets path to directory
to hold temp files (optional)
hlaasmopt=<options> Passes the specified command-line options on to the
underlying assembler.
hlalinkopt=<options> Passes the specified command-line
options on to the
underlying linker.
hla=<asm> Sets default
assembler behavior
<asm>:
hla- uses internal version
of FASM
ohla- uses internal version of FASM
fhla- uses FASM as the back-end assembler
nhla- uses NASM as the back-end assembler
Windows Only:
mhla- uses MASM as the back-end assembler
thla- uses TASM as the back-end assembler
Linux Only:
ghla- uses GAS as the back-end assembler
hlalink=<lnkr> Sets
default linker behavior
<lnkr>:
Windows Only:
mslink- use Microsoft's link.exe linker
polink- use the Pelles C polink.exe linker
Linux Only:
ld- use the FSF/GNU ld linker
Note that HLA ignores case when
processing command line parameters (unlike typical Linux programs). Hence, "-s" is equivalent to
"-S" (for example) when specifying a command line parameter.
-license
This command displays license
information for the entire HLA system. Although the HLA source code written by
Randall Hyde is all public domain, certain components of the HLA system,
including the back-end assembler, the linker, and the resource editor, come
from other sources. The “-license” command-line parameter lists license
information about these other products.
-@
-@@
Under Windows, HLA will produce
a "linker response file" that it supplies to the Microsoft LINK.EXE
(or POLINK.EXE) program during the link phase. This linker response file contains necessary segment
declarations and other vital linker information. By default, HLA uses any existing ".LINK" file
whenever you run the compiler; it will create a new “xxx.link” file only if one
does not already exist. The
"-@" option tells HLA not to create a new ".LINK" file,
even if one does not already exist.
The “-@@” option tells HLA to always create a “.link” file, even if one
already exists.
If you specify multiple
".HLA" filenames on the command line, HLA only generates a single
".LINK" file using the name of the first ".HLA" file it
encounters. *NIX’s ld program does
not require this linker response file, so the *NIX versions of HLA do not
produce this file.
-d:XXXXX{=YYYYY}
The -dXXXXX option tells HLA to define the symbol XXXXX as a boolean VAL constant and initialize it with
the value TRUE. Generally you use
such symbols to control the emission of code during assembly using statements
like "#if( @defined( XXXXX
)) ..."
The -dXXXX=YYYY option tells HLA to define the symbol XXXX as a string VAL constant and give it the initial
value “YYYY”.
-sym
The -sym option dumps the symbol table after compiling each file with an HLA
extension. This option is
primarily intended for testing and debugging the HLA compiler; however, this information can be useful
to the HLA programmer on occasion.
-test
The -test option is intended
for hlaparse testing and debugging purposes only. It causes the compiler to send all error messages to the
standard output device rather than the standard error device. This allows the test code to redirect
all errors to a text file for comparison against other files.
-v
The -v option (verbose) causes
HLA to print additional information during compile to show the progress of the
compilation. Due to a bug in MASM,
if you do not specify the -v option the compilation isn’t completely
quiet. MASM will still output data
to the standard error device even in quiet (non-verbose) mode.
-?
The -? option cause HLA to dump
the list of command line options and immediately quit without further work.
Note that the command line
options this document describes are for HLA v1.87 and later only. Earlier versions of HLA used a
different command line set. See
the documentation for the specific version you’re using if you have questions.
-level=h
-level=m
-level=l
-level=v
The -level options enable or
disable certain HLA language features. These command-line options are intended
for use in programming courses where the instructor needs to batch compile
dozens or even hundreds of student projects at one time and the instructor
would like a convenient way to ensure that the students aren’t using high-level
control constructs that are inappropriate for that point in the course (e.g.,
towards the end of a course, most instructors don’t allow the use of various
high-level control constructs; some instructors may never allow them). The
“-level” command-line options will “turn off” various statements in the HLA
language so that the HLA compiler will report an error if the student attempts to use them in a
source file.
The default, “-level=h” (high)
enables the entire HLA language.
The “-level=m” (medium level)
disables high-level language control constructs, such as “if”, “while”, and
“for” but still allows the use of high-level-like procedure calls in the HLA
language. Medium-level assembly language also allows the use of exceptions
using HLA’s try..except..endtry and raise statements.
The “-level=l” (low-level assembly) disables all high-level
control constructs other than the exception-handling statements and disables
high-level-like procedure calls in HLA. This option also disables automatic
stack frame generation and clean up in HLA procedures (that is, the programmer
will be responsible for writing that code themselves).
The “-level=v” (very low-level
assembly) option disables all high-level control constructs including exception
handling. Only machine instructions (and user written macros) are legal in the
source file. No high-level control constructs or high-level procedure calls are
allowed.
-sourcemode
When emitting source code to be
processed by a back-end assembler, the HLA compiler normally compiles all
machine instructions to their binary opcodes and emits those opcodes as “DB”
statements with the hexadecimal encoding of the instructions. As a general rule, this is faster and
far more efficient than emiting machine instructions that the back-end
assembler has to process. In a
couple of situations, however, it’s better to have HLA emit actual human
readable machine instructions. For example, if you want to read the source
output produced by the HLA compiler, you’d probably prefer to have the output
in human-readable source code form (with mnemonic instruction strings rather
than hexadecimal opcodes). Another situation where source code is preferable is
when you plan on running the output through a symbolic debugger (such as the
GNU GDB debugger) that displays the source code during debugging. The “-sourcemode” command-line option
instructs HLA to emit mnemonic machine instruction strings rather than
hexadecimal opcodes to the output file. Note that this option is not like the
“-s” option – it does not stop the compilation after producing a source file;
this option simply tells the compiler the format of the assembly language
source file it produces.
-s
The -s option tells the HLA
program to run only the hlaparse compiler to produce an assembly language
source file; HLA will not run an
assembler or linker. As a result,
HLA ignores any ".asm" or ".obj" filenames you supply on
the command line. This option is
useful if you wish to view the output of an HLA compilation without producing
any actual object code. If you
specify this option with a version of HLA that would normally produce an object
file output, HLA will emit a FASM-compatible source file instead.
-st
The -st option tells HLA to
produce TASM-compatible assembly and stop after
compilation. Note that the
source file you produce will contain Windows-specific code, even if you produce
the TASM-compatible source file under Linux.
-sm
The -sm option tells HLA to
produce MASM-compatible assembly and stop after
compilation. Note that the source file you produce will contain
Windows-specific code, even if you produce the MASM-compatible source file
under Linux.
-sn
The -sm option tells HLA to
produce NASM-compatible assembly and stop after
compilation. The output source
file is compatible with NASM v2.02 or later.
-sg
The -sg option tells HLA to
produce Gas-compatible assembly and stop after compilation.
This option is used to produce Linux and FreeBSD style Gas code. Note that HLA
v1.102 and later produces AT&T syntax Gas assembly language source code.
-sx
The -sx option tells HLA to
produce Gas-compatible assembly for the Mac OSX version of
Gas and stop after compilation.
-sf
The -sf option tells HLA to
produce FASM-compatible assembly and stop after
compilation. Unlike the other
source output forms, FASM output is specific to the underlying OS on which you
ran HLA. That is, using the -sf option under Windows produces Windows-specific
output, using the -sf option under Linux produces a Linux-compatible file.
Unless you set a different target OS, this command will probably produce
inconsistent results under FreeBSD or Mac OSX.
-sh
The -sf option tells HLA to
produce a somewhat HLA-compatible assembly source
file. It may seem redundant to
compile an HLA source file to another HLA source file, but this option is
actually quite useful. The resulting source file has all macros expanded and
shows out the compiler translates HLL-like control structures into pure machine
code. Therefore, this option is useful for examining HLA’s output.
-code1st
This option instructs HLA to
emit the code to the output source file before any data. Note that this option
may not work with certain assemblers or under certain operating systems. It is
really intended for analysis and debugging purposes, not for normal day-to-day
use. In other words, you shouldn’t use this option unless you really know what
you’re doing.
-c
The -c option tells HLA to run
the hlaparse compiler and the (default) assembler, producing
".obj"/".o" files.
HLA will process all filenames on the command line that have
".hla" or ".asm" extension, but it will ignore any
filenames with ".obj" extensions. If you compile an HLA unit without compiling an HLA program
at the same time, you will need to use this option or the linker will complain
about not finding the main program.
You may specify the ".obj"/".o" file format using
the COFF or OMF command line options (MASM only, TASM always
produces OMF files, Gas always produces ELF files, and FASM always produces
COFF or ELF files).
One common use of this option
is to compile HLA units to OBJ files.
Since HLA units do not contain a main program, you cannot compile an HLA
unit directly to an executable. To
compile an HLA unit separately (i.e., without compiling an HLA main program
during the same HLA.EXE invocation) you must specify the “-c” option or the
compilation will generate an error when it attempts to link the program.
A second reason for using the
“-c” option is because you want to explicitly run the linker yourself and
supply linker command line options that are different than those that HLA
automatically provides.
-co
Compiles an HLA source file
directly to an object file using the internal version of FASM (overriding the
default assembler). Otherwise, the actions are identical to -c. This option is
available only under Windows.
-cf
Compiles an HLA source file to
a FASM source file and then compile this source file to an object file using an
external version of FASM (overriding the default assembler). Otherwise, the
actions are identical to -c.
-cm
Compiles an HLA source file to
a MASM source file and then compile this source file to an object file using
MASM (overriding the default assembler). Otherwise, the actions are identical
to -c. Note that this option is available only under Windows.
-cn
Compiles an HLA source file to
a NASM source file and then compile this source file to an object file using
NASM (overriding the default assembler). Otherwise, the actions are identical
to -c. Note that this option is currently available only under Windows, though
plans are to make this option available for the other OSes, too.
-ct
Compiles an HLA source file to
a TASM source file and then compile this source file to an object file using
TASM (overriding the default assembler). Otherwise, the actions are identical
to -c. Note that this option is available only under Windows.
-cg
Compiles an HLA source file to
a Gas source file and then compile this source file to an object file using Gas
(overriding the default assembler). Otherwise, the actions are identical to -c.
Note that this option is available only under non-Windows operating systems.
-cx
Compiles an HLA source file to
a Gas source file and then compile this source file to an object file using the
Macintosh version of Gas (overriding the default assembler). Otherwise, the
actions are identical to -c. Note that this option is available only under
non-Windows operating systems.
-o:omf
The -o:omf option tells the
underlying assembler (MASM or TASM) to produce an Object Module Format (OMF)
OBJ file. This option is generally
applicable only to MASM since TASM always produces OMF files. This option is not legal when using the
Gas or FASM assemblers.
-o:coff
The -o:coff option instructs
the assembler to generate a COFF OBJ file. This option is the default for MASM/FASM and may not be
available for other assemblers. This option is only available under Windows.
-o:elf
The -o:elf option instructs the
assembler to generate an ELF .o file.
This option is the default for Gas under FreeBSD and Linux and may not
be available for other assemblers/OSes.
-o:macho
The -o:macho option instructs
the assembler to generate a Mach-0 file.
This option is the default for Gas under MacOSX and may not be available
for other assemblers/OSes.
-a<option>
The -aXXXXX option lets you pass assembler-specific command
line options to the assembler during the assembler phase. This option is ignored if you use one
of the -s options. One common form of this command often used with the MASM assembler is “-aFi -aFz” that tells
MASM to generate debugging information in the object file (for use with the
OllyDbg debugger program).
-xo
Compiles an HLA source file
directly to an object file using the internal version of FASM (overriding the
default assembler). Links the result to produce an exectuable file. Under
Windows, this option uses POLINK as the default linker unless overriden. This command is valid only
under Windows.
-xf
Compiles an HLA source file to
a FASM source file and then compile this source file to an object file using an
external version of FASM (overriding the default assembler). Then links the
result to produce an executable file. Under Windows, this option uses POLINK as
the default linker unless overriden. Under Linux, this command option uses the
GNU LD linker. This option is available only under Windows and Linux.
-xm
Compiles an HLA source file to
a MASM source file and then compile this source file to an object file using
MASM (overriding the default assembler). Links the resulting object file to
produce an executable (Microsoft’s linker is the default linker). Note that
this option is available only under Windows. Under Linux, this command-line
option is identical to “-sf”.
-xn
Compiles an HLA source file to
a NASM source file and then compile this source file to an object file using
NASM v2.02 (overriding the default assembler). Links the resulting object file
to produce an executable (Microsoft’s linker is the default linker).
-xt
Compiles an HLA source file to
a TASM source file and then compile this source file to an object file using
TASM (overriding the default assembler). Then links the result to produce an
executable (Microsoft’s linker is the default linker it uses). Under Linux,
this option only produces a source file; it will not run TASM or the linker to
produce an executable (that is, this command is equal to “-st” under *NIX).
-xg
Compiles an HLA source file to
a Gas source file and then compiles and links this source file to an executable
file using Gas (overriding the default assembler). Note that this option is
available only under Linux and FreeBSD operating systems.
-xx
Compiles an HLA source file to
a Gas source file and then compile this source file to an object file using the
Macintosh version of Gas (overriding the default assembler). It then runs the
Mac version of ld to produce an executable. Note that this option is available only under Mac OS X.
-win32
Compiles an HLA source file to
an assembly language source file than can be processed and run under Windows.
-linux
Compiles an HLA source file to
an assembly language source file than can be processed and run under Linux.
-freebsd
Compiles an HLA source file to
an assembly language source file than can be processed and run under FreeBSD.
-macos
Compiles an HLA source file to
an assembly language source file than can be processed and run under Mac OSX.
-lXXXXX
The -lXXXXX option passes the text XXXXX on to the linker as a command line option. One
common command to pass to the Microsoft linker is “-lDEBUG -lDEBUGTYPE:COFF”
that tells the linker to generate debugging information in the object file (for
use with the OllyDbg debugger program).
-e:name
By default, HLA creates an
executable filename using the extension ".exe" (Windows) or without
an extension (Linux) and the basename of the first filename on the command
line. You can use the -e name option to specify a different executable file name
(which will include an “.exe” suffix under Windows).
-x:name
By default, HLA creates an
executable filename using the extension ".exe" (Windows). This option
lets you specify the full filename, including the extension (i.e., “.exe” is
not automatically appended to the name). This option useful mainly under
Windows. Under *NIX, it behaves exactly like the “-e” option.
-m
The -m option tells the
Microsoft linker or POLINK to produce a map file during the link phase. This is equivalent to the
"-lmap" option. The
Linux version of HLA ignores this option.
-w
The -w option informs HLA that
you are compiling a standard Windows (GUI) application rather than a console
application. By default, HLA
assumes that you are compiling a executable that will run from the command
window. If you want to write a
full Windows application, you will need to supply this option to tell HLA not
to link the code for console operation.
Obviously, this option doesn’t apply to Linux systems.
The “-w” option tells HLA to
invoke the linker using the command line option
-subsystem:windows
rather than the default
-subsystem:console
This provides a convenient
mechanism for those who wish to create win32 GUI applications. Most likely, however, if you wish to
create GUI applications, you will run the linker explicitly yourself (as this
document will explain), so you’ll probably not use the “-w” option very
frequently. It’s great for some
short GUI demos, but larger GUI programs will probably not use this
option. This option is only active
if HLA compiles the program to an executable. If you compile the program to an OBJ or ASM file, HLA
ignores this option.
If you want to develop Win32
GUI apps, take a look at Randy Hyde’s book “Windows Programming in Assembly”.
This book provides the linker commands and makefiles for generation such
applications (as well as describing how you actually write such code).
-polink
Under Windows, this forces the
use of the polink linker.
-mslink
Under Windows, this forces the
use of the Microsoft linker.
-p:path
During compilation, HLA
produces several temporary files (that it doesn’t delete, because they may be of
interest to the HLA user). These
files have a habit of cluttering up the current work directory. If you prefer, you can tell HLA to
place these files in a temporary directory so they don’t clutter up your
working directory. One way to
accomplish this is by using the "-p:dirpath" command line option. For example, the option
"-p:c:\hla\tmp" tells HLA to put all temporary files (for the current
assembly) into the "c:\hla\tmp" subdirectory (which must exist). Note that you can set also set the
temporary directory using the hla "hlatemp" environment
variable. The "-p:dirpath" option will override the environment
variable (if it exists). See the description of the hlatemp environment
variable for more details.
Warning: as of FASM v1.51, the use of this option with FHLA is not
recommend because of the way FASM handles include files. FASM’s author is aware
of this issue and will probably do something about it at some point. Until
then, you should always CD into the directory where your HLA projects lie and leave
all the temporary files in that same folder when running FHLA.EXE. This
restriction will be relaxed in a future version of FHLA/FASM.
-obj:path
During compilation, HLA
normally writes all object files to the current working directory. Some programmers have requested a way
to specify a different directory for the .OBJ (.o under Linux) files that HLA
produces. This is now accomplished
using the "-obj:dirpath"
command line option. The dirpath item has to be the path to a valid directory. HLA places all object files produced by the compiler and/or
resource editor in this directory.
Note that, unlike the -p option, there is no environment variable that
lets you permanently set this path. You must specify the path on a compilation
by compilation basis (use a makefile if you get tired of typing the path in on
each compilation).
-r:filename
The “-r:filename” option lets
you specify a response file containing a sequence of HLA command-line
parameters. The file specified after this option must contain a seqeuence of
HLA command-line parameters, one per line, which HLA executes exactly as though
they were specified on the command line. E.g.,
sampleFile.resp:
-cf
sampleFile.hla
The following command treats
each of the above lines as separate HLA command-line parameters:
hla -r:sampleFile.resp
-i:path
This overrides the value of the
HLAINC environment variable and tells HLA where it can find the include/header
files for the HLA standard library.
-lib:path
This overrides the value of the
HLALIB environment variable and tells HLA where it can find the
hlalib.lib/hlalib.a library archive file.
14
The HLAPARSE Command
Line
The HLAPARSE program was
written with the expectation that it would always be invoked by some other
program (e..g, HLA/HLA.EXE). If
you really know what you are doing, you can invoke HLAPARSE from the command line manually. However, other
than testing or debugging the HLA system, there really is no need to manually
invoke HLAPARSE. All the functionality available from the HLAPARSE command line
is also available from the HLA command line. Further, HLA does sanity checks on
the command line parameters, fills in optional values, fetches environment
variables, and so on. HLAPARSE expects a program like HLA to provide a correct
set of command-line parameters; it does not do all the checks on the parameters
and if they are incorrect, it may silently fail. As such, it’s not intended to
be used by end-uses as a matter of course.
There is one case where you
might be interested in invoking HLAPARSE via some other means than via the HLA
program - when writing a replacement for HLA. A good example of this is the
HIDE (HLA Integrated Development Environment) system. If you’re interested in
writing a program that invokes HLAPARSE directly, then you’ll want to learn
about the HLAPARSE command-line parameter set. However, this manual is not the
place to discuss such things, as they are considered internal to the HLA
system. You can view the possible command-line parameters by typing
"HLAPARSE -?" at the command line. However, to truly understand their
semantics, you’ll want to open up the hlaparse.bsn source file and study the
code at the very end of the file (warning, it’s about 100,000 lines long, so
you’ll need a good editor to open and look at this code). I (Randy Hyde) will
be more than happy to answer any questions via email or via some forum concerning
these parameters; they are not, however, something I want to document and give
people the impression that they are available for everyday use.
15
Customizing HLA
Through the use of environment
variables and program names, you can create a customized version of HLA that
suits your particular needs. The following subsections describe different ways
you can optimize HLA for your personal use.
15.1
Changing the Location
of HLA
To simplify installation and
reduce installation problems, this manual suggests that you install HLA under
Windows in the C:\hla subdirectory and install HLA under *NIX in the /usr/hla
subdirectory. If you would prefer
to put the HLA system somewhere else, it’s easy to do as long as you tell the
system what you’re doing. This is typically accomplished by setting up a couple
of environment variables.
First and foremost, to be able
to run the HLA compiler and associated tools, the hla.exe/hla, hlaparse.exe/hlaparse,
back-end assembler (if applicable), and linker all have to be in directories in
the execution path. You may either
move the HLA executables to some existing directory in the OS’ execution path,
or you can tell the OS to include the directory containing these files in the
execution path (the standard HLA installation instructions, for example, opt
for this latter case). Note that
simply specifying the full path to the command-line interpreter for HLA.EXE
(hla under *NIX) may not be sufficient because the HLA.EXE (hla) program runs
HLAPARSE, the back-end assembler (if applicable) and the loader, so all of
these other programs must be in the execution path. Therefore, if you’re going
to move any of the HLA executables, make sure they are all present in a
directory that appears in the PATH environment variable.
Under *NIX, for example, it’s
not uncommon for someone to put executables in the /usr/bin or /usr/local/bin
directories. These directories are always in the execution path, so placing all
the HLA executables in one of these directories under *NIX would spare you
having to add the /usr/hla subdirectory to your execution path.
Under Windows, there is no
special directory where everyone dumps their little executable files (like
/usr/local/bin under *NIX). You could find an existing directory that’s in the
execution path and dump the HLA executables in there, however it’s almost
always a better idea to simply change the path environment variable so that it
includes the HLA directory that contains the executables. If you’ve install HLA via the HLA
installation program, the install program automatically sets this up for you.
However, if you want to move HLA to a different directory in the future, you
will need to remove the old path to HLA from your PATH environment variable and
add the path to the new HLA executables to the PATH.
Changing the execution path
isn’t your only concern if you decide to move HLA around. The HLA compiler will
also need to know where it can find the HLA include files and the hlalib.lib/hlalib.a
standard library files. Under Windows, the linker might also want to know where
the hlalib.lib file can be found.
If you haven’t told it otherwise,
HLA under *NIX assumes that the include subdirectory and the hlalib
subdirectory can be found in the /usr/hla subdirectory. Under Windows, HLA will
first look in the same directory containing the HLA executables and, failing to
find the include and hlalib directories there, it will then look in the C:\HLA
subdirectory. If you’ve moved the HLA include and hlalib directories somewhere
else, then you will need to set up environment variables to tell HLA where it
can find these directories (technically, you could specify the paths to these
directories on the HLA command-line, but that’s so painful that you would never
consider it for anything other than a temporary solution). The “hlainc” and
“hlalib” environment variables serve this purpose.
Windows:
set hlainc=path_to_include_directory
*NIX (using BASH shell
interpreter):
hlainc=path_to_include_directory
export hlainc
Under Windows you can use the set command to set the hlainc environment variable to
the path where HLA can find the HLA include subdirectory. For example, if
you’re using Windows and you’ve moved the HLA include files to the
C:\tools\hla\hlainc subdirectory, you could use the following command to tell
HLA where it can find the include file:
set hlainc=c:\tools\hla\hlainc
The hlalib environment variable
specifies the complete path to
the hlalib.lib file. Unlike the hlainc environment variable, this is not the
path to the directory containing the library file, but the full path to the
file itself. The reason this is a
path to the library file rather than a path to the subdirectory containing the
file is very simple: it’s possible to have two or more library modules (in the
same directory) and you might want to choose the most appropriate one for the
job at hand. For example, you might have a debugging version of the library, an
OMF version of the library, and a standard version of the library all in one
directory. In any case, suppose
the hlalib.a file (archive file) under *NIX is located at
/usr/home/rhyde/hla/hlalib/hlalib.a; you could tell *NIX about this using a
BASH command like the following:
hlalib=/usr/home/rhyde/hla/hlalib/hlalib.a
export hlalib
(export is a bash command that
tells it to make the environment variable available to the invoking shell.)
Perhaps the most common reason
for wanting to move HLA to a different directory is because you’re using a *NIX
system on which you do not have root/administrative access (and, therefore,
cannot create a /usr/hla subdirectory, much less add files to it). If you cannot convince the system
administrator to install HLA at “/usr/hla” for all users, you can always install
it in your home directory and set the path and environment variables
accordingly. For example, on my Linux system running the BASH shell
interpreter, I’ve been able to install HLA in my home directory as follows:
<download linux.tar.gz to my
home directory (/usr/home/rhyde)
# Decompress linux.tar.gz to
linux.tar:
gzip –d linux.tar.gz
# Unpack files in linux.tar to
the ‘hla’ subdirectory:
tar xvf linux.tar
# Set up the HLA environment
variables:
hlalib=/usr/home/rhyde/hla/hlalib/hlalib.a
export hlalib
hlalib=/usr/home/rhyde/hla/include
export hlainc
# Set the execution path:
PATH=/usr/home/rhyde/hla:$PATH
export PATH
At this point, you should be
able to use HLA from your home directory.
15.2
Setting Auxiliary
Paths
When assembling HLA source
files using a back-end assembler such as MASM, FASM (external to HLA), Gas, or
TASM, HLA emits a couple intermediate files for use by these back-end
assemblers and the linkers. Specifically, the compilation process produces a
“.asm” file for the assembler and (under Winodws) a “.link” file for the
linker. Some HLA users feel that these auxiliary files clutter up their project
directory and would prefer not to see them. Fortunately, there are a couple of
different ways to tell HLA to put these files in some other location besides
the current project directory.
The first way to do this (which
isn’t really the subject of this section) is to use the “-p:<path>”
command-line option to provide a temporary path for HLA to use. The advantage
to using this command-line parameter is that you can set a different temporary
path for each compilation. The disadvantage to this approach is that it can be
a real pain to constantly set the path (if you’re typing command lines
manually).
A more comprehensive solution
is to define the hlatmp
environment variable. When HLA runs, it checks this environment variable and,
if defined, uses its value to determine the path the the directory where HLA
will store all temporary files. This spares you from having to place an explicit
path on each command line. For example, the following command line will tell
HLA to use the C:\temp (under Windows) subdirectory to hold all temporary
files:
set hlatmp=c:\temp
Do take care when using the hlatmp environment variable. If you compile multiple
source files with the same name (presumably from different directories), then
the intermediate files they produce may create conflicts. In other words, don’t
use the hlatmp environment
variable to specify a temporary path when doing several compilations in a batch
operation. Use an explicit “-p:<path>” command-line option in those cases
(presumably in a make or batch file).
15.3
Setting the Default
Back-End Assembler
By default, the “HLA.EXE”
(Windows only) program uses an internal version of FASM to directly produce an
object file from the translation of the input HLA source file. For reasons
explained earlier, you might want to override this default selection and use
one of the back-end assemblers that HLA supports (MASM, TASM, or FASM under
Windows, or FASM or GAS under Linux).
There are three ways to do this: via command-line parameters, by
changing the name of the HLA.EXE (Windows) or hla (Linux) programs, or by the
“hla” environment variable.
As described earlier, the -co,
-cf, -cm, -ct, and -cg let you specify which assembly language syntax and
back-end assembler HLA will use to produce an object code file. The default
under Windows is “-co” which uses the internal version of FASM to directly produce
an object code file without using an intermediate assembly language file. The
default under Linux is “-cg” which uses the Gas assembler as the back-end
assembler. The other options all produce an intermediate assembly language
source file and use the associated assembler (if possible under the current
operating system) to translate that assembly language source file into an
object code file. Note that under FreeBSD and Mac OSX, you must use a Gas
variant. FASM (and other assembler) output is not supported on these OSes.
If you would like to change the
default so you don’t have to specify a “-cX” command-line option all the time, you can rename
the “HLA.EXE” (Windows) or “hla” (*NIX) program name to one of the following:
Windows Program Name |
Linux Program Name |
Description |
hla.exe |
hla |
Uses the “hla” environment
variable to determine which compilation method to use. |
mhla.exe |
mhla |
Produces a MASM-compatible
“.asm” intermediate file and (under Windows only) uses MASM to assemble that
file into a “.OBJ” file. |
fhla.exe |
fhla |
Produces a FASM-compatible
“.asm” intermediate file and uses FASM to assemble that file into a “.obj”
(Windows) or “.o” (Linux) object-code file. |
thla.exe |
thla |
Produces a TASM-compatible
“.asm” intermediate file and (under Windows only) uses TASM to assemble that
file into a “.OBJ” file. |
ghla.exe |
ghla |
Produces a GAS-compatible “.asm”
intermediate file and (under Linux only) uses GAS to assemble that file into
a “.O” object-code file. |
ohla.exe |
ohla |
Directly produces a “.obj”
(Windows) object-code file using
the internal version of FASM. Does not produce an intermediate “.asm” source
file. |
If you commonly switch between
the different assembly language source formats for some reason, you can make
copies of the “HLA.EXE” (Windows) or “hla” (*NIX) program files so you don’t
have to constantly rename them. Note that the language level options also allow
you to change HLA’s behavior by renaming the HLA executable file. However, you
may only use this technique to choose the language level or choose the back-end assembler, not both. The
language level option is really for beginning students in a formal assembly
language programming class; and those students would almost never care about
what back-end assembler HLA is using. Conversely, choosing a back-end assembler
via the program name is something that an advanced HLA user would do and such
users would rarely operating HLA at any language level other than “high”.
Therefore, there really isn’t the need to make it possible to support both sets
of options via a program name change. If you really need to operate at a
different language level and exercise control over the back-end assembler on
the same project, use command-line options or environment variables.
A third option you can use to
control HLA’s back-end assembler is to use the “hla” environment variable. When
you run one of the programs listed in the previous, the program first detemines
whether it’s name is “HLA.EXE” (Windows) or “hla” (Linux); if so, then the
program checks for the “hla” environment variable to determine how the program
should behave. The possible environment variable settings are given in the
following table:
hla Environment Variable Setting |
Same as Running This Version of
HLA (Windows) |
Same as Running This Version of
HLA (Linux) |
Notes |
hla=mhla |
mhla.exe |
mhla |
Under Linux, this produces
source code output only as MASM doesn’t run under Linux. |
hla=thla |
thla.exe |
thla |
Under Linux, this produces
source code output only as TASM doesn’t run under Linux. |
hla=ghla |
ghla.exe |
ghla |
Under Windows, this produces
source code output only as HLA won’t run Gas under Windows. |
hla=fhla |
fhla.exe |
fhla |
Produces a FASM-compatible
assembly source file and runs FASM to process it under Windows or Linux |
hla=ohla |
ohla.exe |
ohla |
Directly produces an object file
from the HLA source file under Windows or Linux using the internal version of
FASM. |
Warning: In HLA v1.98..v1.101 object code output using the
internal version of FASM was the default under Linux. Unfortunately, it turns
out that FASM is not completely compatible with all the versions of the GNU
linker out there and some Linux distributions would fail when compiling HLA
using the internal version of FASM. For this reason, the internal version of
FASM was removed from the Linux version of HLA. You can still produce a FASM
source file from HLA and compile it with an external version of FASM for Linux,
but be advised that the code generation may create problems for certain
distributions.
Under FreeBSD and Mac OSX, only
Gas output is supported (at least, when compiling to object code). If you
select a different assembler you can produce an assembly language source output
file, but there is no way to natively compile that source code to an object
file or executable under FreeBSD or Mac OSX.
In theory, it should be
possible to compile an HLA program to a Windows-based MASM, TASM, or FASM file
and compile that program using WINE under Linux (to produce an executable).
However, this theory has not been tested. If you want to try this, don’t forget
to supply the HLA “-win32” command-line option (under Linux) to tell HLA to
produce a Windows-compatible assembly file. Of course, the resulting executable
is a Windows executable, not a Linux executable, so you will have to run it
under Windows (or maybe WINE). In theory, all standard HLA programs should run
just fine under WINE, but this has not been tested at all.
15.4
Setting the Default
Linker
The HLA system currently
employs one of three different linkers to combine the output object and library
files into an executable:
• Microsoft’s linker (link.exe) – The default
when compiling HLA code to MASM or TASM source under Windows.
• The Pelles C linker (polink.exe) – The
default when compiling HLA code to .obj files under Windows using the internal
or external versions of FASM.
• The FSF/GNU ld linker – This is the default,
under non-Windows operating systems, when compiling HLA code to .o files using
FASM (external, Linux only) or when compiling HLA code to Gas-syntax .asm
files.
Note that polink and
Microsoft’s linker are usable only under Windows and the FSF/GNU ld linker is
usable only under non-Windows operating systems. The following table more clearly shows the default and
optional linker possibilities:
Assembler |
Default linker under Windows |
Linker choices under Windows |
Linker for Linux, FreeBSD,
MacOS, etc. |
MASM |
link.exe[5] |
link.exe polink.exe |
n/a |
TASM |
link.exe |
link.exe polink.exe |
|
FASM (internal version) |
polink.exe |
link.exe polink.exe |
ld |
FASM(external version) |
polink.exe |
link.exe polink.exe |
ld |
Gas |
n/a |
n/a |
ld |
Under *NIX operating systems a(Linux, FreeBSD, or
Mac OSX) the only choice is the GNU/FSF ld linker. If you attempt to force the
use of one of these other linkers, HLA will issue a warning and use ld instead.
Under Windows, the HLA system (directly) supports either the Microsoft linker
or the Pelles C linker. HLA
defaults to polink for FASM because the HLA/FASM/POLINK combination consists of
freely distributable software and many HLA users prefer to use free software.
For MASM and TASM, HLA defaults to using the Microsoft linker as it is a bit
more complementary to these tools.
Note that HLA does not
provide direct support for the
Borland TLINK program, but there is nothing stopping you from using TLINK to
process OMF files produced by running HLA output through either MASM or TASM
(HLA does not support OMF output via FASM, sorry). This is quite useful, for
example, when combing HLA-produced output with other Borland tools that depend
on the use of TLINK (e.g., Borland C++ or Delphi). However, as HLA does not
directly support TLINK, this document will not discuss that option further.
Because Windows supports
two different linkers (each as the default for certain assemblers), the natural
questions arise: "How do I change the linker choice from the
default?" and "How do I change the default linker choice?" As
usual for HLA, there are a couple ways to achieve this goal.
The first way to change
the default linker choice is via a command-line parameter. Specifically, under
Windows you can use either the "mslink" or "polink"
command-line parameter to force the use of that linker for the current
compilation operation. This will not affect the linker choice for future
compilations.
Of course, selecting a
different assembler on the command-line also selects the default linker for
that assembler. Because there is a potential conflict when specifying both
types of command-line parameters, plus the fact that HLA gives precedence to
the last command-line parameter when there is any ambiguity, if you specify
both an assembler and a linker on the command line, you should always specify
the linker option last, e.g.,
hla -xm -polink t.hla // Compile
using MASM and polink
hla -xf -mslink u.hla // Compile
using (external) FASM and Microsoft’s linker
The last way to specify
the default linker to use, which operates in a global fashion, is via the hlalink environment variable. You can set the hlalink
environment variable to one of the following three values: mslink, polink, or
ld, e.g.,
set hlalink=mslink
Note that "mslink"
and "polink" are valid only under Windows and "ld" is only
valid under *NIX operating systems (which makes setting the hlalink environment variable under *NIX rather pointless,
to be honest). If you have defined the hlalink environment variable to some reasonable value for
the operating system you’re using, then this overrides all the default cases
and only a command-line parameter to specify a different linker will take
precedence. That is, if you use one of the auxiliary HLA names such as MHLA, FHLA,
or THLA, HLA will continue to use the linker you’ve specified by the hlalink environment variable. Likewise, even if you temporarily choose a different
assembler with a "-xm", "-xf", "-xt", or
"-xo" command-line parameter, HLA will still continue to use the
default linker you’ve specified by the hlalink environment variable. The only way to override
this is to delete the environment variable (i.e., by the command-line "set
hlalink=") or by using one of the "-mslink",
"-polink", or "-ld" command-line parameters.
Why would someone want to
change the default linker? Well, defects in the linkers themselves could be an
issue. For example, a known issue in POLINK is that it emits a (harmless)
warning whenever linking HLA-compiled programs. If this warning message annoys
you (and it does annoy some people), you can switch the default linker to
Microsoft and avoid this message. Because the two linkers produce slightly
different code in the output executable files, you may want to select one
linker or the other in order to force a certain object file. For example, the
HLA test suite program (which compares executable files) always forces the use
of one linker or the other to guarantee proper comparisons. For some other
reason, you might want to set the default linker to POLINK, even when using
TASM or MASM as the HLA back-end assembler. Whatever the reason, it’s nice to have the choice and be
able to configure the system however you please.
16
HLA Language Elements
Starting with this section we
being discussing the HLA source language.
HLA source files must contain only seven-bit ASCII characters. These are text files with each source
line record containing a carriage return/line feed (Windows) or a just a line
feed (Linux) termination sequence (HLA is actually happy with either sequence,
so text files are portable between OSes without change). White space consists of spaces, tabs,
and newline sequences. Generally,
HLA does not appreciate other control characters in the file and may generate
an error if they appear in the source file.
16.1
Comments
HLA uses "//" to lead
off single line comments. It uses
"/*" to begin an indefinite length comment and it uses "*/"
to end an indefinite length comment.
C/C++, Java, and Delphi users will be quite comfortable with this
notation.
16.2
Special Symbols
The following characters are
HLA lexical elements and have special meaning to HLA:
* / + -
( ) [
] { }
< > :
; , .
= ? & | ^ ! @ !
The following character pairs
are HLA lexical elements and also have special meaning to HLA:
&& || <=
>= <> != == := .. <<
>> ## #( )# #{ }#
16.3
Reserved Words
Here are the HLA reserved
words. You may not use any of
these reserved words as HLA identifiers except as noted below (with respect to
the #id and #rw operators). HLA
reserved words are case insensitive.
That is, "MOV" and "mov" (as well as any permutation
with resepect to case) both represent the HLA "mov" reserved word
#append |
#asm |
#closeread |
#closewrite |
#else |
#elseif |
#emit |
#endasm |
#endfor |
#endif |
#endmacro |
#endmatch |
#endregex |
#endstring |
#endtext |
#endwhile |
#error |
#for |
#id |
#if |
#include |
#includeonce |
#keyword |
#macro |
#match |
#openread |
#openwrite |
#print |
#regex |
#return |
#rw |
#string |
#system |
#terminator |
#text |
#while |
#write |
@a |
@abs |
@abstract |
@ae |
@align |
@alignstack |
@arb |
@arity |
@at |
@b |
@baseptype |
@basereg |
@basetype |
@be |
@boolean |
@bound |
@byte |
@c |
@cdecl |
@ceil |
@char |
@class |
@cos |
@cset |
@curdir |
@curlex |
@curobject |
@curoffset |
@date |
@debughla |
@defined |
@delete |
@dim |
@display |
@dword |
@e |
@elements |
@elementsize |
@enter |
@enumsize |
@env |
@eos |
@eval |
@exactlynchar |
@exactlyncset |
@exactlynichar |
@exactlyntomchar |
@exactlyntomcset |
@exactlyntomichar |
@exceptions |
@exp |
@external |
@extract |
@fast |
@filename |
@firstnchar |
@firstncset |
@firstnichar |
@floor |
@forward |
@fpureg |
@frame |
@g |
@ge |
@global |
@here |
@index |
@insert |
@int128 |
@int16 |
@int32 |
@int64 |
@int8 |
@into |
@isalpha |
@isalphanum |
@isclass |
@isconst |
@isdigit |
@IsExternal |
@isfreg |
@islower |
@ismem |
@isreg |
@isreg16 |
@isreg32 |
@isreg8 |
@isspace |
@istype |
@isupper |
@isxdigit |
@l |
@lastobject |
@le |
@leave |
@length |
@lex |
@linenumber |
@localoffset |
@localsyms |
@log |
@log10 |
@lowercase |
@lword |
@match |
@match2 |
@matchchar |
@matchcset |
@matchichar |
@matchid |
@matchintconst |
@matchistr |
@matchiword |
@matchnumericconst |
@matchrealconst |
@matchstr |
@matchstrconst |
@matchtoistr |
@matchtostr |
@matchword |
@max |
@min |
@mmxreg |
@na |
@nae |
@name |
@nb |
@nbe |
@nc |
@ne |
@ng |
@nge |
@nl |
@nle |
@no |
@noalignstack |
@nodisplay |
@noenter |
@noframe |
@noleave |
@norlesschar |
@norlesscset |
@norlessichar |
@normorechar |
@normorecset |
@normoreichar |
@nostackalign |
@nostorage |
@np |
@ns |
@ntomchar |
@ntomcset |
@ntomichar |
@nz |
@o |
@odd |
@offset |
@onechar |
@onecset |
@oneichar |
@oneormorechar |
@oneormorecset |
@oneormoreichar |
@oneormorews |
@optstrings |
@p |
@parmoffset |
@parms |
@pascal |
@pclass |
@pe |
@peekchar |
@peekcset |
@peekichar |
@peekistr |
@peekstr |
@peekws |
@po |
@pointer |
@pos |
@ptype |
@qword |
@random |
@randomize |
@read |
@real128 |
@real32 |
@real64 |
@real80 |
@reg |
@reg16 |
@reg32 |
@reg8 |
@regex |
@returns |
@rindex |
@s |
@section |
@sin |
@size |
@sort |
@sqrt |
@stackalign |
@staticname |
@stdcall |
@strbrk |
@string |
@strset |
@strspan |
@substr |
@system |
@tab |
@tan |
@tbyte |
@text |
@time |
@tokenize |
@tostring |
@trace |
@trim |
@type |
@typename |
@uns128 |
@uns16 |
@uns32 |
@uns64 |
@uns8 |
@uppercase |
@uptochar |
@uptocset |
@uptoichar |
@uptoistr |
@uptostr |
@use |
@volatile |
@wchar |
@word |
@ws |
@wsoreos |
@wstheneos |
@wstring |
@xmmreg |
@z |
@zeroormorechar |
@zeroormorecset |
@zeroormoreichar |
@zeroormorews |
@zerooronechar |
@zerooronecset |
@zerooroneichar |
@zstring |
aaa |
aad |
aam |
aas |
abstract |
adc |
add |
addpd |
addps |
addsd |
addss |
addsubpd |
addsubps |
ah |
al |
align |
and |
andnpd |
andnps |
andpd |
andps |
anyexception |
arpl |
ax |
begin |
bh |
bl |
boolean |
bound |
bp |
break |
breakif |
bsf |
bsr |
bswap |
bt |
btc |
btr |
bts |
bx |
byte |
call case |
cbw |
cdq |
ch |
char |
cl |
class |
clc |
cld |
clflush |
cli |
clts |
cmc |
cmova |
cmovae |
cmovb |
cmovbe |
cmovc |
cmove |
cmovg |
cmovge |
cmovl |
cmovle |
cmovna |
cmovnae |
cmovnb |
cmovnbe |
cmovnc |
cmovne |
cmovng |
cmovnge |
cmovnl |
cmovnle |
cmovno |
cmovnp |
cmovns |
cmovnz |
cmovo |
cmovp |
cmovpe |
cmovpo |
cmovs |
cmovz |
cmp |
cmpeqpd |
cmpeqps |
cmpeqsd |
cmpeqss |
cmplepd |
cmpleps |
cmplesd |
cmpless |
cmpltpd |
cmpltps |
cmpltsd |
cmpltss |
cmpneqpd |
cmpneqps |
cmpneqsd |
cmpneqss |
cmpnlepd |
cmpnleps |
cmpnlesd |
cmpnless |
cmpnltpd |
cmpnltps |
cmpnltsd |
cmpnltss |
cmpordpd |
cmpordps |
cmpordsd |
cmpordss |
cmppd |
cmpps |
cmpsb |
cmpsd |
cmpss |
cmpsw |
cmpunordpd |
cmpunordps |
cmpunordsd |
cmpunordss |
cmpxchg |
cmpxchg8b |
comisd |
comiss |
const |
continue |
continueif |
cpuid |
cr0 |
cr1 |
cr2 |
cr3 |
cr4 |
cr5 |
cr6 |
cr7 |
cseg |
cset |
cvtdq2pd |
cvtdq2pq |
cvtdq2ps |
cvtpd2dq |
cvtpd2pi |
cvtpd2ps |
cvtpi2pd |
cvtpi2ps |
cvtpi2ss |
cvtps2dq |
cvtps2pd |
cvtps2pi |
cvtsd2si |
cvtsd2ss |
cvtsi2sd |
cvtsi2ss |
cvtss2sd |
cvtss2si |
cvttpd2dq |
cvttpd2pi |
cvttps2dq |
cvttps2pi |
cvttsd2si |
cvttss2si |
cwd |
cwde |
cx |
daa |
das |
dec default |
dh |
di |
div |
divpd |
divps |
divsd |
divss |
dl |
do |
downto |
dr0 |
dr1 |
dr2 |
dr3 |
dr4 |
dr5 |
dr6 |
dr7 |
dseg |
dup |
dword |
dx |
dx:ax |
eax |
ebp |
ebx |
ecx |
edi |
edx |
edx:eax |
else |
elseif |
emms |
end |
endclass |
endconst |
endfor |
endif |
endlabel |
endreadonly |
endrecord |
endstatic |
endstorage endswitch |
endtry |
endtype |
endunion |
endval |
endvar |
endwhile |
enter |
enum |
eseg |
esi |
esp |
exception |
exit |
exitif |
external |
f2xm1 |
fabs |
fadd |
faddp |
fbld |
fbstp |
fchs |
fclex |
fcmova |
fcmovae |
fcmovb |
fcmovbe |
fcmove |
fcmovna |
fcmovnae |
fcmovnb |
fcmovnbe |
fcmovne |
fcmovnu |
fcmovu |
fcom |
fcomi |
fcomip |
fcomp |
fcompp |
fcos |
fdecstp |
fdiv |
fdivp |
fdivr |
fdivrp |
felse |
ffree |
fiadd |
ficom |
ficomp |
fidiv |
fidivr |
fild |
fimul |
fincstp |
finit |
fist |
fistp |
fisttp |
fisub |
fisubr |
fld |
fld1 |
fldcw |
fldenv |
fldl2e |
fldl2t |
fldlg2 |
fldln2 |
fldpi |
fldz |
fmul |
fmulp |
fnclex |
fninit |
fnop |
fnsave |
fnstcw |
fnstenv |
fnstsw |
for |
foreach |
forever |
forward |
fpatan |
fprem |
fprem1 |
fptan |
frndint |
frstor |
fsave |
fscale |
fseg |
fsin |
fsincos |
fsqrt |
fst |
fstcw |
fstenv |
fstp |
fstsw |
fsub |
fsubp |
fsubr |
fsubrp |
ftst |
fucom |
fucomi |
fucomip |
fucomp |
fucompp |
fwait |
fxam |
fxch |
fxrstor |
fxsave |
fxtract |
fyl2x |
fyl2xp1 |
gseg |
haddpd |
haddps |
hlt |
hsubpd |
hsubps |
idiv |
if |
imod |
imul |
in |
inc |
inherits |
insb |
insd |
insw |
int |
int128 |
int16 |
int32 |
int64 |
int8 |
intmul |
into |
invd |
invlpg |
iret |
iretd |
iterator |
ja |
jae |
jb |
jbe |
jc |
jcxz |
je |
jecxz |
jf |
jg |
jge |
jl |
jle |
jmp |
jna |
jnae |
jnb |
jnbe |
jnc |
jne |
jng |
jnge |
jnl |
jnle |
jno |
jnp |
jns |
jnz |
jo |
jp |
jpe |
jpo |
js |
jt |
jz |
label |
lahf |
lar |
lazy |
lddqu |
ldmxcsr |
lds |
lea |
leave |
les |
lfence |
lfs |
lgdt |
lgs |
lidt |
lldt |
lmsw |
lock.adc |
lock.add |
lock.and |
lock.btc |
lock.btr |
lock.bts |
lock.cmpxchg |
lock.dec |
lock.inc |
lock.neg |
lock.not |
lock.or |
lock.sbb |
lock.sub |
lock.xadd |
lock.xchg |
lock.xor |
lodsb |
lodsd |
lodsw |
loop |
loope |
loopne |
loopnz |
loopz |
lsl |
lss |
ltreg |
lword |
maskmovdqu |
maskmovq |
maxpd |
maxps |
maxsd |
maxss |
method |
mfence |
minpd |
minps |
minsd |
minss |
mm0 |
mm1 |
mm2 |
mm3 |
mm4 |
mm5 |
mm6 |
mm7 |
mod |
monitor |
mov |
movapd |
movaps |
movd |
movddup |
movdq2q |
movdqa |
movdqu |
movhlps |
movhpd |
movhps |
movlhps |
movlpd |
movlps |
movmskpd |
movmskps |
movntdq |
movnti |
movntpd |
movntps |
movntq |
movq |
movq2dq |
movsb |
movsd |
movshdup |
movsldup |
movss |
movsw |
movsx |
movupd |
movups |
movzx |
mul |
mulpd |
mulps |
mulsd |
mulss |
mwait |
name |
namespace |
neg |
nop |
not |
null |
or |
orpd |
orps |
out |
outsb |
outsd |
outsw |
override |
overrides |
packssdw |
packsswb |
packuswb |
paddb |
paddd |
paddq |
paddsb |
paddsw |
paddusb |
paddusw |
paddw |
pand |
pandn |
pause |
pavgb |
pavgw |
pcmpeqb |
pcmpeqd |
pcmpeqw |
pcmpgtb |
pcmpgtd |
pcmpgtw |
pextrw |
pinsrw |
pmaddwd |
pmaxsw |
pmaxub |
pminsw |
pminub |
pmovmskb |
pmulhuw |
pmulhw |
pmullw |
pmuludq |
pointer |
pop |
popa |
popad |
popf |
popfd |
por |
prefetchnta |
prefetcht0 |
prefetcht1 |
prefetcht2 |
procedure |
program |
psadbw |
pshufd |
pshufhw |
pshuflw |
pshufw |
pslld |
pslldq |
psllq |
psllw |
psrad |
psraw |
psrld |
psrldq |
psrlq |
psrlw |
psubb |
psubd |
psubq |
psubsb |
psubsw |
psubusb |
psubusw |
psubw |
punpckhbw |
punpckhdq |
punpckhqdq |
punpckhwd |
punpcklbw |
punpckldq |
punpcklqdq |
punpcklwd |
push |
pusha |
pushad |
pushd |
pushf |
pushfd |
pushw |
pxor |
qword |
raise |
rcl |
rcpps |
rcpss |
rcr |
rdmsr |
rdpmc |
rdtsc |
readonly |
real128 |
real32 |
real64 |
real80 |
record |
regex |
rep.insb |
rep.insd |
rep.insw |
rep.movsb |
rep.movsd |
rep.movsw |
rep.outsb |
rep.outsd |
rep.outsw |
rep.stosb |
rep.stosd |
rep.stosw |
repe.cmpsb |
repe.cmpsd |
repe.cmpsw |
repe.scasb |
repe.scasd |
repe.scasw |
repeat |
repne.cmpsb |
repne.cmpsd |
repne.cmpsw |
repne.scasb |
repne.scasd |
repne.scasw |
repnz.cmpsb |
repnz.cmpsd |
repnz.cmpsw |
repnz.scasb |
repnz.scasd |
repnz.scasw |
repz.cmpsb |
repz.cmpsd |
repz.cmpsw |
repz.scasb |
repz.scasd |
repz.scasw |
result |
ret |
returns |
rol |
ror |
rsm |
rsqrtps |
rsqrtss |
sahf |
sal |
sar |
sbb |
scasb |
scasd |
scasw |
segment |
seta |
setae |
setb |
setbe |
setc |
sete |
setg |
setge |
setl |
setle |
setna |
setnae |
setnb |
setnbe |
setnc |
setne |
setng |
setnge |
setnl |
setnle |
setno |
setnp |
setns |
setnz |
seto |
setp |
setpe |
setpo |
sets |
setz |
sfence |
sgdt |
shl |
shld |
shr |
shrd |
shufpd |
shufps |
si |
sidt |
sldt |
smsw |
sp |
sqrtpd |
sqrtps |
sqrtsd |
sqrtss |
sseg |
st0 |
st1 |
st2 |
st3 |
st4 |
st5 |
st6 |
st7 |
static |
stc |
std |
sti |
stmxcsr |
storage |
stosb |
stosd |
stosw |
streg |
string |
sub |
subpd |
subps |
subsd |
subss switch |
sysenter |
sysexit |
tbyte |
test |
text |
then |
this |
thunk |
to |
try |
type |
ucomisd |
ucomiss |
ud2 |
union |
unit |
unpckhpd |
unpckhps |
unpcklpd |
unpcklps |
unprotected |
uns128 |
uns16 |
uns32 |
uns64 |
uns8 |
until |
val |
valres |
var |
verr |
verw |
vmt |
wait |
wbinvd |
wchar |
welse |
while |
word |
wrmsr |
wstring |
xadd |
xchg |
xlat |
xmm0 |
xmm1 |
xmm2 |
xmm3 |
xmm4 |
xmm5 |
xmm6 |
xmm7 |
xor |
Note that "@debughla"
is also a reserved compiler symbol.
However, this is intended for internal (HLA) debugging purposes
only. When the compiler encounters
this symbol, it immediately stops the compiler with an assertion failure. Obviously, you should never put this
statement in your source code unless you’re debugging HLA and you want to stop
the compiler immediately after the compilation of some statement.
Because the set of HLA reserved
words is changing frequently, a special feature was added to HLA to allow a
programmer to "disable" HLA reserved words. This may allow an older program that uses new HLA reserved
words as identifiers to continue working with only minor modifications to the
HLA source code. The ability to
disable certain HLA reserved words also allows you to create macros that
override certain machine instructions.
All HLA reserved words take two
forms: the standard, mutable, form
(appearing in the table above) and a special immutable form that consists of a
tilde character (’~’) followed by the reserved word. For example, ’mov’ is the mutable form of the move
instruction while ’~mov’ is the immutable form. By default, the immutable and
mutable forms are equivalent when you begin an assembly. However, you can use the #ID and #RW
compile-time statements to convert the mutable form to an identifer and you can
use the #RW compile-time statement to turn it back into a reserved word. Regardless of the state of the mutable
form, the immutable form always behaves like the reserved word as far as HLA is
concerned. Here’s an example of
the #ID and #RW statements:
#id( mov ) //From this point forward, mov is an
identifier, not a reserved word
mov:
~mov( i, eax ); // Must use ~mov while mov is a reserved word!
cmp( eax, 0 );
jne mov;
#rw( mov ) // Okay, now mov is a reserved word
again.
mov( 0, eax );
Note that use can use the #id
facility to disable certain instructions. For example, by default HLA handles
almost all (32-bit flat model) instructions up through the Pentium IV. If you
want to write code for an earlier processor, you may want to disable
instructions available only on later processors to help avoid their use. You
can do this by placing the offending instructions in #id statements.
16.4
External Symbols and Assembler Reserved Words
HLA produces an assembly
language file during compilation and invokes an assembler such as MASM to
complete the compilation process.
HLA automatically translates normal identifiers you declare in your
program to beneign identifiers in the assembly language program. However, HLA does not translate
EXTERNAL symbols, but preserves these names in the assembly language file it
produces. Therefore, you must take
care not to use external names that conflict with the underlying assembler’s
set of reserved words or that assembler will generate an error when it attempts
to process HLA’s output. For a list of assembler reserved words, please see the
documentation for the assembler you are using.
Also note that the HLA compiler
uses a special suffix to denote internal symbols that it produces. This suffix is “__hla_” (two leading
underscores). You should avoid any symbols in your program that end with this
suffix to prevent conflicts with HLA-generated symbols.
16.5
HLA Identifiers
HLA identifiers must begin with
an alphabetic character or an underscore.
After the first character, the identifier may contain alphanumeric and
underscore symbols. There is no
technical limit on identifier length in HLA, but you should avoid external
symbols greater than about 32 characters in length since the assembler and
linkers that process HLA identifiers may not be able to handle such symbols.
HLA identifiers are always case neutral. This means that
identifiers are case sensitive insofar as you must always spell an identifier
exactly the same (with respect to alphabetic case). However, you are not allowed to declare two identifiers
whose only difference is alphabetic case.
Although technically legal in
your program, do not use identifiers that begin and end with a single
underscore. HLA reserves such
identifiers for use by the compiler and the HLA standard library. If you declare such identifiers in your
program, the possibility exists that you may interfere with HLA’s or the HLA Standard Library’s use
of such a symbol. As noted in the previous section, you should also avoid all
identifiers that end with the character
sequence “__hla_” as the HLA compiler produces symbols for its own internal use
with this suffix.
By convention, HLA programmers
use symbols beginning with two underscores to represent private fields in a class. So you should avoid such identifiers
except when defining such private fields in your own classes.
To avoid the possibility of an
infinite loop in the compiler, you should avoid standard identifiers as formal
macro parameter names. Should you inadvertently supply an actual argument that
is the same name as the formal parameter to a macro, HLA will enter an endless
text expansion loop when it substitutes the actual argument for the formal
argument, and the repeats the process over and over again because the two names
are the same. A good convention to
follow is to begin all user-written macro argument names with two
underscores (note that the HLA
stdlib uses the convention of a single leading and trailing underscore, as such
identifiers are reserved for use by the compiler and standard library).
One last thing to note is that
many high-level languages (like C) will often use a single underscore in front
of an identifier to denote the external form of that identifier. So you should
avoid identifiers that begin with a single underscore if you’re going to be
linking your HLA code with code written in a HLL (unless, of course, you’re
accessing that same symbol in the HLL).
16.6
External Identifiers
HLA lets you explicitly provide
a string for external identifiers.
External identifiers are not limited to the format for HLA
identifiers. HLA allows any string
constant to be used for an external identifier. It is your responsibility to use only those characters that
are legal in the assembler that processes HLA’s intermediate ASM file. Note that this feature lets you use
symbols that are not legal in HLA but are legal in external code (e.g., Win32
APIs use the ’@’ character in identifiers and some non-HLA code may use HLA
reserved words as identifiers).
See the discussion of the @EXTERNAL option for more details.
16.7
Data Types in HLA
16.7.1 Native (Primitive) Data Types in HLA
HLA provides the following
basic primitive types:
boolean One
byte; zero represents false, one represents true.
Enum One
byte; user defined IDs whose value ranges from 0 to 255.
Uns8 Unsigned
values in the range 0..255.
Uns16 Unsigned
integer values in the range 0..65535.
Uns32 Unsigned
integer values in the range 0..4,204,967,295.
Uns64 Unsigned
64-bit integer.
Uns128 Unsigned
128-bit integer.
Byte Generic
eight-bit value.
Word Generic
16-bit value.
DWord Generic
32-bit value.
QWord Generic
64-bit value.
TByte Generic
80-bit value.
LWord Generic
128-bit value.
Int8 Signed
integer values in the range -128..+127.
Int16 Signed
integer values in the range -32768..+32767.
Int32 Signed
integer values in the range -2,147,483,648..+2,147,483,647.
Int64 Signed
64-bit integer values.
Int128 Signed
128-bit integer values.
Char Character values.
WChar Unicode
character values.
Real32 32-bit
floating point values.
Real64 64-bit
floating point values.
Real80 80-bit
floating point values.
Real128 128-bit
floating point values (for SSE/2 instructions).
String Dynamic
length string constants. (Run-time implementation: four-byte pointer.)
ZString Zero-terminated
dynamic length strings (run-time implementation: four-byte pointer).
Unicode Unicode
strings.
CSet A
set of up to 128 different ASCII characters (16-byte bitmap).
Text Similar
to string, but text constants expand in-place (like #define in C/C++).
Thunk A
set of machine instructions to execute.
Often, it is convenient to
discuss the types above in various groups. This document will often use the following terms:
Ordinal: boolean,
enum, uns8, uns16, uns32, byte, word, dword, int8, int16, int32, char.
Unsigned: uns8,
uns16, uns32, byte, word, dword.
Signed: int8,
int16, int32, byte, word, dword.
Number: uns8,
uns16, uns32, int8, int16, int32, byte, word, dword
Numeric: uns8,
uns16, uns32, int8, int16, int32, byte, word, dword, real32, real64, real80
16.7.2 Enumerated Data Types
HLA provides the ability to
associate a list of identifiers with a user-defined type. Such types are known
as enumerated data types (because HLA enumerates, or numbers, each of the
identifiers in the list to give them a unique value). The syntax for an
enumerated type declaration (in an HLA type
section, see the description a little later) takes the following form:
typename :
enum{ list_of_identifiers };
Here is a typical example:
type
color_t :enum{ red, green, blue, magenta, yellow,
cyan, black, white };
Internally, HLA treats
enumerated types as though they were unsigned integer values (though enum types are not directly compatible with the
unsigned types). HLA associates the value zero with the first identifier in the
enum list and then attaches sequentially increasing values to the following
identifiers in the list. For example, HLA will associate the following values
with the color_t symbolic constants:
red 0
green 1
blue 2
magenta 3
yellow 4
cyan 5
black 6
white 7
Because each enumerated
constant in a given enum list is unique, you may compare these values, use
them in computations, etc. Also
note that, because of the way HLA assigns internal values to these constant
names, you may compare objects in an enumerated list for less than and greater
than in addition to equal or not equal.
Note that HLA uses zero as the
internal representation for the first symbol of every enum
list. HLA only guarantees that the values it associates with enum types is unique for a single type; it does not
make this guarantee across different enumerated types (in fact, you’re
guaranteed that different enum types do not use unique values for their symbol sets). In the
following example, HLA uses the value zero for both the internal representation
of const0 and c0.
Likewise, HLA uses the value one for both const1 and c1. And
so on...
type
enumType1 :enum{ const0, const1, const2 };
enumType2 :enum( c0, c1, c2 };
Note that the enumerated
constants you specify are not "private" to that particular type. That
is, the constant names you supply in an enumerated data type list must be
unique within the current scope (see the definition of identifier scope
elsewhere in this document).
Therefore, the following is not legal:
type
enumType1 :enum{ et1, et2, et3, et4 };
enumType2 :enum{ et2, et2a, et2b, et2c }; //et2 is
a duplicate symbol!
The problem here is that both
type lists attempt to define the same symbol: et2. HLA reports an error when you attempt
this.
One way to view the enumerated
constant list is to think of it as a list of constants in an HLA const section (see the description of declaration
sections a little later in this document), e.g.,
const
red :
color_t := 0;
green :
color_t := 1;
blue :
color_t := 2;
magenta :
color_t := 3;
yellow :
color_t := 4;
cyan :
color_t := 5;
black :
color_t := 6;
white :
color_t := 7;
By default, HLA uses eight-bit
values to represent enumerated data types. This means that you can represent up
to 256 different symbols using an enumerated data type. This should prove
sufficient for most applications. HLA provides a special "compile-time
variable" that lets you change the size of an enumerated type from one to
two or four bytes. In theory, all you’ve got to do is assign the value two or
four to this variable and HLA will automatically resize the storage for
enumerated types to handle longer lists of objects. In practice, however, this
feature has never been tested so it’s questionable if it works well. If you
need enumerated lists with more than 256 items, you might consider using HLA
const definitions rather than an enum list, just to be on the safe side.
Fortunately, the need for such an enum
list is exceedingly remote.
16.7.3 HLA Type Compatibility
HLA is unusual among assembly
language insofar as it does some serious type checking on its operands. While
the type checking isn’t quite as "strong" as some high level
languages, HLA clearly does a lot more type checking than other assemblers,
even those that purport to do type checking on operands (e.g., MASM). The use
of strong type checking can help you locate logical errors in your code that
would otherwise go unnoticed (except via a laborious and time consuming testing/debug
session).
The downside to strong type
checking is that experienced assembly programmers may become somewhat annoyed
with HLA’s reports that they are doing something wrong when, in fact, the
programmer knows exactly what they are doing. There are two solutions to this
problem: use type coercion (described a little bit later) or use the
"untyped" types that reduce type checking to simply ensuring that the
sizes of the operands match. However, before discussing how to override HLA’s
type checking system, it’s probably a good idea to first describe how HLA uses
data types.
Fundamentally, HLA divides up
the data types into classes based on the size of their underlying
representation. Unless you explicitly override a type with a type coercion
operation, attempting to mix object sizes in a memory or register operand will
produce an error (in constant expressions, HLA is a bit more forgiving; it will
automatically promote between certain types and adjust the type of the result
accordingly). With most of HLA’s data types, it’s pretty obvious what the size
of the underlying representation is, because most HLA type names incorporate
the size (in bits) in the type’s name. For example, the uns16 data type is a 16-bit (two-byte) type.
Nevertheless, this rule isn’t true for all data types, so it’s a good idea to
begin this discussion by looking at the underlying sizes of each of the HLA
types.
8 bits: boolean, byte, char, enum, int8,
uns8
16 bits: int16, uns16, wchar, word
32 bits: dword, int32, pointer types, real32,
string, zstring, unicode, uns32
64 bits: int64, qword, real64, uns64
80 bits: real80, tbyte
128 bits: cset, int128,
lword, uns128, real128
The byte, word, dword, qword,
tbyte, and lword types are somewhat special. These are known as untyped data
types. They are directly
compatible with any scalar and ordinal data type that is the same size as the
type in question. For example, a byte object is directly compatible with any
object of type boolean,
byte, char, enum, int8, or uns8. No special coercion is necessary when assigning a
byte value to an object that has one of these other
types; likewise, no special coercion operation is necessary when assigning a
value of one of these other types to a byte
object.
Note that cset, real32, real64, real80, and real128 objects are
not ordinal types. Therefore, you cannot directly mix these types with lword, dword, qword, or tbyte objects without an explicit type coercion operation. Also keep in mind
that composite data types (see the next section) are not directly compatible
with bytes, words, dwords, qwords, tbytes, and lwords, even if the composite
data type has the same number of bytes (the only exception is the pointer data
type, which is compatible with the dword type).
16.7.4 Composite Data Types
In addition to the primitive
types above, HLA supports arrays, records (structures), unions, classes, and pointers of the primitive
types (except for text objects).
16.7.4.1
Array Data
Types
HLA allows you to create an
array data type by specifying the number of array elements after a type
name. Consider the following HLA
type declaration that defines intArray to be an array of int32 objects:
type intArray : int32[ 16 ];
The "[ 16 ]"
component tells HLA that this type has 16 four-byte integers. HLA arrays use a zero-based index, so
the first element is always element zero.
The index of the last element, in this example, is 15 (total of 16
elements with indicies 0..15).
HLA also supports multidimensional arrays. You can specify multidimensional arrays by providing a list
of indicies inside the square brackets, e.g.,
type intArray4x4 : int32[ 4, 4
];
type intArray2x2x4 : int32[ 2,2,4 ];
The mechanism for accessing
array elements differs depending upon whether you are accessing compile-time
array constants or run-time array variables. A complete discussion of this will appear in later sections.
16.7.4.2 Union Data
Types
HLA implements the discriminant
union type using the UNION..ENDUNION reserved words. The following HLA type declaration demonstrates a union
declaration:
type allInts: union
i8: int8;
i16: int16;
i32: int32;
endunion;
All fields in a union have the
same starting address in memory.
The size of a union object is the size of the largest field in the
union. The fields of a union may have
any type that is legal in a variable declaration section (see the discussion of
the VAR section for more details).
Given a union object, say
"i" of type "allInts", you access the fields of the union
using the familiar dot-notation.
The following 80x86 mov instructions demonstrate how to access each of
the fields of the "i" variable:
mov( i.i8, al );
mov( i.i16, ax );
mov( i.i32, eax );
Unions also support a special
field type known as an anonymous record (see the next section for a
description of records). The
syntax for an anonymous record in a union is the following:
type
unionWrecord: union
u1Field:
byte;
u2Field:
word;
u3Field:
dword;
record
u4Field:
byte[2];
u5Field:
word[3];
endrecord;
u6Field:
byte;
endunion;
Fields appearing within the
anonymous record do not necessarily start at offset zero in the data
structure. In the example above, u4Field starts at offset zero while u5Field immediately follows it two bytes later. The fields in the union outside the
anonymous record all start at offset zero. If the size of the anonymous record is larger than any other
field in the union, then the record’s size determines the size of the
union. This is true for the
example above, so the union’s size is 16 bytes since the anonymous record
consumes 16 bytes.
16.7.4.3 Record Data
Types[6]
HLA’s records allow programmers
to create data types whose fields can be different types. The following HLA type declaration
defines a simple record with four fields:
type
Planet: record
x: int32;
y: int32;
z: int32;
density: real64;
endrecord;
Objects of type Planet will
consume 20 bytes of storage at run-time.
The fields of a record may be
of any legal HLA data type including other composite data types. Like unions, anything that is legal in
a VAR section is a legal field of a record. Also like unions, you use the dot-notation to access fields
of a record object.
In addition to the VAR types,
you may also declare anonymous unions within a record. An anonymous union is at union
declaration without a fieldname associated with the union, e.g.,
type DemoAU: record
x:
real32;
union
u1:int32;
r1:real32;
endunion;
y:real32;
endrecord;
In this example, x, u1, r1, and
y are all fields of DemoAU. To
access the fields of a variable D of type DemoAU, you would use the following
names: D.x, D.u1, D.r1, and D.y.
Note that D.u1 and D.r1 share the same memory locations at run-time,
while D.x and D.y have unique addresses associated with them.
Record types may inherit fields from other record types.
Consider the following two HLA type declarations:
type
Pt2D: record
x:
int32;
y:
int32;
endrecord;
Pt3D: record inherits( Pt2D )
z:
int32;
endrecord;
In this example, Pt3D inherits
all the fields from the Pt2D type.
The "inherits" keyword tells HLA to copy all the fields from
the specified record (Pt2D in this example) to the beginning of the current
record declaration (Pt3D in this example). Therefore, the declaration of Pt3D above is equivalent to:
Pt3D: record
x:
int32;
y:
int32;
z:
int32;
endrecord;
In some special situations you
may want to override a field from a previous field declaration. For example, consider the following
record declarations:
BaseRecord:
record
a:
uns32;
b:
uns32;
endrecord;
DerivedRecord:
record
inherits( BaseRecord )
b:
boolean; // New definition for b!
c: char;
endrecord;
Normally, HLA will report a
"duplicate" symbol error when attempting to compile the declaration
for "DerivedRecord" since the "b" field is already defined
via the "inherits( BaseRecord )" option. However, in certain cases it’s quite possible that the
programmer wishes to make the original field inaccessible in the derived class
by using a different name. That
is, perhaps the programmer intends to actually create the following record:
DerivedRecord:
record
a:
uns32; // Derived from
BaseRecord
b:
uns32; // Derived from
BaseRecord, but inaccessible here.
b:
boolean; // New definition for b!
c: char;
endrecord;
HLA allows a programmer explicitly
override the definition of a particular field by using the OVERRIDES keyword
before the field they wish to override.
So while the previous declarations for DerivedRecord produce errors, the
following is acceptable to HLA:
BaseRecord:
record
a:
uns32;
b:
uns32;
endrecord;
DerivedRecord:
record inherits( BaseRecord )
overrides
b: boolean; // New definition for
b!
c: char;
endrecord;
Normally, HLA aligns each field
on the next available byte offset in a record. If you wish to align fields within a record on some other
boundary, you may use the ALIGN directive to achieve this. Consider the following record
declaration as an example:
type
AlignedRecord:
record
b:boolean; //
Offset 0
c:char; //
Offset 1
align(4);
d:dword; //
Offset 4
e:byte; //
Offset 8
w:word; //
Offset 9
f:byte; //
Offset 11
endrecord;
Note that variable
"d" is aligned at a four-byte offset while "w" is not
aligned. We can correct this
problem by sticking another ALIGN directive in this record:
type
AlignedRecord2:
record
b:boolean; //
Offset 0
c:char; //
Offset 1
align(4);
d:dword; //
Offset 4
e:byte; //
Offset 8
align(2);
w:word; //
Offset 10
f:byte; //
Offset 12
endrecord;
Be aware of the fact that the
ALIGN directive in a RECORD only aligns fields in memory if the record object
itself is aligned on an appropriate boundary. For example, if an object of type AlignedRecord2 appears in
memory at an odd address, then the "d" and "w" fields will
also be misaligned (that is, they will appear at odd addresses in memory). Therefore, you must ensure appropriate
alignment of any record variable whose fields you’re assuming are aligned.
Note that the AlignedRecord2
type consumes 13 bytes. This means
that if you create an array of AlignedRecord2 objects, every other element will
be aligned on an odd address and three out of four elements will not be
double-word aligned (so the "d" field will not be aligned on a four-byte
boundary in memory). If you are
expecting fields in a record to be aligned on a certain byte boundary, then the
size of the record must be an even multiple of that alignment factor if you
have arrays of the record. This
means that you must pad the record with extra bytes at the end to ensure proper
alignment. For the AlignedRecord2
example, we need to pad the record with three bytes so that the size is an even
multiple of four bytes. This is
easily achieved by using an ALIGN directive as the last declaration in the
record:
type
AlignedRecord2:
record
b:boolean; //
Offset 0
c:char; //
Offset 1
align(4);
d:dword; //
Offset 4
e:byte; //
Offset 8
align(2);
w:word; //
Offset 10
f:byte; //
Offset 12
align(4) //
Ensures we’re padded to a multiple of four bytes.
endrecord;
Note that you should only use
values that are integral powers of two in the ALIGN directive.
If you want to ensure that all
fields are appropriately aligned on some boundary within the record, but you
don’t want to have to manually insert ALIGN directives throughout the record,
HLA provides a second alignment option to solve your problem. Consider the following syntax:
type
alignedRecord3 : record[4]
<< Set of fields >>
endrecord;
The "[4]"
immediately following the RECORD reserved word tells HLA to start all fields in
the record at offsets that are multiples of four, regardless of the object’s
size (and the size of the objects preceeding the field). HLA allows any integer expression that
produces a value in the range 1..4096 inside these parenthesis. If you specify the value one (which is
the default), then all fields are packed (aligned on a byte boundary). For values greater than one, HLA will
align each field of the record on the specified boundary. For arrays, HLA will align the field on
a boundary that is a multiple of the array element’s size. The maximum boundary HLA will round any
field to is a multiple of 4096 bytes.
Note that if you set the
record alignment using this syntactical form, any ALIGN directive you supply in
the record may not produce the desired results. When HLA sees an ALIGN directive in a record that is using
field alignment, HLA will first align the current offset to the value specified
by ALIGN and then align the next field’s offset to the global record align
value.
Nested record declarations may
specify a different alignment value than the enclosing record, e.g.,
type
alignedRecord4 : record[4]
a:byte;
b:byte;
c:record[8]
d:byte;
e:byte;
endrecord;
f:byte;
g:byte;
endrecord;
In this example, HLA aligns
fields a, b, f, and g on dword boundaries, it aligns d and e (within c) on eight-byte boundaries. Note that the alignment of the fields
in the nested record is true only within that nested record. That is, if c turns out to be aligned on some boundary other
than an eight-byte boundary, then d and e will not actually be on eight-byte
boundaries; they will, however be
on eight-byte boundaries relative to the start of c.
In addition to letting you
specify a fixed alignment value, HLA also lets you specify a minimum and
maximum alignment value for a record.
The syntax for this is the following:
type
recordname : record[maximum : minimum]
<< fields >>
endrecord;
Whenever you specify a maximum
and minimum value as above, HLA will align all fields on a boundary that is at
least the minimum alignment value.
However, if the object’s size is greater than the minimum value but less
than or equal to the maximum value, then HLA will align that particular field
on a boundary that is a multiple of the object’s size. If the object’s size is greater than
the maximum size, then HLA will align the object on a boundary that is a
multiple of the maximum size. As
an example, consider the following record:
type
r: record[ 4:1 ];
a:byte; //
offset 0
b:word; //
offset 2
c:byte; //
offset 4
d:dword;[2] //
offset 8
e:byte; //
offset 16
f:byte; //
offset 17
g:qword; //
offset 20
endrecord;
Note that HLA aligns g on a dword boundary (not qword, which would be
offset 24) since the maximum alignment size is four. Note that since the minimum size is one, HLA allows the f field to be aligned on an odd boundary (since it’s
a byte).
If an array, record, or union
field appears within a record, then HLA uses the size of an array element or
the largest field of the record or union to determine the alignment size. That is, HLA will align the field
without the outermost record on a boundary that is compatible with the size of
the largest element of the nested array, union, or record.
HLA sophisticated record
alignment facilities let you specify record field alignments that match that
used by most major high level language compilers. This lets you easily access data types used in those HLLs
without resorting to inserting lots of ALIGN directives inside the record.
Note that there is a big
difference in the semantics between the global record alignment option (above)
and the similar syntax in the STATIC,
READONLY, and STORAGE declaration sections. (which is why their syntax
is different) Consider the
following:
static(4)
v1: byte;
v2: dword;
Unlike the record alignment
option, this example only aligns the first field of the STATIC section, not all
the variables in that section (i.e., v2 will not be aligned on a dword boundary
in the example above). Keep this
difference in mind when using this alignment option.
When declaring record variables in a VAR,
STATIC, READONLY, STORAGE, or SEGMENT declaration section, HLA associates the offset
zero with the first field of a record.
Each additional field in the record is assigned an offset corresponding
to the sum of the sizes of all the prior fields. So in the example immediately above, "x" would
have the offset zero, "y" would have the offset four, and "z"
would have the offset eight.
If you would like to specify a
different starting offset, you can use the following syntax for a record
declaration:
Pt3D: record
:= 4;
x:
int32;
y:
int32;
z:
int32;
endrecord;
The signed integer constant
expression specified after the assignment operator (":=") specifies
the starting offset of the first field in the record. In this example x, y, and z will have the offsets 4, 8, and
12, respectively. Note that this value can be negative, if required.
Warning: setting the starting offset in this manner does
not add padding bytes to the record.
This record is still a 12-byte object. If you declare variables using a record declared in this
fashion, you may run into problems because the field offsets do not match the
actual offsets in memory. This
option is intended primarily for mapping records to pre-existing data
structures in memory. Only really
advanced assembly language programmers should use this option.
16.7.4.4 Pointer Types
HLA allows you to declare a
pointer to some other type using syntax like the following:
pointer to base_type
The following example
demonstrates how to create a pointer to a 32-bit integer within the type
declaration section:
type pi32: pointer to int32;
HLA pointers are always 32-bit
(near32) pointers.
HLA also allows you to define
pointers to existing procedures using syntax like the following:
procedure someProc(
parameter_list );
<< procedure options,
followed by @external, @forward, or procedure body>>
.
.
.
type
p : pointer to procedure someProc;
The p procedure pointer
"inherits" all the parameters and other procedure options associated
with the original procedure. This
is really just shorthand for the following:
procedure someProc(
parameter_list );
<< procedure options,
followed by @external, @forward, or procedure body>>
.
.
.
type
p : procedure ( Same_Parameters_as_someProc );
<<same options as someProc>>
The former version, however,
is easier to maintain since you don’t have to keep the parameter lists and
procedure options in sync.
Note that HLA provides the
reserved word null (or NULL, reserved words are case insensitive) to represent
the nil pointer. HLA replaces NULL
with the value zero. The NULL
pointer is compatible with any pointer type (including strings, which are
pointers).
Warning: the “pointer to procedure xyz;” facility may be deprecated and removed from HLA
v2.0. Try to avoid using this syntax in new programs.
16.7.4.5
Thunks
A "thunk" is an
eight-byte variable that contains a pointer to a piece of code to execute and
an execution environment pointer (i.e., a pointer to an activation record). The code associated with a thunk is, essentially, a small
procedure that (generally) uses the activation record of the surrounding code
rather than creating its own activation record. HLA uses thunks to implement the iterator "yield" statement as well as pass by name and
pass by lazy evaluation parameters.
In addition to these two uses of thunks, HLA allows you to declare your
own thunk objects and use them for any purpose you desire. To declare a thunk variable is easy,
just use a declaration like the following in a VAR or STATIC section:
thunkVar: thunk;
This declaration reserves eight
bytes of storage. The first dword
holds the address of the code to execute, the second dword holds a pointer to
the activation record to load into EBP when the thunk executes.
Of course, like almost any
pointer variable, declaring a thunk variable is the easy part; the hard part is making sure the thunk
variable is initialized before attempting to call the thunk. While you could manually load the address
of some code and the frame pointer value into a thunk variable, HLA provides a
better syntax for initializing thunks with small code fragments: the
"thunk" statement. The
"thunk" statement uses the following syntax:
thunk thunkVar := #{ sequence_of_statements }#;
Consider the following example:
program ThunkDemo;
#include( "stdio.hhf"
);
procedure proc1;
var
i:
int32;
p1Thunk: thunk;
procedure proc2( t:thunk );
var
i:int32;
begin proc2;
mov( 25, i );
t();
stdout.put( "Inside proc2,
i=", i, nl );
end proc2;
begin proc1;
thunk p1Thunk := #{ mov( 0, i ); }#;
mov( 1, i );
proc2( p1Thunk );
stdout.put( "i=", i, nl );
end proc1;
begin ThunkDemo;
proc1();
end ThunkDemo;
In this example, proc1 has two local variables, i and p1Thunk.
The THUNK statement initializes the p1Thunk variable with the address of some code that moves
a zero into the i
variable. The THUNK statement also
initializes p1Thunk
with a pointer to the current activation record (that is, a pointer to proc1’s activation record). Then proc1
calls proc2 passing p1Thunk as a parameter.
The proc2 routine has its own local variable named i. Of
course, this is a different variable than the i in proc1. Proc2 begins by setting its variable i to the value 25. Then proc2
invokes the thunk (passed to it as a parameter). This thunk sets the variable i to zero.
However, since the thunk uses the current activation record when the set
statement was executed, this statement sets proc1’s i
variable to zero rather than proc2’s i
variable. This program produces
the following output:
Inside proc2, i=25
i=0
Although you probably won’t use
thunks that often, they are quite nice for deferred execution. This is especially useful in AI
(Artificial Intelligence) programs.
16.7.4.6
Class Types
16.13
Classes and
object-oriented programming are the subject of a later section of this
document. See Class Data Types
for
more details.
16.7.4.7
Regular Expression Types
The HLA compile-time language
supports a special data type known as a “compiled regular expression”. Please
see the section on regular expression macros for more details on this data
type.
16.8
Literal Constants
Literal constants are those
language elements that we normally think of as non-symbolic constant
objects. HLA supports a wide
variety of literal constants. The
following sections describe those constants.
16.8.1 Numeric Constants
HLA lets you specify several
different types of numeric constants.
16.8.1.1 Decimal Constants
The first and last characters
of a decimal integer constant must be decimal digits (0..9). Interior positions may contain decimal
digits and underscores. The
purpose of the underscore is to provide a better presentation for large decimal
values (i.e., use the underscore in place of a comma in large values). Example: 1_234_265.
Note: Technically, HLA does not
allow negative literal integer constants.
However, you can use the unary “-” operator to negate a value, so you’d
never notice this omission (e.g., -123 is legal, it consists of the unary
negation operator followed by a positive decimal literal constant). Therefore, HLA always returns type unsXX for all literal decimal constants. Also note that HLA always uses a
minimum size of uns32 for
literal decimal constants. If you
absolutely, positively, want a literal constant to be treated as some other
type, use one of the compile-time type coercion functions to change the type
(e.g., uns8(1), word(2), or int16(3)).
Generally, the type that HLA uses for the object is irrelevant since HLA
will automatically promote a value to a larger or smaller type as appropriate.
Here are the following ranges
for the various HLA unsigned data types:
uns8: 0..255
uns16: 0..65,535
uns32: 0..4,294,967,295
uns64: 0..18,446,744,073,709,551,615
uns128: 0..340,282,366,920,938,463,463,374,607,431,768,211,455
16.8.1.2 Hexadecimal Constants
Hexadecimal literal constants
must begin with a dollar sign (“$”) followed by a hexadecimal digit and must
end with a hexadecimal digit (0..9, A..F, or a..f). Interior positions may contain hexadecimal digits or
underscores. Hexadecimal constants
are easiest to read if each group of four digits (starting from the least
significant digit) is separated from the others by an underscore. E.g., $1A_2F34_5438.
If the constant fits into 32
bits or less, HLA always returns the dword type for a hexadecimal constant. For larger values, HLA will
automatically use the qword or lword type, as appropriate. If you would like the hexadecimal value to have a different
type, use one of the HLA compile-time type coercion functions to change the
type (e.g., byte($12) or qword($54)).
Here are the following ranges
for the various HLA hexadecimal data types:
uns8: 0..$FF
uns16: 0..$FFFF
uns32: 0..$FFFF_FFFF
uns64: 0..$FFFF_FFFF_FFFF_FFFF
uns128: 0..$FFFF_FFFF_FFFF_FFFF_FFFF_FFFF_FFFF_FFFF
16.8.1.3
Binary Constants
Binary literal constants begin
with a percent sign (“%”) followed by at least one binary digit (0/1) and they
must end with a binary digit.
Interior positions may contain binary digits or underscore characters. Binary constants are easiest to read if
each group of four digits (starting from the least significant digit) is
separated from the others by an underscore. E.g., %10_1111_1010.
Like hexadecimal constants, HLA
always associates the type dword with a "raw" binary constant; it will use the qword or lword type if the value is greater than 32 bits or 64
bits (respectively). If you want
HLA to use a different type, use one of the compile-time type coercion
functions to achieve this.
Obviously, binary constants may
have as many binary digits as there are bits in the underlying type. This document will not attempt to write
out the maximum binary literal constant!
16.8.1.4
Numeric Set Constants
HLA provides a special numeric
constant form that lets you specify a numeric value by the bit positions
containing ones. This corresponds
to a powerset of integer values in the range 0..31. These constants take the following form:
@{ comma_separated_list_of_digits }
The comma_separate_list_of_digits can be empty (signifying no set bits, i.e., the
value zero), a single digit, or a
set of digits separated by commas.
Here are some examples:
@{}
@{8}
@{1,2,8,24}
The corresponding numeric
constant is given the type dword and is assigned the value that has ones in all the specified bit
positions. For example,
"@{8}" is equal to 256 since this value has a single set bit in bit
position eight. Note that
"@{0}" equals one, not zero (because the value one has a single set
bit in position zero).
16.8.1.5
Real (Floating Point) Constants
Floating point (real) literal
constants always begin with a decimal digit (never just a decimal point). A string of one or more decimal digits
may be optionally followed by a decimal point and zero or more decimal digits
(the fractional part). After the
optional fractional part, a floating point number may be followed by “e” or
“E”, a sign (“+” or “-”), and a string of one or more decimal digits (the
exponent part). Underscores may
appear between two adjacent digits in the floating point number; their presence is intended to
substitute for commas found in real-world decimal numbers.
Examples:
1.2
2.345e-2
0.5
1.2e4
2.3e+5
1_234_567.0
Literal real constants are
always 80 bits and have the default type real80. If
you wish to specify real32 or real64 literal constants, then use the real32 or real64 compile-time coercion functions to convert the
values, e.g., real32(
3.14159 ). During compile time, it’s rare that
you’d want to use one of the smaller types since they are less accurate at
representing floating point values (although this might be precisely why you
decide to use the smaller real type, so the accuracy matches the computations
you’re doing at run-time).
The range of real32 constants is approximately 10±38 with
6-1/2 digits of precision; the range of real64 values is approximately 10±308 with
approximately 14-1/2 digits of precision, and the range
of real80 constants is
approximately 10±4096 with about 18 digits of precision.
16.8.2 Boolean Constants
Boolean constants consist of
the two predefined identifiers true and false. Note that your program may redefine
these identifiers, but doing so is incredibly bad programming style. Since these are actual identifiers in
the symbol table (and not reserved words), you must spell these identifiers in
all lower case or HLA will complain (unlike reserved words that are case
insensitive).
Internally, HLA represents true
with one and false with zero. In
fact, HLA’s boolean operations only look at bit #0 of the boolean value (and
always clear the other bits). HLA
compile-time statements that expect a boolean expression do not use zero/not
zero like C/C++ and a few other languages. Such expressions must have a boolean type with the values
true/false; you cannot supply an integer expression and rely on zero/not zero
evaluation as in C/C++ or BASIC.
16.8.3 Character Constants
Character literals generally
consist of a single (graphic) character surrounded by apostrophes. To represent the apostrophe character,
use four apostrophies, e.g., ‘’’’.
Another way to specify a
character constant is by typing the “#” symbol followed by a numeric literal
constant (decimal, hexadecimal, or binary). Examples:
#13, #$D, #%1101.
16.8.4 Unicode Character Constants
Unicode character constants are
16-bit values. HLA provides limited
support for Unicode literal constants.
HLA supports the UTF/7 code point (character set) which is just the
standard seven-bit ASCII character set and nine high-order zero bits. To specify a 16-bit literal Unicode
constant simply prefix a standard ASCII literal constant with a ’u’ or ’U’,
e.g.,
u’A’ - UTF/7 character constant for ’A’
Note that UTF/7 constants are
simply the ASCII character codes zero extended to 16 bits.
HLA provides a second syntax
for Unicode character constants that lets you enter values whose character
codes are outside the range $20..$7E.
You can specify a Unicode character constant by its numeric value using
the ’u#nnnn’ constant form. This
form lets you specify a 16-bit value following the ’#’ in either binary,
decimal, or hexadecimal form, e.g.,
u#1233
u#$60F
u%100100101001
16.8.5 String Constants
String literal constants
consist of a sequence of (graphic) characters surrounded by quotes. To embed a quote within a string,
insert a pair of quotes into the string, e.g., “He said ““This”” to me.”
If two string literal constants
are adjacent in a source file (with nothing but whitespace between them), then
HLA will concatenate the two strings and present them to the parser as a single
string. Furthermore, if a
character constant is adjacent to a string, HLA will concatenate the character
and string to form a single string object. This is useful, for example, when you need to embed control
characters into a string, e.g.,
“This is the first line” #$d
#$a “This is the second line” #$d #$a
HLA treats the above as a
single string with a Wndows newline sequence (CR/LF) at the end of each of the
two lines of text.
16.8.6 Unicode String Constants
HLA lets you specify Unicode
string literals by prefixing a standard string constant with a ’u’ or a
’U’. Such string constants use the
UTF/7 character set (that is, the standard ASCII character set) but reserve 16
bits for each character in the string.
Note that the high order nine bits of each character in the string will
contain zero.
As this was being written,
there is no support for Unicode strings in the HLA Standard Library, though
support for Unicode string functions should appear shortly (note that Windows’
programmers can call the Unicode string functions that are part of the Windows’
API).
16.8.7 Character Set Constants
A character set literal
constant consists of several comma delimited character set expressions within a
pair of braces. The character set
expressions can either be individual character values or a pair of character
values separated by an ellipse (“..”).
If an individual character expression appears within the character set,
then that character is a member of the set; if a pair of character expressions, separated by an ellipse,
appears within a character set literal, then all characters between the first
such expression and the second expression are members of the set. As a
convenience, if a string constant appears between the braces, HLA will take the
union of all the characters in that string and add those character to the
character set.
Examples:
{‘a’,’b’,’c’} //
a, b, and c.
{‘a’..’c’} //
a, b, and c.
{‘A’..’Z’,’a’..’z’} //Alphabetic
characters.
{“cset”} //
The character set ‘c’, ‘e’, ‘s’, and ‘t’.
{‘ ‘,#$d,#$a,#$9} //Whitespace
(space, return, linefeed, tab).
HLA character sets are
currently limited to holding characters from the 128-character ASCII character
set. In the future, HLA may
support an xcset type (supporting 256 elements) or even wcset (wide
character sets), but that support does not currently exist.
16.8.8 Structured Constants
16.8.8.1 Array Constants
16.7.4.1 Note: see Array Data Types
for
more details about HLA array types.
HLA lets you specify an
array literal constant by enclosing a set of values within a pair of square
brackets. Since array elements
must be homogenous, all elements in an array literal constant must be the same
type or conformable to the same type.
Examples:
[ 1, 2, 3, 4, 9, 17 ]
[ ’a’, ’A’, ’b’, ’B’ ]
[ "hello", "world" ]
Note that each item in the
list of values can actually be a constant expression, not a simple literal
value.
HLA array constants are
always one dimensional. This,
however, is not a limitation because if you attempt to use array constants in a
constant expression, the only thing that HLA checks is the total number of
elements. Therefore, an array
constant with eight integers can be assigned to any of the following arrays:
const
a8: int32[8] :=
[1,2,3,4,5,6,7,8];
a2x4: int32[2,4] :=
[1,2,3,4,5,6,7,8];
a2x2x2: int32[2,2,2] :=
[1,2,3,4,5,6,7,8];
Although HLA doesn’t
support the notation of a multi-dimensional array constant, HLA does allow you
to include an array constant as one of the elements in an array constant. If an array constant appears as a list
item within some other array constant, then HLA expands the interior constant
in place, lengthening the list of items in the enclosing list. E.g., the following three array constants
are equivalent:
[ [1,2,3,4], [5,6,7,8] ]
[ [ [1,2], [3,4] ], [ [5,6], [7,8] ] ]
[1,2,3,4,5,6,7,8]
Although the three array
constants are identical, as far as HLA is concerned, you might want to use
these three different forms to suggest the shape of the array in an actual
declaration, e.g.,
const
a8: int32[8] :=
[1,2,3,4,5,6,7,8];
a2x4: int32[2,4] :=
[ [1,2,3,4], [5,6,7,8] ];
a2x2x2: int32[2,2,2] :=
[[[1,2], [3,4] ], [[5,6], [7,8]]];
Also note that symbol
array constants, not just literal array constants, may appear in a literal
array constant. For example, the
following literal array constant creates a nine-element array holding the
values one through nine at indexes zero through eight:
const Nine: int32[
9 ] :=
[ a8, 9 ];
This assumes, of course,
that "a8" was previously declared as above. Since HLA "flattens" all array constants, you
could have substituted a2x4 or ax2x2x for a8 in the example above and obtained
identical results.
As a convenience to those
building array constants using the HLA compile-time language, an HLA array
constant will allow an extra comma at the end of the list of array elements,
e.g.,
const
a8: int32[8] :=
[1,2,3,4,5,6,7,8, ]; // Note extra comma after ’8’
Note that this does not
create an "empty" element in the array. The array (in this example)
still has eight elements. Allowing the extra comma at the end of the list
allows you to generate the list programmatically (using the HLA compile-time
language) without requiring a special case for the last item in the list (which
would normally need to be handled specially because there is no comma after the
last item when using "clean" syntax). For example, consider the
following definition of a8:
a8: int32[8]
:=
[
#for(
i := 1 to 8 )
i,
#endfor
];
This array definition is
exactly equivalent to the previous one (including the extra comma at the end).
Prior to the addition of this feature in HLA, you’d have to use a kludge like
the following to handle the last element:
a8: int32[8]
:=
[
#for(
i := 1 to 8 )
i,
#endfor
8 // Handle last element
specially
];
Though allowing an extra
comma at the end of the list is aesthetically unappealing, kludges like the
#for loop immediately above is an even worse offense.
You may also create an
array constant using the HLA DUP operator.
This operator uses the following syntax:
expression DUP [expression_to_replicate]
Where expression is an integer expression and expression_to_replicate is a some expression, possibly an array
constant. HLA generates an array
constant by repeating the values in the expression_to_replicate the number of times specified by the expression.
(Note: this does not create an array with expression elements unless the expression_to_replicate contains only a single value; it creates an array whose element count
is expression times the
number of items in the expression_to_replicate).
Examples:
10 dup [1] -- equivalent to
[1,1,1,1,1,1,1,1,1,1]
5 dup [1,2] -- equivalent to [1,2,1,2,1,2,1,2,1,2]
Please note that HLA does
not allow class constants, so class objects may not appear in array
constants. Also, HLA does not
allow generic pointer constants, only certain types of pointer constants are
legal. See the discussion of
pointer constants for more details.
16.8.8.2
Record Constants
16.7.4.3 Note: see Record Data Types
for
details about HLA Records.
HLA supports record
constants using a syntax very similar to array constants. You enclose a comma-separated list of
values for each field in a pair of square brackets. To further differentiate array and record constants, the
name of the record type and a colon must precede the opening square bracket,
e.g.,
Planet:[ 1, 12, 34, 1.96
]
HLA associates the items
in the list with the fields as they appear in the original record
declaration. In this example, the
values 1, 12, 34, and 1.96 are
associated with fields x, y, z,
and density, respectively. Of
course, the types of the individual constants must match (or be conformable to)
the types of the individual fields.
Note that you may not
create a record constant for a particular record type if that record includes
data types that cannot have compile-time constants associated with them. For example, if a field of a record is
a class object, you cannot create a record constant for that type since you
cannot create class constants.
16.8.8.3
Union Constants
16.7.4.2 Note: see Union Data Types
for
more details about HLA’s UNION types.
Union constants allow you
to initialize static union data structures in memory as well as initialize union
fields of other data structures (including anonymous union fields in records).
There are some important differences between HLA compile-time union constants
and HLA run-time unions (as well as between the HLA run-time union constants
and unions in other languages). Therefore, it’s a good idea to begin the
discussion of HLA’s union constants with a description of these differences.
There are a couple of
different reasons people use unions in a program. The original reason was to share a sequence of memory
locations between various fields whose access is mutually exclusive. When using a union in this manner, one
never reads the data from a field unless they’ve previous written data to that
field and there are no intervening writes to other fields between that previous
write and the current read. The
HLA comile-time language fully (and only) supports this use of union objects.
A second reason people use
unions (especially in high level languages) is to provide aliases to a given
memory location; particularly,
aliases whose data types are different.
In this mode, a programmer might write a value to one field and then
read that data using a different field (in order to access that data’s bit
representation as a different type).
HLA does not support this type of access to union constants. The
reason is quite simple: internally, HLA uses a special "variant" data
type to represent all possible constant types. Whenever you create a union constant, HLA lets you provide a
value for a single data field.
From that point forward, HLA effectively treats the union constant as a
scalar object whose type is the same as the field you’ve initialized; access to the other fields through the
union constant is no longer possible.
Therefore, you cannot use HLA compile-time constants to do type
coercion; nor is there any need to
since HLA provides a set of type coercion operators like @byte, @word, @dword, @int8, etc.
As noted above, the main purpose for providing HLA union constants is to
allow you to initialize static union variables; since you can only store one value into a memory location at
a time, union constants only need to be able to represent a single field of the
union at one time (of course, at
run-time you may access any field of the static union object you’ve
created; but at compile-time you
may only access the single field associated with a union constant).
An HLA literal union
constant takes the following form:
typename.fieldname:[ constant_expression ]
The typename component above must be the name of a previously
declared HLA union data type (i.e., a union type you’ve created in the type section).
The fieldname component
must be the name of a field within that union type. The constant_expression component must be a constant value (expression) whose type is the same
as, or is automatically coercable to, the type of the fieldname field.
Here is a complete example:
type
u:union
b:byte;
w:word;
d:dword;
q:qword;
endunion;
static
uVar :u := u.w:[$1234];
The declaration for uVar initializes the first two bytes of this object in
memory with the value $1234. Note
that uVar is actually eight
bytes long; HLA automatically
zeros any unused bytes when initializing a static memory object with a union
constant.
Note that you may place a
literal union constant in records, arrays, and other composite data
structures. The following is a
simple example of a record constant that has a union as one of its fields:
type
r :record
b:byte;
uf:u;
d:dword;
endrecord;
static
sr :r := r:[0,
u.d:[$1234_5678], 12345];
In this example, HLA
initializes the sr
variable with the byte value zero, followed by a dword containing $1234_5678 and a dword containing zero (to pad out the remainder of the
union field), followed by a dword containing 12345.
You can also create
records that have anonymous unions in them and then initialize a record object
with a literal constant. Consider
the following type declaration with an anonymous union:
type
rau :record
b:byte;
union
c:char;
d:dword;
endunion;
w:word;
endrecord;
Since anonymous unions
within a record do not have a type associated with them, you cannot use the
standard literal union constant syntax to initialize the anonymous union field
(that syntax requires a type name).
Instead, HLA offers you two choices when creating a literal record
constant with an anonymous union field.
The first alternative is to use the reserved word union in place of a
typename when creating a literal union constant, e.g.,
static
srau :rau := rau:[ 1, union.d:[$12345], $5678 ];
The second alternative
is a shortcut notation. HLA allows
you to simply specify a value that is compatible with the first field of the
anonymous union and HLA will assign that value to the first field and ignore
any other fields in the union, e.g.,
static
srau2 :rau := rau:[ 1, ’c’, $5678 ];
This is slightly
dangerous since HLA relaxes type checking a bit here, but when creating tables
of record constants, this is very convenient if you generally provide values
for only a single field of the anonymous union; just make sure that the commonly used field appears first
and you’re in business.
Although HLA allows
anonymous records within a union, there was no syntactically acceptable way to
differentiate anonymous record fields from other fields in the union; therefore, HLA does not allow you to
create union constants if the union type contains an anonymous record. The easy workaround is to create a
named record field and specify the name of the record field when creating a
union constant, e.g.,
type
r :record
c:char;
d:dword;
endrecord;
u :union
b:byte;
x:r;
w:word;
endunion;
static
y :u := u.x:[ r:[ ’a’, 5]];
You may declare a union
constant and then assign data to the specific fields as you would a record
constant. The following example
provides some samples of this:
type
u_t :union
b:byte;
x:r;
w:word;
endunion;
val
u :u_t;
.
.
.
?u.b := 0;
.
.
.
?u.w := $1234;
The two assigments above
are roughly equivalent to the following:
?u := u_t.b:[0];
and
?u := u_t.w:[$1234];
However, to use the
straight assignment (the former example) you must first declare the value u as a u_t union.
To access a value of a
union constant, you use the familiar "dot notation" from records and
other languages, e.g.,
?x := u.b;
.
.
.
?y := u.w & $FF00;
Note, however, that you
may only access the last field of the union into which you’ve stored some
value. If you store data into one
field and attempt to read the data from some other field of the union, HLA will
report an error. Remember, you
don’t use union constants as a sneaky way to coerce one value’s type to another
(use the coercion functions for that purpose).
16.8.8.4
Pointer Constants
16.7.4.4 Note: see Pointer Types
for
more details about HLA pointer types.
HLA allows a very limited form
of a pointer constant. If you
place an ampersand ("&") in front of a static object’s name
(i.e., the name of a static variable, readonly variable, uninitialized
variable, segment variable, procedure, method, or iterator), HLA will compute
the run-time offset of that variable.
Pointer constants may not be used in abitrary constant expressions. You may only use pointer constants in
expressions used to initialize static or readonly variables or as constant
expressions in 80x86 instructions.
The following example demonstrates how pointer constants can be used:
program
pointerConstDemo;
static
t:int32;
pt: pointer to int32 :=
&t;
begin pointerConstDemo;
mov( &t, eax );
end pointerConstDemo;
Also note that HLA allows the
use of the reserved word NULL anywhere a pointer constant is legal. HLA substitutes the value zero for
NULL.
You may also supply a numeric
constant offset to a pointer constant using the index operator (“[]”). For example, “&t[4]” is a
pointer constant that references four bytes beyond the address of t.
16.8.8.5 Regular
Expression Constants
HLA uses compile-time
“regex”-typed variables to hold compiled versions of regular expression. There
is no literal form of a regular expression constant. The only way to generate a
regular expression constant is in a VAL, CONST, or “?” declaration by assigning
the “value” of a #regex macro declaration to a symbol, e.g.,
#regex someRegexMacro;
<<regex macro body>>
#endregex
const
compiledRegex :regex := someRegexMacro;
See the section on regular
expressions for more details.
16.9
Constant Expressions in HLA
HLA provides a rich expression
evaluator to process assembly-time expressions. HLA supports the following operators (sorting by decreasing
precedence):
! (unary not), - (unary negation)
*, div, mod, /, <<,
>>
+, -
=, = =, <>, !=, <=,
>=, <, >
&, |, &, in
The following subsections
describe each of these operators in detail.
16.9.1 Type Checking and Type Promotion
Many dyadic (two-operand)
operators expect the types of their operands to be the same. Prior to actually performing such an
operation, HLA evaluates the types of the operands and attempts to make them
compatible. HLA uses a type
algebra to determine if two (different) types are compatible; if they are not, HLA will report a type
mismatch error during assembly. If
the types are compatible, HLA will make them identical via type promotion. The
type algebra describes how HLA promotes one type to another in order to make
the two types compatible.
Usually, you can state a type
algebra easily enough by providing "algebraic" type equations. For example, in high level languages
one could use a statement like "r = r + i" to suggest that the type
of the resulting sum is real when the left operand is real and the right
operand is integer (around the "+" operator). Unfortunately, HLA supports so many
different data types and operators that any attempt to describe the type
algebra in this fashion will produce so many equations that it would be difficult
for the reader to absorb it all.
Therefore, this document will rely on an informal English description of
the type algebra to explain how HLA operates.
First of all, if two operands
have the same basic type, but are different sizes, HLA promotes the smaller
object to the same size as the larger object. Basic types include the following sets: {uns8, uns16, uns32,
uns64, uns128}, {int8, int16,
int32, int64, int128}, {byte,
word, dword, qword, lword}, and
{real32, real64, real80}[7]. So if any two operands appear from one
of these sets, then both operands are promoted to the larger of the two types.
If an unsigned and a signed
operand appear around an operator, HLA produces a signed result. If the unsigned operand is smaller than
the signed operand, HLA assigns both operands the signed type prior to the operation. If the unsigned and signed operands are
the same size (or the unsigned operand is larger), HLA will first check the
H.O. bit of the unsigned operand.
If it is set, then HLA promotes the unsigned operand to the next larger
signed type (e.g., uns16
becomes int32). If the resulting signed type is larger
than the other operand’s type, it gets promoted as well. This scheme fails if you’ve got an uns128 value whose H.O. bit is set. In that case, HLA promotes both
operands to int128 and
will produce incorrect results (since the uns128 value just went negative when it’s really
positive). Therefore, you should
attempt to limit unsigned values to 127 bits if you’re going to be mixing
signed and unsigned operations in the same expression.
Any mixture of hexadecimal
types (byte, word, dword, qword, or lword) and an unsigned type produces an
unsigned type; the size of the
resulting unsigned type will be the larger of the two types. Likewise, any mixture of hexadecimal
types and signed integer types will produce a signed integer whose size is the
larger of the two types. This
"strengthening" of the type (hexadecimal types are "weaker"
than signed or unsigned types) may seem counter-intuitive to a die-hard
assembly programmer; however,
making the result type hexadecimal rather than signed/unsigned can create
problems if the result has the H.O. bit set since information about whether the
result is signed or unsigned would be lost at that point.
Mixing unsigned values and a real32 value will produce a real32 result or an error. HLA produces an error if the unsigned value requires more
than 24 bits to represent exactly (which is the largest unsigned value you may
represent within the real32
format). Note that in addition to
promoting the unsigned type to real32, HLA will also convert the unsigned value to a real32 value (promoting the type is not the same thing as
converting the value; e.g.,
promoting uns8 to uns16 simply involves changing the type designation of
the uns8 object, HLA doesn’t have to deal with the actual value at all since it
keeps all values in an internal 128 bit format; however, the binary representation for unsigned and real32 values is completely different, so HLA must do the
value conversion as well). Note
that if you really want to convert a value that requires more than 24 bits of
precision to a real32
object (with truncation), just convert the unsigned operand to real64 or real80 and then convert the larger operand to real32 using the real32(expr) compile-time function. Since unsigned values are, well, unsigned and real32 objects are signed, the conversion process always
produces a non-negative value.
Mixing signed and real32 values in an expression produces a real32 result.
Like unsigned operands, signed operands are limited to 24 bits of precision
or HLA will report an error.
Technically, you should get one more bit of precision from signed
operands (since the real32 format maintains its sign apart from the mantissa),
but HLA still limits you to 24 bits during this conversion. If the signed integer value is
negative, so will be the real32 result.
If you mix hexadecimal and real32 types, HLA treats the hexadecimal type as an
unsigned value of the same size.
See the discussion of unsigned and real32 values earlier for the details.
If you mix an unsigned, signed,
or hexadecimal type with a real64 type, the result is an error (if HLA cannot exactly represent the value
in real64 format) or a real64 result.
The conversion is very similar to the real32 conversion discussed above except you get 52 bits
of integer precision rather than only 24 bits.
If you mix an unsigned, signed,
or hexadecimal type with a real80 type, the result is an error (if HLA cannot exactly represent the value
in real80 format) or a real80 result.
The conversion is very similar to the real32 conversion discussed above except you get 64 bits
of integer precision rather than only 24 bits. Note that conversion of integer objects 64-bits or less will
always proceed without error; 128-bit
values are the only ones that will get you into trouble.
If you mix a pair of different
sized real values in the same expression, HLA will promote (and convert) the
smaller real value to the same size as the larger real value.
The only non-numeric promotions
that take place in an expression are between characters and strings. If a
character and a string both appear in an expression, HLA will promote the
character to a string of length one[8].
16.9.2 Type Coercion in HLA
HLA will report a type mismatch
error if objects of incompatible types appear within an expression. Note that you may use the type-coercion
compile-time functions to convert between types that HLA does not automatically
support in an expression (see the discussion later in this document). You can also use the HLA type coercion
operator to attach a specific type to a constant expression. The type coercion
operator takes the following form:
typename(constexpr)
The typename component must be a valid, declared type
identifier (including any of the built-in types or appropriate user-defined
types). The constexpr component
can be any constant expression that is reasonably compatible with the specified
type. Reasonably compatible means that the types are the same size or one of
the primitive types. Examples:
int8( ‘a’)
real32( constExpression+2)
boolean( int8Val )
One important thing to
remember is that type coercion is a bitwise operation. No conversion is done
when coercing one type to another using this type coercion operation.
16.9.3 !expr
This is the logical NOT operation. The expression must be either boolean or a number. For boolean values, not computes the standard logical not operation. Numerically, HLA inverts only the L.O.
bit of boolean values and clears the remaining bits of the boolean value. Therefore, the result is always zero or
one when NOTing a boolean value
(even if the boolean object errantly contained other set bits prior to the
"!" operation).
Remember, the "!" operator only looks at the L.O. bit; if the value was originally non-zero
but the L.O. bit was clear[9],
then "!" produces true.
This is not a zero/not-zero operation.
For numbers, not computes the bitwise not operation on the bits of
the number, that is, it inverts all the bits. The exact semantics of this operation depend upon the
original data type of the value you’re inverting. Therefore, the result of applying the "!" operator
to an integer number may not always be intuitive because HLA always maintains 128-bits of
precision, regardless of the underlying data type. Therefore, a full explanation of this operator’s semantics
must be given on a type-by-type basis.
uns8: Bits 8..127 of an Uns8 object are always zero. Therefore, when you apply the "!" operator to an
Uns8 value, the result can no longer be an Uns8 object since bits 8..127 will now contain
ones. Zeroing out the H.O. bits is
not wise, because you could be assigning the result of this expression to a
larger data type and you may very well expect those bits to be set. Therefore, HLA converts the type of
"!u8expr" to type byte (which does allow the H.O. bits to contain non-zero values). If you assign an object of type byte to a larger object (e.g., type word), HLA will copy over the H.O. bits from the byte object to the larger object. Example:
val
u8 :uns8
:= 1;
b8 :=
!u8; //
produces $FFF..FFFE but registers as byte $FE.
w16
:word := b8; //
produces $FF..FFFE but registers as word $FFFE.
Note: If you really want to chop the value off at eight bits, you
can use the compile-time byte function to achieve this, e.g.,
val
u8 :uns8
:= 1;
b8 :=
byte(!u8); // produces $FE.
w16
:word := b8; // produces $00FE.
uns16: The semantics are similar to uns8 except, of course, applying "!" to an uns16 value produces a word value rather than a byte value.
Again, the "!" operator will set bits 16..127 to one in the
final result. If you want to
ensure that the final result contains no set bits beyond bit #15, use the
compile-time word
function to strip the value down to 16 bits (just like the byte function in the example above).
uns32: The semantics are similar to uns8 except, of course, applying "!" to an uns32 value produces a dword value rather than a byte value.
Again, the "!" operator will set bits 32..127 to one in the
final result. If you want to
ensure that the final result contains no set bits beyond bit #31 use the
compile-time dword
function to strip the value down to 32 bits (just like the byte function in the example above).
uns64: The semantics are similar to uns8 except, of course, applying "!" to an uns64 value produces a qword value rather than a byte value.
Again, the "!" operator will set bits 64..127 to one in the
final result. If you want to
ensure that the final result contains no set bits beyond bit #63 use the
compile-time qword
function to strip the value down to 64 bits (just like the byte function in the example above).
uns128: Applying the "!" operator to an uns128 object simply inverts all the bits. There are no funny semantics here.
Resulting expression type is set to lword.
int8: Same semantics as
byte (see explanation
below).
int16: Same semantics as word (see explanation below).
int32: Same semantics as dword (see explanation below).
int64: Same semantics as qword (see explanation below).
int128: Applying the "!" operator
to an int128 object simply
inverts all the bits. There are no
funny semantics here. Resulting expression type is set to lword.
byte: Bits 8..127 of a byte (int8) value must all be zeros or all ones. The "!" operator enforces this. If any of the H.O. bits are non-zero,
the "!" operator sets them all to zero in the result; if all of the H.O. bits are zero, the
"!" operator sets the H.O. bits to ones in the result. Of course, this operator inverts bits
0..7 in the original value and returns this inverted result. Note that the type of the new value is
always byte (even if the
original subexpression was int8).
word: Bits 16..127 of a word (int16) value must all be zeros or all ones. The "!" operator enforces this. If any of the H.O. bits are non-zero,
the "!" operator sets them all to zero in the result; if all of the H.O. bits are zero, the
"!" operator sets the H.O. bits to ones in the result. Of course, this operator inverts bits
0..15 in the original value and returns this inverted result. Note that the type of the new value is
always word (even if the
original subexpression was int16).
dword: Bits 32..127 of a dword (int32) value must all be zeros or all ones. The "!" operator enforces this. If any of the H.O. bits are non-zero,
the "!" operator sets them all to zero in the result; if all of the H.O. bits are zero, the
"!" operator sets the H.O. bits to ones in the result. Of course, this operator inverts bits
0..31 in the original value and returns this inverted result. Note that the type of the new value is
always dword (even if the
original subexpression was int32).
qword: Bits 64..127 of a qword (int64) value must all be zeros or all ones. The "!" operator enforces this. If any of the H.O. bits are non-zero,
the "!" operator sets them all to zero in the result; if all of the H.O. bits are zero, the
"!" operator sets the H.O. bits to ones in the result. Of course, this operator inverts bits
0..63 in the original value and returns this inverted result. Note that the type of the new value is
always qword (even if the
original subexpression was int64).
lword: Applying the "!"
operator to an lword
object simply inverts all the bits.
There are no funny semantics here..
No other types are legal with
the "!" operator. HLA
will report a type conflict error if you attempt to apply this operator to some
other type.
If the operand is one of the
integer types (signed, unsigned, hexadecimal), then HLA will set the type of
the result to the smallest type within that class (signed, unsigned, or
hexadecimal) that can hold the result (not including sign extension bits for
negative numbers or zero extension bits for non-negative values).
16.9.4 - expr (unary negation operator)
The expression must either be a numeric value or a
character set. For numeric values,
“-” negates the value. For
character sets, the “-” operator computes the complement of the character set
(that is, it returns all the characters not found in the set).
Again, the exact semantics
depend upon the type of the expression you’re negating. The following paragraphs explain
exactly what this operator does to its expression. For all integer values (unsXX, intXX, byte, word, dword,
qword, and lword), the negation operator always does a full 128-bit negation of
the supplied operand. The
difference between these different data types is how HLA sets the resulting
type of the expressions (as explained in the paragraphs below).
uns8: If the original value was in
the range 128..255, then the resulting type is int16, otherwise the resulting type is int8.
Since uns8
values are always positive, the negated result is always negative, hence the
result type is always a signed integer type.
uns16: If the original value was in the
range 32678..65535, then the resulting type is int32, otherwise the resulting type is int16.
Since uns16
values are always positive, the negated result is always negative, hence the
result type is always a signed integer type.
uns32: If the original value was in the
range $8000_0000..$FFFF_FFFF, then the resulting type is int64, otherwise the resulting type is int32.
Since uns32
values are always positive, the negated result is always negative, hence the
result type is always a signed integer type.
uns64: If the original value was in the
range $8000_0000_0000_0000..$FFFF_FFFF_FFFF_FFFF, then the resulting type is int128, otherwise the resulting type is int64.
Since uns64
values are always positive, the negated result is always negative, hence the
result type is always a signed integer type.
uns128: The result type is always set to int128. Note
that there is no check for overflow.
Effectively, HLA treats uns128 operands as though they were int128 operands with respect to negation. So really large positive (uns128) values become smaller unsigned values after the
negation. If you need to mix and match 128-bit values in an expression, you
should attempt to limit your unsigned values to 127 bits.
byte, int8,
word, int16,
dword, int32,
qword, int64,
lword,
int128: Negates the expression (full 128
bits) and assigns the original expression type to the result.
real32: Negates the real32 value and returns a real32 result.
real64: Negates the real64 value and returns a real64 result.
real64: Negates the real64 value and returns a real64 result.
cset: Computes the set
complement (returns cset
type). The set complement is all
the items that were not
previously in the set. Since HLA
uses a bitmap representation for character sets, the complement of a character
set is the same thing as inverting all the bits in the powerset.
If the operand is one of the
integer types (signed, unsigned, hexadecimal), then HLA will set the type of
the result to the smallest type within that class (signed, unsigned, or
hexadecimal) that can hold the result (not including sign extension bits for
negative numbers or zero extension bits for non-negative values).
16.9.5 expr1 * expr2
For numeric operands, the “*” operator produces their
product. For character set operands, the “*”operator
produces the intersection of the two sets. The exact result depends upon the types of the two operands
to the "*" operator. To
begin with, HLA attempts to make the types of the two operands identical if
they are not already identical.
HLA achives this via type promotion (see the discussion earlier).
If the operands are unsigned or
hexadecimal operands, HLA will compute their unsigned product. If the operands are signed, HLA computes their signed product. If the operands are real, HLA computes
their real product. If the
operands are integer (signed or unsigned) and less than (or equal to) 64 bits,
HLA computes their exact result.
If the operands are greater than 64 bits and their product would require
more than 128 bits, HLA quietly overflows without error. Note that HLA always performs a 128-bit
multiplication, regardless of the operands’ sizes; however, objects that require 64 bits or less of precision
will always produce a product that is 128 bits or less. HLA automatically extends the size of
the result to the next greater size if the product will not fit into an integer
that is the same size as the operands.
HLA will actually choose the smallest possible size for the product
(e.g., if the result only requires 16 bits of precision, the resulting type
will be uns16, int16, or word). The
resulting type is always unsigned if the operands were unsigned, signed if the
operands were signed, and hexadecimal if the operands were hexadecimal.
If the operands are real
operands, HLA computes their product and always produces a real80 result. If you want to produce a smaller result
via the ’*’ operator, use the real32 or real64
compile-time function to produce the smaller result, e.g., "real32( r32const * r32const2
)". Note that all real arithmetic inside
HLA is always performed using the FPU, hence the results are always real80.
Other than trying to simulate the actual products a running program
would produce, there is no real reason to coerce the product to a smaller
value.
If the operands are character
set operands, the ’*’ operator computes the intersection of the two sets. Since HLA uses a bitmap representation
for character sets, this operator does a bitwise logical AND of the two 16-byte
operands (this operation is roughly equivalent to applying the
"&" operator to two lword operands).
If the operand is one of the
integer types (signed, unsigned, hexadecimal), then HLA will set the type of
the result to the smallest type within that class (signed, unsigned, or
hexadecimal) that can hold the result (not including sign extension bits for
negative numbers or zero extension bits for non-negative values).
16.9.6 expr1 div expr2
The two expressions must be integer (signed,
unsigned, or hexadecimal) numbers.
Supplying any other data type as an operand will produce an error. The div operator divides the first expression by the
second and produces the truncated quotient result.
If the operands are unsigned,
HLA will do a full 128/128 bit division and the resulting type will be unsigned
(HLA sets the type to the smallest unsigned type that will completely hold the
result). If the operands are
signed, HLA will do a full 128/128 bit signed division and the resulting type
will be the smallest intXX type
that can hold the result. If the
operands are hexadecimal values, HLA will do a full 128/128 bit unsigned
division and set the resulting type to the smallest hex type that can hold the
result.
Note that the div operator does not allow real operands. Use the "/" operator for real
division.
HLA will set the type of the
result to the smallest type within its class (signed, unsigned, or hexadecimal)
that can hold the result (not including sign extension bits for negative
numbers or zero extension bits for non-negative values).
16.9.7 expr1 mod expr2
The two expressions must be integer (signed,
unsigned, or hexadecimal) numbers.
The mod
operator divides the first expression by the second and produces their
remainder (this value is always positive).
If the operands are unsigned,
HLA will do a full 128/128 bit division and return the remainder. The resulting
type will be unsigned (HLA sets the type to the smallest unsigned type that
will completely hold the result).
If the operands are signed, HLA
will do a full 128/128 bit signed division and return the remainder. The resulting type will be the smallest
intXX type that can hold
the result.
If the operands are hexadecimal
values, HLA will do a full 128/128 bit unsigned division and set the resulting
type to the smallest hex type that can hold the result.
Note that the mod operator does not allow real operands. You’ll have to define real modulus and
write the expression yourself if you need the remainder from a real division.
HLA will set the type of the
result to the smallest type within its class (signed, unsigned, or hexadecimal)
that can hold the result (not including sign extension bits for negative
numbers or zero extension bits for non-negative values).
16.9.8 expr1 / expr2
The two expressions must be numeric. The ’/’ operator divides the first expression by the
second and produces their (real80) quotient result.
If the operands are integers
(unsigned, signed, or hexadecimal) or the operands are real32 or real80, HLA first converts them to real80 before doing the division operation. The expression result is always real80.
16.9.9 expr1 << expr2
The two expressions must be integer (signed, unsigned,
or hexadecimal) numbers. The
second operand must be a small (32-bit or less) non-negative value in the range
0..128. The << operator shifts the first expression to the left
the number of bits specified by the second expression. Note that the result may require more
bits to hold than the original type of the left operand. Any bits shifted out of bit position
127 are lost.
HLA will set the type of the result to the smallest type
within the left operand’s class (signed, unsigned, or hexadecimal) that can
hold the result (not including sign extension bits for negative numbers or zero
extension bits for non-negative values). Note that the ’<<’ operator can yield a smaller
type (specifcally, an eight bit type) if it shifts all the bits off the H.O.
end of the number; generally,
though, this operation produces larger result types than the left operand.
16.9.10 expr1 >> expr2
The two expressions must be integer (signed, unsigned,
or hexadecimal) numbers. The
second operand must be a small (32-bit or less) non-negative value in the range
0..128. The >> operator shifts the first expression to the right
the number of bits specified by the second expression. Any bits shifted out of the L.O. bit
are lost. Note that this shift is
a logical shift right, not an arithmetic shift right (this is true even if the left operand
is an INTxx value). Therefore,
this operation always shifts a zero into bit position 127.
Shift rights may produce a
smaller type that the value of the left operand. HLA will always set the type of the result value to the
minimum type size that has the same base class as the left operand.
16.9.11 expr1 + expr2
If the two expressions are numeric, the “+” operator
produces their sum.
If the two expressions are strings or characters,
the “+” operator produces a new string by concatenating the right expression to
the end of the left expression.
If the two operands are character sets, the “+”
operator produces their union.
If the operands are integer
values (signed, unsigned, or hexadecimal), then HLA adds them together. Any overflow out of bit #127 (unsigned
or hexadecimal) or bit #126 (signed) is quietly lost. HLA sets the type of the result to the smallest type size
that will hold the sum; the type
class (signed, unsigned, hexadecimal) will be the same as the operands. Note that it is possible for the type
size to grow or shrink depending on the values of the operands (e.g., adding a
positive and negative number could reduce the type size, adding two positive or
two negative numbers may expand the result type’s size).
When adding two real values (or
a real and an integer value), HLA always produces a real80 result.
Since HLA uses a bitmap to
represent character sets, taking the union of two character sets is the same as
doing a bitwise logical OR of all 16 bytes in the character set.
16.9.12 expr1 - expr2
If the two expressions are numeric, the “-” operator
produces their difference.
If the two expressions are
character sets, the “-” operator produces their set difference (that is, all
the characters in expr1 that
are not also in expr2).
If the operands are integer
values (signed, unsigned, or hexadecimal), then HLA subtracts the right operand
from the left operand. Any
overflow out of bit #127 (unsigned or hexadecimal) or bit #126 (signed) is
quietly lost. HLA sets the type of
the result to the smallest type size that will hold their difference; the type class (signed, unsigned,
hexadecimal) will be the same as the operands. Note that it is possible for the type size to grow or shrink
depending on the values of the operands (e.g., subtracting two negative or
non-negative numbers could reduce the type size, subtracting a negative value
from a non-negative value may expand the result type’s size).
When subtracting two real
values (or a real and an integer value), HLA always produces a real80 result.
Since HLA uses a bitmap to
represent character sets, taking the set of two character sets is the same as
doing a bitwise logical AND of the left operand with the inverse of the right
operand.
16.9.13 Comparisons
(=, ==, <>, !=, <, <=, >, and >=)
expr1 = expr2
expr1 == expr2
expr1 <> expr2
expr1 != expr2
expr1 < expr2
expr1 <= expr2
expr1 > expr2
expr1 >= expr2
(note: “!=” and “<>” operators are
identical. “=” and “==” operators
are identical.)
The two expressions must be
compatible (described earlier).
These operators compare the two operands and return true or false
depending upon the result of the comparison.
You may use the "="
and "<>" operators to compare two pointer constants (e.g.,
"&abc" or "&ptrVar"). The other operators do not allow pointer constant operands.
All the above operators allow
you to compare boolean values, enumerated values (types must match), integer (signed, unsigned, hexadecimal)
values, character values, string values, real values, and character set values.
When comparing boolean values,
note that false < true.
One character set is less than
another if it is a proper subset of the other. A character set is less than or equal to another set if it
is a subset of that second set. Likewise,
one character set is greater than, or greater than or equal to, another set if
it is a proper superset, or a superset, respectively.
As with any programming
language, you should take care when comparing two real values (especially for
equality or inequality) as minor precision drifts can cause the comparison to
fail.
16.9.14
expr1 & expr2
(note: "&&"
and "&" mean different things to HLA. See the section on high level language control structures
for details on the "&&" operator.)
The operands must both be boolean or they must
both be numbers. With boolean
operands the AND operator produces the logical and of the two operands (boolean
result). With numeric operands,
the AND operator produces the bitwise logical AND of the operands.
If the operand is one of the
integer types (signed, unsigned, hexadecimal), then HLA will set the type of
the result to the smallest type within that class (signed, unsigned, or
hexadecimal) that can hold the result.
16.9.15
expr1 in expr2
The first expression must be a
character value. The second
expression must be a character set.
The in
operator returns true if the
character is a member of the specified character set; it returns false otherwise.
16.9.16
expr1 | expr2
(note: "||" and "|" mean
different things to HLA. See the
section on high level language control structures for details on the
"||" operator.)
The operands must both be
boolean or they must both be numbers.
With boolean operands the OR operator produces the logical OR of the two
operands (boolean result). With
numeric operands, the OR operator produces the bitwise or of the operands.
If the operand is one of the
integer types (signed, unsigned, hexadecimal), then HLA will set the type of
the result to the smallest type within that class (signed, unsigned, or
hexadecimal) that can hold the result.
16.9.17
expr1 ^ expr2
The operands must both be boolean or they must
both be numbers. With boolean
operands the xor operator
produces the logical exclusive-or of the two operands (boolean result). With number operands, the xor operator produces the bitwise exclusive-or of the
operands.
If the operand is one of the
integer types (signed, unsigned, hexadecimal), then HLA will set the type of
the result to the smallest type within that class (signed, unsigned, or
hexadecimal) that can hold the result.
16.9.18
( expr )
You may override the precedence of any operator(s) using
parentheses in HLA constant expressions.
16.9.19
[ comma_separated_list_of_expressions ]
This produces an array
expression. The type of the
expression is an array type whose base element is the type of one of the
expressions in the list. If there
are two or more constant types in the array expression, HLA promotes the type
of the array expression following the rules for mixed-mode arithmetic (see the
rules earlier in this document).
16.9.20
record_type_name : [ comma_separated_list_of_field_expressions ]
This produces a record
expression. The expressions
appearing within the brackets must match the respective fields of the specified
record type. See the discussion
earlier in this chapter.
16.9.21
identifier
An identifier is a legal
component of a constant expression if the identifier’s classification is CONST
or VAL (that is, the identifier was declared in a constant or value section of
the program). The expression
evaluator substitutes the current declared value and type of the symbol within
the expression. Constant
expressions allow the following types:
Boolean, enumerated types, Uns8,
Uns16, Uns32, Uns64, Uns128
Byte, Word, DWord, QWord,
LWord, Int8, Int16, Int32, Int64, Int128, Char, Real32, Real64, Real80, String, and Cset.
You may also specify arrays
whose element base type is one of the above types (or a record or union subject
to the following restriction).
Likewise, you can specify record or union constants if all of their
respective fields are one of the above primitive types or a value array,
record, or union constant.
HLA allows array, record, and
union constants. If you specify
the name of an array, for example, HLA works with all the values of that
array. Likewise, HLA can copy all
the values of a record or union with a single statement.
HLA allows literal Unicode
character and string constants (e.g., u’a’ and u"unicode") or
identifiers that are of wchar or wstring type in an expression, but no other terms are
allowed in such an expression (as this is being written).
16.9.22
identifier1.identifier2 {...}
Selects a field from a record or union constant. Identifier1 must be a record or union object defined in a
const or val section. Identifier2 (and any following dot-identifiers) must be a
field of the record or union. HLA
replaces this object with the value of the specified field.
Examples:
recval.fieldval
recval.subrecval.fieldval
Don’t forget that with union
constant, you may only access the last field into which you’ve actually stored
data (see the section on union constants for more details).
16.9.23
identifier [ index_list ]
Identifier must be an array constant defined in either a const
or val section. Index_list is a list of constant expressions separated by
commas. The index list selects a
specified element of the “identifier” array. HLA reports an error if you supply more indices than the
array has dimensions. HLA returns
an array slice if you specify fewer indices than the array has dimensions (for
example, if an array is declared as “a:uns8[4,4]” and you specify “a[2]” in a
constant expression, HLA returns the third row of the array (a[2,0]..a[2,3]) as
the value of this term).
Examples:
arrayval[0]
aval[1,4,0]
16.10
Program Structure
An HLA program uses the following general syntax:
program identifier ;
declarations
begin identifier;
statements
end identifier;
The three identifiers above
must all match. The declaration
section (declarations) consists of label, type, const, val, var, static,
uninitialized, readonly, segment, procedure, and macro definitions (all
described later). Any number of
these sections may appear and they may appear in any order; more than one of each section may
appear in the declaration section.
Example:
program TestPgm;
type
integer: int16;
const
i0 : integer := 0;
var
i:integer;
begin TestPgm;
mov( i0, i );
end TestPgm;
If you wish to write a library
module that contains only procedures and no main program, you would use an HLA
unit. Units have a syntax that is
nearly identical to programs, there just isn’t a begin associated with the
unit, e.g.,
unit TestPgm;
procedure
LibraryRoutine;
begin LibraryRoutine;
<<
etc. >>
end LibraryRoutine;
end TestPgm;
16.10.1 Statement
Labels
A statement label is an
identifier that appears within a code section (i.e., the body of your main
program or a procedure or iterator) followed by a colon. The identifier is
given the type “label” and the value associated with the label is the current
value of the location counter.
Statement labels can be used as the target of a JMP or CALL instruction.
You may also take their address with the “&” (address-of) operator.
Within an operand field of a
jump or conditional jump instruction, you may also use the “@here” reserved
word to denote the current location counter value (by current, this means the
address of the start of the current instruction). For example,
jmp @here;
creates an infinite loop, and
call @here[5];
transfers control to the
subroutine immediately beyond the call instruction (which is five bytes long).
16.10.2
Procedure
Declarations
Procedure declarations are
nearly identical to program declarations with two major differences: procedures
are declared using the “procedure” reserved word and procedures may have
parameters. The general syntax is:
procedure identifier ( optional_parameter_list ); procedure_options
declarations
begin identifier;
statements
end identifier;
Note that you may declare
procedures inside other procedure in a fashion analogous to most
block-structured languages (e.g., Pascal).
The optional parameter list
consists of a list of var-type declarations taking the form:
optional_access_keyword
identifier1 : identifier2 optional_in_reg
optional_access_keyword, if present, must be val,
var,
valres, result, name,
or lazy
and defines the parameter passing mechanism (pass by value, pass by reference,
pass by value/result [or value/returned], pass by result, pass by name, or pass
by lazy evaluation, respectively).
The default is pass by value (val) if an access keyword is not present. For pass by value parameters, HLA allocates the specified
number of bytes according to the size of that object in the activation
record. For pass by reference,
pass by value/result, and pass by result, HLA allocates four bytes to hold a
pointer to the object. For pass by
name and pass by lazy evaluation, HLA allocates eight bytes to hold a pointer
to the associated thunk and a pointer to the thunk’s execution environment (see
the sections on parameters and thunks for more details).
The optional_in_reg clause, if present, corresponds to the phrase
"in reg" where reg is one of the 80x86’s general purpose 8-, 16-, or
32-bit registers. You must take
care when passing parameters through the registers as the parameter names
become aliases for registers and this can create confusion when reading the
code later (especially if, within a procedure with a register parameter, you
call another procedure that uses that same register as a parameter).
HLA also allows a special
parameter of the form:
var identifer : var
This creates an untyped reference parameter. You may specify any
memory variable as the corresponding actual parameter and HLA will compute the
address of that object and pass it on to the procedure without further type
checking. Within the procedure,
the parameter is given the DWORD type.
The procedure_options component above is a list of keywords that specify
how HLA emits code for the procedure.
There are several different procedure options available: @noalignstack, @alignstack, @pascal, @stdcall, @cdecl, @align( int_const), @use reg32, @leave, @noleave, @enter, @noenter, and @returns("text").
Option |
Description |
@noframe, @frame |
By default, HLA emits code at the beginning of the procedure to construct a stack frame. The @noframe option disables this action. The @frame option tells HLA to emit code for a particular procedure if stack frame generation is off by default. HLA also uses these two special identifiers as a compile-time variable to set the default display generation for all procedures. Setting @frame to true (or @noframe to false) turns on frame generation by default; setting @frame to false (or @noframe to true) turns off frame generation. |
@nodisplay, @display |
By default, HLA emits code at the beginning of the procedure to construct a display within the frame. The @nodisplay option disable this action. The @display option tells HLA to emit code to generate a display for a particular procedure if display generation is off by default. Note that HLA does not emit code to construct the display if ’@noframe’ is in effect, though it will assume that the programmer will construct this display themselves. HLA also uses these two special identifiers as a compile-time variable to set the default display generation for all procedures. Setting @display to true (or @nodisplay to false) turns on display generation by default; setting @display to false (or @nodisplay to true) turns off display generation. |
@noalignstack, @alignstack |
By default (assuming frame generation is active), HLA will an instruction to align ESP on a four-byte boundary after allocating local variables. Win32, Linux, and other 32-bit OSes require the stack to be dword-aligned (hence this option). If you know the stack will be dword-aligned, you can eliminate this extra instruction by specifying the @noalignstack option. Conversely, you can force the generation of this instruction by specifying the @alignstack procedure option. HLA also uses these two special identifiers as a compile-time variable to set the default display generation for all procedures. Setting @alignstack to true (or @noalignstack to false) turns on stack alignment generation by default; setting @alignstack to false (or @noalignstack to true) turns off stack alignment code generation. |
@pascal, @cdecl, @stdcall |
These options give you the ability to specify the parameter passing mechanism for the procedure. By default, HLA uses the @pascal calling sequence. This calling sequence pushes the parameters on the stack in a left-to-right order (i.e., in the order they appear in the parameter list). The @cdecl procedure option tells HLA to pass the parameters from right-to-left so that the first parameter appears at the lowest address in memory and that it is the user’s responsibility to remove the parameters from the stack. The @stdcalll procedure option is a hybrid of the @pascal and @cdecl calling conventions. It pushes the parameters in the right-to-left order (just like @cdecl) but @stdcall procedures automatically remove their parameter data from the stack (just like @pascal). Win32 API calls use the @stdcall calling convention. |
@align( int_constant ) |
The @align( int_const ) procedure option aligns the procedure on a 1, 2, 4, 8, or 16 byte boundary. Specify the boundary you desire as the parameter to this option. The default is @align(1), which is unaligned; HLA also uses this special identifiers as a compile-time variable to set the default procedure alignment . Setting @align := 1 turns off procedure alignment while supplying some other value (which must be a power of two) sets the default procedure alignment to the specified number of bytes. |
@use reg32 |
When passing parameters, HLA can sometimes generate better code if it has a 32-bit general purpose register for use as a scratchpad register. By default, HLA never modifies the value of a register behind your back; so it will often generate less than optimal code when passing certain parameters on the stack. By using the @use procedure option, you can specify one of the following 32-bit registers for use by HLA: eax, ebx, ecx, edx, esi, or edi. By providing one of these registers, HLA may be able to generate significantly better code when passing certain parameters. |
@returns( "text" ) |
This option specifies the compile-time return value whenever a function name appears as an instruction operand. For example, suppose you are writing a function that returns its result in EAX. You should probably specify a "returns" value of "EAX" so you can compose that procedure just like any other HLA machine instruction (see the example below and the section on machine instructions for more details). |
@leave, @noleave |
These two options control the code generation for the standard exit
sequence. If you specify the @leave option then HLA emits the x86 LEAVE instruction to clean up the activation record
before the procedure returns. If
you specify the @noleave
option, then HLA emits the primitive instructions to achieve this, e.g.,
mov( ebp, esp );
pop( ebp ); The manual sequence is faster on some architectures, the LEAVE instruction is always shorter. Note that @noleave occurs by default if you’ve specified @noframe. By default, HLA assumes @noleave but you may change the default using these special identifiers as a compile-time variable to set the default LEAVE generation for all procedures. Setting @leave to true (or @noleave to false) turns on LEAVE generation by default; setting @leave to false (or @noleave to true) turns off the use of the LEAVE instruction. |
@enter, @noenter |
These two options control the code generation for a procedure’s
standard entry sequence. If you
specify the @enter
option then HLA emits the x86 ENTER instruction to create the activation
record. If you specify the @noenter option, then HLA emits the primitive
instructions to achieve this. The manual sequence is always faster, using the ENTER instruction is usually shorter. Note that @noenter occurs by default if you’ve specified @noframe. By default, HLA assumes @noenter but you may change the default using these special identifiers as a compile-time variable to set the default ENTER generation for all procedures. Setting @enter to true (or @noenter to false) turns on ENTER generation by default; setting @enter to false (or @noenter to true) turns off the use of the ENTER instruction. |
The following example demonstrates how the @returns option works:
program returnsDemo;
#include( "stdio.hhf" );
procedure eax0;
@returns( "eax" );
begin eax0;
mov(
0, eax );
end eax0;
begin returnsDemo;
mov( eax0(), ebx );
stdout.put(
"ebx=", ebx, nl );
end returnsDemo;
To help those who insist on
constructing the activation record themselves, HLA declares two local constants
within each procedure: _vars_ and _parms_. The _vars_ symbol is an integer constant that specifies the
number of local variables declared in the procedure. This constant is useful when allocating storage for your
local variables. The _parms_ constants specifies the number of bytes of
parameters. You would normally
supply this constant as the parameter to a ret() instruction to automatically
clean up the procedure’s parameters when it returns.
If you do not specify @nodisplay, then HLA defines a run-time variable named _display_ that is an array of pointers to activation
records. For more details on the _display_ variable, see the section on lexical scope.
You can also declare @external procedures (procedures defined in other
HLA units or written in languages other than HLA) using the following syntaxes:
procedure externProc1 (optional
parameters) ; @returns(
"text" ); @external;
procedure externProc2 (optional
parameters) ;
@returns( "text" ); @external( "external_name" );
As with normal procedure
declarations, the parameter list and @returns clause are optional.
The first form is generally
used for HLA-written functions.
HLA will use the procedure’s name (externProc1 in this case) as external
name.
The second form lets you refer
to the procedure by one name (externProc2 in this case) within your HLA program and by a different name
("different_name" in this example) in the externally generated
code. This second form has two
main uses: (1) if you choose an external procedure name that just happens to be
a back-end assembler reserved word, the program may compile correctly but fail
to assemble. Changing the external
name to something else solves this problem. (2) When calling procedures written in external languages you
may need to specify characters that are not legal in HLA identifiers. For example, Win32 API calls often use
names like "WriteFile@24" containing illegal (in HLA) identifier
symbols. The string operand to the
external option lets you specify any name you choose. Of course, it is
your responsibility to see to it that you use identifiers that are
compatible with the linker and back-end assembler, HLA doesn’t check these
names.
By default, HLA does the
following:
•
Creates a display for
every procedure.
•
Emits code to
construct the stack frame for each procedure.
•
Emits code to align
ESP on a four-byte boundary upon procedure entry.
•
HLA assumes that it
cannot modify any register values when passing (non-register) parameters.
•
The first instruction
of the procedure is unaligned.
These options are the most general
and "safest" for beginning assembly language programmers. However, the code HLA generates for
this general case may not be as compact or as fast as is possible in a specific
case. For example, few procedures
will actually need a display data structure built upon procedure
activation. Therefore, the code
that HLA emits to build the display can reduce the efficiency of the
program. Advanced programmers, of
course, can use procedure options like "@nodisplay" to tell HLA to
skip the generation of this code.
However, if a program contains many procedures and none of them need a
display, continually adding the "@nodisplay" option can get really
old. Therefore, HLA allows you to
treat these directives as "pseudo-compile-time-variables" to control
the default code generation. E.g.,
? @display := true; // Turns on default display
generation.
? @display := false; // Turns off default display
generation.
? @nodisplay := true; // Turns off default display
generation.
? @nodisplay := false; // Turns on default display
generation.
? @frame := true; // Turns on default frame
generation.
? @frame := false; // Turns off default frame
generation.
? @noframe := true; // Turns off default frame generation.
? @noframe := false; // Turns on default frame
generation.
? @alignstack := true; // Turns on default stk
alignment code generation.
? @alignstack := false; // Turns off default stk
alignment code generation.
? @noalignstack := true; // Turns off default stk
alignment code generation.
? @noalignstack := false; // Turns on default stk
alignment code generation.
? @enter := true; // Turns on default ENTER code
generation.
? @enter := false; // Turns off default ENTER code
generation.
? @noenter := true; // Turns off default ENTER code
generation.
? @noenter := false; // Turns on default ENTER
code generation.
? @leave := true; // Turns on default LEAVE code generation.
? @leave := false; // Turns off default LEAVE code
generation.
? @noleave := true; // Turns off default LEAVE code
generation.
? @noleave := false; // Turns on default LEAVE
code generation.
?@align := 1; // Turns off procedure alignment
(align on byte boundary).
?@align := int_expr; // Sets alignment, must be a power of two.
These directives may appear
anywhere in the source file. They
set the internal HLA default values and all procedure declarations following
one of these assignments (up to the next, corresponding assignment) use the
specified code generation option(s).
Note that you can override these defaults by using the corresponding
procedure options mentioned earlier.
16.10.2.1 Disabling HLA’s Automatic Code Generation for Procedures
Before jumping in and describing
how to use the high level HLA features for procedures, the best place to start
is with a discussion of how to disable these features and write "plain old
fashioned" assembly language code.
This discussion is important because procedures are the one place where
HLA automatically generates a lot of code for you and many assembly language
programmers prefer to control their own destinies; they don’t want the compiler to generate any excess code for
them. So disabling HLA’s automatic
code generation capabilities is a good place to start.
By default, HLA automatically
emits code at the beginning of each procedure to do five things: (1) Preserve
the pointer to the previous activation record (EBP); (2) build a display in the
current activation record; (3) allocate storage for local variables; (4) load
EBP with the base address of the current activation record; (5) adjust the
stack pointer (downwards) so that it points at a dword-aligned address.
When you return from a
procedure, by default HLA will deallocate the local storage and return,
removing any parameters from the stack.
To understand the code that HLA
emits, consider the following simple procedure:
procedure p( j:int32 );
var
i:int32;
begin p;
end p;
Here is a dump of the symbol
table that HLA creates for procedure p:
p <0,proc>:Procedure type (ID=?1_p)
--------------------------------
_vars_
<1,cons>:uns32, (4 bytes)
=4
i
<1,var >:int32, (4 bytes, ofs:-12)
_parms_
<1,cons>:uns32, (4 bytes)
=4
_display_ <1,var >:dword, (8
bytes, ofs:-4)
j
<1,valp>:int32, (4 bytes, ofs:8)
p
<1,proc>:
------------------------------------
The important thing to note
here is that local variable "i" is at offset -12 and HLA automatically
created an eight-bit local variable named "_display_" which is at offset -4.
HLA emits the following code
for the procedure above (annotations in italics are not emitted by HLA, this
output is subject to changes in HLA code generation algorithms [actually, this
code example is quite old and newer versions of HLA do emit different code, but
it is similar enough for this discussion]):
?1_p proc near32
push ebp ;Dynamic
link (pointer to previous activation record)
pushd [ebp-04] ;Display
for lex level 0
lea ebp,[esp+04] ;Get
frame ptr (point EBP at current activation record)
pushd ebp ;Ptr
to this proc's A.R. (part of display construction)
sub esp, 4 ;Local
storage.
and esp, 0fffffffch ;dword-align
stack
; Exit point for the procedure:
?x?1_p:
mov esp, ebp ;Deallocate
local variables.
pop ebp ;Restore
pointer to previous activation record.
ret 4 ;Return,
popping parameters from the stack.
?1_p endp
Building the display data
structure is not very common in standard assembly language programs. This is only necessary if you are using
nested procedures and those nested procedures need to access non-local
variables. Since this is a rare
situation, many programmers will immediately want to tell HLA to stop emitting
the code to generate the display.
This is easily accomplished by adding the "@nodisplay" procedure option to the procedure
declaration. Adding this option to
procedure p produces the following:
procedure p( j:int32 );
@nodisplay;
var
i:int32;
begin p;
end p;
Compiling this procedures the
following symbol table dump:
p
<0,proc>:Procedure type (ID=?1_p)
--------------------------------
_vars_
<1,cons>:uns32, (4 bytes)
=4
i
<1,var >:int32, (4 bytes, ofs:-4)
_parms_
<1,cons>:uns32, (4 bytes)
=4
j
<1,valp>:int32, (4 bytes, ofs:8)
p
<1,proc>:
------------------------------------
Note that the _display_ variable is gone and the local variable i is now at offset -4. Here is the code that HLA emits for this new version of the
procedure:
?1_p proc near32
push ebp ;Save ptr to previous
activation record.
mov ebp, esp ;Point
EBP at current activation record.
sub esp,4 ;Local
storage.
and esp, 0fffffffch ;Align
stack on dword boundary.
; Exit point for the procedure:
?x?1_p:
mov esp, ebp ;Deallocate
local variables.
pop ebp ;Restore
pointer to previous activation record.
ret 4 ;Return, and remove
parameters from stack.
?1_p endp
As you can see, this code is
smaller and a bit less complex. Unlike the code that built the display, it is
fairly common for an assembly language programmer to construct an activation
record in a manner similar to this.
Indeed, about the only instruction out of the ordinary above is the
"AND" instruction that dword-aligns the stack (OS calls require the
stack to be dword-aligned, and the system performance is much better if the
stack is dword aligned).
This code is still relatively
inefficient if you don’t pass parameters on the stack and you don’t use
automatic (non-static, local) variables.
Many assembly language programmers pass their few parameters in machine
registers and also maintain local values in the registers. If this is the case, then the code
above is pure overhead. You can
inform HLA that you wish to take full responsibility for the entry and exit
code by using the "@noframe" procedure option.
Consider the following version of p:
procedure p( j:int32 );
@nodisplay; @noframe;
var
i:int32;
begin p;
end p;
(this produces the same symbol
table dump as the previous example).
HLA emits the following code
for this version of p:
?1_p proc near32
?1_p endp
Whoa! There’s nothing there!
But this is exactly what the advanced assembly language programmer
wants. With both the @nodisplay and @noframe options, HLA does not emit any extra code for
you. You would have to write this
code yourself.
By the way, you can specify the @noframe option without specifying the @nodisplay option.
HLA still generates no extra code, but it will assume that you are
allocating storage for the display in the code you write. That is, there will be an eight-byte _display_ variable created and i will have an offset of -12 in the activation
record. It will be your
responsibility to deal with this.
Although this situation is possible, it’s doubtful this combination will
be used much at all.
Note a major difference between
the two versions of p when @noframe is not specified and @noframe is specified: if @noframe is not present, HLA automatically emits code to
return from the procedure. This
code executes if control falls through to the "end p;" statement at
the end of the procedure.
Therefore, if you specify the @noframe option, you must ensure that the last statement in
the procedure is a RET() instruction or some other instruction that causes an
unconditional transfer of control.
If you do not do this, then control will fall through to the beginning
of the next procedure in memory, probably with disasterous results.
The RET() instruction presents a special problem. It is dangerous to use this instruction
to return from a procedure that does not have the @noframe option.
Remember, HLA has emitted code that pushes a lot of data onto the
stack. If you return from such a
procedure without first removing this data from the stack, your program will
probably crash. The correct way to
return from a procedure without the @noframe option is to jump to the bottom of the procedure
and run off the end of it. Rather
than require you to explicitly put a label into your program and jump to this
label, HLA provides the "exit procname;" instruction. HLA compiles the EXIT instruction into a JMP that transfers
control to the clean-up code HLA emits at the bottom of the procedure. Consider the following modification of p and the resulting assembly code produced:
procedure p( j:int32 );
@nodisplay;
var
i:int32;
begin p;
exit p;
nop();
end p;
; MASM output:
?2_p
proc near32
push ebp
mov ebp,
esp
sub
esp, 4 ;Local
storage.
and esp,
0fffffffch
jmp
?x?2_p ;p
nop
?x?2_p:
mov esp,
ebp
pop ebp
ret 4
?2_p
endp
As you can see, HLA
automatically emits a label to the assembly output file ("?x?2_p" in this instance) at the bottom of the
procedure where the clean-up code starts.
HLA translates the "exit p;" instruction into a jmp to this
label.
If you look back at the code
emitted for the version of p
with the @noframe
option, you’ll note that HLA did not emit a label at the bottom of the
procedure. Therefore, HLA cannot
generate a jump to this nonexistent label, so you cannot use the exit statement
in a procedure with the @noframe option (HLA will generate an error if you attempt this).
Of course, HLA will not stop you from putting a RET() instruction into a
procedure without the @noframe
option (some people who know exactly what they are doing might actually want to
do this). Keep in mind, if you
decide to do this, that you must deallocate the local variables (that’s what
the "mov esp, ebp" instruction is doing), you need to restore EBP
(via the "pop ebp" instruction above), and you need to deallocate any
parameters pushed on the stack (the "ret 4" handles this in the
example above). The following code
demonstrates this:
procedure p( j:int32 );
@nodisplay;
var
i:int32;
begin p;
if( j = 0 ) then
// Deallocate locals.
mov( ebp, esp );
// Restore old EBP
pop( ebp );
// Return and pop
parameters
ret( 4 );
endif;
nop();
end p;
; MASM output
?1_p proc near32
push ebp
mov ebp, esp
sub esp, 4 ;Local
storage.
and esp, 0fffffffch
cmp dword ptr [ebp+8], 0
jne ?2_false
mov esp, ebp
pop ebp
ret 4
?2_false:
nop
?x?1_p:
mov esp, ebp
pop ebp
ret 4
?1_p endp
If "real" assembly
language programmers would generally specify both the @noframe and @nodisplay options, why not make them the default case (and
use "@frame" and "@display" options to specify the
generation of the activation record and display)? Well, keep in mind that HLA was originally designed as a
tool to teach assembly language programming to beginning students. Those students have considerable
difficulty comprehending concepts like activation records and displays. Having HLA generate the stack frame
code and display generation code automatically saves the instructor from having
to teach (and explain) this code.
Even if the student never uses a display, it doesn’t make the program
incorrect to go ahead and generate it.
The only real cost is a little extra memory and a little extra execution
time. This is not a problem for
beginning students who haven’t yet learned to write efficient code. Therefore, HLA was optimized for the
beginning at the expense of the advanced programmer. It is also worthwhile to point out that the behavior of the EXIT statement depends upon displays if you attempt
to exit from a nested procedure;
yet another reason for HLA’s default behavior.
If you are absolutely certain
that your stack pointer is aligned on a four-byte boundary upon entry into a
procedure, you can tell HLA to skip emitting the AND( $FFFF_FFFC, ESP );
instruction by specifying the @noalignstack procedure option. Note that specifying @noframe also specifies @noalignstack.
16.10.3
Procedure Calls and Parameters in HLA
HLA’s high level support
consists of three main features: HLL-like declarations, the HLL statements (IF,
WHILE, etc), and HLA’s support for procedure calls and parameter passing. This section discusses the syntax for
procedure declarations and how HLA generates code to automatically pass
parameters to a procedure.
The syntax for HLA procedure
declarations was touched on earlier;
however, it’s probably a good idea to review the syntax as well as
describe some options that previous sections ignored. There are several procedure declaration forms, the following
examples demonstrate them all[10]:
// Standard procedure
declaration:
procedure procname (opt_parms); proc_options
begin procname;
<< procedure body >>
end procname;
// External procedure
declarations:
procedure extname (opt_parms); proc_options @external;
procedure extname (opt_parms); proc_options @external(
"name");
// Forward procedure declarations:
procedure fwdname (opt_parms); proc_options @forward;
Opt_parms
indicates that the parameter list is optional; the parentheses are not present if there are no parameters
present.
Proc_options is any combination (zero or more) of the following
procedure options (see the discussion earlier for these options):
@noframe;
@nodisplay;
@noalignstack;
@pascal;
@cdecl;
@stdcall;
@align( expression );
@returns( "string" );
The @external reserved word tells HLA that the specified
procedure does not appear in the current compilation, but is present in a
different source file that will be compiled separately. Note that the presence of an external
declaration doesn’t require that the procedure appear in a separate source
file. If the actual procedure
appears in the same compilation unit as the external declaration, HLA treats
the external declaration as a forward declaration (see the next paragraph for
details on forward declarations).
External procedure declarations have been discussed earlier, see the
appropriate section(s) for additional details.
The @forward declaration syntax is necessary because HLA
requires all procedure symbols to be declared before they are used. In a few rare cases (where mutual
recursion occurs between two or more procedures), it may be impossible to write
your code such that every procedure is declared before the first call to the
code. More commonly, sorting your
procedures to ensure that all procedures are written before their first call
may force an artificial organization on the source file, making it harder to
read. The forward procedure
declaration handles this situation for you. It lets you create a procedure prototype that describes how
the procedure is to be called without actually specifying the procedure
body. Later on in the source file,
the full procedure declaration must appear.
Note: an external declaration
also serves as a forward declaration.
So if you have an external definition at the beginning of your program
(perhaps it appears in an include file), you do not need to provide a forward
declaration as well.
16.10.4
Calling HLA Procedures
There are two standard ways to
call an HLA procedure: use the call instruction or simply specify the name of
the procedure as an HLA statement.
Both mechanisms have their plusses and minuses.
To call an HLA procedure using
the call instruction is exceedingly easy. Simply use either of the following
syntaxes:
call( procName );
call procName;
Either form compiles into an
80x86 call instruction that calls the specified procedure. The difference between the two is that
the first form (with the parentheses) returns the procedure’s
"returns" value, so this form can appear as an operand to another
instruction. The second form above
always returns the empty string, so it is not suitable as an operand of another
instruction. Also, note that the
second form requires a statement or procedure label, you may not use memory
addressing modes in this form; on
the other hand, the second form is the only form that lets you "call"
a statement label (as opposed to a procedure label); this form is useful on ocassion.
If you use the call statement to call a procedure, then you are
responsible for passing any parameters to that procedure. In particular, if the parameters are
passed on the stack, you are responsible for pushing those parameters (in the
correct order) onto the stack before the call. This is a lot more work than letting HLA push the parameters
for you, but in certain cases you can write more efficient code by pushing the
parameters yourself.
The second way to call an HLA
procedure is to simply specify the procedure name and a list of actual
parameters (if needed) for the call.
This method has the advantage of being easy and convenient at the
expense of a possible slight loss in effiency and flexibility. This calling method should also prove
familiar to most HLL programmers.
As an example, consider the following HLA program:
program parameterDemo;
#include( "stdio.hhf"
);
procedure PrtAplusB( a:int32;
b:int32 ); @nodisplay;
begin PrtAplusB;
mov( a, eax );
add( b, eax );
stdout.put( "a+b=", (type int32 eax ),
nl );
end PrtAplusB;
static
v1:int32 := 25;
v2:int32 := 5;
begin parameterDemo;
PrtAplusB( 1, 2 );
PrtAplusB( -7, 12 );
PrtAplusB( v1, v2 );
mov( -77, eax );
mov( 55, ebx );
PrtAplusB( eax, ebx );
end parameterDemo;
This program produces the
following output:
a+b=3
a+b=5
a+b=30
a+b=-22
As you can see, call PrtAplusB in HLA is very similar to calling procedures (and
passing parameters) in a high level language like C/C++ or Pascal. There are, however, some key
differences between and HLA call and a HLL procedure call. The next section will cover those
differences in greater detail. The
important thing to note here is that if you choose to call a procedure using
the HLL syntax (that is, the second method above), you will have to pass the
parameters in the parameter list and let HLA push the parameters for you. If you want to take complete control
over the parameter passing code, you should use the call instruction.
16.10.5
Parameter Passing in HLA, Value Parameters
The previous section probably
gave you the impression that passing parameters to a procedure in HLA is nearly
identical to passing those same parameters to a procedure in a high level
language. The truth is, the examples
in the previous section were rigged. There are actually many restrictions on how you can pass
parameters to an HLA procedure.
This section discusses the parameter passing mechanism in detail.
The most important restriction
on actual parameters in a call to an HLA procedure is that HLA only allows
memory variables, registers, constants, and certain other special items as
parameters. In particular, you cannot specify an arithmetic expression that
requires computation at run-time (although a constant expression, computable at
compile time is okay). The bottom
line is this: if you need to pass the value of an expression to a procedure,
you must compute that value prior to calling the procedure and pass the result
of the computation; HLA will not
automatically generate the code to compute that expression for you.
The second point to mention
here is that HLA is a strongly typed language when it comes to passing
parameters. This means that with
only a few exceptions, the type of the actual parameter must exactly match the
type of the formal parameter. If
the actual parameter is an int8 object, the formal parameter had better not be an int32 object or HLA will generate an error. The only exceptions to this rule are
the hexadecimal types: byte, word, dword, qword, tbyte, and lword. If a formal parameter is of type byte,
the corresponding actual parameter may be any one-byte data object. If a formal parameter is a word object, the corresponding
actual parameter can be any two-byte object. Likewise, if a formal parameter is a dword object, the actual
parameter can be any four-byte data type.
And so on. Conversely, if the actual parameter is a byte, word, or dword
object, it can be passed without error to any one, two, or four-byte actual
parameter (respectively).
Programmers who are really lazy make all their parameters bytes, words,
or dwords (at least, whereever possible).
Programmers who care about the quality of their code use untyped
parameters cautiously.
If you want to use the high
level calling sequence, but you don’t like the inefficient code HLA sometimes
produces when generating code to pass your parameters, you can always use the #{...}# sequence parameter to override HLA’s code
generation and substitute your own code for one or two parameters. Of course, it doesn’t make any sense to
pass all the parameters is a procedure using this trick, it would be far easier
just to use the call instruction.
Example:
PrtAplusB
(
#{
mov( i,
eax ); // First parameter is
"i+5"
add( 5,
eax );
push(
eax );
}#,
5
);
HLA will automatically copy an
actual value parameter into local storage for the procedure, regardless of the
size of the parameter. If your
value parameter is a one million byte array, HLA will allocate storage for
1,000,000 bytes and then copy that array in on each call. C/C++ programmers may
expect HLA to automatically pass arrays by reference (as C/C++ does) but this
is not the case. If you want your
parameters passed by reference, you must explicitly state this.
As a convenience, HLA will
allow you to pass the lexeme “edx:eax” wherever a 64-bit parameter is expected,
and “dx:ax” whereever a 32-bit parameter is expected. When HLA sees these
parameters, it will push (e)dx on the stack first and (e)ax on the stack
second.
The code HLA generates to copy
value parameters, while not particularly bad, certainly isn’t optimal. If you need the fastest possible code
when passing parameters by value on the stack, it would be better if you
explicitly pushed the data yourself.
Another alternative that sometimes helps is to use the "use reg32" procedure option to provide HLA with a hint
about a 32-bit scratchpad register that it can use when building parameters on
the stack.
16.10.6
Parameter Passing in HLA, Reference, Value/Result, and Result Parameters
The one good thing about pass
by reference, pass by value/result, and pass by result parameters is that they
are always four byte pointers, regardless of the size of the actual parameter.
Therefore, HLA has an easier time generating code for these parameters than it
does generating code for pass by value parameters.
HLA treats reference,
value/result, and result parameters identically. The code within the procedure is responsible for
differentiating these parameter types (value/result and result parameters
generally require copying data between local storage and the actual
parameter). The following
discussion will simply refer to pass by reference parameters, but it applies
equally well to pass by value/result and pass by result.
Like high level languages, HLA
places a whopper of a restriction on pass by reference parameters: they can
only be memory locations.
Constants and registers are not allowed since you cannot compute their
address. Do keep in mind, however,
that any valid memory addressing mode is a valid candidate to be passed by
reference; you do not have to
limit yourself to static and local variables. For example, "[eax]" is a perfectly valid memory
location, so you can pass this by reference (assuming you type-cast it, of
course, to match the type of the formal parameter). The following example demonstrates a simple procedure with a
pass by reference parameter:
program refDemo;
#include( "stdio.hhf"
);
procedure refParm( var a:int32 );
begin refParm;
mov( a, eax );
mov( 12345, (type int32
[eax]));
end refParm;
static
i:int32:=5;
begin refDemo;
stdout.put( "(1) i=", i, nl );
mov( 25, i );
stdout.put( "(2) i=", i, nl );
refParm( i );
stdout.put( "(3) i=", i, nl );
end refDemo;
The output produced by this
code is
(1) i=5
(2) i=25
(3) i=12345
As you can see, the parameter a
in refParm exhibits pass by
reference semantics since the change to the value a in refParm changed the value of the actual parameter (i) in the main program.
Note that HLA passes the
address of i to refParm, therefore, the a parameter contains the address
of i. When accessing the value of the i parameter, the refParm procedure must deference the pointer passed in a. The
two instructions in the body of the refParm procedure accomplish this.
Take a look at the code that
HLA generates for the call to refParm:
pushd offset32
?198_i
call ?197_refParm
("?198_i" is the MASM compatible name that an older
version of HLA generated for the static variable "i".)
As you can see, this program
simply computed the address of i
and pushed it onto the stack. Now
consider the following modification to the main program:
program refDemo;
#include( "stdio.hhf"
);
procedure refParm( var a:int32 );
begin refParm;
mov( a, eax );
mov( 12345, (type int32
[eax]));
end refParm;
static
i:int32:=5;
var
j:int32;
begin refDemo;
mov( 0, j );
refParm( j );
refParm( i );
lea( eax, j );
refParm( [eax] );
end refDemo;
This version emits something
similar to the following MASM code:
mov dword ptr [ebp-8] , 0 ;j
push eax
lea eax,
dword ptr [ebp-8] ;j
xchg eax, [esp]
call
?197_refParm
;refParm
pushd offset32 ?198_i
call ?197_refParm ;refParm
lea eax,
dword ptr [ebp-8] ;j
push eax
push eax
lea eax,
dword ptr [eax+0] ;[eax]
mov
[esp+4],eax
pop eax
call
?197_refParm
;refParm
As you can see, the code
emitted for the last call is pretty ugly (we could easily get rid of three of
the instructions in this code).
This call would be a good candidate for using the call instruction directly. Also see "Hybrid Parameters" a little later in
this document. Another option is
to use the "use reg32" option to tell HLA it can use one of the
32-bit registers as a scratchpad.
Consider the following:
procedure refParm( var a:int32
); use esi;
.
.
.
lea( eax, j );
refParm( [eax] );
This sequence generates the
following code (which is a little better than the previous example):
lea eax,
dword ptr [ebp-8] ;j
lea eax,
dword ptr [eax+0] ;[eax]
push eax
call
?197_refParm
;refParm
As a general rule, the type of
an actual reference parameter must exactly match the type of the formal
parameter. There are a couple
exceptions to this rule. First, if
the formal parameter is dword,
then HLA will allow you to pass any four-byte data type as an actual parameter
by reference to this procedure.
Second, you can pass an actual dword parameter by reference if the formal parameter is
a four-byte data type.
There is a third exception to
the "the types must exactly match" rule. If the formal reference parameter is some data type HLA will
allow you to pass an actual parameter that is a pointer to this type. Note that HLA will actually pass the value of the pointer, rather than the address of the pointer, as the reference parameter. This turns out to be really convenient,
particularly when calling Win32 API functions and other C/C++ code. Note, however, that this behavior isn’t
always intuitive, so be careful when passing pointer variables as reference
parameters.
If you want to pass the value
of a double word or pointer variable in place of the address of such a variable
to a pass by reference, value/result, or result parameter, simply prefix the
actual parameter with the VAL reserved word in the call to the procedure, e.g.,
refParm(
val dwordVar );
This tells HLA to use the
value of the variable rather than it’s address.
You may also use the VAL
keyword to pass an arbitrary 32-bit numeric value for a string parameter. This is useful in certain Win32 API
calls where you pass either a pointer to a zero-terminated sequence of
characters (i.e., a string) or a small integer "ATOM" value.
16.10.6.1 Untyped Reference Parameters
HLA provides a special formal
parameter syntax that tells HLA that you want to pass an object by reference
and you don’t care what its type is.
Consider the following HLA procedure:
procedure zeroObject( var
object:byte; size:uns32 );
begin zeroObject;
<< code to write "size" zeros to
"object" >
end zeroObject;
The problem with this procedure
is that you will have to coerce non-byte parameters to a byte before passing
them to zeroObject. That is, unless you’re passing a byte
parameter, you’ve always got to call zeroObject thusly:
zeroObject( (type byte NotAByte), sizeToZero );
For some functions you call
frequently with different types of data, this can get painful very
quickly.
The HLA untyped reference
parameter syntax solves this problem.
Consider the following declaration of zeroObject:
procedure zeroObject( var
object:var; size:uns32 );
begin zeroObject;
<< code to write "size" zeros to
"object" >
end zeroObject;
Notice the use of the reserved
word "VAR" instead of a data type for the object
parameter. This syntax tells HLA
that you’re passing an arbitrary variable by reference. Now you can call zeroObject and pass any (memory) object as the first
parameter and HLA won’t complain about the type, e,g.,
zeroObject( NotAByte, sizeToZero );
Note that you may only pass
untyped objects by reference to a procedure.
Note that untyped reference
parameters always take the address of the actual parameter to pass on to the
procedure, even if the actual parameter is a pointer (normal pass by reference
semantics in HLA will pass the value of a pointer, rather than the address of
the pointer variable, if the base type of the pointer matches the type of the
reference parameter). Sometimes you’ll
have the address of an object in a register or a pointer variable and you’ll
want to pass the value of that pointer object (i.e., the address of the
ultimate object) rather than the address of the pointer variable. To do this, simply prefix the actual
parameter with the VAL keyword, e.g.,
zeroObject( ptrVar ); // Passes the
address of ptrVal
zeroObject( val ptrVar ); // Passes ptrVar’s value.
16.10.6.2 Parameter Passing in HLA, Name and Lazy Evaluation Parameters
HLA provides a modicum of
support for pass by name and pass by lazy evaluation parameters. A pass by name parameter consists of a
thunk that returns the address of the actual parameter. A pass by lazy evaluation parameter is
a thunk that returns the value of the actual parameter. Whenever you specify the
"name" or "lazy" keywords before a parameter, HLA reserves
eight bytes to hold the corresponding thunk in the activation record. It is your responsibility to create a
thunk whenever calling the procedure.
There is a minor difference
between passing a thunk parameter by value and passing a lazy evaluation or
name parameter to a procedure.
Pass by name/lazy parameters require an immediate thunk constant; you cannot pass a thunk variable as a
pass by name or lazy parameter.
To pass a thunk constant as a
parameter to a pass by name or pass by lazy evaluation parameter, insert the
thunk’s code inside "#{...}#" sequence in the parameter list and
preface the whole thing with the THUNK reserved word. The following example demonstrates passing a thunk as a pass
by name parameter:
program nameDemo;
#include( "stdio.hhf"
);
procedure passByName( name ary:int32; var
ip:int32 );
@nodisplay;
const i:text := "(type int32 [ebx])";
const a:text := "(type int32 [eax])";
begin passByName;
mov( ip, ebx );
mov( 0, i );
while( i < 10 ) do
ary(); // Get address
of "ary[i]" into eax.
mov(i, ecx );
mov( ecx, a );
inc( i );
endwhile;
end passByName;
procedure thunkParm( t:thunk );
begin thunkParm;
t();
end thunkParm;
var
index:int32;
array:int32[10];
th:thunk;
begin nameDemo;
thunk th := #{ stdout.put( "Thunk
Variable",nl ) }#;
thunkParm( th );
thunkParm( thunk #{ stdout.put( "Thunk
Constant" nl ); }# );
// passByName( th, index ); -- would be illegal;
passByName
(
thunk
#{
push(
ebx );
mov( index, ebx );
lea( eax, array[ebx*4] );
pop( ebx );
}#,
index
);
mov( 0, ebx );
while( ebx < 10 ) do
stdout.put
(
"array[",
(type int32
ebx),
"]=",
array[ebx*4],
nl
);
inc( ebx );
endwhile;
end nameDemo;
This program produces the
following output:
Thunk Variable
Thunk Constant
array[0]=0
array[1]=1
array[2]=2
array[3]=3
array[4]=4
array[5]=5
array[6]=6
array[7]=7
array[8]=8
array[9]=9
16.10.6.3 Hybrid Parameter Passing in HLA
HLA’s automatic code generation
for parameters specified using the high level language syntax isn’t always
optimal. This is because HLA makes
very few assumptions about your program.
For example, suppose you are passing a word parameter to a procedure by
value. Since all parameters in HLA
consume an even multiple of four bytes on the stack, HLA will zero extend the
word and push it onto the stack.
It does this producing MASM code like the following:
pushw 0
pushw Parameter
Clearly you can do better than
this if you know something about the variable. For example, if you know that the two bytes following
"Parameter" are in memory (as opposed to being in the next page of
memory that isn’t allocated, and access to such memory would cause a protection
fault), you could get by with the single MASM instruction:
push dword
ptr Parameter
Unfortunately, HLA cannot make
these kinds of assumptions about the data because doing so could create
malfunctioning code.
One solution, of course, is to
forego the HLA high level language syntax for procedure calls and manually push
all the parameters yourself and call the procedure via the CALL
instruction. However, this is a
major pain that involves lots of extra typing and produces code that is
difficult to read and understand.
Therefore, HLA provides a hybrid parameter passing mechanism that lets
you continue to use a high level language calling syntax yet still specify the
exact instructions needed to pass certain parameters. This hybrid scheme works out well because HLA actually does
a good job with most parameters (e.g., if they are an even multiple of four
bytes, HLA generates efficient code to pass the parameters; it’s only those parameters that have a
weird size that HLA generates less than optimal code for).
If a parameter consists of the
"#{" and "}#" brackets with some
corresponding code inside the brackets, HLA will emit the code inside the
brackets in place of any code it would normally generate for that
parameter. So if you wanted to
pass a 16-bit parameter efficiently to a procedure named "Proc" and
you’re sure there is no problem accessing the two bytes beyond this parameter,
you could use code like the following:
Proc( #{ push( (type dword WordVar) ); }# );
Notice the similarity to pass
by name/eval parameters. However,
no THUNK reserved word prefaces the code in this instance.
Whenever you pass a non-static[11]
variable as a parameter by reference, HLA generates something like the
following MASM code to pass the address of that variable to the procedure:
push eax
push eax
lea eax,
Variable
mov [esp+4],
eax
pop eax
It generates this particular
code to ensure that it doesn’t change any register values (after all, you could
be passing some other parameter in the EAX register). If you have a free register available, you can generate
slightly better code using a calling sequence like the following (assuming EBX
is free):
HasRefParm
(
#{
lea(
ebx, Variable );
push(
ebx );
}#
);
16.10.6.4 Parameter Passing in HLA, Register Parameters
HLA provides a special syntax
that lets you specify that certain parameters are to be passed in registers
rather than on the stack. The
following are some examples of procedure declarations that use this feature:
procedure a( u:uns32 in eax );
forward;
procedure b( w:word in bx );
forward;
procedure d( c:char in ch );
forward;
Whenever you call one of these
procedures, the code that HLA automatically emits for the call will load the
actual parameter value into the specified register rather than pushing this
value onto the stack. You may
specify any general purpose 8-bit, 16-bit, or 32-bit register after the
"IN" keyword following the parameter’s type. Obviously, the parameter must fit in
the specified register. You may
only pass reference parameters in 32-bit registers; you cannot pass parameters that are not one, two, or four
bytes long in a register.
You can get in to trouble if you’re
not careful when using register parameters, consider the following two
procedure definitions:
procedure one( u:uns32 in eax;
v:dword in ebx ); forward;
procedure two( a:uns32 in eax
);
begin two;
one( 25, a );
end two;
The call to "one" in
procedure "two" looks like it passes the values 25 and whatever was
passed in for "a" in procedure two. However, if you study the HLA output code, you will discover
that the call to "one" passes 25 for both parameters. They reason for this is because HLA
emits the code to load 25 into EAX in order to pass 25 in the "u"
parameter. Unfortunately, this
wipes out the value passed into "two" in the "a" variable,
hence the problem. Be aware of
this if you use register parameters often.
Note that when passing a
parameter in a register, HLA simply creates a “text equate” that substitutes
the register name wherever it appears in the procedure’s body. The parameter is
not a memory location and doesn’t behave as one. In particular, you cannot directly
pass a parameter passed in a register by reference to some other procedure. You
must either surround the parameter name with “[“ and “]” or preface it by a VAL
operator. This is exactly what you would have to do if you used the bare
register name. Remember, wherever the parameter name appears, HLA simply
substitutes the register name for that parameter name.
16.11
Lexical Scope
HLA is a block-structured language that enforces the scope
of local identifiers. HLA uses
lexical scope to determine when and where an identifier is visible to the
program. Identifiers declared
within a procedure are always visible within that procedure and to any local
procedures declared after the identifier.
Local identifiers are never visible outside the procedure
declaration. The scoping rules are similar to languages like Pascal,
Ada, and Modula-2. As an example,
consider the following code:
program scopeDemo;
#include( "stdio.hhf" );
var
i:int32;
j:int32;
k:int32;
procedure lex1;
var
i:int32;
j:int32;
procedure lex2;
var
i:int32;
begin lex2;
mov( i, eax ); /1
mov( ebx::j, eax ); //2
mov( ecx::k, eax ); //3
end lex2;
begin lex1;
mov( i, eax ); //4
mov( j, eax ); //5
mov( ecx::k, eax ); //6
end lex1;
procedure alsolex1;
var
i:int32;
m:int32;
begin alsolex1;
mov( i, eax ); //4
mov( m, eax ); //5
mov( ecx::k, eax ); //6
end alsolex1;
begin scopeDemo;
mov( i, eax );
//7
mov( j, eax );
//8
mov( k, eax );
//9
end scopeDemo;
(Note: the purpose of the
ebx:: and ecx:: prefixes on certain variables will become clear in a
moment. Also note that this code
is not functional, it was written only as an illustration.)
In this example you will note
that lex2 is nested within lex1, which is nested within the main program. The alsolex1 procedure is nested within the main program but
inside no other procedure. To describe this arrangement, compiler writers use
the term lex level to denote the depth of nesting.
HLA defines the main program to be lex level zero. Each
time you nest a procedure, you increase its lex level. So lex1 is at lex level one since it is directly nested
inside the main program at lex level zero. The lex2
procedure is at lex level two because it is nested inside the lex1 procedure.
Finally, alsolex1
is also at lex level one because it is nested inside the main program (which is
lex level zero).
Within a given procedure (or
the main program), all identifiers must be unique. That is, you cannot have two symbols named "i" in
the same procedure. In different
procedures, however, you may reuse the names. If all procedures were written at lex level one, then no
procedure would be able to directly access the local variables in any other
procedure (this is the case with the C/C++ language). In block
structured languages, like HLA, it is possible to access certain
non-local variables in other procedures if the current procedure (whose code is
attempting to access said variable) is nested within the other procedure.
In the example above, lex2 accesses three variables: i, j,
and k. The i variable is local to lex2, so there is nothing surprising here. The j variable is local to lex1 and global to lex2.
Likewise, the k variable
is global to both lex1 and lex2 yet lex2 can access it. Whenever a procedure is nested within another (either
directly or indirectly), the nested procedure can access all variables in the global, nesting, procedures
(including the main program)[12]
unless the procedure declares a local name with the same name as a global name
(the local name always takes precedence in this case). The term "scope" refers to
the visibility of these names.
Being able to use a name during
compilation is one thing, accessing the memory location associated with that
name at run-time is something else entirely. Most block structured high level languages (HLLs) emit lots
of extra code to access these "intermediate" and global variables for
you. Why the extra code? Well remember, local procedure
variables are accessed on the stack by indexing off the EBP register (which
points at a procedure’s "activation record"). When a procedure like lex1 above calls a local procedure like lex2, the lex2 procedure promptly saves the value in EBP (that
points at lex1’s
activation record) and it points EBP at the new activation record for lex2.
Unfortunately, lex2 no
longer has access to lex1’s
local variables since EBP no longer points at lex1’s locals.
This creates a bit of a problem.
"But wait!" you
exclaim. "EBP is pointing at
the pointer to lex1’s
activation record, why not just use double indirection to get the pointer to lex1’s locals?" This is a good idea, but it fails if lex2 is recursive. There are two or three general solutions to this problem,
HLA uses a display to access
non-local values.
A display is nothing more than an array of
pointers. Display[0] is a pointer to the most recent activation
record at lex level zero, Display[1] is a pointer to the most recent activation record at lex level one, Display[2] is a pointer to the most recent activation
record at lex level two, etc. (note the use of the phrase most recent. This
ensures that displays work properly even when recursion occurs). With a display, to access a non-local
variable, you just go to the memory location specified by Display[ varlex ] + varoffset where "varlex" is the lex level of the symbol you wish to
access and "varoffset"
is the offset into the activation record where the variable’s data can be
found.
Sound complex? Actually, HLA simplifies this quite a
bit. First, as long as you don’t
specify the @nodisplay
procedure option, HLA automatically emits the code to build a display at the
start of the procedure’s code[13]. HLA also defines a run-time variable, _display_, that points at this array of pointers. To access
a non-local variable requires two instructions, one to fetch the address of the
variable’s activation record and one to access the variable. Correcting the previous program, the
code would look something like this:
procedure lex2;
var
i:int32;
begin lex2;
mov( i, eax );
// access non-local variable j
// at lex level 1.
mov( _display_[-1*4], ebx );
mov( ebx::j, eax );
// access non-local variable k
// at lex level 0.
mov( _display_[0], ecx );
mov( ecx::k, eax );
end lex2;
There are two things to note
about the display: first, the entries are stored at negative indicies in the
array (0, -1, -2, etc) rather than at positive indicies (this is due to HLA’s
implementation). Second, don’t
forget that this is a run-time array of dwords so you must multiply each index
by the array element size, which is four in this case.
Once you’ve loaded the address
into a register, the reg:var syntax tells HLA to use the specified register
rather than EBP as the pointer to the variable’s activation record. The "mov(ecx::k,eax);" instruction,
for example, compiles to "mov eax, [ecx+koffset]" where koffset represents the offset of k in the main program’s activation record.
In general, few programs take
advantage of nested procedures and access to local variables, so it is very
common to find programmers putting " @nodisplay" after all their procedures. Of course, if you do this, HLA does not
generate display and access to non-local variables (declared in the var
section) is not possible. Of
course, static variables are not allocated in the activation record, so you
always have access to non-local static variables even if you don’t generate the
code for a display.
16.12
Declarations
Programs, units, procedures,
methods, and iterators all have a declaration section. Classes and namespaces also have a
declaration section, though it is somewhat limited. A declaration section can contain one or more of the
following components:
•
A label section
•
A type section.
•
A const section.
•
A val section.
•
A var section.
•
A static section.
•
A namespace.
•
A procedure.
•
A method.
•
An iterator.
The order of these sections is
irrelevant as long as you ensure that all identifiers used in a program are
defined before their first use.
Furthermore, as noted above, you may have multiple sections within the
same set of declarations. For
example, the two const sections in the following procedure declaration are
perfectly legal:
procedure TwoConsts;
const MaxVal := 5;
type Limits: int32[ MaxVal ];
const MinVal := 0;
begin TwoConsts;
//...
end TwoConsts;
C/C++ programmers who are used
to specifying "typedef" or "const" before each declaration
can do so in HLA:
type intArray: int32[4];
const pi := 3.14159;
var i:int32;
const MaxVal := 10;
const MinVal := 0;
etc.
Pascal/Delphi users can put as
many declarations in each section should they choose to do so. Neither is a preferable style over the other.
16.12.1
Label Section
The label section allows you to
forward-declare statement labels that appear in a module. This section takes the following form:
label
id1;
id2;
@external;
id3;
@external( "external_name" );
etc.
endlabel; // optional, but you
should use it!
For the most part, HLA already
handles forward references on labels, so you will rarely need a label section in your programs. The one time where this section is
handy is when you want to refer to a statement label at an outer lex level from
within a procedure. By
predeclaring the label at the outer lexlevel, you can access to that symbol
within the procedure, e.g.,
program funnyStuff;
label
funny;
endlabel;
procedure weird;
begin weird;
jmp funny;
end weird;
begin funnyStuff;
call weird;
funny:
end funnyStuff;
Note that this call returns the
"weird" procedure, but leaves a bunch of stuff on the stack (like the
return address and other parts of the activation record). This is useful in some bizzare cases,
but is not common in normal code.
Perhaps the primary use for the
label declaration section
is to declare labels that are external to the program. This is done by attaching the @external option to the label you are defining. Without the optional external name
string, HLA will define an external label using the label name you specify
(e.g., id2 in the example
above); if the optional external
string is present, HLA uses the specified external name when referencing that
label. These external declarations
are quite useful when one module needs to refer to statement labels appearing
in a different module.
Note that if you define the
label as external, then HLA treats
that as a public declaration of that statement label, e.g.,
procedure SomeProc;
label
here; @external;
begin SomeProc;
.
.
.
here:
.
.
.
end SomeProc;
In this example, the label
"here" is a public symbol and is available globally throughout the
source file and it is externally accessible by other modules.
You can also create global labels by attaching a
“::” symbol to a label rather than using a single colon. This has the same
effect as declaring the label in a label section at the global lex level.
16.12.2
Type Section
You can declare user-defined
data types in the type section.
The type section appears in a declaration section and begins with the
reserved word type. It
continues until encountering another declaration reserved word (e.g., const,
var, or val), the reserved word endtype, or the reserved word begin. Ending a type declaration
section with "endtype;"
is optional, but recommended for future compatibility with HLA and other tools.
A typical type definition begins with an indentifier followed by a colon and a
type definition. The following
paragraphs describe the legal types of type definitions.
id1 : forward( id2 );
This isn’t an actual type
declaration at all. What it will
do is create a text constant (id2) and initialize that constant with the string "id1". The purpose of this declaration form is
to let you defer the declaration of a symbol within a macro. For example, suppose you want to create
a data type "template" (like those in C++). A template is just a macro you use in place of a data
type. Given HLA’s declaration
syntax, however, the identifier for the template type has already appeared on
the current source line. The
forward declaration lets you "undo" this declaration and move it
later. For example, consider the
"strStorage" macro:
#macro strStorage( NumChars ):
theIdentifier,
MaxLength,
CurLength;
forward(
theIdentifier );
MaxLength:
dword := NumChars;
CurLength:
dword := 0;
theIdentifier:
byte[ (NumChars+4) & $FFFF_FFFC ];
#endmacro
Now consider the following
variable declaration in the STATIC section:
static
s:
strStorage( 250 );
endstatic;
HLA expands this template/macro
to (something like) the following:
static
s:
forward( _1000_ );
_1001_
:dword := 250;
_1002_
:dword := 0;
_1000_
:byte[ 252 ];
endstatic;
Note that _1001_ is a text
constant containing the string "s", so the last line above expands to
s: byte[ 252 ];
This example demonstrates how
you can use the "forward" clause to defer the declaration of a symbol
within the type section.
id1 : pointer to id2
This declaration creates a new
type (id1) which is a pointer to some other type (id2). Pointer objects always consume four
bytes at run-time. Note that you
may not use pointer types in constant expressions. If id2 is undefined earlier in the program, then the program
must declare id2 before the end of the current procedure (that is, id2 must be
defined before the current lex level is reduced).
Examples:
intPtr: pointer to int32;
PtrToPtr: pointer to CharPtr;
CharPtr: pointer to char;
id1 : enum { id_list };
This declaration creates a new
type (id1) which is an enumerated data type. <id_list> is a list of names that represent the values
of this data type.
Examples:
Colors:enum {red, green, blue};
Gender:enum {female, male};
State:enum {on, off};
id : procedure( optional_parameter_list );
Defines a pointer type that
points at a procedure. The
optional parameter list consists of a list of parameter declarations (described
later) separated by semicolons. If
there are no parameters, do not include the parentheses in the type
declaration. Like other pointers,
procedure pointers are always 32-bits long (four-byte near pointers for the
flat memory model).
Examples:
ProcPtr: procedure; options
ProcI : procedure( i:int32 ); options
ProcIF: procedure( i:int32;
f:real64 ); options
Procedure variables (pointers)
allow the @pascal, @cdecl, @stdcall, and @returns options immediately after the semicolon following
the optional parameters. The @returns option attaches a "returns" string for
use with instruction composition to calls through this pointer variable. For more information about the @returns clause, see the section on procedures earlier in
this documentation. The @pascal,
@cdecl, and @stdcall options (which are mutually exclusive) select the
parameter passing mechanism and calling convention for the procedure object.
Examples:
ProcPtr: procedure; @returns( "eax"
);
ProcI : procedure( i:int32
); @returns( "esi" );
ProcIF: procedure( i:int32;
f:real64 ); @returns( "st0" );
id : pointer to procedure id2;
Defines a pointer type that
points at a procedure. id2 must be a previously declared procedure. id inherits all the parameters and
procedure options of procedure id2.
Examples:
procedure xyz( a:byte; b:word;
c:dword ); @returns( "eax" );
.
.
.
type
p : pointer to procedure xyz;
endtype;
The declaration for p is
equivalent to:
type
p : procedure( a:byte; b:word;
c:dword ); @returns( "eax" );
endtype;
Note that the phrase "pointer to
xyz" does not imply that p must point at xyz; it
only means that p points
at a procedure whose procedure prototype
is identical to xyz’s.
Warning: this declaration is deprecated and may not appear
in a future version of HLA (e.g., HLA v2.0).
id1 : id2;
Defines a new type (id1) that
has the same characteristics as the specified type (id2). This is a type isomorphism; that is, you can rename a type.
Examples:
integer : int32;
float : real64;
double : float;
id1 : id2 [ dim_list ];
This declaration defines an
array type. Id1 is an array whose
base type is specified by id2 that has the number of elements and dimensions
(arity) specified by the dimension list (dim_list). Dim_list is a comma-separated list of one or more integer
constant expressions.
Examples:
InpBufType : char[ 128 ];
Matrix3D : real32[ 4, 4 ];
ScreenType: char[ 25, 80 ];
id1 : union
field_declarations
endunion;
This declaration creates a
discriminate union type. The field
declarations can be anything that is legal in the var declaration section (see
the var section for details) including other composite types (records, unions,
arrays, pointers, etc). HLA allows
union constants, but only if all the fields are data types that may legally
appear in a const declaration section (e.g., no pointer objects and no
procedure objects). Unlike
records, unions do not allow inheritence.
All objects within a union begin at the same base address in memory.
Examples:
FourBytes:
union
a4: uns8[4];
b2: uns16[2];
c1: uns32;
endunion;
Str: union
s:string;
cp: [char];
endunion;
Note that a union type
definition must have at least one field declaration or HLA will generate an
error.
id1 : record
field_declarations
endrecord;
id2 : record
inherits ( optional_base_type )
field_declarations
endrecord;
This declaration creates a
record type. The field
declarations can be anything that is legal in the var declaration section (see
the var section for details) including other composite types (records, unions,
arrays, pointers, etc). HLA allows
record constants, but only if all the fields are data types that may legally
appear in a const declaration section (e.g., no pointer objects and no
procedure objects). If the
"inherits" reserved word and optional_base_type identifier is
present, then the base type identifier must also be a record type and the
current record definition “inherits” all the fields from the base type (that
is, all of the base record’s fields are automatically included in the current record’s
definition).
Examples:
student:
record
name: string;
ID: char[11];
year: int8;
endrecord;
GradStudent:
record inherits (student )
ThesisTitle: string;
TA: boolean;
RA: boolean;
endrecord;
course:
record
instructor: string;
StudentCnt: int16;
CourseName: string;
CourseID: string;
endrecord;
Record type declarations may
contain anonymous union
fields. An anonymous union field
is a union declaration without a preceding field name and colon. For example, consider the following
record definition:
vType :
enum { integer, real, str, character };
variant:
record
DataType: vType;
union
i : int32;
r : real64;
s : string;
c : char;
endunion;
endrecord;
Anonymous union fields add
their field names to the list of names belonging to the outside record
type. For example, if you have a
variable “x” of type “variant” you could refer to the fields in the anonymous union
as x.i, x.r, x.s, and x.c.
Contrast this with the following record definition that would require
you to use the field names x.u.i, x.u.r, x.u.s, and x.u.c, respectively:
vType2 : enum{ { integer, real, str, character };
variant2:
record
DataType: vType2;
u : union
i : int32;
r : real64;
s : string;
c : char;
endunion;
endrecord;
Note that a record definition
must have at least one field present or HLA will generate an error.
You may also declare classes
in a TYPE section. Please see the section on classes and object-oriented
programming later in this document for details.
16.12.3
Const Section
You may declare manifest
constants in the CONST section of an HLA program. Manifest constants are named constant values. In particular, HLA can replace the name
of a manifest constant by its actual value during the assembly process. The value of an HLA constant is bound
at the moment the constant’s declaration is encountered at assembly time. That is, a given constant can be given
exactly one value (within the current scope) during assembly. It is illegal to attempt to change the
value of a constant at some later point during assembly. Of course, at run-time the constant
always has a static value.
Const objects can be one of the
following types:
Boolean, enumerated types,
Uns8, Uns16, Uns32, Byte, Word, DWord, Int8, Int16, Int32, Char, WChar,
Real32, Real64, Real80, String, WString, Cset, and Text.
Constants can also be arrays,
records, or unions as long as all elements/fields of these composite objects
are valid const objects.
The constant declaration
section begins with the reserved word const and is followed by a sequence of constant
definitions. The constant
declaration section ends when HLA encounters "endconst;" or a keyword such as const, type, var, val, etc. Although the use of endconst is optional, you should use it to ensure
compatibility with future version of HLA and other tools. Actual constant
definitions take the forms specified in the following subsections.
id1: forward( id2 );
Defers the definition of
id1. See the description of forward in the TYPE section for more details.
id := expr;
Associates the value and type
of expr with the name id.
Future references to id within the current scope will use the value of
the expression in place of the identifier. If expr evaluates to an array constant, id is stored as a
single dimension array, even if you attempt a trick like declaring an array of
array expressions. If expr
consists of an array name, then id inherits the dimensions and type of the
specified array name. The
expression must be a constant expression whose value can be computed at the
point of this particular constant declaration (i.e., no forward declared
identifiers).
Examples:
u := 5;
i := -5;
i2 := u * i;
b := true;
c := ‘a’;
s := “string”;
us:= u"Unicode
String";
a := [1,2,3,4];
id1 : id2 := expr;
This declaration defines a
constant, id1, of type id2, that is given the value of expr. The type of id2 and the expression must
be compatible. If id2 is an array
type, the expression must be an array constant with the same number of
elements; the arity (number of
dimensions) does not need to agree as long as the element count is the same.
Examples:
i8 : int8 := -5;
i16 : int16 := -6;
s : string := “Hello World!”;
// Assume array4x4 is defined as
“array4x4 : uns8[4,4]”
a : array4x4 :=
[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16 ];
id1 : id2 [ dimension_list ] := expr2;
This declaration creates a
constant, id1, that is an array of type id2 with the size and arity specified
by id2 (if id2 is an array type) and the dimension_list (a comma-separated list
of array dimension sizes). The id1
constant is given the value of the array constant specified by expr2 (which
must have the same base type and number of elements, though not necessarily the
same shape, as id2[dimension_list]).
Examples:
i8a : int8[4] := [1,2,3,4];
a4x4: uns8[4,4] :=
[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16 ];
// Assume array2x2 is defined as
“array2x2:uns8[2,2]”
a2222 : array2x2[2,2] := a4x4;
a2222a : array2x2[2,2] := [
a2222[1], a2222[0] ];
id1 : id2 [] := expr2;
This declaration creates a
constant, id1, that is an array of type id2 with the size and arity specified
by id2 (if id2 is an array type) and expr2. The id1 constant is given the value of the array constant specified
by expr2. This is an
"open-ended" array declaration that lets you specify an arbitrary
number of array elements without having to explicitly specify the bounds of the
array. You may about the full
bounds of the array using the @elements compile-time function. If id2 is an array type, then the
number of elements in expr2 must be an even multiple of the number of elements
that id2 possesses. Such a
declaration creates an array with one more dimension (arity) than that for id2
and the bounds for the last dimension is numelements(expr2)/arity(id2).
Examples:
// Creates int8[4] array:
i8a : int8[] := [1,2,3,4];
// Creates uns8[16] array:
a4x4: uns8 :=
[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16 ];
// Assume array2x2 is defined as
“array2x2:uns8[2,2]”
// a224 is uns8[2,2,4]:
a2222 : array2x2[] := a4x4;
16.12.4
Val Section
HLA allows a second type of
constant declaration: the value declaration. The major difference between const and val symbols is that you can only bind a value to a const symbol once within
a given scope; you may, however,
bind different values to a val
identifier within the same scope.
At run-time, both const
and val objects have a constant
value (at least, at any given statement in the program). At compile time, however, it is better
to view const objects as
constants and val objects as
compile-time variables. The val declaration section begins with the reserved word val and continues until encountering "endval;", another declaration section, a program unit
(procedure, macro, etc), or the begin reserved word. Although the use of endval is optional, you should get in
the habit of using it to ensure compatibility with your source code and future
versions of HLA. The following subsections describe the legal syntax of the
statements that may appear within the val section.
id1: forward( id2 );
Defers the definition of
id1. See the description of
forward in the TYPE section for more details.
id := expr;
Associates the value of the
specfied constant expression with the identifier on the left hand side of the
assignment operator. If id is already defined at within the current scope, it
must have been defined as a val
object. In this case, the type and
value of the expression on the right hand side of the assignment operator
replaces the current value and type of id.
id1 : id2;
This declaration defines object
id1 to be a value of type id2, but does not associate a value with it. HLA will actually assign a value that
roughly corresponds to zero to id1 (e.g., integer/unsigned zero, 0.0, false, #0, the empty string, the empty character set, etc.)
although you should not depend upon this initialization within the body of your
code. Id2 must be a type
identifier that is a legal constant (val) type. The primary purpose of this declaration is to give a particular
symbol a type when future assignments may not completely specify the type. For
example, a future assignment like “v := 5;” doesn’t really specify whether v is
unsigned, signed, or generic, 8, 16, or 32 bits, etc. By predeclaring v as “v:uns32;” you can eliminate this
ambiguity (HLA would default to uns32 in this case, but it is always better
programming style to explicitly state the type of a constant object).
id1 : id2 [ bounds_list ];
Declares an array named id1 whose base type is id2 with the specified number of dimensions and
elements (bounds_list is a list
of comma-separated constant expressions that specifies the size of the
array). HLA allocates storage for id1 (assembly-time storage) and initializes each
element to a value that approximates zero for the given type. The ultimate purpose for this
declaration is to allow you to fix the element type in the declaration section
and then assign appropriate values (that could be one of many different types,
e.g., uns8, uns16, or uns32) to the individual elements later in the code. If id1 already exists, the array declaration replaces its
current type and value(s).
Otherwise this declaration creates a new constant (val) object.
Examples:
a: int32[2,2,4];
b: Some_User_Type[5];
c: char[128];
d: cset[2];
id1 : id2 := expr
This declaration defines id1 to be of type id2 and is given the value of expr. Id2 must be a value type identifier (that is legal for
constants) and expr must be
type compatible with this type. If
id1 is currently undefined in
the current scope, HLA creates a new val object with the specified type and value. If id1 has
already been defined in the current scope, HLA replaces its value and type with
the type of id2 and the value
of expr; the previous value of id1 would be lost in this case.
Examples: (assume array is defined in a type section as
“array:uns8[2,2];”)
i : int8 := -5;
u : uns8 := 0;
a : array := [1,2,3,4];
id1 : id2 [ bounds_list ] := expr;
Declares id1 to be an array of type id2 with the number of dimensions and elements
specified by the bounds_list
comma-delimited list of array bounds;
this declaration also assigns the values of expr (which must be an array constant containing the
same number of elements as id1)
to id1. Id2 must be a valid constant (val) type.
If id1 is already
defined in the current scope, the new value of id1 replaces the old value.
Examples:
clrs : Colors[4] := [
red,
green, green, yellow ]; //Assumes Colors is an enum type.
clrs2 : Colors[4] := clrs;
TwoByTwo : real32[2,2] :=
[1.0,4.0,2.5,3.0];
id1[ bounds_list ] := expr;
Id1 must be an array constant declared in a val section.
This statement replaces the current value of the specified element of id1 with the value of the expr. The
type of the expr must be
assignment compatible with the type of the array element. If id1 has more dimensions that specified in bounds_list, then the expr must be an array constant with the same number of
elements as the array slice selected from id1.
Examples:
clrs[0] := red;
clrs2[2] := blue;
TwoByTwo[0,0] := 0.0;
id1 : id2 [] := expr2;
This declaration creates a
constant, id1, that is an array of type id2 with the size and arity specified
by id2 (if id2 is an array type) and expr2. The id1 constant is given the value of the array constant
specified by expr2. This is an
"open-ended" array declaration that lets you specify an arbitrary
number of array elements without having to explicitly specify the bounds of the
array. You may about the full
bounds of the array using the @elements compile-time function. If id2 is an array type, then the
number of elements in expr2 must be an even multiple of the number of elements
that id2 possesses. Such a
declaration creates an array with one more dimension (arity) than that for id2
and the bounds for the last dimension is numelements(expr2)/arity(id2).
Examples:
// Creates int8[4] array:
i8a : int8[] := [1,2,3,4];
// Creates uns8[16] array:
a4x4: uns8 :=
[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16 ];
// Assume array2x2 is defined as
“array2x2:uns8[2,2]”
// a224 is uns8[2,2,4]:
a2222 : array2x2[] := a4x4;
id1.fieldlist := expr;
Assigns the value of expr to the specified field of the record or union
constant id1. Expr must be assignment compatible with the specified
field. This val assignment replaces the current value of the
specified field in id1. Id1 must have been previously declared as a record or
union object.
Examples:
Pt.X := 0.0;
Pt.Y := 1.0;
Student.Name.Last := “Hyde”;
Note: Of course, you can
extrapolate the array and field access for recursively nested structures (i.e.,
arrays of records and fields that contain arrays). For example, given the type definitions:
type
Name : record
Last : string;
First: string;
MI : char;
endrecord;
Student : record
SName : Name;
ProjectScores : uns16[8];
ID : uns32;
endrecord;
endtype;
And the following val
declaration:
val
Course: Student[58];
endval;
Then the following are examples
of legal statements in a val section (assuming the above are all still in
scope):
Course[0].SName.Last := “Hyde”
Course[0].SName.First := “Randy”
Course[0].SName.MI := ‘L’;
Course[0].ProjectScores[0] := 100;
Course[0].ProjectScores[1] := 65;
Course[0].ID := 555_12_5687;
Special Syntax for val objects:
because it is sometimes convenient to modify a value object outside the val section, HLA provides a special syntax that allows
you to insert any legal val
statement whereever white space is legal in the program. By preceding a val declaration with a question mark (“?”), you may
embed a val statement anywhere in the program. This allows you to use macros and other HLA features to
automatically generate unique code within this other sections by using HLA’s
string handling facilities and a value object to generate unique labels. Consider the following example:
val
lblCntr : uns16 := 0;
endval;
const
@text( "L"
+ string(lblCntr) ) : uns16 := lblCntr;
? lblCntr := lblCntr
+ 1;
@text( "L"
+ string(lblCntr) ) : uns16 := lblCntr;
? lblCntr := lblCntr
+ 1;
@text( "L"
+ string(lblCntr) ) : uns16 := lblCntr;
? lblCntr := lblCntr
+ 1;
@text( "L"
+ string(lblCntr) ) : uns16 := lblCntr;
endconst;
The sequence above would
generate the statements:
L0 : uns16 := lblCntr;
? lblCntr := lblCntr + 1;
L1 : uns16 := lblCntr;
? lblCntr := lblCntr + 1;
L2 : uns16 := lblCntr;
? lblCntr := lblCntr + 1;
L3 : uns16 := lblCntr;
(Note that “@text” expands its
string parameter as text at the point “@text” appears in the program.)
As you can see in this example,
it was useful to be able to embed val statements within the const declaration section. Of
course, this example would have been a little more realistic had it used
macros, but that would have somewhat obfuscated the use of the val objects in this example.
The "?" operator is
actually HLA’s compile-time assignment statement (see the section on the HLA
compile-time language for more details on the compile-time language). In addition to the straight-forward
assignment noted above (which is syntactically identical to an assignment in
the VAL section), the HLA compile-time assignment statement offers two
additional forms:
?Scalar += expression;
?Scalar -= expression;
These forms add or subtract the
value of the expression on right hand side to/from the scalar variable
(non-array/non-record) on the left hand side of the "+=" or
"-=" operator. C/C++ and
Java programmers should be familiar with this operator. Note that HLA only allows scalar
variables on the left hand side of the operator. This isn’t a major limitation because 99% of the time you’ll
just be incrementing or decrementing compile-time variables (VAL objects) with
these operators. And you can
always use a statement of the form:
?CompositeObject :=
CompositeObject + expression;
or
?CompositeObject :=
CompositeObject - expression;
For non-scalar compile-time
variables.
16.12.5
Var Section
HLA supports two basic types of
variables: static variables and automatic variables (automatic variables are
also known as semi-dynamic variables).
HLA assumes that automatic variables are allocated on the stack in
the activation record of the current program unit (e.g., a procedure); it assumes that static variables are
allocated in the static data area (e.g., the data segment). You declare automatic run-time
variables in the var portion of
an HLA declaration section.
Example:
var
i: int8;
u: uns16;
d: dword;
r: real64;
w: wchar;
endvar;
Unlike const and val objects, you cannot assign values to a var object during assembly. Therefore, the var declaration section is rather simple and
straight-forward -- you can associate a data type with name and that’s about
it.
HLA assumes that all var
objects are allocated on the stack immediately below the frame pointer (EBP is
usually the frame pointer value in a typical assembly language program). For each variable in a program unit,
HLA subtracts the size of the object from the current variable offset and uses
the result value as the offset for the variable. For example, HLA would associate the following offsets with
each of the corresponding variables:
var
i : int8; //
offset -1
j: int16; //
offset -3 (-1 minus the size of an int16 [-2] produces -3).
k:int 32; //
offset -7 (-3 minus the size of an int32 [-4] produces -7).
a:uns8[9]; //
offset -16 (-7 minus the size [9 bytes] is -16).
etc.
endvar;
In addition to the syntax used
above, HLA provides some addition forms of the var declaration that lets you control the alignment and offsets of the variables. The generic syntax is (braces surround
optional items):
var { [ maxAlignment { : minAlignment } ] { := startingOffset ;}
Here are some examples that
demonstrate all the possible forms:
var [ 4 ]
var [4:2]
var := -4;
var [4] := -4;
var [4:2] := -4;
The maxAlignment value specifies the largest boundary upon which
HLA will align all the variables in this particular var section.
For example, "var [4:2]" tells HLA to align all variables on
no greater than a double word boundary within the activation record.
The minAlignment value specifies the smallest boundary upon which
HLA will align all the variables in the particular var section. Note that you may not specify a minAlignment value without also specifying a maxAlignment value, though you may specify a maxAlignment value without a minAlignment value (in which case HLA uses the value you specify
for both the minAlignment and maxAlignment values). The default value,
if you do not specify an alignment value at all, is one for both minAlignment and maxAlignment.
When HLA processes the VAR
section it maintains an internal "current offset" variable. At the beginning of a procedure, HLA
initializes this value to zero[14]. As you declare automatic variables in
the var section, HLA drops
the value of this current offset variable by the size of the object and then
uses the new current offset value as the offset for that variable. For example, if a procedure has a
single declaration, as follows:
var
b:byte;
endvar;
The HLA assigns the offset
"-1" to b since b
is one byte long (and zero minus one is "-1").
Whenever you specify alignment
values, HLA will choose an offset within the activation record that is either
the size of the object or minAlignment if the object’s size is less than minAlignment; or maxAlignment if the object’s size is greater than maxAlignment. For
example, with the following declaration the alignment chosen will be two since
the object’s size (one) is less than the minAlignment value:
var [4:2]
b:byte;
endvar;
Since HLA aligns b on an even
offset, b’s offset will be -2 rather than -1.
If the size of the object is
greater than the maxAlignment value, then HLA will align the object on a boundary that is a multiple
of the maxAlignment value. For example, the following declaration
aligns q on a boundary that is
an even multiple of four, but not necessarily an even multiple of eight (q is a quadword value):
var [4:2]
b:byte; //
Offset = -2
q:qword; //
Offset = -12
endvar;
If the size of an object falls
between the minAlignment and maxAlignment values, inclusive, then HLA will
align the object on an offset that is an even multiple of the object’s
size. The following declaration aligns
all objects on a boundary that is an even multiple of their size unless the
object is larger than eight bytes:
var [8:1]
b:byte; //
offset = -1
w:word; //
offset = -4
d:dword; //
offset = -8
b2:byte; //
offset = -9
q:qword; //
offset = -24
endvar;
One very important thing to
note about these offsets – the fact that an offset is aligned on a particular
boundary provides no guarantee that the object is aligned on that same boundary
in memory. These offsets are based
upon the value in EBP and if
the value in EBP is not
aligned on the largest boundary you specify in a var section, then that variable will not be aligned on
the desired address. Generally
(though this is certainly not guaranteed), the stack is aligned on a
double-word boundary (and, therefore, EBP usually is as well). So you can probably count on alignments up to four, but
anything after that will require special coding on your part (within the
procedure that needs the alignment) do guarantee memory address alignment on a
larger alignment boundary.
Note that HLA resets the minAlignment and maxAlignment values back to zero at the start of each var section.
So if you do the following, on the variables appearing in the first var section obey the alignment:
var [4]
a:byte; //
offset = -4
b:dword; //
offset = -8
endvar;
var
c:byte; //
offset = -9
d:dword; //
offset = -13
endvar;
Also, remember that you can
change the alignment of a single variable by using the align directive in the var section.
The align
directive only temporarily changes the alignment for a single variable
declaration. Immediately after the
variable, the minAlignment/maxAlignment values again control the alignment, e.g.,
var [4:2]
b:byte; //
offset -2
w:word; //
offset -4
d:dword; //
offset -8
align(1);
b2:byte; //
offset -9
d2:dword; //
offset -16 -- Goes back to [4:2]
alignment.
endvar;
The other optional item you can
attach to a var
declaration is to assign a starting offset to the section. This is done by following the var reserved word or the alignment option with the
assignment operator (":="), an integer constant expression, and a
semicolon, e.g.,
var := -8;
<< declarations >>
endvar;
var [4:2] := -8;
<< declarations >>
endvar;
As noted earlier, HLA will normally
use a starting offset of zero when it encounters the first var declaration
section in a procedure. HLA
subtracts the size of an object from the current offset prior to assigning the
offset to the variable. The offset
assignment option, above, lets you choose a different starting value. Also note that the first declaration
after such a var clause
will use the offset assigned; it
will not first subtract its size, e.g.,
var := -8;
b:byte; //
offset -8 (not -9!)
c:char; //
offset -9
w:word; // offset
-11
endvar;
If a declaration section
contains two var
declaration sections, then the second will continue to use the current offset
value at the point of its declaration unless it has an explicit offset value,
e.g.,
var := -8;
b:byte; //
offset -8 (not -9!)
endvar;
var
c:char; //
offset -9
endvar;
var
w:word; //
offset -11
endvar;
var := -8;
AlsoB:byte; //
offset -8 (alias to "b" above)
endvar;
Probably the only sane reason
for playing around with the starting offset in the var section is because you’re not going to build a
standard activation record and access your automatic variables by indexing off EBP. If
you aren’t constantly pushing and popping data throughout the execution of your
procedure, you might be able to index all your locals off ESP and save having to preserve, setup, and restore EBP in your procedure.
Statements in the var section can take one of the following forms:
align( expr );
As noted above, this
temporarily sets the alignment for the next variable you declare in the current
var section (this
alignment will not carry over into another var section later in the procedure).
id1: forward( id2 );
Defers the definition of
id1. See the description of
forward in the TYPE section for more details.
id1 : [ id2 ];
Declares id1 to be a pointer to an object of type id2.
Since HLA generates code for the 32-bit flat model, pointers are always 32-bit
offsets. Hence, HLA always
reserves exactly four bytes for a pointer object (regardless of what type the
variable is pointing at). If id2 is not defined at the point of id1’s declaration, then id2 must be defined before the end of the current
program unit (that is, id2 must
be defined in the same scope as id1).
id : procedure; options
id : procedure ( parameter_list ); options
These declarations define a
procedure pointer variable. Like
other pointers, procedure pointers are four-byte objects. If HLA encounters id as a statement in
the main body of the a program unit, it will automatically emit an indirect
call through this pointer variable.
See the section on procedures for the syntax of a valid parameter
list. The legal procedure options
include @pascal, @cdecl, @stdcall, and @returns. Note
that the @pascal, @cdecl, and @stdcall
options are mutually exclusive.
See the section on procedure declarations for a discussion of all these
options.
id : enum { enum_list };
Declares id to be an enumerated data type whose run-time values can be
one of the identifiers appearing in the enum_list (a comma-separated list of identifiers given the
consecutive values 0, 1, 2, etc.).
By default, HLA reserves one byte of storage for enumerated data types.
id1 : id2 ;
Declares the variable id1 to be an object of type id2.
Allocates enough space for id1 to hold a value of type id2. Id2 must be defined at the point of id1’s declaration.
id1 : id2 [ expr ] ;
Id1 is an array whose elements are of type id2.
There will be expr elements in this array (expr is a constant expression that HLA computes at
assembly time). HLA allocates
sufficient storage for the array in the activate record and associates the
lowest address of this block of memory with the symbol id1 (i.e., the base address of the array).
id1 : record
field_definitions
endrecord;
This declaration declares an
automatic variable that is a record type.
See the description of records in the section on type declarations for
more details. HLA computes an
offset for id1 that will
reserve sufficient space in the activation record for the specified record
data.
id1 : union
field_definitions
endunion;
This declaration declares an
automatic variable that is a union type.
See the description of unions in the section on type declarations for
more details. HLA computes an
offset for id1 that will
reserve sufficient space in the activation record for the largest object in the
union.
Note that you cannot declare
class variables directly in the var section.
You must define a class type in the type section and then declare a
variable of the specified type.
16.12.6
Static Section
The static section is syntatically similar to the var section except it begins with the reserved word
“static” rather than “var”. One
difference is that the static section only allows a single alignment form:
static ( const_expr )
<< declarations >>
endstatic;
This declaration will align the
next declaration on the boundary specified. The value of const_expr should be 1, 2, 4, 8, 10, or 16. Warning: this feature is
depreciated. Use the align
directive instead. This will be
changed in a future version of HLA (to match the alignment options for records
and the var section). Code that uses this syntax will break
at that time.
You can also use the align directive within the static section to force the alignment of the next
variable you declare. This
directive uses the following syntax:
align( constant );
The constant value should be 1,
2, 4, 8, 10, or 16. This is the
preferred way to align a single static variable declaration to a particular
boundary in the static
section.
HLA assumes that all static objects are allocated in a global data area (e.g.,
the data segment). For each
variable in a program unit, HLA allocates storage for the object in successive
memory locations in the global segment.
For example, HLA would associate the following offsets with each of the
corresponding variables (assuming no other static objects at this point):
static
i : int8; //
offset 0
j: int16; //
offset 1 (The size of an int8 [1] produces an offset of one).
k:int 32; //
offset 3 (the size of the previous variables).
a:uns8[9]; //
offset 7 (the size of the previous variables).
etc.
endstatic;
Unlike objects in the var section, static variables can be initialized
during assembly. The syntax is
similar to that used by the val section, e.g.,
static
i : int8 := -2; //
Initializes i with $FE when program loads into memory.
j: int16 := 20; //
Initializes j with 16.
k:int 32 := 0; //
Initializes k with zero.
a:uns8[9] := [0, 1, 2, 3, 4, 5,6 ,7 ,8 ];
//
Initialize array with specified values.
oea:byte[] := [1,2,3,4]; // Creates an open-ended array
(byte[4]).
endstatic;
Each of the values used to
initialize static variables must be constants or constant expressions. Note that the initialization only
occurs once, when the program is loaded into memory. Static initialization that occurs inside a procedure does not imply that initialization occurs on each call of
the procedure.
Open-ended array declarations
(e.g., oea in the example above) are paticularly useful for creating tables and
other objects in memory that you want to initialize but don’t want to have to
count the actual elements by hand (you can always apply the @elements
compile-time function to a static array to determine the actual number of
elements the array possesses).
To initialize procedure
variables (i.e., procedure pointers) you would normally take the address of a
procedure using the "&" (static address-of) operator. Here’s the syntax for procedure
variables in the STATIC section:
id : procedure; options optional_external
id : procedure ( parameter_list ); options optional_external
id : procedure := &procedure_name; options
optional_external
id : procedure ( parameter_list ) := &procedure_name; options
optional_external
The options are the same as for procedure declarations in the
VAR section with the addition of
certain "variable options" you’ll read about in a later section ( Va). The optional_external clause is either @EXTERNAL or @EXTERNAL( "external_name" ).
Within the body of a procedure
or program you may also embed static variable declarations using the
static..endstatic directives. E.g.,
mov( 0, ax );
static
i:int32;
endstatic;
mov( ax, bx );
Note that HLA still inserts the
variables into the data segment area.
The variable "i"
in the example above is not inserted into the machine code between the two MOV
instructions. The object code for
the two MOV instructions is adjacent in the emitted code. The principal reason for having the static..endstatic section is to allow macros to create static
variables on the fly (unfortunately, there is no good way to generate automatic
[var] variables within the
middle of the code, so this only works for static objects).
Variables appearing in the
static section are always initialized.
If you do not specify an initial value, HLA automatically initializes
the variable with zero.
In general, you can assume that
variables you declare in the same static section (static, readonly, storage, or segment) are adjacent to one another in memory. HLA, the back-end assembler, and the
linker will typically assign higher memory addresses to variables declared
later in the same static section as other variables. You may not, however, make any assumptions about variables
declared in different static sections, even if those static sections are
adjacent to one another in the source code. I.e.,
static
i: int32;
j: int32;
endstatic;
static
k: int32;
endstatic;
You can assume that i and j are adjacent (and j immediately follows i in memory).
You cannot assume anything about the placement of k with respect to i or j. The k variable could come before or after i and j, and there could be other objects between
them. Note that the adjacency of
objects in HLA v2.0 may not be the same as v1.x, so you should not count on the
adjacency of variables in v1.x if you can help it.
You can also place "unlabelled" data
values into the static data section.
Unlabelled data objects take the following form:
typeID list_of_constants ;
TypeID must be a
predeclared type identifier (e.g., a predefined type like dword or a type
you’ve declared in the type section).
The list_of_constants
component must be a comma separated list of one or more constant items. Each constant in the list must be the
type specified by typeID. Examples:
type
eType: enum {e, f, g};
endtype;
static
eVar: eType:= e;
eType e, e, f, g, f, e;
pStr: byte := 12;
byte "Hello There";
endstatic;
Assuming enums are one byte
objects (the default), these declarations create an array of seven eType
objects and a "Pascal" string consisting of a length byte followed by
the specified number of characters.
The example above shows that
string literals may appear in a byte statement.
This does not output an HLA string constant, instead it simply outputs
the sequence of characters in the string with no extra data (i.e., no length
values and no zero terminating byte).
If you need these, you can manually add them.
Initialized string constants
store the pointer to the specified string in the static segment and the actual
string data in a special (inaccessible to you) segment. Therefore, if you have a declaration
like the following:
static
s:string :=
"hello";
endstatic;
The string variable s consists
of a single dword pointer. This
pointer, initialized to point at the string data, is created in the static
segment in memory. The actual
characters, along with the two length dwords and zero terminating byte
associated with HLA strings, is stored into the "strings" memory
segment. The upshot of this is
that you cannot overwrite a string variable allocated in this fashion. If you absolutely, positively, must be
able to overwrite literal string constants at run-time (a very poor practice),
you can achieve this as follows:
static
s: string := &sss;
dword
5;
// MaxStrLen value.
dword
5;
// length value
sss:
byte := 'h';
byte
"ello", 0,0,0;
endstatic;
Note that some HLA library
routines assume that the string data is an even multiple of four bytes
long. Hence the extra zeros
(padding) in the this example.
Also note that string literals appearing in a byte directive do not
output HLA style strings. This
example also demonstrates that you can assign a pointer constant ("&sss" above) to a
string variable. This is legal
because, after all, strings in HLA are really nothing more than pointers to the
actual data. Note that this same
discussion applies to unicode
objects (Unicode strings), the only difference being that HLA reserves two
bytes for each initialized character in the string.
Like the VAR section, you may
use the forward clause to defer the definition of a symbol
in the STATIC section, e.g.,
id1: forward( id2 );
Defers the definition of
id1. See the description of
forward in the TYPE section for more details.
The STATIC section supports a
special syntax that lets you associate an address and type with a variable
without actually reserving any storage for that object. That syntax is as follows:
id: type; @nostorage;
The address of the variable id
is the same address as whatever declaration happens to follow in memory
(generally the next declaration in the STATIC section). This is quite useful for creating
aliases:
ValAsWord: word ; @nostorage;
ValAsDword: dword;
In this example, ValAsWord and ValAsDword both refer to the same memory location because no
storage is actually associated with the ValAsWord identifier.
Another use of the @nostorage option is to create an arbitrary table of values
using the unlabelled data feature of the STATIC section, e.g.,
MyTable: dword; @nostorage;
dword 0, 1, 2, 3;
This example creates an array
of data with four dwords.
16.12.7
Segments
Segments were a
Win32-only feature that has been removed from HLA starting with HLA v1.102. The
following discussion has been left in this documentation just in case you see
any existing code using segments or in case segments make a return in a future
version of HLA. For now, ignore
this section.
Note: Segments will change
dramatically in HLA v2.0. You
should avoid using segments in an HLA v1.x program if future compatibility with
HLA is desired.
Although HLA does not support
80x86 segmentation, it does allow you to create your own named segments in the
variable declaration section[15]. The primary purpose for segments is to
allow you to create named segments in memory with special names for interface
to high level languages and other code that expects a certain segment name or
alignment type. The general syntax
for a segment declaration is the following
segment segmentID ( alignment, "class" );
<< static declarations >>
segmentID is the
name of the segment you wish to create.
This must be either a unique identifier in the program or the name of an
existing segment. Note that
segment names are not lexically scoped.
That is, segment names are global even if you define them inside a
procedure. If you define multiple
segment sections with the same name, HLA combines them all into the same memory
segment.
The alignment parameter must be one of the following: byte, word, dword, para, or page.
This option defines the alignment boundary in memory for the start of
the segment. This value should be
greater than or equal to the largest align value you specify within the segment
(e.g., use PARA if you have an ALIGN(16) directive).
The class string specifies the combine class for this
segment. This is usually the
segmentID enclosed within quotes, but you can specify a common data for several
different segments and the linker will combine these segments together during
the link phase. "data"
is a good combination string if you want your segments merged with the HLA
static data in the STATIC section.
See the section on Segment
Names a little later in this document for more details on the SEGMENT
directive.
Following the segment
statement, up to the next VAR, STATIC, etc., statement come the variable
declarations for this particular segment.
The segment section accepts the same declarations as the STATIC section.
16.12.8
Readonly Section
The readonly section is another
section where you may declare static variables. The syntax is very similar to the static declaration with
the following three differences:
You use the
"readonly" reserved word rather than "static" to begin the
declarations.
All variables you declare in a
readonly section must have an initializer (except for @external or @nostorage
objects).
Any attempt to write to the
variable at run-time will produce a run-time error[16].
Any variable you declare in a
readonly section winds up in the READONLY segment in memory. E.g., consider the following code:
readonly
s: string :=
"hello world";
i: int32 := 10;
wc:wchar := u’w’;
endreadonly;
The READONLY section lets you
emit unlabelled data within the segment.
Unlabelled data consists of a type name followed by a parentheses, a
list of objects of the specified type, and a closing parenthesis. E.g., "int32 0, 1, 2, 3;"
emits four dwords containing the values zero, one, two, and three at the
current point at in the readonly segment.
See the discussion in
"Static Section"
for more details.
Like the static section, you
can specify the alignment of the first declaration by specifying the alignment
value within parentheses after the readonly keyword:
readonly(4)
AlignedOn4: uns32 := 32;
endreadonly;
However, this feature is being
depreciated and you should not use it.
Instead, you should use the align directive as in the static section.
Like the static section, you may use the @nostorage option to define a name without actually
allocating storage.
HLA also provides a readonly..endreadonly block that may appear in the code segment. Variables you declare in such a section
are moved to the readonly
segment in memory. E.g.,
mov( 0, ax );
readonly
ro:int32
:= 10;
endreadonly;
mov( ax, bx );
16.12.9
Storage Section
The storage section is yet
another static variable declaration section. Unlike the static section, however, you cannot initialize
variables in the storage section - it simply reserve storage for uninitialized
variables. Note that variables
declared in the storage section go into the "bss" segment in memory,
so they are in a different segment than variables you declare in the static or
readonly sections.
Example:
storage
i:uns32;
j:int8;
endstorage;
Like the static section, you
can specify the alignment of the first declaration by specifying the alignment
value within parentheses after the storage keyword:
storage(4)
AlignedOn4: uns32;
endstorage;
Again, like the static and
readonly sections, this feature is depreciated and will go away soon. You should use the "align" directive instead.
Note that it is not legal to
put unlabelled objects in the storage section. Unlabelled data objects may only appear in a declaration
section that supports initialization (i.e., static or
readonly). However, the
@nostorage option is
perfectly legal in the storage
section.
Also note that open-ended
arrays are not possible in the storage section because open-ended array
declarations require an initializer in order to determine the number of
elements in the array.
16.12.10 Variable Options
The syntax for the declarations
appearing the the previous sections is not totally complete. Variable declarations in the static, readonly, and storage sections also allow certain options following the
declarations. This section
discusses those options.
A typical declaration in one of
the static sections (static, readonly, or storage) takes the following form:
varname : vartype; options
The previous sections discuss
the varname and vartype components, they are not particularly interesting
to us in this section. Of interest
is the (optional) options
component. This is a sequence of
zero or more keywords that provide the HLA compiler with additional information
about these symbols.
Actually, there are three types
of options that may follow a variable in one of the static sections, depending
on the type of the variable. These
are variable options (proper), procedure
options (for procedure variables),
and the @external
option. If multiple types of
options appear after a variable declaration, they must appear in this order
(variable, procedure, @external). However, within one of these sets of
options, the order of the individual options is irrelevant (e.g., the order of
the @nostorage and @volatile options within the variable options section
doesn’t matter). Here are the
option types:
Variable Options:
•
@nostorage;
•
@volatile;
•
@pascal;
•
@cdecl;
•
@stdcall;
•
@returns(
"string" );
•
@external;
•
@external(
"string" );
The procedure options may only
appear after a procedure variable;
these options are not legal following other types of variable objects.
16.12.10.1 The @NOSTORAGE Option
The @nostorage option tells HLA to associate the current offset
in the segment with the specified variable, but don’t actually allocate any
storage for the object. This
option effectively creates an alias of the current variable with the next
object you declare in one of the static sections. Consider the following example:
static
b: byte;
@nostorage;
w: word;
@nostorage;
d: dword;
Because the b and w variables both have the @nostorage option associated with them, HLA does not reserve
any storage for these variables.
The d variable
does not have the @nostorage
option, so HLA does reserve four bytes for this variable. The b and w variables, since they don’t have storage
associated with them, share the same address in memory with the d variable.
Note that is is not legal to
supply an initializer to a variable that has the @nostorage option.
I.e., the following is illegal:
IllegalDeclaration: byte := 5; @nostorage;
This should be obvious since an initializer supplies
initial data for the variable’s storage, yet the @nostorage option implies that no such storage exists.
The @nostorage option is legal in the readonly section.
As noted above, however, you cannot supply an initial value for an
object when specifying the @nostorage option. Normally, though,
declarations in the readonly
section require an initializer.
HLA will allow a readonly
variable declaration without an initializer if the @nostorage option appears. This lets you create aliases in the readonly section, e.g.,
readonly
alias: byte; @nostorage;
aliased: byte := 0;
endreadonly;
Both alias and aliased refer to the same value in memory (zero in this
case).
Note to long-time HLA users
(and those reading code written by long-time HLA users). HLA v1.25 and earlier supported a
fourth static variable declaration section, DATA. As of HLA v1.26 this static section no longer exists. In the DATA section, all variables had
an implied "@nostorage"
option associated with them. This
section was removed after the @nostorage option was added to the language since the DATA section is
superfluous. If you find a DATA
section in some HLA code, simply change it to a static section and attach the @nostorage option to all variables appearing in that section.
16.12.10.2 The @VOLATILE Option
The @volatile option is the second variable option. Currently, HLA ignores (though allows)
this variable option. The purpose
of this option is to tell the compiler that a variable’s value can change
unexpectedly due to hardware access to this object or via modification by a
different thread of execution. An
optimizer would use this information to take special care when manipulating
volatile objects. However, since
HLA v1.x does not support an optimizer (that is slated for v2.x), HLA cannot
currently make use of this information.
Although HLA currently ignores
the @volatile option, you
should use it if a variable is indeed volatile. First, this is a good way to document the fact that the
variable’s value can change unexpectedly.
Second, when HLA v2.x finally begins to utilitize this information, you
won’t have to go back and change your source code to accomodate the optimizer.
Example:
static
v: dword; @volatile;
endstatic;
Note: the @volatile option is legal in the var section as well as the static sections.
16.12.10.3 The @PASCAL, @CDECL, and @STDCALL Options
These three options are
procedure options and are only legal following a procedure variable
declaration. Remember that the @volatile or @nostorage options must appear before all procedure
options; so if you use one of
these three options along with one or more of the variable options, these
options must follow all the variable options.
The @pascal, @cdecl, and @stdcall options are mutually exclusive[17]. They define the calling sequence HLA
will use when calling the procedure variable you are declaration with these
options. If none of these options
appears, then HLA will assume the use of the pascal calling convention.
The @pascal calling convention pushes parameters in the order
of their declaration (left to right in the parameter list) and it is the
procedure’s responsibility to remove the parameters from the stack upon
return. The @cdecl calling convention pushes the parameters in the
opposite order of their declaration (right to left in the parameter list) and
it is the caller’s responsibility to remove the parameters from the stack when
the procedure returns. The @stdcall calling convention pushes the parameters in the
reverse order, like @cdecl, but
it is the procedure’s responsibility to remove the parameters (like the @pascal convention).
16.10.2 For
more details, see Procedure Declarations
.
16.12.10.4 The @RETURNS Option
16.10.2 As
for procedure declarations, (see Procedure Declarations
), the returns option lets you specify a string
that HLA substitutes for a procedure invocation when using instruction
composition. For more details, see
The 8.
16.12.10.5 The @EXTERNAL Option
The @external option gives you the ability to reference static
variables that you declare in other files. Like the @external clause for procedures, there are two different syntax for the external
clause appearing after a variable declaration:
varName: varType; @external;
varName: varType; @external(
"external_Name" );
The first form above uses the
variable’s name for both the internal and external names. The second form uses varName as the internal name that HLA uses and it
associates this varible with external_Name in the external modules. The @external
option is always the last option associated with a variable declaration. If other options (like @nostorage or @stdcall) also appear, they must appear before the @external clause.
Don’t forget that all external names in an HLA program must be
compatible with the assembly code that HLA emits. For example, if you’re emitting MASM code, you must not use
any MASM reserved words for your external symbols.
You may only attach the
external clause to static objects (those you declare in a static, readonly, or storage
section). Automatic (var) variables can never be external. Note that, unlike external procedures,
you may declare external variables at any lexical scope level. You can even declare (static) objects
in a class to be external.
Of course, if you declare an
object to be external, you are making a promise to HLA that you will define
that variable in a different object module. If you do not, then the linker will complain about an "unresolved
external" when it attempts to link your modules together.
If the actual variable
definition for an external object appears in a source file after an external
declaration, this tells HLA that the definition is a public variable that other
modules may access (the default is local to the current source file). This is the only way to declare a
variable to be public so that other modules can use it. Usually, you would put the external
declaration in a header file that all modules (wanting to access the variable)
include; you also include this
header file in the source file containing the actual variable declaration. Note that HLA scoping rules still
apply, so if you put the external declaration at one lex level and the variable
definition at a different lex level, HLA will treat them as separate objects,
e.g.,
static i:int32; @external;
procedure HideI;
static i:int32; // Not the same I as above!
begin HideI;
.
.
.
end HideI;
.
.
.
You cannot place an external
declaration after a variable definition in the source file; HLA will complain about a duplicate
defined symbol if you do. HLA will
also complain if an external definition of a variable appears twice in a source
file.
16.12.11 Segment Names
Note: segment names are a
depreciated feature and have been removed.
16.12.12
Namespaces
A namespace declaration takes
one of the following forms:
namespace identifier;
<< declarations >>
end identifer;
namespace identifier; @fast;
<< declarations >>
end identifer;
To access an identifier
declared in in namespace, you would preface the identifier with the name of the
namespace and a dot (similar to a record, class, or union reference). Using namespaces lets you reuse common
identifiers for different purposes (e.g., the HLA Standard Library string and
standard out namespaces both redefine the symbol "put" and you access
their particular symbols using the "stdout.put" and
"str.put" names).
Within a namespace, you
normally may only access other identifiers defined previously in that same
namespace. Since you may sometimes
need to access other identifiers (especially namespace’d identifiers) outside
the current namespace, a special lexeme has been added to the language to provide
access to global objects: "@global:identifier".
This form tells HLA to ignore any local symbols (in the current
namespace) and only look outside the current namespace for the specified
identifier.
If you declare a second
namespace using the same namespace identifier as a previous namespace, then HLA
will append those names to the end of the existing namespace. This only applies if the new namespace
identifier is at the same lex level (the same scope) as the previous namespace. I.e., if you create a local namespace
in a procedure using the same name as a global namespace, then the normal rules
of scope apply and the new namespace is local to that procedure and overrides
the global definition.
Namespaces have a couple of
useful properties. Besides the obvious solution to the name space pollution
problems, HLA namespaces use a different symbol table searching algorithm that
the rest of the system. This search algorithm (a hashing algorithm) is much
faster than the standard search algorithm. Therefore, searching for symbols in a
large namespace (one that contains lots of symbols) is much more efficient than
searching for symbols in the standard HLA global namespace. For example, on a 300 MHz Pentium II,
it takes over 40 seconds to assemble an empty source file that includes all the
Win32 API declarations at the global level; it takes only about two second to
assemble that same source file when you include all the Win32 declarations in a
namespace. Therefore, namespaces are great for library header file declarations
and other such objects that you include, wholesale, in a typical assembly
language program.
The "@fast" namespace
option increases the speed of assembly even more. This option tells HLA not to
bother checking for duplicate symbols within a namespace (which can consume a
fair amount of time in a large namespace like the one that encapsulate the
Win32 declarations). You should never use this option during the development of
a namespace (that is, while you’re making changes to the namespace). Otherwise, HLA won’t report any
duplicate symbol errors within the namespace. The "@fast" attributed is really intended for
debugged library modules that will not change very frequently, but will be
often included in other assembly source files. If you ever make a change to a namespace that has an
"@fast" attribute attached to it, you should temporarily comment out
the "@fast" and do a quick compile to verify that you didn’t
introduce any errors into the namespace that HLA would miss because of the
presence of the "@fast" option.
There are a couple of known
issues in the HLA v1.x implementation of namespaces. Because of the design of
HLA v1.x, it is unlikely such issues will have a resolution before HLA
v2.0. One problem is that pointer
types must reference a symbol within the current namespace or a built-in
type; e.g., the following is
currently not legal in HLA (though, logically, it should be):
type
usertype :int32;
namespace n;
static
v:pointer to usertype; //Illegal
u:pointer to int32; //This is okay
(built-in type)
endstatic;
end n;
The solution is to create a
type within the namespace that is an isomorphism for the external type:
type
usertype :int32;
endtype;
namespace n;
type
utype : @global:usertype;
endtype;
static
v:pointer to utype; //This is now legal
u:pointer to int32; //This is okay
(built-in type)
endstatic;
end n;
16.13
Class Data
Types
HLA supports object-oriented
programming via the class data type.
A class declaration takes the following form:
class
<< declarations >>
endclass;
Classes allow const, val, var,
static, readonly, uninitialized, procedure, method, and macro
declarations. In general, just about
everything allowed in a program declaration section except types, segments, and
namespaces are legal in a class declaration.
Unlike C++ and Object Pascal,
where the class declarations are nearly identical to the record/struct
declarations, HLA class declarations are noticably different than HLA records
because you supply const, var,
static, etc., declaration sections within the class. As an example, consider the following HLA class declaration:
type SomeClass: class
var
i:int32;
const
pi:=3.14159;
method
incrementI;
endclass;
Unlike records, you must put
each declaration into an appropriate section. In particular, data fields must appear in a static,
readonly, uninitialized, or var section.
Note that the body of a
procedure or method does not appear in the class declaration. Only prototypes (forward declarations)
appear within the class definition itself. The actual procedure or method is declared elsewhere in the
code.
16.13.1 Classes,
Objects, and Object-Oriented Programming in HLA
HLA provides support for
object-oriented program via classes, objects, and automatic method
invocation. Indeed, supporting
method calls requires HLA to violate an important design principle (that HLA
generated code does not disturb values in any registers except ESP and
EBP). Nevertheless, supporting
object-oriented programming and automatic method calls was so important, an
exception was made in this instance.
But more on that in a moment.
It is worthwhile to review the
syntax for a class declaration.
First of all, class declaration may only appear in a type section within
an HLA program. You cannot define
classes in the VAR, STATIC, STORAGE, or READONLY sections and HLA does not
allow you to create class constants[18]. Within the TYPE section, a class
declaration takes one of the following forms:
type
baseClass:
class
Declarations,
including const,
val,
var, and static sections, as
well
as procedures, methods, and
macros.
endclass;
derivedClass:
class
inherits( baseClass )
Declarations,
including const,
val,
var, and static sections, as
well
as procedure and method prototypes, and
macros.
endclass;
Note that you may not include
type sections or namespaces in a class.
Allowing type sections in a class creates some special problems (having
to due with the possibility of nested class definitions). Namespaces are illegal because they
allow type sections internally (and there is no real need for namespaces within
a class).
Note that you may only place
procedure, iterator, and method prototypes in a class definition. Procedure and method prototypes look
like a forward declaration without the forward reserved word; They use the following syntax:
procedure procName(optional_parameters); options
method methodName(optional_parameters); options
iterator iterName(
optional_parameters ); optional_external
"procName", "iterName", and "methodName" are the names you wish to assign to these
program units. Note that you do not preface these names with the name of the class and
a period.
If the procedure, iterator, or
method has any parameters, they immediately following the
procedure/iterator/method name enclosed in parentheses. The parentheses must not be present if
there are no parameters. A
semicolon immediately follows the parameters, or the procedure/method name if
there are no parameters.
Class procedure and method
prototypes allow two options: a @RETURNS clause and/or an @EXTERNAL
clause. The @pascal,
@cdecl, @stdcall, @nodisplay
and @noframe options are not
allowed in the prototype. See the
section on procedures for more details on the @returns and @external clauses.
The iterator only allows the @external option.
Unlike procedures and methods,
if you define a macro within a class you must supply the body of the macro
within the class definition.
Consider the following example
of a class declaration:
type
baseClass:
class
var
i:int32;
procedure
create; @returns( "esi" );
procedure
geti; @returns( "eax" );
method
seti( ival:int32 ); @external;
endclass;
By convention, all classes
should have a class procedure named "create".
This is the constructor for the class. The create procedure should return a pointer to the class
object in the ESI register, hence the
@returns( "esi" ); clause in this example.
This procedure includes two
accessor functions, geti and seti, that provide access to the class variable "i".
Note that HLA classes do not support the public, private, and protected
visibility options found in HLLs like C++ and Delphi. HLA’s design assumes that an assembly language programmers
are sufficiently disciplined such that they will not access fields that should
be private[19].
Of course, the class’
procedures and methods must be defined at one point or another. Here are some reasonable examples of
these class definitions (a full explanation will appear later):
procedure baseClass.create;
begin create;
push( eax );
if( esi = 0 ) then
malloc( @size( baseClass ));
mov( eax, esi );
endif;
mov( baseClass._VMT_, this._pVMT_ );
pop( eax );
ret();
end create;
procedure baseClass.geti;
@nodisplay; @noframe;
begin geti;
mov( this.i, eax );
ret();
end geti;
method baseClass.seti(
ival:int32 ); @nodisplay;
begin seti;
push( eax );
mov( ival, eax );
mov( eax, this.i );
pop( eax );
end seti;
These procedure and method
declarations look almost like regular procedure declarations with one important
difference: the class name and a period precede the procedure or method name on
the first line of the procedure/method declaration. Note, however, that only the procedure or method name
appears after the BEGIN and END clauses.
Another important difference is
the procedure options. Only the @nodisplay/@display, @noalignstack/@alignstack, and @noframe/@frame options are legal here (the converse of the class
procedure/method prototype definitions which only allow @external and @returns).
Note that call procedures, methods, and iterators do not support the @pascal,
@cdecl, or @stdcall procedure options (they always use the Pascal
calling convention).
Class procedures and methods
must be defined at the same lex level and within the same scope as the class
declaration. Usually class
declarations are a lex level zero (i.e., inside the main program or within a
unit), so the corresponding procedure and method declarations must appear at
lex level zero as well. Of course,
it is perfectly legal to declare a class type within some other procedure (at
lex level one or higher). If you
do this, the class procedure and method declarations must appear at the same
level.
16.13.2
Inheritence
HLA classes support inheritence
using the INHERITS reserved word. Consider the following class declaration that inherits the
fields from the baseClass declaration in the previous section:
derivedClass:
class inherits( baseClass )
var
j:int32;
f:real64;
endclass;
This class inherits all the
fields from baseClass and adds two new fields, j and f. This
declaration is roughly equivalent to:
derivedClass:
var
i:int32;
procedure
create; @returns( "esi" );
procedure
geti; @returns( "eax" );
method
seti( ival:int32 ); @external;
var
j:int32;
f:real64;
endclass;
It is "roughly"
equivalent because there is no need to create the derivedClass.create and derivedClass.geti procedures or the derivedClass.seti method.
This class inherits the procedures and methods written for baseClass along with the field definitions.
Like records, it is possible to
"override" the VAR fields of a base class in a derived class. To do this, you use the OVERRIDES
keyword. Note that this keyword is
valid only for VAR fields in a class, you may not override static objects with
this keyword. Example:
derivedClass:
class inherits( baseClass )
procedure
create; @returns( "esi" );
procedure
geti; @returns( "eax" );
method
seti( ival:int32 ); @external;
var
overrides
i: dword; // New copy of i
for this class.
j:int32;
f:real64;
endclass;
Occasionally, you may want to
override a procedure in a base class.
For example, it is very common to supply a new constructor in each
derived class (since the constructor may need to initialize fields in the derived
class that are not present in the base class). The override[20] keyword tells HLA that you intend to supply a new
procedure or method declaration and you do not want to call the corresponding
functions in the base class.
Consider the following modifications to derivedClass that override the create procedure and seti method:
derivedClass:
class inherits( baseClass )
var
j:int32;
f:real64;
override
procedure create;
override
method seti;
endclass;
When you override a procedure
or method, you are not allowed to specify any parameters or procedure options
except the @external
option. This is because the
parameters and @returns
strings must exactly match the declarations in the base class. So even though seti in this derived
class doesn’t have an explicit parameter declared, the "ival" parameter is still required in a call to seti.
Of course, once you override
procedures and methods in a derived class, you must provide those program units
in your code. Here is an example
of a section of a program that provides overridden procedures and methods along
with their declarations:
type
base: class
var
i:int32;
procedure create;
method geti;
method seti( ival:int32 );
endclass;
derived:class inherits( base )
var
j:int32;
override procedure create;
override method seti;
method getj;
method setj( jval:int32 );
endclass;
procedure base.create; @nodisplay; @noframe;
begin create;
push( eax );
if( esi = 0 ) then
malloc( @size( base ));
mov( eax, esi );
endif;
mov( &base._VMT_,
this._pVMT_ );
mov( 0, this.i );
pop( eax );
ret();
end create;
method base.geti; @nodisplay; @noframe;
begin geti;
mov( this.i, eax );
ret();
end geti;
method base.seti( ival:int32 ); @nodisplay;
begin seti;
push( eax );
mov( ival, eax );
mov( eax, this.i );
pop( eax );
end seti;
procedure derived.create; @nodisplay; @noframe;
begin create;
push( eax );
if( esi = 0 ) then
malloc( @size( base ));
mov( eax, esi );
endif;
// Do any initialization
done by the base class:
call base.create;
// Do our own specific
initialization.
mov( &derived._VMT_,
this._pVMT_ );
mov( 1, this.j );
// Return
pop( eax );
ret();
end create;
method derived.seti( ival:int32 ); @nodisplay;
begin seti;
push( eax );
mov( ival, eax );
// call inherited code
to do whatever it does:
(type base [esi]).seti(
ival );
// Now handle the code
that we do specially.
mov( eax, this.j );
// Okay, return to
caller.
pop( eax );
end seti;
method derived.setj( jval:int32 ); @nodisplay;
begin setj;
push( jval );
pop( this.j );
end setj;
method derived.getj; @nodisplay; @noframe;
begin getj;
mov( this.j, eax );
ret();
end getj;
16.13.3
Abstract Methods
Sometimes you will want to
create a base class as a template for other classes. You will never create instances (variables) of this base
class, only instances of classes derived from this class. In object-oriented terminology, we call
this an abstract class. Abstract classes may contain certain
methods that will always be overridden in the derived classes. Hence, there is no need to actually
supply the method for this base class.
HLA, however, always checks to verify that you supply all methods
associated with a class.
Therefore, you normally have to supply some sort of method, even if it’s
just an empty method, to satisfy the compiler. In those instances where you really don’t need such a
method, this is an annoyance.
HLA’s abstract methods
provide a solution to this problem.
You declare an abstract method
in a class declaration as follows:
type
c: class
method
absMethod( parameters: uns32 ); @abstract;
endclass;
The @ABSTRACT keyword must follow the @RETURNS option
if the @RETURNS option is present.
The @ABSTRACT keyword tells HLA
not to expect an actual method associated with this class. Instead, it is the responsibility of
all classes derived from "c" to override this method. If you attempt to call an abstract
method, HLA will raise an exception and abort program execution.
16.13.4
Classes versus Objects
An object is an instance of a class. In plain English, this means that a
class is only a data type while an object is a variable whose type is some
class type. Therefore, actual
objects may be declared in the var or static section of a program. Here are a couple of typical
examples:
var
b: base;
static
d: derived;
Each of these declarations
reserves storage for all the data in the specified class type.
For reasons that will shortly
become clear, most programmers use pointers to objects rather than directly declared
objects. Pointer declarations look
like the following:
var
ptrToB: pointer to base;
static
ptrToD: pointer to derived;
Of course, if you declare a
pointer to an object, you will need to allocate storage for the object (call
the HLA Standard Library "malloc" routine) and initialize the pointer variable with the address of
the allocated storage. As you will soon see, the class
constructor typically handles this allocation for you.
16.13.5
Initializing the Virtual Method Table Pointer
Whether you allocate storage
for an object statically (in the STATIC section), automatically (in the VAR
section), or dynamically (via a call to malloc), it is important to realize that the object is
not properly initialized and must be initialized before making any method
calls. Failure to do so will, most
likely, cause your program to crash when you attempt to call a method or access
other data in the class.
The first four bytes of every
object contain a pointer to that object’s virtual method table. The
virtual method table, or VMT, is an array of pointers to the code for each
method in the class. To help you
initialize this pointer, HLA automatically adds two fields to every class you
create: _VMT_ which
is a static dword entry (the significance of this being a static entry will
become clear later) and _pVMT_
which is a VAR field of the class whose type is pointer to dword. _pVMT_ is where you must put a pointer to the virtual
method table. The pointer value to
store here is the address of the _VMT_ entry.
This initialization can be done using the following statement:
mov( &ClassName._VMT_, ObjectName._pVMT_ );
ClassName
represents the name of the class and ObjectName represents the name of the STATIC or VAR variable
object. If you’ve allocated
storage for an object pointer using malloc, you’d use code like the following:
mov( ObjectPtr,
ebx );
mov( &ClassName._VMT_, (type ClassName [ebx])._pVMT_ );
In this example, ObjectPtr represents the name of the pointer variable. ClassName still represents the name of the class type.
Typically, the initialization
of the pointer to the virtual method table takes place in the class’ constructor procedure (it must be a procedure, not
a method!). Consider the example
from the previous section:
procedure base.create; @nodisplay; @noframe;
begin create;
push( eax );
if( esi = 0 ) then
malloc( @size( base ));
mov( eax, esi );
endif;
mov( &base._VMT_,
this._pVMT_ );
mov( 0, this.i );
pop( eax );
ret();
end create;
As you can see here, this
example uses the keyword "this._pVMT_" rather than "(type derived [esi])._pVMT_"
That’s because "this" is a shorthand for using the ESI register as
a pointer to an object of the current class type.
16.13.6
Creating the Virtual Method Table
For various technical reasons
(related to efficiency), HLA does not automatically create the virtual method
table for you; you must explicitly
tell HLA to emit the table of pointers for the virtual method table. You can do this in either the STATIC or
the READONLY declaration sections.
The simple way is to use a statement like the following in either the
STATIC or READONLY section:
VMT( classname );
If you need to be able to
access the pointers in this table, there are two ways to do this. First, you can refer to the "classname._VMT_" dword variable in the class. Another way is to directly attach a
label to the VMT you create using a declaration like the following:
vmtLabel: VMT( classname );
The "vmtLabel" label will be a static object of type dword.
If you intend to reference a
VMT outside the source file in which you declare it, you can use the @external
option to make the symbol accessible, e.g.,
VMT( classname ); @external;
Without this declaration, any
references of the form “classname._VMT_” will generate an error when you
attempt to build and link the application.
16.13.7
Calling Methods and Class Procedures
Once the virtual method table
of an object is properly initialized, you may call the methods and procedures
of that object. The syntax is very
similar to calling a standard HLA procedure except that you must prefix the
procedure or method name with the object name and a period. For example, assume you have some
objects with the following types ("base" is the type in the examples
of the previous sections):
var
b: base;
pb: pointer to base;
With these variable
declarations, and some code to initialize the pointers to the "base" virtual method table, the calls to the base procedures and methods might look like the
following:
b.create();
b.geti();
b.seti( 5 );
pb.create();
pb.geti();
pb.seti( eax );
Note that HLA uses the same
syntax for an object call regardless of whether the object is a pointer or a
regular variable.
Whenever HLA encounters a call
to an object’s procedure or method, HLA emits some code that will load the
address of the object into the ESI register. This is the one place HLA emits code that modifies the
value in a general purpose register!
You must remember this and
not expect to be able to pass any values to an object’s procedure or methods in
the ESI register. Likewise, don’t
expect the value in ESI to be preserved across a call to an object’s procedure or
method. As you will see
momentarily, HLA may also emit code that modifies the EDI register as well as
the ESI register. So don’t count on the value in EDI,
either.
The value in ESI, upon entry
into the procedure or method, is that object’s "this" pointer. This pointer is nececessary because the
exact same object code for a procedure or method is shared by all object
instances of a given class.
Indeed, the "this" reserved word within a method or class
procedure is really nothing more than shorthand for "(type ClassName [esi])".
Perhaps an obvious question is
"What is the difference between a class procedure and a method?"
The difference is the calling mechanism. Given an object b, a call to a class procedure emits a call
instruction that directly calls the procedure in memory. In other words, class procedure calls
are very similar to standard procedure calls with the exception that HLA emits
code to load ESI with the address of the object[21]. Methods, on the other hand, are called
indirectly through the virtual method table. Whenever you call a method, HLA actually emits three machine
instructions: one instruction that load the address of the object into ESI, one
instruction that loads the address of the virtual method table (i.e., the first
four bytes of the object) into EDI, and a third instruction that calls the
method indirectly through the virtual method table. For example, given the following four calls:
b.create();
b.geti();
pb.create();
pb.geti();
HLA emits the following 80x86
assembly language code:
lea
esi, [ebp-12]
;b
call ?8_create
lea
esi, [ebp-12]
;b
mov edi,
[esi]
call dword ptr
[edi+0] ;geti
mov
esi, dword ptr
[ebp-16] ;pb
call ?8_create
mov
esi, dword ptr
[ebp-16] ;pb
mov edi,
[esi]
call dword ptr
[edi+0] ;geti
HLA class procedures roughly
correspond to C++’s static member functions.
HLA’s methods roughly correspond to C++’s virtual member functions. Read
the next few sections on the impact of these differences.
16.13.8
Non-object Calls of Class Procedures
In addition to the difference
in the calling mechanism, there is another major difference between class
procedures and methods: you can call a class procedure without an associated
object. To do so, you would use
the class name and a period, rather than an object name and a period, in front
of the class procedure’s name.
E.g.,
base.create();
Since there is no object here
(remember, base is a type name, not a variable name, and types do not have any
storage allocated for them at run-time), HLA cannot load the address of the
object into the ESI register before calling create. This situation can create some big problems in your code if
you attempt to use the "this" pointer within a class procedure. Remember, an instruction like
"mov( this.i, eax );" really expands to "mov( (type base
[esi]).i, eax );" The
question that should come to mind is "where is ESI pointing when one makes
a non-object call to a class procedure?"
When HLA encounters a
non-object call to a class procedure, HLA loads the value zero (NULL) into ESI
immediately before the call. So ESI doesn’t contain junk but it does contain
the NULL pointer. If you attempt
to dereference NULL (e.g., by accessing "this.i") you will probably bomb the program. Therefore, to be really safe, you must
check the value of ESI inside your class procedures to verify that it does not
contain zero.
The base.create constructor procedure demonstrates a great way to
use class procedures to your advantage.
Take another look at the code:
procedure base.create; @nodisplay; @noframe;
begin create;
push( eax );
if( esi = 0 ) then
malloc( @size( base ));
mov( eax, esi );
endif;
mov( &base._VMT_,
this._pVMT_ );
mov( 0, this.i );
pop( eax );
ret();
end create;
This code follows the standard
convention for HLA constructors with respect to the value in ESI. If ESI contains zero, this function
will allocate storage for a brand new object, initialize that object, and
return a pointer to the new object in ESI[22]. On the other hand, if ESI contains a
non-null value, then this function does not allocate memory for a new object,
it simply initializes the object at the address provided in ESI upon entry into
the code.
Certainly you do not want to
use this trick (automatically allocating storage if ESI contains NULL) in all
class procedures; but it’s still a real good idea to check the value of ESI
upon entry into every class procedure that accesses any fields using ESI or the
"this" reserved word.
One way to do this is to use code like the following at the beginning of
each class procedure in your program:
if( ESI = 0 ) then
raise( AttemptToDerefZero );
endif;
If this seems like too much
typing, or if you are concerned about efficiency once you’ve debugged your
program, you could write a macro like the following to solve your problem:
#macro ChkESI;
#if( CheckESI )
if( ESI = 0 ) then
raise(
AttemptToDerefZero );
endif;
#endif
#endmacro
Now all you’ve got to do is
stick an innocuous "ChkESI" macro invocation at the beginning of your class procedures (maybe
on the same line as the "begin" clause to further hide it) and you’re
in business. By defining the
boolean constant "CheckESI"
to be true or false at the beginning of your code, you can control whether this
"inefficent" code is generated into your programs.
16.13.9
Static Class Fields
There exists only one copy,
shared by all objects, of any static data objects in a class. Since there is only one copy of the
data, you do not access variables in the class’ static section using the object
name or the "this" pointer.
Instead, you preface the field name with the class name and a period.
For example, consider the following
class declaration that demonstrates a very common use of static variables
within a class:
program DemoOverride;
#include(
"memory.hhf" );
#include( "stdio.hhf"
);
type
CountedClass:
class
static
CreateCnt:int32
:= 0;
procedure create;
procedure DisplayCnt;
endclass;
procedure CountedClass.create; @nodisplay;
@noframe;
begin create;
push( eax );
if( esi = 0 ) then
malloc( @size( base ));
mov( eax, esi );
endif;
mov(
&CountedClass._VMT_, this._pVMT_ );
inc( this.CreateCnt );
pop( eax );
ret();
end create;
procedure CountedClass.DisplayCnt; @nodisplay;
@noframe;
begin DisplayCnt;
stdout.put(
"Creation Count=", CountedClass.CreateCnt, nl );
ret();
end DisplayCnt;
var
b:
CountedClass;
pb: pointer to CountedClass;
begin DemoOverride;
CountedClass.DisplayCnt();
b.create();
CountedClass.DisplayCnt();
CountedClass.create();
mov( esi, pb );
CountedClass.DisplayCnt();
end DemoOverride;
In this example, a static field
(CreateCnt) is incremented
by one for each object that is created and initialized. The DisplayCnt procedure prints the value of this static
field. Note that DisplayCnt does not access any non-static fields of CountedClass. This
is why it doesn’t bother to check the value in ESI for zero.
16.13.10
Taking the Address of Class Procedures, Iterators, and Methods
You can use the static
address-of operator ("&") to obtain the memory address of a class
procedure, method, or iterator by applying this operator to the class
procedure/method/iterator’s name with a classname prefix. E.g.,
type
c : class
procedure p;
method m;
iterator i;
endclass;
procedure c.p; begin p; end p;
method c.m; begin m; end m;
iterator c.i; begin i; end i;
.
.
.
mov( &c.p, eax );
mov( &c.m, ebx );
mov( &c.i, ecx );
Please note that when you apply
the address-of operator ("&") to a class
procedure/method/iterator you must specify the class name, not an object name,
as the prefix to the procedure/method/iterator name. That is, the following is illegal given the class definition
for c, above:
static
myClass: c;
.
.
.
mov( &myClass.p, eax );
16.14
Program Unit Initializers and Finalizers
HLA does not automatically call
an object’s constructor like C++ does.
Also, there is no code associated with a unit that automatically
executes to initialize that unit as in (Turbo) Pascal or Delphi. Likewise, HLA does not automatically
call an object’s destructor.
However, HLA does provide a mechanism by which you can automatically
execute initialization and shut-down code without explicitly specifying the
code to execute at the beginning and end of each procedure. This is handled via the HLA "
_initialize_" and " _finalize_" strings. All programs, procedures, methods, and iterators have these
two predeclared string constants (VALUE strings, actually) associated with
them. Whenever you declare a
program unit, HLA inserts these constants into the symbol table and initializes
them with the empty string.
HLA expands the "_initialize_" string immediately before the first
instruction it finds after the "BEGIN" clause for a program,
procedure, iterator, or method.
Likewise, it expands the "_finalize_" string immediately before the END clause in
these program units. Since, by
default, these string constants hold the empty string, they usually have no
effect. However, if you change the
values of these constants within a declaration section, HLA emits the
corresponding code at the appropriate point. Consider the following example:
procedure HasInitializer;
?_initialize_ := "mov( 0,
eax );";
begin HasInitializer;
stdout.put( "EAX = ",
eax, nl );
end HasInitializer;
This program will print "EAX
= 0000_0000" since the "_initialize_" string contains an instruction that moves
zero into EAX.
Of course, the previous example
is somewhat irrelevant since you could have more easily put the MOV instruction
directly into the program. The
real purpose of initialize and finalize strings in an HLA program is to allow
macros and include files to slip in some initialization code. For example, consider the following
macro:
#macro init_int32( initValue ):theVar;
:forward( theVar );
theVar: int32
?_initialize_ = _initialize_ +
"mov(
" +
@string:initValue
+
",
" +
@string:theVar
+
"
);";
#endmacro
Now consider the following
procedure:
procedure HasInitedVars;
var
i: init_int32( 0 );
j: init_int32( -1 );
k: init_int32( 1 );
begin HasInitedVars;
stdout.put( "i=", i,
" j=", j, " k=", k, nl );
end HasInitedVars;
The first
"init_int32" macro above expands to (something like) the following
code:
i: forward( _1002_ );
_1002_: int32
?_initialize_ := _initialize_ +
"mov( " +
"0" +
", " +
"i" +
"
);";
Note that the last statement is
equivalent to:
?_initialize_ := _initialize_ + "mov( 0, i
);"
Also note that the text object _1002_ expands to "i".
If you take a step back from
this code and look at it from a high level persepective, you can see that what
it does is initialize a VAR variable by emitting a MOV instruction that stores
the macro parameter into the VAR object.
This example makes use of the FORWARD declaration clause in order to make a copy
of the variable’s name for use in the MOV instruction. The following is a complete program
that demonstrates this example (it prints "i=1", if you’re
wondering):
program InitDemo;
#include(
"stdlib.hhf" )
#macro init_int32( initVal ):theVar;
forward( theVar );
theVar:int32;
?_initialize_ :=
_initialize_
+
"mov( " +
@string:initVal +
", " +
@string:theVar +
" );";
#endmacro
var
i:init_int32( 1 );
begin InitDemo;
stdout.put( "i=", i, nl );
end InitDemo;
Note how this example uses
string concatenation to append an initialization string to the end of the
existing string. Although "_initialize_" and "_finalize_" start out as the empty string, there may be
more than one initialization string required by the program. For example, consider the following
modification to the code above:
var
i:init_int32( 1 );
j:init_int32( 2 );
The two macro invocations above
produce the initialization string "mov( 1, i);mov(2,j);". Had the macro not used string
concatenation to attach its string to the end of the existing "_initialize_" string, then only the last initialization
statement would have been generated.
You can put any number of
statements into an initialization string, although the compiler tools used to
write HLA limit the length of the string to something less than 32,768
characters. In general, you should
try to limit the length of the initialization string to something less than
4,096 characters (this includes all initialization strings concatenated
together within a single procedure).
Two very useful purposes for
the initialization string include automatic constructor invocation and Unit
initialization code invocation.
Let’s consider the UNITs situation first. Associated with some unit you might have some code that you
need to execute to initialize the code when the program first loads in to
memory, e.g.,
unit NeedsInit;
#include( "NeedsInit.hhf" )
static
i:uns32;
j:uns32;
procedure InitThisUnit;
begin InitThisUnit;
mov( 0,
i );
mov( 1,
j );
end InitThisUnit;
.
.
.
end NeedsInit;
Now suppose that the
"NeedsInit.hhf" header file contains the following lines:
procedure InitThisUnit; @external;
?_initialize_ := _initialize_ +
"InitThisUnit();";
When you include the header
file in your main program (that uses this unit), the statement above will
insert a call to the "InitThisUnit" procedure into the main program. Therefore, the main program will
automatically call the "InitThisUnit" procedure without the user of this unit
having to explicitly make this call.
You can use a similar approach
to automatically invoke class constructors and destructors in a procedure. Consider the following program that
demonstrates how this could work:
program InitDemo2;
#include(
"stdlib.hhf" )
type
_MyClass:
class
procedure
create;
var
i:
int32;
endclass;
#macro MyClass:theObject;
forward( theObject );
theObject: _MyClass;
?_initialize_ := _initialize_ +
@string:theObject
+
".create();"
#endmacro
procedure _MyClass.create;
begin create;
push( eax );
if( esi = 0 ) then
malloc(
@size( _MyClass ) );
mov(
eax, esi );
endif;
mov( &_MyClass._VMT_,
this._pVMT_ );
mov( 12345, this.i );
pop( eax );
end create;
procedure UsesMyClass;
var
mc:MyClass;
begin UsesMyClass;
stdout.put( "mc.i=",
mc.i, nl );
end UsesMyClass;
static
vmt( _MyClass );
begin InitDemo2;
UsesMyClass();
end InitDemo2;
The variable declaration "mc:MyClass;" inside the UsesMyClass procedure (effectively) expands to the following
text:
mc: _MyClass;
?_initialize_ := _initialize_ +
"mc.create();";
Therefore, when the UsesMyClass procedure executes, the first thing it does is
call the constructor for the mc/_MyClass object. Notice that the
author of the UsesMyClass
procedure did not have to explicitly call this routine.
You can use the "_finalize_" string in a similar manner to automatically
call any destructors associated with an object.
Note that if an exception
occurs and you do not handle the exception within a procedure containing "_finalize_" code, the program will not execute the
statements emitted by "_finalize_" (any more than the program will execute any other statements
within a procedure that an exception interrupts). If you absolutely, positively, must ensure that the code
calls a destructor before leaving a procedure (via an exception), then you
might try the following code:
?_initialize_ :=
_initialize_
+
<<string
to call constructor>> +
"try
";
?_finalize_ :=
_finalize_
+
"anyexception
push(eax); " +
<<string
to call destructor>> +
"pop(eax);
raise( eax ); endtry; " +
<<string
to call destructor>>;
This version slips a
TRY..ENDTRY block around the whole procedure. If an exception occurs, the ANYEXCEPTION handler traps it
and calls the associated destructor, then reraises the exception so the caller
will handle it. If an exception
does not occur, then the second call to the destructor above executes to clean
up the object before control transfers back to the caller.
16.15
HLA High Level Language Statements
HLA provides several control
structures that provide a high level language flavor to assembly language
programming. The statements HLA
provides are
try..unprotect..exception..anyexception..endtry,
raise
if..then..elseif..else..endif
switch..case..default..endswitch
while..endwhile
repeat..until
for..endfor
foreach..endfor
forever..endfor
break, breakif
continue, continueif
begin..end, exit, exitif
JT
JF
These HLL statements provide
two basic improvements to assembly language programs: (1) they make many
algorithms much easier to read;
(2) they eliminate the need to create tons of labels in a program (which
also helps make the program easier to read).
Generally, these instructions
are "macros" that emit one or two machine instructions. Therefore, these instructions are not
always as flexible as their HLL counterparts. Nevertheless, they are suitable for about 85% of the uses
people typically have for these instructions.
Do keep in mind, that even
though these statements compile to efficient machine code, writing assembly
language using a HLL mindset produces intrinsically inefficient programs. If speed or size is your number one
priority in a program, you should be sure you understand exactly which
instructions each of these statements emits before using them in your code.
The JT and JF statements are
actually "medium level language" statements. They are intended for use in macros
when constructing other HLL control statements; they are not intended for use
as standard statements in your program (not that they don’t work, they’re just
not true HLL statements).
Note: The FOREACH..ENDFOR loop
is mentioned above only for completeness.
The full discussion of the FOREACH..ENDFOR statement appears a little
later in the section on iterators.
16.15.1
Exception Handling in HLA
HLA uses the TRY..EXCEPTION..ENDTRY and RAISE statements to implement exception
handling. The syntax for these
statements is as follows:
try
<< HLA Statements to execute >>
<< unprotected // Optional unprotected section.
<< HLA Statements to execute >>
>>
exception( const1 )
<< Statements to execute if exception const1
is raised >>
<< optional exception
statements for other exceptions >>
<< anyexception //Optional anyexception section.
<< HLA Statements to execute >>
>>
endtry;
raise( const2 );
Const1 and const2 must be unsigned integer constants. Usually, these are values defined in
the excepts.hhf header file. Some
examples of predefined values include the following:
ex.StringOverflow
ex.StringIndexError
ex.ValueOutOfRange
ex.IllegalChar
ex.ConversionError
ex.BadFileHandle
ex.FileOpenFailure
ex.FileCloseError
ex.FileWriteError
ex.FileReadError
ex.DiskFullError
ex.EndOfFile
ex.MemoryAllocationFailure
ex.AttemptToDerefNULL
ex.WidthTooBig
ex.TooManyCmdLnParms
ex.ArrayShapeViolation
ex.ArrayBounds
ex.InvalidDate
ex.InvalidDateFormat
ex.TimeOverflow
ex.AssertionFailed
ex.ExecutedAbstract
Hardware related exception
values:
ex.AccessViolation
ex.Breakpoint
ex.SingleStep
ex.PrivInstr
ex.IllegalInstr
ex.BoundInstr
ex.IntoInstr
ex.DivideError
ex.fDenormal
ex.fDivByZero
ex.fInexactResult
ex.fInvalidOperation
ex.fOverflow
ex.fStackCheck
ex.fUnderflow
ex.InvalidHandle
ex.StackOverflow
ex.ControlC
This list is constantly
changing as the HLA Standard Library grows, so it is impossible to provide a
compete list of standard exceptions at this time. Please see the excepts.hhf header file for a complete list
of standard exceptions. As this
was being written, the *NIX-specific exceptions (signals) had not been added to
the list. See the excepts.hhf file
on your *NIX system to see if these have been added. Note that not all OSes
support every hardware-related exception value.
The HLA Standard Library currently reserves
exception numbers zero through 1023 for its own internal use. User-defined exceptions should use an integer value
greater than or equal to 1024 and less than or equal to 65535 ($FFFF). Exception value $10000 and above are
reserved for use by Windows Structured Exception Handler and *NIX
signals.
The TRY..ENDTRY statement
contains two or more blocks of statements. The statements to protect immediately follow the TRY reserved word. During the execution of the protected
statements, if the program encounters the first exception block, control
immediately transfers to the first statement following the endtry reserved word. The program will skip all the
statements in the exception blocks.
If an exception occurs during
the execution of the protected block, control is immediate transferred to an
exception handling block that begins with the exception reserved word and the
constant that specifies the type of exception.
Example:
repeat
mov( false, GoodInput );
try
stdout.put( "Enter an
integer value:" );
stdin.get( i );
mov( true, GoodInput );
exception( ex.ValueOutOfRange )
stdout.put( "Numeric
overflow, please reenter ", nl );
exception( ex.ConversionError )
stdout.put( "Conversion
error, please reenter", nl );
endtry;
until( GoodInput = true );
In this code, the program will
repeatedly request the input of an integer value as long as the user enters a
value that is out of range (+/- 2 billion) or as long as the user enters a
value containing illegal characters.
TRY..ENDTRY statements can be nested. If
an exception occurs within a nested TRY protected block, the EXCEPTION blocks
in the innermost try block containing the offending statement get first shot at
the exceptions. If none of the
EXCEPTION blocks in the enclosing TRY..ENDTRY statement handle the specified
exception, then the next innermost TRY..ENDTRY block gets a crack at the
exception. This process continues
until some exception block handles the exception or there are no more
TRY..ENDTRY statements.
If an exception goes unhandled,
the HLA run-time system will handle it by printing an appropriate error message
and aborting the program.
Generally, this consists of printing "Unhandled Exception" (or
a similar message) and stopping the program. If you include the excepts.hhf header file in your main
program, then HLA will automatically link in a somewhat better default
exception handler that will print the number (and name, if known) of the
exception before stopping the program.
Note that TRY..ENDTRY blocks
are dynamically nested, not statically nested. That is, a program must actually execute the TRY in order to
activate the exception handler.
You should never jump into the middle of a protected block, skipping
over the TRY. Doing so may produce
unpredictable results.
You should not use the
TRY..ENDTRY statement as a general control structure. For example, it will probably occur to someone that one
could easily create a switch/case selection statement using TRY..ENDTRY as
follows:
try
raise( SomeValue );
exception( case1_const)
<code for case 1>
exception( case2_const)
<code for case 2>
etc.
endtry
While this might work in some
situations, there are two problems with this code.
First, if an exception occurs
while using the TRY..ENDTRY statement as a switch statement, the results may be
unpredictable. Second, HLA’s
run-time system assumes that exceptions are rare events. Therefore, the code generated for the
exception handlers doesn’t have to be efficient. You will get much better results implementing a switch/case
statement using a table lookup and indirect jump (see the Art of Assembly)
rather than a TRY..ENDTRY block.
Warning: The TRY statement pushes data onto the stack upon
initial entry and pops data off the stack upon leaving the TRY..ENDTRY
block. Therefore, jumping into or
out of a TRY..ENDTRY block is an absolute no-no. As explained so far, then, there are only two reasonable
ways to exit a TRY statement, by falling off the end of the protected block or
by an exception (handled by the TRY statement or a surrounding TRY statement).
The UNPROTECTED clause in the TRY..ENDTRY statement provides a safe way to exit a TRY..ENDTRY
block without raising an exception or executing all the statements in the
protected portion of the TRY..ENDTRY statement. An unprotected section is a sequence of statements, between
the protected block and the first exception handler, that begins with the
keyword UNPROTECTED. E.g.,
try
<< Protected HLA Statements >>
unprotected
<< Unprotected HLA Statements >>
exception( SomeExceptionID )
<< etc. >>
endtry;
Control flows from the
protected block directly into the unprotected block as though the UNPROTECTED
keyword were not present. However,
between the two blocks HLA compiler-generated code removes the data pushed on
the stack. Therefore, it is safe
to transfer control to some spot outside the TRY..ENDTRY statement from within
the unprotected section.
If an exception occurs in an
unprotected section, the TRY..ENDTRY statement containing that section does not
handle the exception. Instead,
control transfers to the (dynamically) nesting TRY..ENDTRY statement (or to the
HLA run-time system if there is no enclosing TRY..ENDTRY).
If you’re wondering why the
UNPROTECTED section is necessary (after all, why not simply put the statements
in the UNPROTECTED section after the ENDTRY?), just keep in mind that both the
protected sequence and the handled exceptions continue execution after the
ENDTRY. There may be some
operations you want to perform after exceptions are released, but only if the
protected block finished successfully.
The UNPROTECTED section provides this capability. Perhaps the most common use of the UNPROTECTED
section is to break out of a loop that repeats a TRY..ENDTRY block until it executes
without an exception occuring. The
following code demonstrates this use:
forever
try
stdout.put(
"Enter an integer: " );
stdin.geti8(); // May raise an exception.
unprotected
break;
exception( ex.ValueOutOfRange )
stdout.put(
"Value was out of range, reenter" nl );
exception( ex.ConversionError )
stdout.put(
"Value contained illegal chars" nl );
endtry;
endfor;
This simple example repeatedly
asks the user to input an int8 integer until the value is legal and within the range of valid integers.
Another clause in the
TRY..EXCEPT statement is the ANYEXCEPTION clause. If this clause is
present, it must be the last clause in the TRY..EXCEPT statement, e.g.,
try
<< protected statements >>
<<
unprotected
Optional unprotected statements
>>
<< exception(
constant ) // Note: may be zero or more of
of
these.
Optional exception handler
statements
>>
anyexception
<< Exception handler if
none of the others execute >>
endtry;
Without the ANYEXCEPTION clause
present, if the program raises an exception that is not specifically handled by
one of the exception clauses, control transfers to the enclosing TRY..ENDTRY
statement. The ANYEXCEPTION clause gives a
TRY..ENDTRY statement the opportunity to handle any exception, even those that
are not explicitly listed. Upon
entry into the ANYEXCEPTION block, the EAX register contains the actual
exception number.
The HLA RAISE statement generates
an exception. The single parameter
is an 8, 16, or 32-bit ordinal constant.
Control is (ultimately) transferred to the first (most deeply nested)
TRY..ENDTRY statement that has a corresponding exception handler (including
ANYEXCEPTION).
If the program executes the RAISE statement within
the protected block of a TRY..ENDTRY statement, then the enclosing TRY..ENDTRY
gets first shot at handling the exception. If the RAISE statement occurs in an UNPROTECTED block, or in
an exception handler (including ANYEXCEPTION), then the next higher level
(nesting) TRY..ENDTRY statement
will handle the exception. This
allows cascading exceptions; that is,
exceptions that the system handles in two or more exception handlers. Consider the following example:
try
<< Protected statements >>
exception( someException )
<< Code to process this exception >>
// The following re-raises this exception,
allowing
// an enclosing try..endtry statement to handle
// this exception as well as this handler.
raise( someException );
<< Additional, optional, exception handlers >>
endtry;
16.15.2
The IF..THEN..ELSEIF..ELSE..ENDIF Statement in HLA
HLA provides a limited
IF..THEN.ELSEIF..ELSE..ENDIF statement that can help make your programs easier
to read. For the most part, HLA’s
if statement provides a convenient substitute for a CMP and a conditional
branch instruction pair (or chain of such instructions when employing
ELSEIF’s).
The generic syntax for the HLA
if statement is the following:
if( conditional_expression )
then
<< Statements to execute if expression is
true >>
endif;
if( conditional_expression )
then
<< Statements to execute if expression is
true >>
else
<< Statements to execute if expression is
false >>
endif;
if( expr1 ) then
<< Statements to execute if expr1 is true
>>
elseif( expr2 ) then
<< Statements to execute if expr1 is false
and expr2 is true >>
endif;
if( expr1 ) then
<< Statements to execute if expr1 is true
>>
elseif( expr2 ) then
<< Statements to execute if expr1 is false
and expr2 is true >>
else
<< Statements to execute if both expr1 and
expr2 are false >>
endif;
Note: HLA’s if statement allows
multiple ELSEIF clauses. All
ELSEIF clauses must appear between IF clause and the ELSE clause (if present)
or the ENDIF (if an ELSE clause is not present).
See the next section for a
discussion of valid boolean expressions within the IF statement (this section
appears first because the section on boolean expressions uses IF statements in
its examples).
16.15.3
Boolean Expressions for High-Level Language
Statements
The primary limitation of HLA’s
IF and other HLL statements has to do with the conditional expressions allowed
in these statements. These
expressions must take one of the following forms:
operand1 relop operand2
register in constant .. constant
register not in constant .. constant
memory in constant .. constant
memory not in constant .. constant
reg8 in CSet_Constant
reg8 in CSet_Variable
reg8 not in CSet_Constant
reg8 not in CSet_Variable
register
!register
memory
!memory
Flag
( boolean_expression )
!( boolean_expression )
boolean_expression && boolean_expression
boolean_expression || boolean_expression
For the first form,
"operand1 relop
operand2", relop is one
of:
= or
== (either
one, both are equivalent)
<> or != (either
one)
<
<=
>
>=
Operand1 and operand2 must be operands that would be legal for a "cmp(operand1,
operand2);" instruction.
For the IF statement, HLA emits
a CMP instruction with the two operands specified and an appropriate
conditional jump instruction that skips over the statements following the
"THEN" reserved word if the condition is false. For example, consider the following
code:
if( al = ’a’ ) then
stdout.put( "Option ’a’ was selected",
nl );
endif;
Like the CMP instruction, the
two operands cannot both be memory operands.
Unlike the conditional branch instructions, the
six relational operators cannot differentiate between signed and unsigned
comparisons (for example, HLA uses "<" for both signed and
unsigned less than comparisons).
Since HLA must emit different instructions for signed and unsigned comparisons,
and the relational operators do not differentiate between the two, HLA must
rely upon the types of the operands to determine which conditional jump
instruction to emit.
By default, HLA emits unsigned
conditional jump instructions (i.e., JA, JAE, JB, JBE, etc.). If either (or both) operands are signed
values, HLA will emit signed conditional jump instructions (i.e., JG, JGE, JL,
JLE, etc.) instead.
HLA considers the 80x86
registers to be unsigned. This can create some problems when
using the HLA if statement.
Consider the following code:
if( eax < 0 ) then
<< do something if eax is negative >>
endif;
Since neither operand is a
signed value, HLA will emit the following code:
cmp( eax, 0 );
jnb SkipThenPart;
<< do something if eax is
negative >>
SkipThenPart:
Unfortunately, it is never the
case that the value in EAX is below zero (since zero is the minimum unsigned
value), so the body of this if statement never executes. Clearly, the programmer intended to use
a signed comparison here. The
solution is to ensure that at least one operand is signed. However, as this example demonstrates,
what happens when both operands are intrinsically unsigned?
The solution is to use coercion
to tell HLA that one of the operands is a signed value. In general, it is always possible to
coerce a register so that HLA treats it as a signed, rather than unsigned,
value. The IF statement above
could be rewritten (correctly) as
if( (type int32 eax) < 0 )
then
<< do something if eax is negative >>
endif;
HLA will emit the JNL
instruction (rather than JNB) in this example. Note that if either operand is signed, HLA will emit a
signed condition jump instruction.
Therefore, it is not necessary to coerce both unsigned operands in this
example.
The second form of a
conditional expression that the IF statement accepts is a register or memory
operand followed by "in" and then two constants separated by the
".." operator, e.g.,
if( al in 0..10 ) then ...
This code checks to see if the
first operand is in the range specified by the two constants. The constant value to the left of the
".." must be less than the constant to the right for this expression
to make any sense. The result is
true if the operand is within the specified range. For this instruction, HLA emits a pair of compare and
conditional jump instructions to test the operand to see if it is in the
specified range.
HLA also allows a exclusive
range test specified by an expression of the form:
if( al not in 0..10 ) then ...
In this case, the expression
is true if the value in AL is outside the range 0..10.
In addition to integer ranges,
HLA also lets you use the IN operator with CSET constants and variables. The generic form is one of the
following:
reg8 in CSetConst
reg8 not in CSetConst
reg8 in CSetVariable
reg8 not in CSetVariable
For example, a statement of the
form "if( al in {’a’..’z’}) then ..." checks to see if the character
in the AL register is a lower case alphabetic character. Similarly,
if( al
not in {’a’..’z’, ’A’..’Z’}) then...
checks to see if AL is not an
alphabetic character.
The fifth form of a conditional
expression that the IF statement accepts is a single register name (eight,
sixteen, or thiry-two bits). The
IF statement will test the specified register to see if it is zero (false) or
non-zero (true) and branches accordingly.
If you specify the not operator ("!") before the register, HLA
reverses the sense of this test.
The sixth form of a conditional
expression that the IF staement accepts is a single memory location. The type of the memory location must be
boolean, byte, word, or dword. HLA
will emit code that compares the specified memory location against zero (false)
and generate an appropriate branch depending upon the value in the memory
location. If you put the not
operator ("!") before the variable, HLA reverses the sense of the
test.
The seventh form of a
conditional expression that the IF statement accepts is a Flags register bit or
other condition code combination handled by the 80x86 conditional jump
instructions. The following
reserved words are acceptable as IF statement expressions:
@c, @nc, @o, @no, @z, @nz, @s, @ns, @a, @na, @ae, @nae, @b, @nb, @be,
@nbe, @l, @nl, @g, @ne, @le, @nle, @ge, @nge, @e, @ne
These items emit an appropriate
jump (of the opposite sense) around the THEN portion of the IF statement if the
condition is false.
If you supply any legal boolean
expression in parenthesis, HLA simply uses the value of the internal expression
for the value of the whole expression.
This allows you to override default precedence for the AND, OR, and !
operators.
The !( boolean_expression ) evaluates the expression
and does just the opposite. That
is, if the interior expression is false, then !( boolean_expression ) is true
and vice versa. This is mainly
useful with conjunction and disjunction since all of the other interesting terms
already allow the not operator in front of them. Note that in general, the "!" operator must
precede some parentheses. You
cannot say "! AX < BX", for example.
Originally, HLA did not include
support for the conjunction (&&) and disjunction (||) operators.
This was explicitly left out of the design so that beginning students
would be forced to rethink their logical operations in assembly language. Unfortunately, it was so inconvenient
not to have these operators that they were eventually added. So a compromise was made: these
operators were added to HLA but "The Art of Assembly Language
Programming/Win32 Edition" doesn’t bother to mention them until an
advanced chapter on control structures.
The conjunction and disjunction
operators are the operators && and ||. They expect two valid HLA boolean expressions around the
operator, e.g.,
eax < 5 && ebx <> ecx
Since the above forms a valid
boolean expression, it, too, may appear on either side of the && or |
operator, e.g.,
eax < 5 && ebx <> ecx || !dl
HLA gives && higher
precedence than ||. Both operators
are left-associative so if multiple operators appear within the same
expression, they are evaluated from left to right if the operators have the
same precedence. Note that you can
use parentheses to override HLA’s default precedence.
One wrinkle with the addition
of && and || is that you need to be careful when using the flags in a
boolean expression. For example,
"eax < ecx && @nz" hides the fact that HLA emits a compare
instruction that affects the Z flag.
Hence, the "@nz" adds nothing to this expression since EAX
must not equal ECX if
eax<ecx. So take care when
using && and ||.
HLA uses short-circuit
evaluation when evaluating expressions containing the conjunction and
disjunction operators. For the
&& operator, this means that the resulting code will not compute the
right-hand expression if the left-hand expression evaluates false. Similarly, the code will not compute
the right-hand expression of the || operator if the left-hand expression
evaluates true.
Note that the evaluation of
complex boolean expressions involving the !(---), &&, and || operators
does not change any register or memory values. HLA strictly uses flow control to implement these operations.
Note that the "&"
and "|" operators are for compile-time only expression while the
"&&" and "||" operators are for run-time boolean
expressions. These two groups of
operators are not synonyms and you cannot use them interchangably.
If you would prefer to use a less abstract scheme
to evaluate boolean expressions, one that lets you see the low-level machine
instructions, HLA provides a
solution that allows you to write code to evaluate complex boolean expressions
within the HLL statements using low-level instructions. Consider the following syntax:
if
(#{
<<arbitrary HLA statements >>
}#) then
<< "True" section >>
else //or elseif...
<< "False" section >>
endif;
The "#{" and "}#" brackets tell HLA that
an arbitrary set of HLA statements will appear between the braces. HLA will not emit any code for the IF expression. Instead, it is the programmer’s
responsibility to provide the appropriate test code within the
"#{---}#" section.
Within the sequence, HLA allows the use of the boolean constants "true" and "false" as targets of conditional jump
instructions. Jumping to the
"true" label
transfers control to the true section (i.e., the code after the
"THEN" reserved word).
Jumping to the "false" label transfers control to the false section. Consider the following code that checks
to see if the character in AL is in the range "a".."z":
if
(#{
cmp( al, 'a' );
jb false;
cmp( al, 'z' );
ja false;
}#) then
<< code to execute if AL
in {’a’..’z’} goes here >>
endif;
With the inclusion of the
#{---}# operand, the IF statement becomes much more powerful, allowing you to
test any condition possible in assembly language. Of course, the #{---}# expression is legal in the ELSEIF
expression as well as the IF expression.
It would be a good idea for you
to write some code using the HLA if statement and study the MASM code produced
by HLA for these IF statements. By
becoming familiar with the code that HLA generates for the IF statement, you
will have a better idea about when it is appropriate to use the if statement
versus standard assembly language statements.
16.15.4
The WHILE..WELSE..ENDWHILE Statement in HLA
The while..endwhile statement
allows the following syntax:
while( boolean_expression ) do
<< while loop body>>
endwhile;
while( boolean_expression ) do
<< while loop body>>
else
<< Code to execute when expression is false
>>
endwhile;
while(#{ HLA_statements }#) do
<< while loop body>>
endwhile;
while(#{ HLA_statements }#) do
<< while loop body>>
welse
<< Code to execute when expression is false
>>
endwhile;
The WHILE statement allows the
same boolean expressions as the HLA IF statement. Like the HLA IF statement, HLA allows you to use the boolean
constants "true"
and "false"
as labels in the #{...}# form of the WHILE statement above. Jumping to the true label executes the body of the while loop, jumping
to the false label exits the
while loop.
For the "while( expr )
do" forms, HLA moves the test for loop termination to the bottom of the
loop and emits a jump at the top of the loop to transfer control to the termination
test. For the
"while(#{stmts}#)" form, HLA compiles the termination test at the top
of the emitted code for the loop.
Therefore, the standard WHILE loop may be slightly more efficient (in
the typical case) than the hybrid form.
The HLA while loop supports an
optional "welse" (while-else) section. The while loop will execute the code in this section only
when then the expression evaluates false.
Note that if you exit the loop vra a "break" or
"breakif" statement the welse section does not execute. This provides logic that is sometimes
useful when you want to do something different depending upon whether you exit
the loop via the expression going false or by a break statement.
16.15.5
The REPEAT..UNTIL Statement in HLA
HLA’s REPEAT..UNTIL statement
uses the following syntax:
repeat
<< statements to execute repeatedly >>
until( boolean_expression );
repeat
<< statements to execute repeatedly >>
until(#{ HLA_statements }#);
For those unfamiliar with
REPEAT..UNTIL, the body of the loop always executes at least once with the test
for loop termination ocurring at the bottom of the loop. The REPEAT..UNTIL loop (unlike C/C++’s
do..while statement) terminates loop execution when the expression is true (that
is, REPEAT..UNTIL repeats while the expression is false).
As you can see, the syntax for
this is very similar to the WHILE loop.
About the only major difference is the fact that jump to the "true" label in the #{---}# sequence exits the loop
while jumping to the "false" label in the #{---}# sequence transfers control back to the top of
the loop.
16.15.6
The FOR..ENDFOR Statement in HLA
The HLA for..endfor statement
is very similar to the C/C++ for loop.
The FOR clause consists of three components:
for( initialize_stmt; if_boolean_expression; increment_statement ) do
The initialize_statement component is a single machine instruction. This instruction typically initializes
a loop control variable. HLA emits
this statement before the loop body so that it executes only once, before the test
for loop termination.
The if_boolean_expression component is a simple boolean expression (same
syntax as for the IF statement).
This expression determines whether the loop body executes. Note that the FOR statement tests for
loop termination before executing the body of the loop.
The increment_statement component is a single machine instruction that HLA
emits at the bottom of the loop, just before jumping back to the top of the
loop. This instruction is
typically used to modify the loop control variable.
The syntax for the HLA for
statement is the following:
for( initStmt; BoolExpr; incStmt ) do
<< loop body >>
endfor;
-or-
for( initStmt; BoolExpr; incStmt ) do
<< loop body >>
felse
<< statements to execute when BoolExpr
evaluates false >>
endfor;
Semantically, this statement is
identical to the following while loop:
initStmt;
while( BoolExpr ) do
<< loop body >>
incStmt;
endwhile;
-or-
initStmt;
while( BoolExpr ) do
<< loop body >>
incStmt;
welse
<< statements to execute when BoolExpr evaluates false >>
endwhile;
Note that HLA does not include
a form of the FOR loop that lets you bury a sequence of statements inside the
boolean expression. Use the WHILE
loop if you want to do that. If
this is inconvenient, you can always create your own version of the FOR loop
using HLA’s macro facilities.
The FELSE section in the
FOR..FELSE..ENDFOR loop executes when the boolean expression evaluates
false. Note that the FELSE section
does not execute if you break out of the FOR loop with a BREAK or BREAKIF
statement. You can use this fact
to do different logic depending on whether the code exits the loop via the
boolean expression going false or via some sort of BREAK.
16.15.7
The FOREVER..ENDFOR Statement in HLA
The forever statement creates
an infinite loop. Its syntax is
forever
<< Statements to execute repeatedly >>
endfor
This HLA statement simply emits
a single JMP instruction that unconditionally transfers control from the ENDFOR
clause back up to the beginning of the loop.
In addition to creating
infinite loops, the FOREVER..ENDFOR loop is very useful for creating loops that
test for loop termination somewhere in the middle of the loop’s body. For more details, see the BREAK and
BREAKIF statements, next.
16.15.8 The
BREAK and BREAKIF Statements in HLA
The BREAK and BREAKIF
statements allow you to exit a loop at some point other than the normal test
for loop termination. These two
statements allow the following syntax:
break;
breakif( boolean_expression );
breakif(#{ stmts }#);
There are two very important
things to note about these statements.
First, unlike many HLA machine instructions, you do not follow the BREAK
statement with a pair of empty parentheses. The 80x86 machine instructions behave like compile-time
functions, so it made sense to require empty parentheses after those
instructions. The HLA HLL
statements do not behave like compile-time functions; the lack of parentheses after BREAK (and other HLL
statements, e.g., ELSE) makes sense here if you think about it for a moment.
The second thing to note is
that the BREAK and BREAKIF statements are legal only inside WHILE, FOREACH,
FOREVER, and REPEAT loops. HLA
does not recognize loops you’ve coded yourself using discrete assembly language
instructions (of course, you can probably write a macro to provide a BREAK
function for your own loops). Note
that the FOREACH loop pushes data on the stack that the BREAK statement is
unaware of. Therefore, if you
break out of a FOREACH loop, garbage will be left on the stack. The HLA BREAK statement will issue a
warning if this occurs. It is your
responsibility to clean up the stack upon exiting a FOREACH loop if you break
out of it.
16.15.9
The CONTINUE and CONTINUEIF Statements in HLA
The continue and continueif
statements allow you to restart a loop.
These two statements allow the following syntax:
continue;
continueif( boolean_expression );
continueif(#{ stmts }#);
There are two very important
things to note about these statements.
First, unlike many HLA machine instructions, you do not follow the
CONTINUE statement with a pair of empty parentheses. The 80x86 machine instructions behave like compile-time
functions, so it made sense to require empty parentheses after those
instructions. The HLA HLL
statements do not behave like compile-time functions; the lack of parentheses after continue (and other HLL
statements, e.g., else) makes sense here if you think about it for a moment.
The CONTINUE and CONTINUEIF
statements are legal only inside WHILE, FOREACH, FOREVER, and REPEAT
loops. HLA does not recognize
loops you’ve coded yourself using discrete assembly language instructions (of
course, you can probably write a macro to provide a CONTINUE function for your
own loops).
For the WHILE and REPEAT
statements, the CONTINUE and CONTINUEIF statements transfer control to the test
for loop termination. For the
FOREVER loop, the CONTINUE and CONTINUEIF statements transfer control the the
first statement in the loop. For
the FOREACH loop, CONTINUE and CONTINUEIF transfer control to the bottom of the
loop (i.e., forces a return from the yield() call).
16.15.10
The BEGIN..END, EXIT, and EXITIF Statements in HLA
The BEGIN..END statement block
provides a structured goto statement for HLA. The BEGIN and END clauses surround a group of
statements; the EXIT and EXITIF
statements allow you to exit such a block of statements in much the same way
that the BREAK and BREAKIF statements allow you to exit a loop. Unlike BREAK and BREAKIF, which can
only exit the loop that immediately contains the BREAK or BREAKIF, the exit
statements allow you to specify a BEGIN label so you can exit several nested
contexts at once. The syntax for
the BEGIN..END, EXIT, and EXITIF statements is as follows:
begin contextLabel ;
<< statements within the specified context
>>
end contextLabel;
exit contextLabel;
exitif( boolean_expression ) contextLabel;
exitif(#{ stmts }#) contextLabel;
The BEGIN..END clauses do not
generate any machine code (although END does emit a label to the assembly
output file). The EXIT statement
simply emits a JMP to the first instruction following the END clause. The EXITIF statement emits a compare
and a conditional jump to the statement following the specified end.
If you break out of a FOREACH
loop using the EXIT or EXITIF statements, there will be garbage left on the
stack. It is your responsibility
to be aware of this situation (i.e., HLA doesn’t warn you about it) and clean
up the stack, if necessary.
You can nest BEGIN..END blocks
and EXIT out of any enclosing BEGIN..END block at any time. The BEGIN label provides this
capability. Consider the following
example:
program ContextDemo;
#include( "stdio.hhf"
);
static
i:int32;
begin ContextDemo;
stdout.put( "Enter an integer:" );
stdin.get( i );
begin c1;
begin c2;
stdout.put(
"Inside c2" nl );
exitif(
i < 0 ) c1;
end c2;
stdout.put( "Inside
c1" nl );
exitif( i = 0 ) c1;
stdout.put( "Still inside
c1" nl );
end c1;
stdout.put( "Outside of c1" nl );
end ContextDemo;
The EXIT and EXITIF statements
let you exit any BEGIN..END block;
including those associated with a program unit such as a procedure,
iterator, method, or even the main program. Consider the following (unusable)
program:
program mainPgm;
procedure LexLevel1;
procedure LexLevel2;
begin LexLevel2;
exit
LexLevel2; //
Returns from this procedure.
exit
LexLevel1; //
Returns from this procedure and
// and the LexLevel1 procedure
// (including cleaning up the stack).
exit
mainPgm; //
Terminates the main program.
end LexLevel2;
begin LexLevel1;
.
.
.
end LexLevel1;
begin mainPgm;
.
.
.
end mainPgm;
Note: You may only exit from
procedures that have a display and all nested procedures from the procedure you
wish to exit from through to the EXIT statement itself must have displays. In the example above, both LexLevel1 and LexLevel2 must have displays if you wish to exit from the LexLevel1 procedure from inside LexLevel2. By
default, HLA emits code to build the display unless you use the "@nodisplay" procedure option.
Note that to exit from the
current procedure, you must not have specified the "@noframe" procedure option. This applies only to the current procedure. You may exit from nesting (lower lex
level) procedures as long as the display has been built.
16.15.11
The SWITCH/CASE/DEFAULT/ENDSWITCH Statement in HLA
As of HLA v1.102, a multi-way
switch statement is available in the HLA language (prior to HLA v1.102, the
switch statement was handled by a macro provided in the HLA Standard
Library). This statement uses
syntax similar to the following:
switch( reg32 )
case( constant_list )
<statements>
<< any number of
additional case clauses >>
default // This is optional
<statements>
endswitch;
The case clause argument list
is either a single ordinal constant, or a list of ordinal constants separated
by commas. The following is
an example of a legal switch statement with multiple case clauses:
switch( eax )
case( 0 )
mov( 1,
eax );
case(
1, 2 )
mov(
2, eax );
case(
5 )
add(
4, eax );
endswitch;
The switch statement, like it’s HLL counterpart, transfers
control to the statements following the case clause containing the value held in the 32-bit
register passed into the switch
statement.
The case constant values in a single case statement must all be unique. HLA will report an
error if two cases contain the
same constant value.
During the execution of the switch statement, if the value in the 32-bit register
passed as an argument to the switch statement is not present any any of the case clauses, then control transfers to the statements
associated with the default
clause (if one is present) or to the first statement following the endswitch class if there is no default section present.
In general, HLA compiles the switch statement into a jump table and an indirect jmp
instruction that transfers control to the code associated with the specified
case. However, in a couple of special cases HLA will not compile a switch into
an indirect jump instruction. To understand when this occurs, there are a
couple of terms you’ll need to understand.
Jump tables created for switch statements will have one entry for every ordinal
value between the smallest case
value and the largest case value
in the table. The difference between the largest and smallest case values (plus
one) is called the spread. This means that a jump table’s size in
bytes will be four times the spread.
Note that the spread value is independent of the number of cases. Consider the following switch statement fragnents:
switch( eax )
case( 1 )
<<
code to execute if EAX = 1 >>
case(
10 )
<<
code to execute if EAX = 10 >>
endswitch;
The jump table associated with
this switch entry will have ten entries, not two. This is because the spread is
10 for this switch statement.
Consider the following example:
switch( eax )
case( 1 )
<<
code to execute if EAX = 1 >>
case( 3 )
<<
code to execute if EAX = 3 >>
case( 6 )
<<
code to execute if EAX = 6 >>
case(
10 )
<<
code to execute if EAX = 10 >>
endswitch;
In this examples the spread is
still 10 and the jump table will have the same number of entries (10) as the
previous example. This is true even though this latter example has twice as
many cases as the earlier example.
The case clause lets you
specify multiple values in a comma-separated list. Consider the following
example:
switch( eax )
case( 1 )
<<
code to execute if EAX = 1 >>
case( 3, 6, 12 )
<<
code to execute if EAX = 3, 6, or 12 >>
case(
10 )
<<
code to execute if EAX = 10 >>
endswitch;
It is important to realize that
this switch statement has five
cases, not three. It just happens
that three of the cases (3, 6, and 12) share the same set of instructions to
execute. Also note that the spread is 12 in this example as the minimum case value is 1 and the largest is 12. Note that the default case does not count as a case for the purposes of counting the number of case values.
The default case simply provides a sequence of instructions to execute
for all the “holes” in the spread of case values (as well as all values below
and greater than the minimum and maximum case values).
Because the jump table will
have one entry for each integer value between the smallest and largest case values, you can easily generate a huge table with
a very simple switch statement.
Consider the following example:
switch( eax )
case( 1 )
<<
code to execute if EAX = 1 >>
case( 1000 )
<<
code to execute if EAX = 1000 >>
endswitch;
Even though this example has
only two cases, the jump table will contain 1,000 entries (and be 4,000 bytes
long). A set of widely spaced case values produces a sparse jump table (that is, only
a few of the entries in the jump table contain pointers to sections of code
associated with the cases, most entries contain a pointer to the default case (or the address of the first statement
following the endswitch if
there isn’t a default section).
To improve efficiency and
reduce the space consumed by large, sparse, jump tables, HLA specially handles
a couple of situations. First of
all, if the number of cases is three or less, HLA will not emit a jump table. Instead, it will emit a
sequence of CMP and JNE instructions to test the three or fewer case
values. Second, if the spread is 256
or greater but there are 32 or fewer cases, then HLA will emit a sequence of
CMP and JNE instructions to implement the switch statement.
In all other situations, HLA will emit a jump table implementation of
the switch.
If the spread is 16384 or
greater (this is an implementation-dependent constant an may change in the
future), HLA will generate an error and refuse to compile the switch statement.
If you really want to generate a switch statement whose jump table consumes 64K (or more)
of data, you will have to implement the statement manually (or modify the
switch macro in the “switch.hhf” header file).
If the spread is 4096 or
greater but less than 16384, HLA will generate the code but issue a warning
telling you that the jump table is going to be very large. If the spread is 16 times (or more) the
number of cases, HLA will emit a warning telling you that the jump table is
going to be very sparse.
All the case values in a
particular switch statement
must be unique. If there are any duplicate case values in a particular switch statement HLA will issue an error message.
16.15.11
The JT and JF Medium Level Instructions in HLA
The JT (jump if true) and JF
(jump if false) instructions are a cross between the 80x86 conditional jump
instruction and the HLA IF statement.
These two instructions use the following syntax:
JT ( booleanExpression ) targetLabel;
JF ( booleanExpression ) targetLabel;
The booleanExpression component can be any legal HLA boolean expression
that you’d use in an IF, WHILE, REPEAT..UNTIL, or other HLA HLL statement. The HLA compiler emits code that will
transfer control to the specified target label in your program if the condition
is true.
These instructions are
primarily intended for use in macros when creating your own HLL control
statements. For a discussion of
macros and creating your own control structures, see the HLA documentation on
the compile-time language.
16.15.12
Iterators and the HLA Foreach Loop
HLA provides a very powerful
user-defined looping control structure, the FOREACH..ENDFOR loop. The FOREACH loop uses the following
syntax:
foreach iteratorProc(
parameters ) do
<< foreach loop body >>
endfor;
The iteratorProc( parameters ) component
is a call to a special kind of procedure known as an iterator[23]. Iterators have the special property
that they return one of two states, success or failure. If an iterator returns success, it
generally also returns a function result.
If an iterator returns success, the foreach loop will execute the loop
body and reenter the iterator (more on that later) at the top of the loop. If an iterator returns failure, then
the loop terminates.
If you’ve never used true
iterators before, you may be thinking "big deal, an iterator is simply a
function that returns a boolean value." This, however, isn’t entirely true. An iterator behaves like a value
returning function when it succeeds, it behaves like a procedure when it
fails. The success or failure
state of the iterator call is not the return value. To
understand the difference, consider the syntax for an iterator:
iterator iteratorName <<( optional_parameters )>>;
<< procedure options >>
<< local declarations >>
begin iteratorName;
<< iterator statements >>
end iteratorName;
Other than the use of the
"ITERATOR" keyword rather than "PROCEDURE," this declaration
looks just like a procedure or method declaration. However, there are some crucial differences. First of all, HLA emits different code
for building iterator activation records than it does for procedures and
methods. Furthermore, whenever you
declare an iterator, HLA automatically creates a special thunk variable named
"yield". Also, HLA will not let you call an
iterator directly by specifying the iterator’s name as an HLA statement
(although you can still use the CALL instruction to call an iterator procedure,
though you’d better have set the stack up properly before doing so).
If an iterator returns via a
EXIT( iteratorname ) or
RET() statement, or returns by "falling off the end of the
function" (i.e., executing the "end" clause), then the iterator
returns failure to the calling FOREACH loop (hence, the loop will
terminate). To return success, and
return a value to the body of the FOREACH loop, you must invoke the "yield" thunk.
Yield doesn’t actually
return to the FOREACH loop, instead, it calls the body of the FOREACH loop and
at the bottom of the FOREACH loop HLA emits a return instruction that transfers
control back into the iterator (to the first statement following the yield).
This may seem counter-intuitive, but it has some important ramifications. First of call, an iterator maintains
its context until it fails. This
means that local variables maintain their values across the yield calls.
Likewise, when a FOREACH loop reenters an iterator, it picks up
immediately after the yield, it
does not pass new parameters and begin execution at the top of the iterator code.
Consider the following typical
iterator code:
program iteratorDemo;
#include( "stdio.hhf"
);
iterator range( start:int32;
stop:int32 ); @nodisplay;
begin range;
forever
mov( start, eax );
breakif( eax > stop );
yield();
inc( start );
endfor;
end range;
static
i:int32;
begin iteratorDemo;
foreach range( 1, 10 ) do
stdout.put( "eax = ",
eax, nl );
endfor;
end iteratorDemo;
This example demonstrates how
to create a standard "for" loop like those found in Pascal or C++[24]. The range iterator is passed two parameters, a starting
value and an ending value. It
returns a sequence of values between the starting and ending values
(respectively) and fails once the return value would exceed the ending
value. The FOREACH loop in this
example prints the values one through ten to the display.
Warning: because the iterator’s activation is left on the
stack while executing a FOREACH loop, you should take care when breaking out of
a FOREACH loop using BREAK, BREAKIF, EXIT, EXITIF, or some sort of jump. Cavalierly jumping out of a loop in
this fashion leaves the iterator’s activation record on the stack. You will need to clean this up manually
if you exit an iterator in this fashion.
Since HLA cannot determine the myriad of ways one could jump out of a
FOREACH loop body, it is up to you to make sure you don’t do this (or that you
handle the garbage on the stack in an appropriate way).
Keep in mind that the body of a
FOREACH loop is actually a procedure your program calls when it encounters the yield statement[25]. Therefore, any registers whose values
you change will be changed when control returns to the code following the yield. If
you need to preserve any registers across a yield, either push and pop them at the beginning of the
FOREACH loop body or place the PUSH and POP instructions around the yield.
16.16
HLA Compile-Time Language and Pragmas
This topic section describes
one of HLA’s more impressive features - the compile time language. Combined with the macro preprocessor,
the HLA compile-time language lets you customize the HLA language in almost an
infinite variety of ways.
Compile-time programs are just
that- programs that execute while HLA is compiling your source file. You embed compile-time language
statements directly in your HLA source files and these short program fragments
control how HLA compiles your assembly code.
This section doesn’t fully
explain the HLA compile-time language because you’ve already seen some major
parts of it. For example, VAL
constants in the HLA source file are equivalent to compile-time variables. The "?" statement is the
compile-time assignment statement.
This topic section, therefore, builds on the material that appears
elsewhere in this document.
16.16.1 The
#Include Directive
Like most languages, HLA provides a source
inclusion directive that inserts some other file into the middle of a source
file during compilation. HLA’s
#INCLUDE directive is very similar to the pragma of the same name in C/C++ and
you primarily use them both for the same purpose: including library header
files into your programs.
HLA’s include directive has the
following syntax:
#include( string_expression );
Note that any arbitrary
compile-time string expression is legal.
You are not limited to a literal string constant.
The #INCLUDE directive is legal
anywhere whitespace is legal. The
string specifies a filename that HLA will insert into the program during
compilation at the point the #INCLUDE appears. If HLA cannot find the file specified by the string constant
in the current directory (or in the directory specified if the string contains
a pathname), then HLA tries to find the file in the location specified by the
"hlainc" environment variable.
If HLA still doesn’t find the file, HLA will report an error.
Although you can use the
#INCLUDE directive to insert any arbitrary text at an arbitrary point in your
program, the vast majority of the time you will use #INCLUDE to include a
library header file (either an HLA Standard Library header file or a library header
file you’ve written) into your program.
HLA requires that you compile all external files at lex level zero. Therefore, if you are including some
declarations into your program, the #INCLUDE directive should be just inside
the main program. Convention
dictates that #INCLUDE directives that include library headers should appear
immediately after the "program" or "unit" header in a file.
16.16.2 The
#IncludeOnce Directive
When composing complex header
files, particularly when constructing library header files, you may find in
necessary to insert a #INCLUDE("file") directive into some other
header files. Generally, this is
not a problem, HLA certainly allows nested include files (up to 256 files
deep). However, unless you are
very careful about how you organize your files, it is very easy to create an
"include loop" where one header file includes another and that other
header file includes the first.
Attempting to compile a program that includes either header file results
in an infinite "include loop" during compilation. Clearly, this is not desirable.
The standard way to handle this situation is to
surround all the statements in an include file with a #IF statement as follows:
#if( !@defined(
headerfilename_hhf ))
?headerfilename_hhf := true;
<< Statements associated with this header
file go here >>
#endif
The first time HLA includes
this file the symbol "headerfilename_hhf" is not defined, so HLA
processes the statements in the body of the #IF statement. The very first statement defines this
"headerfilename_hhf" symbol (the value and type of this symbol are
irrelevant for our purposes; only
the fact that the symbol exists is important). Thereafter, if some other header file includes this file a
second (or additional) time, the "headerfilename_hhf" symbol is
defined, so HLA skips all the statements in the header file since the value of
the boolean expression in the #IF statement is false. Therefore, HLA only processes the statements of this header
file (at least those inside the #IF statement) the first time it encounters
this particular header file.
A drawback to this scheme is
that HLA must still open the header file and read each and every line from the
file, even if it ignores all the lines in the file. For large header files (e.g., the "stdlib.hhf"
header file) this can consume a significant amount of time during compilation. The #includeonce directive provides a solution for this problem.
You use the #INCLUDEONCE
directive just like the #INCLUDE directive. The only difference between the two is that HLA keeps track
of all files it has processed using the #INCLUDE or #INCLUDEONCE directives and
will not process a header file a second time if you attempt to include it using
the #INCLUDEONCE directive.
Whenever HLA processes the
#INCLUDEONCE directive, it first compares its string operand with a list of
strings appearing in previous #INCLUDE or #INCLUDEONCE directives. If it matches one of these previous strings,
then HLA ignores the #INCLUDEONCE directive; if the include filename does not appear in HLA’ internal
list, then HLA adds this filename to the list and includes the file.
Note that HLA’s #INCLUDEONCE
directive only compares strings for equality. If you use two separate filenames for the same file, HLA
will not detect this and it will include the file a second time. E.g., if the current directory is
"C:\hlafiles" then the following sequence will include the file
"whoops.hhf" twice:
#IncludeOnce( "whoops.hhf"
)
#IncludeOnce(
"c:\whoops.hhf" )
Also note that the #INCLUDE
directive will include its file regardless of whether the program previously
included that file with a #INCLUDEONCE directive, e.g., the following sequence
also includes "whoops.hhf" twice:
#IncludeOnce(
"whoops.hhf" )
#Include(
"whoops.hhf" )
For these two reasons, it’s
still a good idea to protect all header files using the #IF technique mentioned
earlier, even if you use the #IncludeOnce directive throughout.
16.16.3
Macros
HLA has one of the most
powerful macro expansion facilities of any programming language. HLA’s macros are the key to extended
the HLA language. The following
subsections describe HLA’s powerful macro processing facilities.
16.16.3.1 Standard Macros
HLA provides powerful macro
capabilities. You can declare
macros almost anywhere whitespace is allowed in a program using the following
syntax:
#macro identifier ( optional_parameter_list ) ;
statements
#endmacro
Note that a semicolon does not
follow the #endmacro
clause. However, HLA will allow an
optional semicolon after #endmacro without ill effects in the following source
code.
Example:
#macro MyMacro;
?i = i + 1;
#endmacro
The optional parameter list
must be a list of one or more identifiers separated by commas. Unlike procedure declarations, you do
not associate a type with macro parameters. HLA automatically associates the type “text” with all macro
parameters (except for two special cases noted below). Example:
#macro MacroWParms( a, b, c );
?a = b + c;
#endmacro
Optionally, the last (or only) name in the
identifier list may take the form “identifier[]”. This syntax tells the
macro that it may allow a variable number of parameters and HLA will create an
array of string objects to hold all the parameters (HLA uses a string array
rather than a text array because text arrays are illegal). Example:
#macro MacroWVarParms( a, b, c[] );
?a = b + text(c[0]) + text(c[1]);
#endmacro
If the macro does not allow any
parameters, then you follow the identifier with a semicolon (i.e., no
parentheses or parameter identifiers).
See the first example in this section for a macro without any parameters.
When using the array form
(variable parameters) in a macro argument list, HLA will parse the remaining
actual parameters and shove them into the array, one (perceived) parameter per
string array element. Sometimes, however, you might want to handle the parameter
parsing chores yourself (for example, to allow commas as part of an actual
macro parameter) rather than have HLA handle this task for you. HLA provides an
option to tell it to grab all remaining (or simply all) parameter text passed
in the actual parameter list and store all this data into a compile-time string
object. To achieve this, you prefix the last (or only) formal macro parameter
with the reserved word “string”, e.g.,
#macro MacroWStringParms( a, b,
string c );
<<macro body>>
#endmacro
In this example, the first two
actual parameters will be assigned to the text objects a and b within the
macro. Any remaining parameters will be collected as a single string and stored
into the c formal parameter as a string.
One very useful purpose for
string macro parameters is to allow you to grab a list of parameters you want
to pass on to some otther macro or procedure as a single object. E.g.,
procedure abc( a:byte; b:word;
c:dword );
begin abc;
.
.
.
end abc;
#macro CallsAbc( string
abcParms );
.
.
.
abc( @text( abcParms ));
.
.
.
#endmacro
.
.
.
CallsAbc( 1, 2, 3 );
The final macro invocation in
this sequence passes the three parameters “1,2,3” to the abc function.
Occasionally you may need to define some symbols
that are local to a particular macro invocation (that is, each invocation of
the macro generates a unique symbol for a given identifier). The local
identifier list allows you to do this.
To declare a list of local identifiers, simply following the parameter
list (after the parenthesis but before the semicolon) with a colon (“:”) and a
comma separated list of identifiers, e.g.,
#macro ThisMacro(parm1):id1,id2;
...
HLA automatically renames each
symbol appearing in the local identifier list so that the new name is unique
throughout the program. HLA
creates unique symbols of the form “_XXXX_” where XXXX is some hexadecimal numeric value. To guarantee that HLA can generate
unique symbols, you should avoid defining symbols of this form in your own
programs (in general, symbols that begin and end with an underscore are
reserved for use by the compiler and the HLA standard library). Example:
#macro LocalSym : i,j;
j: cmp(ax,
0)
jne( i )
dec( ax )
jmp( j )
i:
#endmacro
Without the local identifier
list, multiple expansions of this macro within the same procedure would yield
multiple statement definitions for “i” and “j”.
With the local statement present, however, HLA substitutes symbols
similar to “_0001_”
and “_0002_” for i and j for the first invocation and symbols like “_0003_” and “_0004_” for i and j on the second invocation, etc. This avoids duplicate symbol errors if
you do not use (poorly chosen) identifiers like “_0001_” and “_0004_” in your code.
The statements section of the
macro may contain any legal HLA statements (including definitions of other
macros). However, the legality of
such statements is controlled by where you expand the macro.
To invoke a macro, you simply
supply its name and an appropriate set of parameters. Unless you specify a variable number of parameters (using
the array syntax) then the number of actual parameters must exactly match the
number of formal parameters. If
you specify a variable number of parameters, then the number of actual
parameters must be greater than or equal to the number of formal parameters
(not counting the array parameter).
During macro expansion, HLA
automatically substitutes the text associated with an actual parameter for the
formal parameter in the macro’s body.
The array parameter, however, is a string array rather than a text array
so you will have force the expansion yourself using the “@text” function:
#macro example( variableParms[] );
?@text(variableParms[0]) :=
@text(variableParms[1]);
#endmacro
Actual macro parameters consist
of a string of characters up to, but not including a separate comma or the
closing parentheses, e.g.,
example( v1, x+2*y )
“v1” is the text for parameter #1, “x+2*y” is the text for parameter #2. Note that HLA strips all leading
whitespace and control characters before and after the actual parameter when
expanding the code in-line. The
example immediately above would expand do the following:
?v1 := x+2*y;
If (balanced) parentheses appear in some macro’s
actual parameter list, HLA does not count the closing parenthesis as the end of
the macro parameter. That is, the
following is perfectly legal:
example(
v1, ((x+2)*y) )
This expands to:
?v1 := ((x+2)*y);
If you need to embed commas or unmatched
parentheses in the text of an actual parameter, use the HLA literal quotes “#(“
and “)#” to surround the text.
Everything (except surrounding whitespace) inside the literal quotes
will be included as part of the macro parameter’s text. Example:
example( v1, #( array[0,1,i] )# )
The above expands to:
?v1 := array[0,1,i];
Without the literal quote
operator, HLA would have expanded the code to
?V1 := array[0;
and then generated an error
because (1) there were too many actual macro parameters (four instead of two)
and (2) the expansion produces a syntax error.
Of course, HLA’s macro
parameter parser does not consider commas appearing inside string or character
constants as parameter separators.
The following is perfectly legal, as you would expect:
example( charVar, ‘,’ )
As you may have noticed in
these examples, a macro invocation does not require a terminating
semicolon. Macro expansion occurs
upon encountering the closing parenthesis of the macro invocation. HLA uses this syntax to allow a macro
expansion anywhere in an HLA
source file. Consider the
following:
#macro funny( dest )
, dest );
#endmacro
mov( 0 funny( ax )
This code expands to “mov( 0,
ax );” and produces a legal machine instruction. Of course, the this is a truly horrible example of macro use
(the style is really bad), but it demonstrates the power of HLA macros in your
program. This “expand anywhere”
philosophy is the primary reason macro invocations do not end with a semicolon.
16.16.3.2 Where You Declare a Macro Affects its Visibility
Prior to HLA v1.46, macro
declarations had to appear in the declaration section of a program, procedure,
iterator, method, in a class, or in a namespace. In HLA v1.46 this restriction was lifted. Now you may declare a macro almost
anywhere whitespace is allowed in a program. This increases the utility of macros as part of the HLA
Compile-time Language. However,
there are some issues of which you should be aware when declare macros at
arbitrary points; this section will discuss those issues so you can avoid some
pitfalls of this new flexibility.
First of all, unless you have
good reason to do otherwise, you really should declare your macros in a
declaration section of your program.
Long-time HLA programmers have grown used to finding them there and by
placing your macros in a declaration section (e.g., whereever a procedure
declaration is allowed) you’ll make your programs easier to read because other
programmers can look for such declarations in a few known locations. Arbitrarily scattering your macro declarations
all over the place can make your programs harder to read. Also, it should go without saying that
you must declare a macro before the first invocation.
Like other identifiers in an
HLA program, macro identifiers have a scope that limits their visibility. If you declare a macro within a procedure,
then that macro’s identifier is only visible within that procedure and you
cannot invoke (call) the macro outside of the procedure (that is, beyond the end statement associated with the procedure). Note that this is true even if you
declare the macro in the body of the procedure, outside the procedure’s
declaration section, e.g.,
procedure SomeProc;
begin SomeProc;
#macro mov0eax;
mov( 0,
eax )
#endmacro
mov0eax; // legal here
end SomeProc;
mov0eax; // undefined symbol here.
If you declare a macro in a
namespace or within an HLA class, you may invoke that macro from outside the
namespace or class declaration by prefixing the macro identifier with the
namespace or class identifier (or by an object identifier, if that object is a
variable of the class type containing the macro) using the normal dot-notation
for access to fields of the namespace or class. Note that you may invoke namespace or class macros within
the namespace or class without the namespace prefix (just as you may access
other symbol types within the namespace or class without the prefix).
You may also embed macro
definitions within records and unions.
However, when you do this HLA will insert the macro’s symbol into the
field list for the record or union.
Because HLA does not provide a way to access anything other than
variable fields of a record or union outside the declaration of that type, you
will not be able to invoke the macro from outside the record or union
declaration. However, you may
invoke that macro within the same record/union declaration that contains the
macro definition, e.g.,
type
r :record
i:int32;
#macro inrec;
k:int32;
#endmacro
j:int32;
inrec; // Legal expansion here
endrecord;
var
r.inrec; // this is not legal here. Use a namespace or class to do this.
Because of some limitations of
the HLA implementation language (Flex/Bison), there is an important peculiarity you need to be aware of
when declaring macros. In particular,
HLA may process a macro declaration before it finishes processing whatever
occurs immediately before the macro.
Therefore, if the successful definition of a macro depends on whatever
appears immediately before the macro, the declaration may fail. Though this is rare, it does occur once
in a while. Should this happen to
you, try an insert an innocuous syntatical item (like a semicolon) before the
macro declaration.
16.16.3.3 Multi-part (Context Free) Macro Invocations:
HLA macros provide some very
powerful facilities not found in other macro assemblers. One of the really unique features that
HLA macros provides is support for multi-part (or context-free) macro
invocations. This feature is
accessed via the
#terminator and #keyword reserved words. Consider the following macro declaration:
program demoTerminator;
#include( "stdio.hhf"
);
#macro InfLoop:TopOfLoop,
LoopExit;
TopOfLoop:
#terminator endInfLoop;
jmp TopOfLoop;
LoopExit:
#endmacro;
static
i:int32;
begin demoTerminator;
mov( 0, i );
InfLoop
stdout.put( "i=", i,
nl );
inc( i );
endInfLoop;
end demoTerminator;
The #terminator keyword, if it appears within a macro, defines a
second macro that is available for a one-time use after invoking the main
macro. In the example above, the “endInfLoop” macro is available only after the invocation of
the “InfLoop” macro. Once you invoke the EndInfLoop macro, it is no longer available (though the macro
calls can be nested, more on that later).
During the invocation of the #terminator macro, all local symbols declared in the main
macro (InfLoop above) are
available (note that these symbols are not available outside the macro
body. In particular, you could not
refer to either “TopOfLoop”
nor “LoopExit” in the
statements appearing between the InfLoop and endInfLoop invocations above). The code above, by the way, emits code similar to the
following:
_0000_:
stdout.put( “i=”, i, nl );
inc( i );
jmp _0000_;
_0001_:
The macro expansion code
appears in italics. This program,
therefore, generates an infinite loop that prints successive integer values.
These macros are called
multi-part macros for the obvious reason: they come in multiple pieces (note,
though, that HLA only allows a single #terminator macro).
They are also refered to as Context-Free macros because of their syntactical nature. Earlier, this document claimed that you
could refer to the #terminator
macro only once after invoking the main macro. Technically, this should have said “you can invoke the
terminator once for each outstanding invocation of the main macro.” In other words, you can nest these
macro calls, e.g.,
InfLoop
mov( 0, j );
InfLoop
inc( i
);
inc( j
);
stdout.put(
“i=”, i, “ j=”, j, nl );
endInfLoop;
endInfLoop;
The term Context-Free comes from automata theory; it describes this nestable feature of
these macros.
As should be painfully obvious
from this InfLoop
macro example, it would be really nice if one could define more than one macro
within this context-free group.
Furthermore, the need often arises to define limited-scope scope macros
that can be invoked more than once (limited-scope means between the main macro
call and the terminator macro invocation). The #keyword
definition allows you to create such macros.
In the InfLoop example above, it would be really nice if you
could exit the loop using a statement like “brkLoop” (note that “BREAK” is an HLA reserved word and
cannot be used for this purpose).
The #keyword
section of a macro allows you to do exactly this. Consider the following macro definition:
#macro InfLoop:TopOfLoop,
LoopExit;
TopOfLoop:
#keyword brkLoop;
jmp LoopExit;
#terminator endInfLoop;
jmp TopOfLoop;
LoopExit:
#endmacro;
As with the “#terminator” section, the “#keyword” section defines a macro that is active after the
main macro invocation until the terminator macro invocation. However, #keyword macro invocations to not terminate the multi-part
invocation. Furthermore, #keyword invocations may occur more that once. Consider the following code that might
appear in the main program:
mov( 0, i );
InfLoop
mov( 0, j );
InfLoop
inc( j
);
stdout.put(
“i=”, i, “ j=”, j, nl );
if( j
>= 10 ) then
brkLoop;
endif
endInfLoop;
inc( i );
if( i >= 10 ) then
brkLoop;
endif;
endInfLoop;
The “brkLoop” invocation inside the “if( j >= 10)” statement
will break out of the inner-most loop, as expected (another feature of the
context-free behavior of HLA’s macros).
The “brkLoop”
invocation associated with the “if( i >= 10 )” statement breaks out of the
outer-most loop. Of course, the
HLA language provides the FOREVER..ENDFOR loop and the BREAK and BREAKIF
statements, so there is no need for this InfLoop macro, nevertheless, this example is useful
because it is easy to understand.
If you are looking for a challenge, try creating a statement similar to
the C/C++ switch/case statement; it is perfectly possible to implement such a
statement with HLA’s macro facilities, see the HLA Standard Library for an
example of the SWITCH statement implemented as a macro.
The discussion above introduced
the “#keyword” and “#terminator” macro sections in an informal way. There are a few details omitted from
that discussion. First, the full
syntax for HLA macro declarations is actually:
#macro identifier ( optional_parameter_list
) :optional_local_symbols;
statements
#keyword identifier ( optional_parameter_list
) :optional_local_symbols;
statements
note: additional #keyword
declarations may appear here
#terminator identifier ( optional_parameter_list
) :optional_local_symbols;
statements
#endmacro
There are three things that
should immediately stand out here: (1) You may define more than one #keyword within a macro. (2) #keywords
and #terminators allow
optional parameters. (3) #keywords and #terminators
allow their own local symbols.
The scope of the parameters and local symbols
isn’t particularly intuitive (although it turns out that the scope rules are
exactly what you would want). The
parameters and local symbols declared in the main macro declaration are
available to all statements in the macro (including the statements in the #keyword and #terminator sections).
The InfLoop
macro used this feature since the JMP instructions in the brkLoop and endInfLoop sections refered to the local symbols declared in
the main macro. The InfLoop macro did not declare any parameters, but had they
been present, the brkLoop
and endInfLoop sections could
have used those macros as well.
Parameters and local symbols
declared in a #keyword
or #terminator section are
local to that particular section.
In particular, parameters and/or local symbols declared in a #keyword section are not visible in other #keyword sections or in the #terminator section.
One important issue is that
local symbols in a mutipart macro are visible in the main code between the
start of the multipart macro and the terminating macro. That is, if you have some sequence like
the following:
InfLoop
jmp
LoopExit;
endInfLoop;
The HLA substitutes the
appropriate internal symbol for the LoopExit symbol.
This is somewhat unintuitive and might be considered a flaw in HLA’s
design. Future versions of HLA may
deal with this issue; in the
meantime, however, some code takes advantage of this feature (to mask global
symbols) so it’s not easy to change without breaking a lot of code. Be forewarned before taking advantage
of this "feature", however, that it will probably change in HLA v2.x. An important aspect of this behavior is
that macro parameter names are also visible in the code section between the
initial macro and the terminator macro.
Therefore, you must take care to choose macro parameter names that will
not conflict with other identifiers in your program. E.g., the following will probably lead to some problems:
static
i:int32;
#macro parmi(i);
mov( i, eax );
#terminator endParmi;
mov( eax, i );
#endmacro
.
.
.
parmi( xyz );
mov( i, ebx ); //
actually moves xyz into ebx, since the parameter i
//
overrides the global variable i here.
endParmi;
16.16.3.4 Macro Invocations and Macro Parameters:
As mentioned earlier, HLA
treats all non-array macro parameters as text constants that are assigned a
string corresponding to the actual parameter(s) passed to the macro. I.e., consider the following:
#macro SetI( v );
?i := v;
#endmacro
SetI( 2 );
The above macro and invocation
is roughly equivalent to the following:
const
v : text := “2”;
?i := v;
When utilizing variable
parameter lists in a macro, HLA treats the parameter object as a string array
rather than a text array (because HLA v1.x does not currently support text
arrays). For example, consider the
following macro and invocation:
#macro SetI2( v[] );
?i := v[0];
#endmacro
SetI2( 2 );
Although this looks quite
similar to the previous example, there is a subtle difference between the
two. The former example will
initialize the constant (value) i with the int32 value two.
The second example will initialize i with the string value “2”.
If you need to treat a macro
array parameter as text rather than as a string object, use the HLA “@text”
function that expands a string parameter as text. E.g., the former example could be rewritten as:
#macro SetI2( v[] );
?i := @text( v[0]);
#endmacro
SetI2( 2 );
In this example, the @text
function tells HLA to expand the string value v[0] (which is “2”) directly as text, so the
"SetI2( 2 )" invocation expands as
?i := 2;
rather than as
?i := “2”;
On occasion, you may need to do
the converse of this operation.
That is, you may want to treat a standard (non-array) macro parameter as
a string object rather than as a text object. Unfortunately, text objects are expanded by the lexer
in-line upon initial processing;
the compiler never sees the text variable name (or parameter name, in
this particular case). Therefore,
writing an “@string” function in HLA wouldn’t work because the
lexer would simply expand the text object parameter before HLA got a chance to
process it.
To work around this limitation,
the lexer provides a special syntactical entity that converts a text object to
the corresponding string. The
syntax is “@string:identifier” where identifier is the name of the
text constant (or macro parameter or macro local symbol) that you wish
converted to a string. When HLA
encounters this construct, it will substitute a string constant for the
identifier. The following example
demonstrates one possible use of this feature:
program demoString;
#macro seti3( v );
#print( "i is being set to " + @string:v
)
?i := v;
#endmacro
begin demoString;
seti3( 4 )
#print( "i = " + string( i ) )
seti3( 2 )
#print( "i = " + string( i ) )
end demoString;
If an identifier is a TEXT
constant (e.g., a macro parameter or a const/value identifier of type TEXT),
special care must be taken to modify the string associated with that text
object. A simple VAL expression
like the following won’t work:
?textVar:text :=
"SomeNewText";
The reason this doesn’t work is
subtle: if textVar is
already a text object, HLA immediately replaces textVar with its corresponding string; this includes the occurrence of the
identifier immediately after the "?" in the example above. So were you to execute the following
two statements:
?textVar:text := "x";
?textVar:text := "1";
the second statement would not
change textVar’s value from
"x" to
"1". Instead, the second
statement above would be converted to:
?x:text := "1";
and textVar’s value would remain "x".
To overcome this problem, HLA provides a special syntactical entity that
converts a text object to a string and then returns the text object ID. The syntax for this special form is
"@tostring:identifier".
The example above could be rewritten as:
?textVar:text := "x";
?@tostring:textVar:text :=
"1";
In this example, textVar would
be a text object that expands to the string "1".
16.16.3.5 Processing Macro Parameters
As described earlier, HLA
processes as parameters all text between a set of matching parentheses after
the macro’s name in a macro invocation.
HLA macro parameters are delimited by the surrounding parentheses and
commas. That is, the first
parameter consists of all text beyond the left parenthesis up to the first
comma (or up to the right parenthesis if there is only one parameter). The second parameter consists of all
text just beyond the first comma up to the second comma (or right parenthesis
if there are only two parameters).
Etc. The last parameter
consists of all text from the last comma to the closing right parenthesis.
Note that HLA will strip away
any white space at the beginning and end of the parameter’s text (though it
does not remove any white space from the interior of the parameter’s text).
If a single parameter must contain commas or parentheses, you must surround
the parameter with the literal text macro quotes “#(“ and “)#”. HLA considers everything but leading
and trailing space between these macro quote symbols as a single
parameter. Note that this applies
to macro invocations appearing within a parameter list. Consider the following (erroneous)
code:
CallToAMacro( 5, “a”,
CallToAnotherMacro( 6,7 ), true );
Presumably, the “( 6,7 )” text
is the parameter list for the “CallToAnotherMacro” invocation.
When HLA encounters a macro invocation in a parameter list, it defers
the expansion of the macro. That
is, the third parameter of “CallToAMacro” should expand to “CallToAnotherMacro( 6,7 )”, not the text that “CallToAnotherMacro” would expand to. Unfortunately, this example will not compile correctly
because the macro processor treats the comma between the 6 and the 7 as the end
of the third parameter to CallToAMacro (in other words, the third parameter is actually “CallToAnotherMacro(
6” and the fourth parameter is “7 )”. If
you really need to pass a macro invocation as a parameter, use the “ #(“ and “)#” macro quotes to surround the interior
invocation:
CallToAMacro( 5, “a”, #(
CallToAnotherMacro( 6,7 ) )#, true );
In this example, HLA passes
all the text between the “#(“ and “)#” markers as a single parameter (the third
parameter) to the “CallToAMacro” macro.
This example demonstrates
another feature of HLA’s macro processing system - HLA uses deferred macro parameter expansion. That
is, the text of a macro parameter is expanded when HLA encounters the formal
parameter within the macro’s body, not while HLA is processing the actual parameters in the macro invocation
(which would be eager evaluation).
There are three exceptions to
the rule of deferred parameter evaluation: (1) text constants are always
expanded in an eager fashion (that is, the value of the text constant, not the
text constant’s name, is passed as the macro parameter). (2) The @text function, if it appears
in a parameter list, expands the string parameter in an eager fashion. (3) The @eval function immediately evaluates its
parameter; the discussion of @eval
appears a little later.
In general, there is very
little difference between eager and deferred evaluation of macro
parameters. In some rare cases
there is a semantic difference between the two. For example, consider the following two programs:
program demoDeferred;
#macro two( x, y ):z;
?z:text:="1";
x+y
#endmacro
const
z:string := "2";
begin demoDeferred;
?i := two( z, 2 );
#print( "i=" + string( i ))
end demoDeferred;
In the example above, the code
passes the actual parameter “z”
as the value for the formal parameter “x”.
Therefore, whenever HLA expands “x” it gets the value “z” which is a local symbol inside the “two” macro
that expands to the value “1”. Therefore, this code prints “3” ( “1” plus y’s value which is “2”) during assembly. Now consider the following code:
program demoEager;
#macro two( x, y ):z;
?z:text:="1";
x+y
#endmacro
const
z:string := "2";
begin demoEager;
?i := two( @text( z ), 2 );
#print( "i=" + string( i ))
end demoEager;
The only differences between
these two programs are their names and the fact that demoEager invocation of “two” uses the @text function to eagerly expand z’s text.
As a result, the formal parameter “x” is given the value of z’s expansion (“2”) and HLA ignores the local value for “z” in macro “two”.
This code prints the value “4” during assembly. Note that changing “z” in the main program to a text constant (rather
than a string constant) has the same effect:
program demoEager;
#macro two( x, y ):z;
?z:text:="1";
x+y
#endmacro
const
z:text := "2";
begin demoEager;
?i := two( z, 2 );
#print( "i=" + string( i ))
end demoEager;
This program also prints “4”
during assembly.
One place where deferred vs.
eager evaluation can get you into trouble is with some of the HLA built-in
functions. Consider the following
HLA macro:
#macro DemoProblem( Parm );
#print( string( Parm ) )
#endmacro
.
.
.
DemoProblem( @linenumber );
(The @linenumber function returns, as an uns32 constant, the current line number in the file).
When this program fragment
compiles, HLA will use deferred evaluation and pass the text
"@linenumber" as the parameter "Parm".
Upon compilation of this fragment, the macro will expand to
"#print( string( @linenumber ))" with the intent, apparently, being
to print the line number of the statement containing the DemoProblem
invocation. In reality, that is
not what this code will do.
Instead, it will print the line number, in the macro, of the
"#print( string (Parm));" statement. By delaying the substitution of the current line number for
the "@linenumber" function call until inside the macro, deferred
execution produces the wrong result.
What is really needed here is eager evaluation so that the @linenumber
function expands to the line number string before being passed as a parameter
to the DemoProblem macro. The @eval built-in function provides
this capability. The following
coding of the DemoProblem
macro invocation will solve the problem:
DemoProblem(
@eval( @linenumber ) );
Now the compiler will execute
the @linenumber function and pass that number as the macro parameter text
rather than the string "@linenumber". Therefore, the #print statement inside the macro will print
the actual line number of the DemoProblem statement rather than the line number
of the #print statement.
Keep these minor differences in
mind if you run into trouble using macro parameters.
16.16.4
Built-in Functions:
HLA provides several
built-in functions that take constant operands and produce constant
results. It is important that you
differentiate these compile-time functions from run-time functions. These functions do not emit any object
code, and therefore do not exist while your program is running. They are only available while HLA is
compiling your program. Note that
many of these functions are trivial to implement in assembly language or have
counterparts in the HLA standard library.
Therefore, the fact that they are not available at run-time shouldn’t
prove to be much of a problem.
16.16.4.1 Constant Type Conversion Functions
boolean( const_expr )
The expression must be an
ordinal or string expression. If const_expr is numeric, this function returns false for zero
and true for everything else. If const_expr is a character, this function returns true for
"T" and false for "F". It generates an error for any other character value. If const_expr is a string, the string must contain
"true" or "false" else HLA generates an error.
int8( const_expr )
int16( const_expr )
int32( const_expr )
int64( const_expr )
int128( const_expr )
uns8( const_expr )
uns16 const_expr )
uns32( const_expr )
uns64( const_expr )
uns128( const_expr )
byte( const_expr )
word( const_expr )
dword( const_expr )
qword( const_expr )
lword( const_expr )
These functions convert
their parameter to the specified integer.
For real operands, the result is truncated to form a numeric
operand. For all other numeric
operands, the result is ranged checked.
For character operands, the ASCII code of the specified character is
returned. For boolean objects,
zero or one is returned. For string operands, the string must be a sequence of
decimal characters which are converted to the specified type. Note that byte,
word, and dword types are synonymous with uns8, uns16, and uns32 for the
purposes of range checking.
real32( const_expr )
real64( const_expr )
real80( const_expr )
Similar to the integer
functions above, except these functions produce the obvious real results. Only numeric and string parameters are
legal.
char( const_expr )
Const_expr
must be a ordinal or string value.
This function returns a character whose ASCII code is that ordinal
value. For strings, this function
returns the first character of the string.
string( const_expr )
This function produces a
reasonable string representation of the parameter. Almost all data types are
legal.
cset( const_expr )
The parameter must be a
character, string, or cset. For
character parameters, this function returns the singleton set containing only
the specified character. For
strings, each character in the string is unioned into the set and the function
returns the result. If the
parameter is a cset, this function makes a copy of that character set.
16.16.4.2 Bitwise Type Transfer Functions
The type conversion functions
of the previous section will
automatically convert their operands from the source type to the destination
type. Sometimes you might want to
change the type of some object without changing its value. For many "conversions" this is
exactly what takes place. For
example, when converting and uns8 object to an uns16
value using the uns16(---)
function, HLA does not modify the bit pattern at all. For other conversions, however, HLA may completely change
the underlying bit pattern when doing the conversion. For example, when
converting the real32
value 1.0 to a dword
value, HLA completely changes the underlying bit pattern ($3F80_0000) so that
the dword value is equal to
one. On occasion, however, you
might actually want to copy the bits straight across so that the resulting dword value is $3F80_0000. The HLA bit-transfer type conversion compile-time functions
provide this facility.
The HLA bit-transfer type
conversion functions are the following:
@int8( const_expr )
@int16( const_expr )
@int32( const_expr )
@int64( const_expr )
@int128( const_expr )
@uns8( const_expr )
@uns16 const_expr )
@uns32( const_expr )
@uns64( const_expr )
@uns128( const_expr )
@byte( const_expr )
@word( const_expr )
@dword( const_expr )
@qword( const_expr )
@lword( const_expr )
@real32( const_expr )
@real64( const_expr )
@real80( const_expr )
@char( const_expr )
@cset( const_expr )
The above functions extract
eight, 16, 32, 64, or 128 bits from the constant expression for use as the
value of the function. Note that
supplying a string expression as an argument isn’t particularly useful since
the functions above will simply return the address of the string data in memory
while HLA is compiling the program.
The @byte
function provides an additional syntax with two parameters, see the next
section for details.
@string( const_expr )
HLA string objects are pointers
(in both the language as well as within the compiler). So simply copying the bits to the
internal string object would create problems since the bit pattern probably is
not a valid pointer to string data during the compilation. With just a few exceptions, what the
@string function does is takes the bit data of its argument and translates this
to a string (up to 16 characters long).
Note that the actual string may be between zero and 16 characters long
since the HLA compiler (internally) uses zero-terminated strings to represent
string constants. Note that the
first zero byte found in the argument will end the string.
If you supply a string expression
as an argument to @string,
HLA simply returns the value of the string argument as the value for the @string function.
If you supply a text object as an argument to the @string function, HLA returns the text data as a string
without first expanding the text value (similar to the @string:identifier token).
If you supply a pointer constant as an argument to the @string function, HLA returns the string that HLA will
substitute for the static object when it emits the assembly file.
16.16.4.3 General functions
@abs( numeric_expr )
Returns the absolute equivalent
of the numeric value passed as a parameter.
@byte( integer_expr, which )
The which parameter is a value in the range 0..15. This function extracts the specified
byte from the value of the integer_expression parameter.
(This is an extension of the @byte type transfer function.)
@byte( real32_expr, which )
The which parameter is a value in the range 0..3. This function extracts the specified
byte from the value of the real32_expression parameter.
@byte( real64_expr, which )
The which parameter is a value in the range 0..7. This function extracts the specified
byte from the value of the real64_expression parameter.
@byte( real80_expr, which )
The which parameter is a value in the range 0..9. This function extracts the specified
byte from the value of the real80_expression parameter.
@ceil( real_expr )
This function returns the
smallest integer value larger than or equal to the expression passed as a
parameter. Note that although the
result will be an integer, this function return a real80 value.
@cos( real_expr )
The real parameter is an angle
in radians. This function returns
the cosine of that angle.
@date
This function returns a string
of the form "YYYY/MM/DD" containing the current date.
@env( string_expr )
This function returns a string
containing the value of a system environment variable (whose name you pass as
the string parameter). If the specified environment variable does not exist,
this function returns the empty string.
@exp( real_expr )
This function returns a real80
value that is the result of the computation e**real_expr (i.e., e raised to the specified power).
@extract( cset_expr )
This function returns a
character from the specified character set constant. Note that this function doesn’t actually remove the
character from the set, if you want to do that, then you will need to
explicitly remove the character yourself.
The following code demonstrates how to do this:
program extractDemo;
val
c:cset := {'a'..'z'};
begin extractDemo;
#while( c <> {} )
?b := @extract( c );
#print( "b=" + b )
?c := c - {b};
#endwhile
end extractDemo;
@floor( real_expr )
This function returns the
largest integer value less than or equal to the supplied expression. Note that
the returned result is of type real80 even though it is an integer value.
@isalpha( char_expr )
This function returns true if
the character expression is an upper or lower case alphabetic character.
@isalphanum( char_expr )
This function returns true if
the parameter is an alphabetic or numeric character. It returns false otherwise.
@isdigit( char_expr )
This function returns true if
the character expression is a decimal digit.
@islower( char_expr )
This function returns true if
the character expression is a lower case alphabetic character.
@isspace( char_expr )
This function returns true if
the character expression is a "whitespace" character. Typically, this would be spaces, tabs,
newlines, returns, linefeeds, etc.
@isupper( char_expr )
This function returns true if
the character expression is an upper case alphabetic character.
@isxdigit( char_expr )
This function returns true if
the supplied character expression is a hexadecimal digit.
@log( real_expr )
This function returns the
natural (base e) logarithm of the supplied parameter.
@log10( real_expr )
This function returns the
base-10 logarithm of the supplied parameter.
@max( comma_separated_list_of_ordinal_or_real_values )
This function returns the
largest value from the specified list.
@min( comma_separated_list_of_ordinal_or_real_values )
This function returns the least
of the values in the specified list.
@odd( int_expr )
This function returns true if
the integer expression is an odd number.
@random( int_expr )
This function returns a random
uns32 value.
@randomize( int_expr )
This function uses the integer
expression passed as a parameter as the new seed value for the random number
generator.
@sin( real_expr )
This function returns the sine
of the angle (in radians) passed as a parameter.
@sort( array_expr, int_expr, left_compare_id,
right_compare_id, str_expr )
This function returns an array
containing the elements of array_expr sorted in ascending order.
The second parameter (int_expr) specifies the number of elements in the array to sort (sorting always
begins with element zero and continues for int_expr elements). Note that @sort always returns an array
that is the same size as array_expr, but only the first int_expr elements are sorted.
Because array_expr elements can be an arbitrary type, you must supply
a mechanism for comparing individual elements of the array. This is
accomplished using the last three parameters to @sort. First of all, you must
supply the names of two HLA VAL objects as the left_compare_id and
right_compare_id parameters. These two value objects must be the same type an
an element of array_expr. The
last parameter must be a string constant holding the name of a macro that will
compare the values in these two identifiers and return true if left_compare_id is less than right_compare_id (This has to be a string constant so that HLA
won’t attempt to immediately expand the macro when encountering the name).
Though it shouldn’t matter much, the current implementation of
@sort uses a quick-sort algorithm. There is no guarantee that this function
will continue to use quicksort in the future, however.
Here’s a quick example:
#macro abcmp;
(a < b)
#endmacro
val
a:int32;
b:int32;
array:int32[8] := [8,7,6,5,4,3,2,1];
sortedArray:int32[8] := @sort( array,
@elements(array), a, b, “abcmp” );
@sqrt( real_expr )
This function returns the
square root of the parameter.
@system( string_expr )
This function executes the
system command specified by the string (i.e., a command-line operation for a
shell interpreter). It captures all the output sent to the standard output
device by this command and returns that data as a string value.
@tan( real_expr )
This function returns the
tangent of the angle (in radians) passed as a parameter.
@time
This function returns a string
of the form "HH:MM:SS xM" (x= A or P) denoting the time at the point
this function was called (according to the system clock).
16.16.5
String functions:
@delete( str_expr, int_start, int_len )
This function returns a string
consisting of the str_expr
passed as a parameter with ( possibly) some characters removed. This function removes int_len characters from the string starting at index int_start (note that strings have a starting index of zero).
@index( str_expr1, int_start, str_expr2 )
This function searches for str_expr2 within str_expr1 starting at character position int_start within str_expr1. If
the string is found, this function returns the index into str1_expr1 of the first match (starting at int_start).
This function returns -1 if there is no match.
@insert( str_expr1, int_start, str_expr2 )
This function insert str_expr2 into str_expr1 just before the character at index int_start.
@length( str_expr )
This function returns the
length of the specified string.
@lowercase( str_expr, int_start )
This function returns a string
of characters from str_expr
with all uppercase alphabetic characters converted to lower case. Only those characters from int_start on are copied into the result string.
@rindex( str_expr1, int_start, str_expr2 )
Similar to the index function,
but this function searches for the last occurrence of str_expr2 in str_expr1 rather than the first occurrence.
@strbrk( str_expr, int_start, cset_expr )
This function returns the index
of the first character beyond int_start in str_expr
that is a member of the cset_expr
parameter. This function returns
-1 if none of the characters are in the set.
@strset( char_expr, int_len )
This function returns a string
consisting of int_len
copies of char_expr.
@strspan( str_expr, int_start, cset_expr )
This function returns the index
of the first character beyond position int_start in str_expr that is not a member of the cset_expr parameter. This function returns -1 if all of the
characters are in the set.
@substr( str_expr, int_start, int_len )
This function returns the
substring specified by the starting position and length in str_expr.
@tokenize( str_expr, int_start, cset_delims,
cset_quotes )
This function returns an array
of strings obtained by doing a lexical scan of the str_expr passed as a parameter (starting at character index
int_start). Each element of this array consists of
all characters between any sequence of delimiter characters (specified by the cset_delims parameter).
The only exceptions are strings appearing between bracketing (quoting)
symbols. The fourth parameter
specifies the possible bracketing characters. If cset_quotes
contains a quotation mark (") then all sequences of characters between a
pair of quotes will be treated as a single string. Similarly, if cset_quotes contains an apostrophe, then all characters
between a pair of apostrophes will be treated as a single string. If the cset_quotes parameters contains one of the pairs "("
/ ")", "{" / "}", or "[" /
"]" (both characters from a given pair must be present), then Tokenize will consider all characters between these
bracketing symbols to be a single string.
You should use the @elements
function to determine how many strings are present in the resulting array of
strings (this will always be a one-dimensional array, although it is possible
for it to have zero elements).
@trim( str_expr, int_start )
This function returns a string
consisting of the characters in str_expr starting at position int_start with all leading and trailing whitespace removed.
@uppercase( str_expr, int_start )
This function returns a string
consisting of the characters in str_expr starting at position int_start with all lower case alphabetic character converted
to uppercase.
16.16.6
String/Pattern matching functions
The HLA string/pattern matching
functions all attempt to match a string against a pattern. These functions all return a boolean
result indicating success or failure (i.e., whether the string matches the
pattern).
Most of these funtions have two
optional parameters: Remainder
and Matched. If the function succeeds it generally
copies the matched string into the VAL string constant specified by the Matched parameter and it copies all the characters in the InputStr parameter the follow the matched text into the Remainder parameter.
You may specify the Remainder parameter without also specifying the Matched parameter, but if you need the Matched result, you must specify all the parameters. The Remainder and Matched parameters appear in italics in all of the
following functions to denote that they are optional.
If the function fails, the
values of the Remainder
and Matched parameters are
generally undefined.
@peekCset( InputStr, charSet, Remainder, Matched )
This function checks the first
character of InputStr
to see if it is a member of charSet. The function returns
true/false depending on the result of the set membership test. If the function succeeds, it copies the
value of the InputStr
parameter to Remainder
and creates a single character string from the first character of InputStr and stores this into Matched.
@oneCset( InputStr, charSet, Remainder, Matched )
This function checks the first
character of InputStr
to see if it is a member of charSet. The function returns
true/false depending on the result of the set membership test. If the function succeeds, it copies all
characters but the first of InputStr parameter to Remainder
and copies the first character of InputStr into Matched.
@uptoCset( InputStr, charSet, Remainder, Matched )
This function matches all
characters up to, but not including, a single character from the charSet character set parameter. If the InputStr parameter does not contain a character in the specified cset, this
function fails. If it succeeds, it
copies all the matched characters (not including the character in the cset) to
the Matched parameter and it
copies all remaining characters (including the character in the cset) to the Remainder parameter.
@zeroOrOneCset( InputStr, charSet, Remainder, Matched )
If the first character of InputStr is a member of charSet, this function succeeds and returns that character
in the Matched parameter. It also returns the remaining
characters in the string in the Remainder parameter.
This function always succeeds
(since it matches zero characters).
If the first character of InputStr is not in charSet, then this function returns InputStr in Remainder and returns the empty string in Matched.
@exactlynCset( InputStr, charSet, n, Remainder, Matched )
This function returns true if
the first ’n’
characters of InputStr
are in the cset specified by charSet. The n+1st character must not be in the character set
specified by charSet. If this function succeeds (i.e.,
returns true), then it copies the first n characters to the Matched string and it copies all remaining characters into
the Remainder string. If this function fails and returns
false, Remainder and Matched are undefined.
@firstnCset( InputStr, charSet, n, Remainder, Matched )
This function is very similar
to exactlyncset except it
doesn’t require that the n+1st character not be a member of the charSet set.
If the first n
characters of InputStr
are in charSet, this function
succeeds (returning true) and copies those n characters into the Matched string;
it also copies any following characters into the Remainder string.
@nOrLessCset( InputStr, charSet, n, Remainder, Matched )
This function always
succeeds. It will match between
zero and n
characters in InputStr
from the charSet
set. The n+1st character may be in charSet, this function doesn’t care and only matches upto
the nth character. This function
copies up to n matched
characters to the Matched
string (the empty string if it matches zero characters); the remaining characters in the string
are copied to the Remainder
parameter.
@nOrMoreCset( InputStr, charSet, n, Remainder, Matched )
This function succeeds if it
matches at least n
characters from InputStr
against the charSet
set. It returns false if there are
fewer than n
characters from charSet at
the beginning of InputStr. If this function succeeds, it copies
the characters it matches to the Matched string and all characters after that sequence to
the Remainder string.
@ntomCset( InputStr, charSet, n, Remainder, Matched )
This function succeeds if InputStr begins with at least n characters from charSet. If
additional characters in InputStr are in this set, ntomcset
will match up to m
characters (n < m). It will not match any additional
characters beyond the mth character, although those characters may be in the
charSet set without
affecting the success/failure of this routine. If this routine succeeds, it copies all the characters it
matches to the Matched
parameter and any remaining characters to the Remainder parameter.
@exactlyntomCset( InputStr, charSet, n, Remainder, Matched )
Similar to the ntomcset function, except this function fails if more than
’m’ characters at the
beginning of InputStr
are in the specified character set.
@zeroOrMoreCset( InputStr, charSet, Remainder, Matched )
This function always
succeeds. If the first character
of InputStr is not in charSet, this function copies InputStr to Remainder, sets matched to the empty string, and returns true. If some sequence of characters at the
beginning of InputStr are in charSet, this function copies those characters to Matched and copies the following characters to Remainder.
@oneOrMoreCset( InputStr, charSet, Remainder, Matched )
This function succeeds if InputStr begins with at least one character from charSet. It
will match all characters at the beginning of InputStr that are members of charSet. It
copies the matched chars to the Matched string and any remaining characters to the Remainder string.
It fails if the first character of InputStr is not a member of charSet.
@peekChar( InputStr, Character, Remainder, Matched )
This function succeeds if the
first character of InputStr
matches Character. If it succeeds, it copies the character
to the Matched string and copies
the entire string (including the first character) to Remainder.
@oneChar( InputStr, Character, Remainder, Matched )
This function succeeds if the
first character if InputStr
is equal to Character. If it succeeds, it copies the matched
character to Matched
and any remaining characters to Remainder. If
it fails, then Remainder
and Matched are undefined.
@uptoChar( InputStr, Character, Remainder, Matched )
This function matches all
characters up to, but not including, the specified character. If fails if the specified character is
not in the InputStr
string. If this function succeeds
and returns true, it copies the matched character to the Matched string and copies all remaining characters to the Remainder string (the Remainder string will begin with the value found in Character). If
this function fails, it leaves Remainder and Matched
undefined.
@zeroOrOneChar( InputStr, Character, Remainder, Matched )
This function always succeeds
since it can match zero characters.
If the first character of InputStr is not equal to Character, this function returns true and sets Remainder equal to InputStr and sets Matched to the empty string. If the first character of InputStr is equal to Character, then this function returns that character in Matched and returns any remaining characters from InputStr in Remainder.
@zeroOrMoreChar( InputStr, Character, Remainder, Matched )
This function always succeeds
since it can match zero characters.
If the first character of InputStr is not equal to Character, this function returns true and sets Remainder equal to InputStr and setsMatched to the empty string. If InputStr
begins with a sequence of characters that are all equal to Character, then this function returns those characters in Matched and returns any remaining characters from InputStr in Remainder.
@oneOrMoreChar( InputStr, Character, Remainder, Matched )
This function always succeeds
since it can match zero characters.
If the first character of InputStr is not equal to Character, this function returns true and sets Remainder equal to InputStr and sets Matched to the empty string. If InputStr
begins with a sequence of characters that are all equal to Character, then this function returns those characters in Matched and returns any remaining characters from InputStr in Remainder.
@exactlynChar( InputStr, Character, n, Remainder, Matched )
This function returns true if
the first ’n’
characters of InputStr
are equal to Character. The n+1st character cannot be equal to Character. If
this function succeeds, it returns a string consisting of ’n’ copies of Character in Matched and returns any remaining characters in Remainder. Matched and Remainder are undefined if this function returns false.
@firstnChar( InputStr, Character, n, Remainder, Matched )
This function returns true if
the first ’n’
characters of InputStr
are equal to Character. The n+1st character may or may not be equal to Character. If
this function succeeds, it returns a string consisting of ’n’ copies of Character in Matched and returns any remaining characters in Remainder.
@nOrLessChar( InputStr, Character, n, Remainder, Matched )
This function always returns
true. It matches up to ’n’ copies of Character at the beginning of InputStr. More
than n characters can be equal
to Character and this
routine will still succeed.
However, this routine only matches the first n copies of Character in InputStr. It
copies the matched characters to the Matched string and copies any remaining characters to the Remainder string.
@nOrMoreChar( InputStr, Character, n, Remainder, Matched )
The normorechar function matches any string that begins with at
least n copies of Character. If
it succeeds, it copies the sequence of Character chars to the Matched string and copies any remaining characters (that
must begin with something other than Character) to the Remainder string.
This function fails and returns false if the string doesn’t begin with
at least ’n’ copies
of Character. Note that Remainder and Matched are undefined if this function fails.
@ntomChar( InputStr, Character, n, m, Remainder, Matched )
This function returns true if
the first ’n’
characters of InputStr
are equal to Character. It will
match up to m
characters (m >= n). The m+st character does not have to be different than Character, although this function will match, at most, m characters.
If this function succeeds, it copies the matched characters to the Matched string and any following characters to the Remainder string.
If this function fails and returns false, the values of Matched and Remainder are undefined.
@exactlyntomChar( InputStr, Character, n, m, Remainder, Matched )
This function succeeds and
returns true if there are at least n copies of Character at the beginning of InputStr and no more than m copies of Character at the beginning of InputStr. If
this function succeeds, it copies the matched characters at the beginning of InputStr to the Matched parameter and any following characters to the Remainder parameter.
If this function fails, the values of Remainder and Matched are undefined upon return.
@peekiChar
@oneiChar
@uptoiChar
@zeroOrOneiChar
@zeroOrMoreiChar
@oneOrMoreiChar
@exactlyniChar
@firstniChar
@nOrLessiChar
@nOrMoreiChar
@ntomiChar
@exactlyntomiChar
These functions use the same
syntax as the standard xxxxxChar functions.
The difference is that these function do a case insensitive comparison
of the Character parameter with
the InputStr parameter.
@matchStr( InputStr, String, Remainder, Matched )
This function checks to see if
the string specified by String
appears as the first set of characters at the beginning of InputStr. This
function returns true if InputStr begins with String. If this function succeeds, it copies String to Matched and any following characters to Remainder.
@matchiStr( InputStr, String, Remainder, Matched )
Just like @matchStr except this function does a case insenstive
comparison.
@uptoStr( InputStr, String, Remainder, Matched )
The uptoStr function matches all characters in InputStr up to,
but not including, the string specified by "String".
If it succeeds, it copies all the matched characters (not including the
string specified by ’String’)
into the Matched
parameter an any following characters to Remainder. If
this function returns false, the values of Remainder and Matched are undefined.
@uptoiStr( InputStr, String, Remainder, Matched )
Same as @uptoStr function except that this function does a case
insensitive comparison.
@matchToStr( InputStr, String, Remainder, Matched )
Similar to @uptoStr except this function matches all characters up to
and including the characters in the ’String’ parameter.
@matchToiStr( InputStr, String, Remainder, Matched )
Same as @matchToStr except this function does a case insensitive
comparison.
@matchID( InputStr, Remainder, Matched )
This is a special matching
function that matches characters in InputStr that correspond to an HLA identifier. That is, InputStr must begin with an alphabetic character or an
underscore and @matchID
will match all following alphanumeric or underscore characters. If this function succeeds by matching a
prefix of InputStr
that looks like an identifier, it copies the matched characters to Matched and all following characters to Remainder. This
function returns false if the first character of InputStr is not an underscore or an alphabetic
character. Note that the first
character beyond a matched identifier can be anything other than an
alphanumeric or underscore character and this function will still succeed.
@matchIntConst( InputStr, Remainder, Matched )
This function matches a string
of one or more decimal digit characters (i.e., an unsigned integer
constant). The Matched parameter, if present, must be an
"int32" VAL object. If @matchIntConst succeeds, it will convert the string to an integer
and copy this integer to the Matched parameter; it will also
copy any characters following the integer string to the Remainder parameter.
@matchRealConst( InputStr, Remainder, Matched )
This function matches a
sequence of characters at the beginning of InputStr that correspond to a real constant (note that a
simple sequence of digits, i.e., an integer, satisifies this). The number may have a leading plus or
minus sign followed by at least one decimal digit, an optional fractional part
and an optional exponent part (see the definition of an HLA real literal
constant for more details). If
this function succeeds, it converts the string to a real80 value and stores
this value into Matched
(which must be a real80 VAL object).
The characters after the matched string are copied into the Remainder parameter.
If this function fails, the values of Matched and Remainder are undefined.
@matchNumericConst( InputStr, Remainder, Matched )
This is a combination of @matchRealConst and @matchIntConst. It
checks the prefix of InputStr. If it corresponds to an integer
constant it will behave like @matchIntConst. If
the prefix string corresponds to a real constant, this function behaves like @matchRealConst. If
the prefix matches neither, this function returns false.
@matchStrConst( InputStr, Remainder, Matched )
This function matches a
sequence of characters that correspond to an HLA literal string constant. Note that such constants generally
contain quotes surrounding the string.
If this function returns true, it copies the matched string, minus the
quote delimiters, to the Matched parameter and it copies the following characters to the Remainder parameter.
If this function fails, those two paremeter values are undefined.
This function automatically
handles several idiosyncrases of HLA literal string constants. For example, if two adjacent quotes
appear within a string, @matchStrConst copies only a single quote to the Matched parameter. If two quoted strings appear at the beginning of
InputStr separated only
by whitespace (a space or any control character other than NUL), then this
function concatenates the two strings together. Likewise, any character objects (surrounded by apostrophes
or taking the form #ddd, #$hh, or #%bbbbbbbb where ddd is a decimal constant,
hh is a hexadecimal constant, and bbbbbbbb is a binary constant) are
automatically concatenated into the result string. See the definition of HLA literal constants for more
details.
@zeroOrMoreWS( InputStr, Remainder )
This function always
succeeds. It matches zero or more
whitespace characters (white space is defined here as a space or any control
character other than NUL [ASCII code zero]). This function copies any characters following the white
space characters to the Remainder
parameter (this could be the empty string).
@oneOrMoreWS( InputStr, Remainder )
This function matches one or
more whitespace characters (white space is defined here as a space or any
control character other than NUL [ASCII code zero]). If this function succeeds, it copies any characters
following the white space characters to the Remainder parameter.
If this function fails, the Remainder string’s value is undefined.
@WSorEOS( InputStr, Remainder )
This function always
succeeds. It matches zero or more
whitespace characters (white space is defined here as a space or any control
character) or the end of string token (a zero terminating byte). This function copies any characters
following the white space characters to the Remainder parameter (this could be the empty string if it
matches EOS or there is only white space at the end of the string).
@WSthenEOS( InputStr)
This function matches zero or
more whitespace characters (white space is defined here as a space or any
control character) immediately followed by the EOS token (a zero terminating
byte). Technically, it allows a Remainder parameter, but such a parameter will always be set
to the empty string if this function succeeds, so it’s hardly useful to supply
the parameter.
@peekWS( InputStr, Remainder )
This function returns true if
the first character if InputStr
is a white space character. If it
succeeds and the Remainder
parameter is present, this function copies InputStr to Remainder.
@EOS( InputStr )
This function returns true if InputStr is the empty string.
@reg( InputStr )
This function returns true if InputStr matches a valid register name.
@reg8( InputStr )
This function returns true if InputStr matches a valid eight-bit register name.
@reg16( InputStr )
This function returns true if InputStr matches a valid 16-bit register name.
@reg32( InputStr )
This function returns true if InputStr matches a valid 32-bit register name.
16.16.7
Symbol and constant related functions and assembler control functions
@name( identifier )
This function returns a string
of characters that corresponds to the name of the identifier (note: after
text/macro expansion). This is
useful inside macros when attempting to determine the name of a macro parameter
variable (e.g., for error messages, etc).
This function returns the empty string if the parameter is not an
identifier.
@type( identifier_or_expression )
This function returns a unique
integer value that specifies the type of the specified symbol. Unfortunately, this unique integer may
be different across assemblies. Do
not use this function when comparing types of objects in different source code
modules. This is a deprecated
function. Future versions of the assembler will return the same value as
@typename. Do not use this function in new code, and change any existing uses
to use @typename instead.
@typename( identifier_or_expression )
This function returns the
string name of the type of the identifier or constant expression. Examples include "int32",
"boolean", and "real80".
@basetype( identifier_or_expression )
Similar to @typename, except
this function returns the underlying primitive type for array and pointer
objects. For other types, it behaves just like @typename.
@ptype( identifier_or_expression )
This function returns a small
integer constant denoting the primitive type of the specified identifier or expression. Primitive types
would include things like int32, boolean, and real80. See the "hla.hhf" header file for the latest set
of constant definitions for pType.
At the time this was written, the definitions were:
//
pType constants.
hla.ptIllegal =
0
hla.ptBoolean =
1
hla.ptEnum =
2
hla.ptUns8 =
3
hla.ptUns16 =
4
hla.ptUns32 =
5
hla.ptByte =
6
hla.ptWord =
7
hla.ptDWord =
8
hla.ptInt8 =
9
hla.ptInt16 =
10
hla.ptInt32 =
11
hla.ptChar =
12
hla.ptReal32 =
13
hla.ptReal64 =
14
hla.ptReal80 =
15
hla.ptString =
16
hla.ptCset =
17
hla.ptArray =
18
hla.ptRecord =
19
hla.ptUnion =
20
hla.ptClass =
21
hla.ptProcptr =
22
hla.ptThunk =
23
hla.ptPointer =
24
hla.ptQWord =
25
hla.ptTByte =
26
hla.ptLabel =
27
hla.ptProc =
28
hla.ptMethod =
29
hla.ptClassProc =
30
hla.ptClassIter
=
31
hla.ptProgram =
32
hla.ptMacro =
33
hla.ptText =
34
hla.ptNamespace =
35
hla.ptSegment =
36
hla.ptAnonRec =
37
hla.ptVariant =
38
hla.ptError =
39
@baseptype( identifier_or_expression )
This function returns a small
integer constant denoting the underlying primitive type of the specified identifier or expression.
See the discussion for @ptype for details. The difference between @ptype and
@baseptype is that @baseptype returns the element type for arrays and the base
type for ptPointer types.
@class( identifier_or_expression )
This returns a symbol’s class
type. The class type is constant,
value, variable, static, etc., this has little to do with the class abstract
data type See the
"hla.hhf" header file for the current symbol class definitions. At the time this was written, the
definitions were:
hla.cIllegal =
0
hla.cConstant =
1
hla.cValue =
2
hla.cType =
3
hla.cVar =
4
hla.cParm =
5
hla.cStatic =
6
hla.cLabel =
7
hla.cMacro =
8
hla.cKeyword =
9
hla.cTerminator =
10
hla.cProgram =
11
hla.cProc =
12
hla.cClassProc = 13
hla.cMethod =
14
hla.cNamespace = 15
hla.cNone =
16
@size( identifier_or_expression )
This function returns the size,
in bytes, of the specified object.
@elementsize( identifier_or_expression )
This function returns the size,
in bytes, of an element of the specified array. If the parameter is not an array identifier, this function
generates an assembly-time error.
@offset( identifier )
For VAR, PARM, METHOD, and
class ITERATOR objects only, this function returns the integer offset into the
activation record (or object record) of the specified symbol.
@staticname( identifier )
For STATIC objects, procedures,
methods, iterators, and external objects, this function returns a string
specifying the "static" name of that string. This is the name that HLA emits to the
assembly output file for certain objects.
@lex( identifier )
This function returns an
integer constant specifying the static lexical nesting for the specified
symbol. Variables declared in the
main program have a lex level of zero.
Variables declared in procedures (etc.) that are in the main program
have a lex level of one. This
function is useful as an index into the _display_ array when accessing
non-local variables.
@IsExternal( identifier )
This function returns true if
the specified identifier is an external symbol.
@arity( identifier_or_expression )
This function returns zero if
the specified identifier is not an array.
Otherwise it returns the number of dimension of that array.
@dim( array_identifier_or_expression )
This function returns a single
array of integers with one element for each dimension of the array passed as a
parameter. Each element of the
array returned by this function gives the number of elements in the specified
dimension. For example, given the
following code:
val threeD: int32[ 2, 4, 6];
tdDims:= @dim( threeD );
The tdDims constant would be an array with the three elements
[2, 4, 6];
@elements( array_identifier_or_expression )
This function returns the total
number of elements in the specified array. For multi-dimensional array constants, this function returns
the number of all elements, not just a particular row or column.
@defined( identifier )
This function returns true if
the specified identifier is has been previously defined in the program and is
currently in scope.
@pclass( identifier )
If the specified identifer is a
parameter, this function returns a small integer indicating how the parameter
was passed to the function. These
constants are defined in the hla.hhf header file. At this time this document was written, these constants had
the following values.
hla.illegal_pc := 0;
hla.valp_pc := 1;
hla.refp_pc := 2;
hla.vrp_pc := 3;
hla.result_pc := 4;
hla.name_pc := 5;
hla.lazy_pc := 6;
valp_pc means pass by
value. refp_pc means pass by
reference. vrp_pc means pass by value/result (value/returned). result_pc means pass by result. name_pc
means pass by name. lazy_pc means pass by lazy evaluation.
@localsyms( record_union_procedure_method_or_iterator_identifier )
This function returns an array
of string listing the local names associated with the argument. If the argument is a record or union
object, the elements of the string array contain the field names for the specified
record or union. Note that the
field names appear in their declaration order (that is, element zero contains
the name of the first field, element one contains the name of the second field,
etc.).
If the argument is a procedure,
method, or iterator, the string array this function returns is a list of all
the local identifiers in that program unit. Note that the local object names appear in the reverse order
of their declarations (that is, element zero contains the name of the last
local name in the program unit, element one contains the second identifier,
etc.). Note that parameters are
consider local identifiers and will appear in this array. Also note that HLA automatically
predefines several symbols when you declare a program unit, those HLA declared
symbols also appear in the array of strings @localsyms creates.
Currently, @localsyms does not
allow namespace, program, or class identifiers. This restriction may be lifted in the future if there is
sufficient need.
@isconst( expr )
This function returns true if
the specified parameter is a constant identifier or expression.
@isreg( expr )
This function returns true if
the specified parameter is one of the 80x86 general purpose registers. It returns false otherwise.
@isreg8( expr )
This function returns true if
the specified parameter is one of the 80x86 eight-bit general purpose
registers. It returns false
otherwise.
@isreg16( expr )
This function returns true if
the specified parameter is one of the 80x86 16-bit general purpose
registers. It returns false
otherwise.
@isreg32( expr )
This function returns true if
the specified parameter is one of the 80x86 32-bit general purpose
registers. It returns false
otherwise.
@isfreg( expr )
This function returns true if
the specified parameter is one of the 80x86 FPU registers. It returns false otherwise.
@ismem( expr )
This function returns true if
the specified expression is a memory address.
@isclass( expr )
This function returns true if
the specified parameter is a class or a class object.
@istype( identifier )
This function returns true if
the specified identifier is a type id.
@linenumber
This function returns the
current line number in the source file.
@filename
This function returns the name
of the current source file.
@curlex
This function returns the
current static lex level (e.g., zero for the main program).
@curoffset
This function returns the
current VAR offset within the activation record.
@curdir
This function returns +1 if
processing parameters, it returns -1 otherwise. This corresponds to whether variable offsets are increasing
or decreasing in an activation record during compilation. This function also
returns +1 when processing fields in a record or class. This function returns zero when
processing fields in a union.
@addofs1st
This function returns true when
processing local variables, it returns false when processing parameters and
record/class/union fields.
@lastobject
This function returns a string
containing the name of the last macro object processed.
@curobject
This function returns a string
containing the name of the last class object processed.
16.16.8
Pseudo-Variables
HLA provides several special
identifiers that act as functions in expressions and as variables in VAL
assignments. These
"pseudo-variables" let you control the code emission during
compilation. Typically, you would
use these pseudo-variables in a statement like "?@bound:=true;" in
order to set their values.
@parmoffset
This variable contains the the starting offset for
parameters. This is generally
eight for most procedures since the parameters start at offset eight. You can change this value during
assembly by assigning a value to this variable (e.g., ?@parmoffset = 10;). However, this activity is not
recommended except by advanced programmers.
@localoffset
This variable returns the
starting offset for local variables in an activation record. This is typically zero. You can change
this value during assembly by assigning a value to this variable (e.g.,
?@localoffset = -10;). However,
this activity is not recommended except by advanced programmers.
@basereg
This variable returns a string
containing either "ebp" or "esp". You assign either ebp or esp (the registers, not a string) to this
variable. This sets the base
register that HLA uses for automatic (VAR) variables. The default is ebp.
Examples:
?SaveBase :string := @basereg;
?@basereg := esp;
<< code that uses esp to access locals and
parameters>>
?@basereg := @text( SaveBase ); // Restore to
original register.
Note the use of @text to convert the string to an actual register
name. This must be done because
HLA only allows the assignment of the actual ebp/esp registers to @basereg, not a string.
@enumsize
This assembly time variable
specifies the size (in bytes) of enumerated objects. This has a default value of one.
@minparmsize
This assembly time variable has
the initial value four. You should
not change the value of this object when running under Win32, Linux, or other 32-bit OS.
@bound
This assembly time variable is
a boolean value that indicates whether HLA compiles the BOUND instruction into
actual machine code (or ignores the BOUND instruction).
@into
This assembly time variable is
a boolean value that indicates whether HLA compiles the INTO instruction into
actual machine code.
@exceptions
This assembly time variable
controls whether HLA emits full exception handling code or an abbreviated set
of routines. If this variable
contains true, then HLA emits the full exception handling code. If false, the HLA emits the minimal
amount of code to pass exceptions on to Windows or Linux. Note that this variable only affects
code generation in the main program, it does not affect the code generation in
a UNIT. This variable must be set
to true before the BEGIN clause associated with the main program if it is to
have any effect. Note that
including the EXCEPTS.HHF file automatically sets this to true; so you will have to explicitly set it
to false if you include this file (or some other file that includes
EXCEPTS.HHF, like STDLIB.HHF).
@optstring
By default, HLA folds string constants to generate better code. This means that whenever you ask the
compiler to emit code for a string constant like "Hello World" the
compiler will first check to see if it has already emitted such a string. If so, the compiler uses the reference
to the original string constant rather than emitting a second copy of the
string; this shortens the size of
your program if there are multiple occurrences of the same string in the
program. Since string constants
generally go into a read-only section of memory, the program cannot
accidentally change this unique occurrence. However, if you elect to make the CONSTS segment writable,
you might not want HLA to fold string constants in this manner. The @optstrings pseudo-variable lets
you control this optimization. If
@optstrings is true (the default condition), then HLA folds all duplicate
string constants; if @optstrings
is false, then HLA emits duplicate strings to the CONSTS section.
@trace
This boolean variable controls
the emission of "trace" statements by the HLA compiler. This feature is offered in lieu of a
decent debugger for tracing through HLA programs. When this variable is false (the default), HLA emits the
code you specify. However, if you
set this compile-time variable to true, HLA emits the following code before
most statements in the program:
_traceLine_( filename, linenumber );
The filename parameter is a
string the specifies the current filename HLA is processing. The linenumber parameter is an uns32
value that specifies the current line number in the file. You are responsible for supplying the
"_traceLine_"
procedure somewhere in your program.
Here’s a typical implementation:
procedure trace( filename:string;
linenumber:uns32 ); @external( "_traceLine_" );
procedure trace(
filename:string; linenumber:uns32 ); @nodisplay;
begin trace;
pushfd();
// This function must preserve all registers and flags!
stdout.put( filename, ": #", linenumber,
nl );
popfd();
end trace;
As the comments above note, it
is your responsibility to preserve all registers and flags in the _traceLine_ procedure.
If you fail to do this, it will corrupt those values in the code that
calls _traceLine_.
A common operation inside the _traceLine_ procedure is to display register values. Don’t forget that EBP’s and ESP’s values
are modified by this call.
Furthermore, if you do any processing whatsoever at all, the flag values
will change. To obtain EBP’s value
prior to the call, fetch the dword at address [EBP+0]. To obtain ESP’s value, take the value
of EBP inside _traceLine_
and subtract 16 from it (EBP, return address, and eight bytes of parameters are
on the stack). Obviously if you
build _traceLine_’s activation
record yourself, these values can change.
To display the flag values, access the copy of the FLAGs register you
pushed on the stack (at offset [EBP-4] in the code above).
In addition to simply
displaying values, you can write some very sophisticated debugging routines
that let you set breakpoints, watch values, and so on. Someday the HLA Standard Library
will include some trace support functions, until then have fun doing whatever
you want.
16.16.9
Text emission functions
@text( str_expr )
This function replaces itself
with the text of the specified parameter.
The result is then processed by HLA. E.g.,
@text( "mov( 0, eax );" );
The above is equivalent to the
single move instruction.
@string:identifier
The identifier must be a
constant of type text. HLA
replaces this item with the string data assigned to the text object. Note that
this operation is deprecated. HLA now allows @string( textVal ) to convert a text
object to a string value.
@tostring:identifier
Like @string:identifier, the
identifier must be a constant of type text. Also like @string:identifier, HLA replaces this item with
the string data assigned to the text object. However, this function also converts identifier from a text to a string object.
16.16.10 Miscellaneous
Functions
@section
This function returns a 32-bit
bitmap that identifies the current point in the source. Identification is as follows:
Bit 0: Currently processing the
CONST section.
Bit 1: Currently processing the
VAL section.
Bit 2: Currently processing the
TYPE section.
Bit 3: Currently processing the
VAR section.
Bit 4: Currently processing the
STATIC section.
Bit 5: Currently processing the
READONLY section.
Bit 6: Currently processing the
STORAGE section.
Bit 12: Currently processing statements
in the "main" program.
Bit 13: Currently processing statements
in a procedure.
Bit 14: Currently processing statements
in a method.
Bit 15: Currently processing statements
in an iterator.
Bit 16: Currently processing statements
in a #macro.
Bit 17: Currently processing statements
in a #keyword macro.
Bit 18: Currently processing statements
in a #terminator macro.
Bit 19: Currently processing statements
in a thunk.
Bit 23: Currently processing statements
in a Unit.
Bit 24: Currently processing statements
in a Program.
Bit 25: Currently processing statements
in a record.
Bit 26: Currently processing statements
in a union.
Bit 27: Currently processing statements
in a class.
Bit 28: Currently processing statements
in a namespace.
This function is useful in
macros to determine if a macro expansion is legal at a given point in a
program.
16.16.11
#Text and #endtext Text Collection Directives
The #TEXT and #ENDTEXT
directives surround a block of text in an HLA program from which HLA will
create an array of string constants.
The syntax for these directives is:
#text( identifier )
<< arbitrary lines of text
>>
#endtext
The identifier must either be an undefined symbol or an object
declared in the VAL section.
This directive converts each
line of text between the #TEXT and #ENDTEXT directives into a string and then
builds an array of strings from all this text. After building the array of strings, HLA assigns this array
to the identifier symbol. This is
a VAL constant array of strings.
The #TEXT..#ENDTEXT directives may appear anywhere in the program where
white space is allowed.
Although these directives
provide an easy way to initialize a constant array of strings, the real purpose
for these directives is to allow the inclusion of Domain Specific Embedded Language (DSEL) text within an HLA
program. Presumably, a parser
(written with macros, regular expression macros, and the HLA compile-time
language) would process the statements between the #TEXT and #ENDTEXT
directives.
16.16.12
#String and #endstring Text Collection Directives
The #STRING and #ENDSTRING
directives surround a block of text in an HLA program from which HLA will
create an a single string constant.
The syntax for these directives is:
#string( identifier )
<< arbitrary lines of text
>>
#endstring
The identifier must either be an undefined symbol or an object
declared in the VAL section.
These directives are similar in
principle to the #text..#endtext directives except that they produce a single
string (including new line characters) holding the entire block of text rather
than an array of strings.
Although these directives
provide an easy way to initialize a string, the real purpose for these
directives is to allow the inclusion of
Domain Specific Embedded Language (DSEL) text within an HLA program. Presumably, a parser (written with
macros, regular expression macros, and the HLA compile-time language) would
process the statements between the #STRING and #ENDSTRING directives.
16.16.13
Regular Expression Macros and the @match/@match2 Functions
HLA v1.87 introduced a new (and
very powerful) macro form known as regular expression macros.
Regular expression macros contain sequences of pattern-matching
statements that you can use to determine if some string takes a particular
form. With HLA’s regular expression macros and the attendant @match and @match2
functions, you can develop sophisticated language processors inside HLA and
specify whatever syntax you like (well, within certain bounds) for those
languages.
Technical Note: although these features are called “regular
expression macros”, the purists out there will note that “regular expression”
is actually a misnomer here. HLA’s
regular expression macros actually handle a subset of the context-free
languages. This language facility is called “regular expression macros” because
most programmers, even those not intimately familiar with automata theory,
recognize the term and associate “pattern matching” with the term. Hence the
use of the term “regular expression” when “context-free grammar” would probably
be a better choice. For those of you who aren’t intimately familiar with
automata theory design, fear not: the context-free languages are a proper
subset of the regular languages and you’re not getting short-changed here.
HLA’s “regular expression” macros will actually handle all the stuff you can do
with a regular expression, and more.
Before describing the syntax
for a regular expression macro, it’s probably best to begin by discussing how
you use them in a program. This will better motivate you when this document
actually discusses the regular expression syntax.
Regular expressions are used
for pattern matching.[26]
Generally, a regular expression is applied to some string of text and a
boolean “success (matched) /
failure (no match)” result comes back from the operation. The HLA compile-time
function @match (and @match2) is how you achieve this task. The basic syntax
for the @match[27]
function is the following:
@match( stringToMatch, RegexMacroName,
ReturnsResult, Remainder, MatchedString )
This function returns the
boolean result true if the regular expression specified by RegexMacroName
matches some prefix of the string stringToMatch. The remaining three arguments are optional, though if one argument is present then any
preceding arguments must also be present.
The optional ReturnsResult argument
must be an HLA VAL identifier. The @match function will store a special
“#return” string into this VAL object. We’ll take a look at what a “#return”
string is a little later in this documentation. For now, suffice to say that
this is the “text” that the regex macro expands into (regex macros do not
expand in-place as standard HLA macros do). If this argument is not present and
the regex macro produces a “#return” string, then HLA simply throws away
the associate string data.
The optional Remainder argument
must be an HLA VAL identifier. If this argument is present, then the
ReturnsResult argument must also be present. This argument is identical to the
“remainder” arguments of the string matching functions given earlier. When
matching stringToMatch with RegexMacroName, the regex macro might not match the
entire string, only a prefix of the string (this is still a successful match).
Any remaining characters that are not matched once @match exhausts the regular
expression are collected and stored into the Remainder argument,, if it is
present. @match will not generate this string if you do not pass the Remainder
argument (and the string information is simply thrown away at that point).
The optional MatchedString
argument must be an HLA VAL identifier. If this argument is present, then the
Remainder and ReturnsResult
arguments must also be present. This argument is identical to the “matched”
arguments of the string matching functions given earlier. If the regular
expression macro successfully matches stringToMatch, then @match will store a
copy of the sequence that has been matched into this VAL argument.
Note that if the @match
function returns false, because RegexMacroName failed to match the characters
in stringToMatch, then @match will not disturb the existing values of the
ReturnsResult, Remainder, and MatchedString parameters. Therefore, you should
only expect those arguments to contain reasonable values if @match returns
true.
16.16.13.1 #regex..#endregex
The syntax for a regular
expression macro is very similar to a standard macro declaration. Here is the
basic form:
#regex macroName (
optional_parameter_list ) : optional_locals_list;
<< regex body >>
#endregex
The optional_parameter_list and
optional_locals_list items are identical (in syntax) to a macro declaration.
The following #regex statements demonstrate some of the legal permutations:
#regex noParmsOrLocals;
#regex onParmNoLocals( oneParm
);
#regex oneLocalNoParms:oneLocal;
#regex variableParms( a, b, c[]
);
#regex stringParms( string
parms );
It’s actually a somewhat rare
occurrence for a regular expression macro to have parameters. The semantics for
parameters (and locals) are different for compiled and precompiled regular
expression macros. Therefore, it’s a good idea to avoid using parameters unless they are absolutely
necessary.
16.16.13.3 The body of a #regex macro consists of zero or more regular expression
items following by an optional #return clause. If the regular expression body
is empty, then the regular expression will match the empty string, which means
it will match any string appearing in
an @match function call.
The section Regular Expression Elements
describes the exact syntax for the body of a regular
expression macro. The next section
describes the optional #return clause.
16.16.13.2 The #return Clause
A #regex macro declaration may
optionally contain a #return clause immediately after the regular expression
body (and immediately before the #endregex clause). The #return clause
specifies a string expression to return (via the ReturnsResult argument in the
@match function call). Here is a typical example:
#regex newMov;
<<body for newMov>>
#returns “mov( eax, ebx )”
#endregex
Note that an arbitrary HLA
string expression is legal after the #returns clause, not just a simple literal
constant. So you can use the concatentation operation (+) or any other HLA
compile-time string functions to build up the #return string. Note that there is no semicolon at the
end of the string expression. The #endregex properly terminates the string
expression.
If no #return clause is present
in a #regex macro, then that #regex macro returns the empty string as the
#return string result.
The main purpose for the
#return clause is to return some text to expand in the invoking code should the
@match function succeed. Unlike standard macros, you cannot expect to be able
to arbitrarily expand text found in a #regex macro because you only “invoke”
#regex macros in an @match function call, and those generally appear in a
compile-time boolean expression. For example, if the #regex macro above
directly emitted the mov instruction during the invocation of this macro, you’d
get syntax errors whenever you made calls like:
#if( @match( “Hello World”,
newMov ))
.
.
#endif
because HLA would emit the
“mov” instruction right into the boolean expression associated with the #if
statement (which is syntactically incorrect). By putting the #return value into a string and returning
that string result, the system can defer the expansion of the text until the
caller gets to an appropriate context, e.g., (from earlier)
#if( @match( “Hello World”,
newMov, returnResult ))
@text( returnResult );
#endif
This example expands the “mov(
eax, ebx )” instruction if and only if the pattern matches “Hello World”.
If you would like the default
situation to be “expand text if match” then it’s easy enough to write a macro
to do this job for you:
#macro expand( theStr, theRegex
):returnResult;
#if( @match( theStr, theRegex, returnResult ))
@text( returnResult );
#endif
#endmacro
.
.
expand( “Hello World”, newMov
);
The return string is
automatically processed by the #match(regex)..#endmatch block. See the
description of #match..#endmatch for more details.
The “meat” of a regular
expression macro is the sequence of regular expression elements that appear in
a #regex macro body. Each element in a regular expression body can match a part
of the source string. The following subsections describe each regular expression
element in detail.
With only a couple exceptions
(that will be noted as they arrive), each time a regular expression element
matches a character in the source string (the first parameter provided to
@match), the match operation consumes that character. For example, if the source string is “Hello World” and
the first regular expression element matches the single character ‘H’, then ‘H’
is consumed from the source string (yielding “ello World”) and further regular
expression elements operate on that remainder of the string.
16.16.13.3.1 Kleene Star, Plus, and Numeric Range Specifications
Most regular expression
elements we’re about to explore match a single instance of themselves. For
example, a literal character constant in the body of a regular expression macro
will match a single character in the source string (see the next section). You
can modify this match operation by supplying one of the following suffixes to
the literal character constant.
Suffix |
Meaning |
* |
(Kleene star) Matches zero or
more occurrences of the preceding operand. |
+ |
(Kleene plus) Matches one or
more occurrences of the preceding operand. |
:[n] |
Matches exactly n occurrences of
the preceding operand. ‘n’ must be a reasonably-valued unsigned integer
constant expression. |
:[n,m] |
Matches between n and m
occurrences of the preceding operand. ‘n’ and ‘m’ must be reasonable unsigned
integer constants with n<m. |
:[n,*] |
Matches n or more occurrences of
the preceding operand. ‘n’ must be a reasonably-valued unsigned integer
constant expression. |
Examples:
‘c’* Matches zero or more ‘c’ characters.
‘c’+ Matches one or more ‘c’ characters.
‘c’:[4] Matches exactly four
‘c’ characters.
‘c’:[4,6] matches between four
and six ‘c’ characters.
‘c’:[4,*] Matches four or more
‘c’ characters.
Exceptions to this syntax will
be noted whenever they occur.
16.16.13.3.2 Matching Characters in a Regular Expression
A character literal constant
within a #regex body matches the corresponding character in the source string.
For example, the following regular expression macro matches a string beginning
with the single character ‘c’:
#regex matchesC;
‘c’
#endregex
Note that this form only allows
a single character constant. In particular, you cannot specify an arbitrary HLA
character expression. However, you
can also use the HLA @matchChar (synonym: @oneChar) function in a regular
expression body to specify a character expression. @matchChar requires a single
parameter that must evaluate to a single character. For example,
#regex matchesC;
@matchChar( char( uns8(‘b’) + 1)) // Matches ‘c’
#endregex
The single character match
operation consumes a single character from the beginning of the source string
if it successfully matches the first character of the source string.
Examples of character matching
repetition:
‘c’* Matches zero or more ‘c’ characters.
‘c’+ Matches one or more ‘c’ characters.
‘c’:[4] Matches exactly four
‘c’ characters.
‘c’:[4,6] matches between four
and six ‘c’ characters.
‘c’:[4,*] Matches four or more
‘c’ characters.
@matchChar( char( uns8(‘b’) +
1))* Matches zero or more ‘c’
characters
16.16.13.3.3 Case-insensitive Character Matching in a Regular Expression
You can perform a
case-insensitive character match by prefixing a literal character constant with
the “!” operation. For example, !’c’ matches either ‘c’ or ‘C’. Here is an explicit example:
#regex matchesCorc;
!’c’
#endregex
If you want to specify a character
expression rather than a single literal character constant, you can use the
@matchiChar function in a manner similar to @matchChar given earlier. This
operation also consumes a single character from the soruce string if a match
occurs.
Examples of character matching
repetition:
!‘c’* Matches zero or more ‘c’ or ‘C’ characters.
!‘c’+ Matches one or more ‘c’ or ‘C’ characters.
!‘c’:[4] Matches exactly four
‘c’ or ‘C’ characters.
!‘c’:[4,6] matches between four
and six ‘c’ or ‘C’ characters.
!‘c’:[4,*] Matches four or more
‘c’ or ‘C’ characters.
@matchiChar( char( uns8(‘b’) +
1))* Matches zero or more ‘c’ or
‘C’ characters
Note that repetitive matches
allow any combination of upper and lower case characters. For example, ‘c’+
will match the sequence “ccCcCCc”.
16.16.13.3.4 Negated Character Matching
Sometimes you’ll want to match
“anything but a given character.” The HLA #regex macro body provides a shortcut
for matching anything but a single character. By placing a minus sign in front
of a single literal character constant, you can tell HLA to match anything but
that character. E.g., -’c’ matches anything but the ‘c’ character. You can
combine this with the “!” operator to match anything but the upper or lower
case version of a character. For example, -!’c’ matches anything but ‘c’ or
‘C’.
There is no generic function
you can call like @matchChar or @matchiChar if you want to specify a character
expression rather than a character literal constant. However, you can easily
achieve the same effect by using negated character sets. See the discussion of
matching character sets a little later in this documentation.
If the first character of the
source string is not the specified literal constant, then this operation
consumes the first character of the source string.
Examples of character matching
repetition:
-‘c’* Matches zero or more characters that are not ‘c’.
-‘c’+ Matches one or more characters that are not ‘c’.
-‘c’:[4] Matches exactly four
characters that are not ‘c’.
-‘c’:[4,6] matches between four
and six characters that are not ‘c’.
-‘c’:[4,*] Matches four or more
characters that are not ‘c’.
16.16.13.3.5 String Matching in Regular Expressions
A string literal constant
within a #regex body matches the corresponding sequence of characters in the
source string. For example, the following regular expression macro matches a
string beginning with the sequence “str”:
#regex matchesC;
“str”
#endregex
Note that this form only allows
a single literal string constant. In particular, you cannot specify an
arbitrary HLA string expression.
However, you can also use the HLA @matchStr function in a regular
expression body to specify a string expression. @matchStr requires a single
parameter that must evaluate to a single string. For example,
#regex matchesHelloWorld;
@matchStr( “Hello “ + “World” ) // Matches “Hello
World”
#endregex
The string match operation
consumes one character from the source string for each character in the regular
expression element, but only if the match is completely successful. This is, if
the first few characters of the source string match the regular expression
element but not all the characters match, then the operation consumes no
characters.
Although it is not commonly
done, the repetition operations apply to string objects as well as characters.
Examples of string matching repetition:
“str”* Matches zero or more “str” sequences.
“str”+ Matches one or more “str” sequences.
“str”:[4] Matches exactly four
“str” sequences.
“str”:[4,6] matches between
four and six “str” sequences.
“str”:[4,*] Matches four or
more “str” sequences.
@matchStr( “Hello” + “ world”
)* Matches zero or more “Hello
world” sequences.
16.16.13.3.6 Case-insenstive String Matching in Regular Expressions
Like character matching, you
can do a case-insensitive string match by prefixing a string literal constant
with “!” or by using the @matchiStr function. E.g.,
#regex caseInsensitive;
@matchiStr( “Hello world” )
#endregex
Another example:
#regex caseInsensitive;
!“Hello world”
#endregex
Although it is not commonly
done, the repetition operations apply to string objects as well as characters.
Examples of case-insensitive string matching repetition:
!“str”* Matches zero or more “str” sequences
(case insensitive).
!“str”+ Matches one or more “str” sequences
(case insensitive).
!“str”:[4] Matches exactly four
“str” sequences (case insensitive).
!“str”:[4,6] matches between
four and six “str” sequences (case insensitive).
!“str”:[4,*] Matches four or
more “str” sequences (case insensitive).
@matchiStr( “Hello” + “ world”
)*
Matches zero or more
“Hello world” sequences (case insensitive).
16.16.13.3.7 Negated String Matching
You can put the “-” operator in
front of a string literal expression to specify that the match should fail if
the following characters match a given string. For example,
#regex caseInsensitive;
-“Hello world”
#endregex
will succeed as long as the
next 11 characters are not “Hello world”.
You can also apply the case-insenstive operator to this sequence,,,
e.g., -!”Hello worrld”.
Note: negated string matching never consumes any
characters from the source string. That is, once this pattern succeeds, the
source string contains the same data it did before the match operation.
Character consumption doesn’t make sense for this operation because the source
string could actually be shorter than the negated match string (in which case
we still want the pattern to succeed because the source string doesn’t begin
with the negated string).
The repetition operators to not
apply to negated string matching operations.
16.16.13.3.8 String List Matching
The following regular
expression syntax tells HLA to successfully match if any one of a list of
strings matches the front of the source string:
[ “string1”, “string2”, ...,
“stringn” ]
The match operation fails only
if all the strings in the list fail to match the front of the source string. If
multiple strings match the start of the source string, then the first string in
the list is the one that will match. So if you want a maximal match, put the
longest strings at the beginning of the list, e.g.,
[ “these”, “the”, “th” ]
Similarly, if you want a
minimal match, put the shortest strings first in the list.
If this operation succeeds,
then it consumes the matching characters from the source string.
The repetition operators to not
apply to string list matching operations. If you really need this capability,
use the alternation operator (discussed later).
16.16.13.3.9 Character Set Matching in a Regular Expression
A character set literal
constant within a #regex body matches a character from the set in the source
string. For example, the following regular expression macro matches a string
beginning with any of the character ‘c’, ‘s’, or ‘t’:
#regex matchesC;
{‘c’, ‘s’, ‘e’, ‘t’}
#endregex
Note that this form only allows
a single character set constant. In particular, you cannot specify an arbitrary
HLA character set expression.
However, you can also use the HLA @matchCset (synonym: @oneCset) function
in a regular expression body to specify a character set expression. @matchCset
requires a single parameter that must evaluate to a single character. For
example,
#regex matchesC;
@matchCset( -{‘c’,’C’} + numericCset ) // Matches
anything but ‘c’, ‘C’, or a digit
#endregex
The single character set match
operation consumes a single character from the beginning of the source string
if it successfully matches the first character of the source string.
Examples of character matching
repetition:
{’0’..’9’}* Matches zero or more digit characters.
{’0’..’9’}+ Matches one or more digit characters.
{’0’..’9’}:[4] Matches exactly
four decimal digit characters.
{’0’..’9’}:[4,6] matches
between four and six decimal digit characters.
{’0’..’9’}:[4,*] Matches four
or more digit characters.
@matchCset(
{“0123456789”})* Matches zero or
more digit characters
16.16.13.3.10 Negated Character Set Matching
Although you can use the
@matchCset function to specify a negated character set (e.g., @matchCset(
-someSet )), for simple literal character set constants HLA allows a shortcut
operation. Just put a minus sign in front of the literal character set
constant. E.g., -{‘c’, ‘C’,’d’,’D’} matches anything except upper/lower case C
and D.
16.16.13.3.11 Matching Arbitrary Characters
You can match a single
character (regardless of its value) using the negated empty character set
(i.e., -{}). However, HLA provides a shortcut for this – the period operator. A
period appearing in regular expression body will match any single character and
consume that character from the source string. It only fails if there are no
more characters in the source string.
.* Matches zero or more characters.
.+ Matches one or more characters.
.:[4] Matches exactly four
characters.
.:[4,6] matches between four
and six characters.
.:[4,*] Matches four or more
characters.
The .* pattern is useful at the
beginning of a pattern if you want to match some subsequent pattern anywhere in
the source string. The .* pattern will skip over any characters up to the desired pattern.
Note that there are some
performance issues (at compile time) concerning the use of the repeated “.”
operator in complex regular expressions. Please see the section on regular
expression performance later in this document.
16.16.13.3.12 Sequences (Concatenation) – The ‘,’ Operator
Most regular expressions will
consist of more than a single regular expression item. The “,” operator lets
you create a sequence of regular expression items in a regular expression
macro. The resulting regular expression is effectively a concatenation of the
match semantics. For example, consider the following regular expression macro:
#regex identifier;
{‘a’..’z’, ‘A’..’Z’, ‘_’}, {‘a’..’z’, ‘A’..’Z’,
‘_’}*
#endregex
This regular expression
matches a sequence of characters that begin with at least one alphabetic or
underscore character followed by zero or more alphanumeric or underscore characters (i.e., the
definition of an HLA identifier). Here is another example that matches signed
integer literal constants:
#regex intConst;
‘-’:[0,1], {‘0’..’9’}+
#endregex
The repetition operators do not
apply to sequences (they apply,
instead, to the last element of the regular expression sequence). See the
discussion of parentheses (“()”) for a way to apply a repetition to a sequence.
16.16.13.3.13 Alternation – The “|” Operator
The alternation operator (“|”)
lets HLA select from amongst several different alternative regular expression
elements. The basic syntax is:
RX1 | RX2
where RX1 and RX2 are two
regular expressions (e.g., the regular expression elements we’ve discussed thus
far). The @match function will try to match the first regular expression
against the source string. If this succeeds, then the whole expression succeeds
and the @match function ignores the second alternative. If matching the first
regular expression fails, then the @match function tries to match against the
second regular expression. The success or failure of the match is then based on
the result of this second match.
Because R | S is itself a
regular expression, recursively we can come up with an arbitrary list of
alternatives, e.g.,
RX1 | RX2 | RX3 | RX4 | ... |
RXn
The @match function will try
to match the first expression. If that fails it will try the second; if that
fails it will try the third, etc. If any of the n regular expressions succeeds,
then the alternation succeeds and @match ignores any remaining regular
expressions in the alternation expression. The alternation sequence fails only
if all the subpatterns fail. Note that the string list operator, [ “str1”,
“str2”, str3”, ..., “strn”] is just a shorthand for:
“str1” | “str2” | ... | “strn”
The repetition operators do not
apply to alternative sequences
(they apply, instead, to the last element of the alternation sequence). See the
discussion of parentheses (“()”) for a way to apply a repetition to an
alternation sequence.
16.16.13.3.14 Subexpressions – The “()” operator
Like arithmetic operators,
regular expression operators exhibit an operator precedence. The precedence
order is repetitive operators (e.g., “*” and “:[2]”), sequences (“,”), and
last, alternation (“|”). This precedence is natural and eliminates some
ambiguity that would otherwise be present in a regular expression. For example,
consider the following regular expression sequence:
‘c’, ‘d’ | ‘e’
Does this mean match the
string “cd” or “e” (that is, match ‘c’, ‘d’ or match ‘e’), or does this mean
match either of the strings “cd” or “cd” (that is, match ‘c’ followed by ‘d’ or
‘e’)? An argument could be made for either resolution of the ambiguity.
However, the ‘,’ operator has higher precedence than the “|” operator in HLA,
so the first possibility is the one that HLA uses (that is, it matches “cd” or
“e”).
No matter which choice is made
with respect to precedence, there will be situations where you need to override
the precedence. As for arithmetic expressions, you can use the parentheses to
override precedence. For example, if you really want to match “cd” or “ce” in
the previous example, you could rewrite the expression as follows:
‘c’, ( ‘d’ | ‘e’ )
You may apply the repetition
operators to a parenthetical regular expression. For example, the regular expression
‘c’, ( ‘d’ | ‘e’ )*
matches the character ‘c’
followed by a string of zero or more ‘d’ and ‘e’ characters.
Some regular expression items
don’t directly support the repetition operators. For example, sequences don’t
support the repetition operators (because
of precendence issues). You can use parentheses to overcome this
problem, e.g.,
( ‘a’, ‘b’, {‘c’,’d’}):+
matches a sequence of
characters containing “abc” or “abd” (or both) repeated one or more times.
Note: some operators don’t
support repetition because it just doesn’t make sense to do so. Be careful when
you force repetition on to an operation that doesn’t otherwise support it. It’s
very easy to create a regular expression that never succeeds, or always
succeeds, by misapplying the repetition operators.
16.16.13.3.15 Extracting Substrings – The Extraction Operator “<>:”
On occasion, you’ll want to
save some part of the source string you’ve matched. Granted, the @match
function has a “MatchedString” argument that returns the entire matched string,
but sometimes youll want to extract only a portion of the entire matched
string. The regular expression extraction operator lets you achieve this. The
extraction operator uses the following syntax:
< Regular_Expression_sequence >:identifier
For the purposes of pattern matching,
the extraction operator behaves exactly like the subexpression (parentheses)
operator. Everything between the two angle brackets (“<“ and “>”) is used
as a unit. If this sequence matches the source string, then the @match function
will extract the substring matched by this subexpression and store that string
into the compile-time variable specified by identifier. This identifier must either be a regular
expression macro parameter, a regular expression local symbol, or a global VAL
object.
One very common use of the
#return statement is to return some string composed of items processed by the
extraction operator. For example, if you want to create a LISP-like assembly
language, you could use a regular expression macro like the following (for the
“mov”, “add”, and “sub” instructions ):
#regex stmt:mnemonic, op1, op2;
‘(‘,
<[“mov””, “add”, “sub”]>:mnemonic, // Match
the mnemonic
‘,’,
<.*>:op1, // Everything up to the 2nd comma
is the 1st operand
‘,’,
<.*>:op2, // Everything up to the ‘)’ is the
2nd operand
‘)’
#return mnemonic + “(“ + op1 +
“,” + op2 + “)” //Construct HLA statement
#endmacro
16.16.13.3.16 Invoking Other #regex Macros in a Regular Expression
HLA’s #regex macros allow you
to call other #regex macros as though they were pattern matching functions.
This one feature alone is what
gives HLA’s “regular expressions” the power to handle many context-free grammars
(rather than being limited to just the regular language subset). If you include the name of some #regex
macro within a regular expression, the @match function will match the current
source string using that other regular expression and it’s success or failure
will determine if the match proceeds upon return from that other #regex macro.
Consider the following example:
#regex ID;
{‘a’..’z’, ‘A’..’Z’, ‘_’}, {‘a’..’z’, ‘A’..’Z’,
‘0’..’9’, ‘_’}*
#endregex
#regex arrayAccess;
ID, ‘[‘, {‘0’..’9’}+, ‘]’
#endregex
The arrayAccess regular
expression matches an identifier followed by a numeric constant surrounded by
braces, e.g., “myArray[4]”.
16.16.13.7 Regular expression invocations can even be recursive. However, you must
be careful not to create an infinitely recursive loop (that is, creating a
“left recursive” expression, using compiler terminology). Advanced HLA users
(and hopefully you are an advanced HLA user if you’re reading this stuff) might
think that they can use HLA’s conditional assembly directives (e.g., #if) to
halt the recursion. Though the compile-time language elements may appear in a
#regex macro, they don’t work the way you probably think that they do; in
particular, they cannot be used to terminate left recursion (see the section on
Compiling and Precompiling Regular Expressions
for
details on this issue). There primary ways to make decisions in regular
expressions is via success/failure and via alternation. Specfically, if you
have two regular expressions R and S, then the expression “R, S” will not
execute S if R fails. Similarly, the sequence “R | S” will not execute S if R
succeeds. If these two sequences are inside S, then you can stop infinite recursion via the
success or failure of R.
Eliminating left recursion (and
left factoring, another important operation for creating grammars that a
predictive parser like @match can use) is a subject well beyond the scope of
this manual. Pick up any decent compiler design text for details.
16.16.13.7 There are some important compile-time performance issues associated with
invoking regular expression macros from within another regular expression. see
the section Compiling and Precompiling Regular Expressions
for
more details.
16.16.13.3.17 Lookahead (peeking)
Sometimes when matching a
string, you’ll need to look ahead one or more characters to determine whether
you can satisfy the current regular expression. A classic example is the “less
than” operator in many programming languages (“<“). A simple regular
expression of the form ‘<‘ is insufficient because the next character might
be “=” or “>” (for languages that use “<>” to denote ‘not equals’,
such as HLA). Of course, with HLA’s regular expressions you could use use the
string list [“<=”, “<>”, “<“] to handle this specific match, but in
general you might want the ability to lookahead a character or two before
deciding if you’re going to succeed. This is accompished using the peek
operator and functions.
For literal constants,
prefacing the constant with “/” tells the @match function that the following
literal constant must appear in the source string, but @match will not consume
any of those characters. For example, ‘a’/’b’ requires that the source string
begin with “ab” but it only consumes the ‘a’ from the source string. Similarly,
!“ax”/-{‘a’..’z’, ‘A’..’Z’, ‘0’..’9’, ‘_’} matches “ax” (case-insensitive) as
long as whatever follows is not an alphanumeric or underscore character (btw,
this expression isn’t quite good enough, you’ll also want to allow end of
string after the “ax”, but we haven’t discussed how to match end of string yet,
so that will have to wait).
You can also use the @peekChar,
@peekiChar, @peekStr, @peekiStr, and @peekCset functions to look ahead without
consuming any characters in the source string. E.g, this last example is
equivalen to:
!”ax” @peekCset(-{‘a’..’z’,
‘A’..’Z’, ‘0’..’9’, ‘_’} )
16.16.13.3.18 Utility Matching Functions
HLA’s regular expression macros
support several utility functions that match common strings, thus sparing you
from having to write regular expressions for these common items. The following
table lists the built-in functions.
Name |
Parameters |
Supports Repetition |
Description |
@eos |
|
No |
Matches the end of the string. |
@ws |
|
Yes |
Matches a whitespace character. |
@reg |
|
No |
Matches an x86 general-purpose
8, 16, or 32-bit register. |
@reg8 |
|
No |
Matches an x86 8-bit register
name. |
@reg16 |
|
No |
Matches an x86 16-bit register
name. |
@reg32 |
|
No |
Matches an x86 32-bit register
name. |
@regfpu |
|
No |
Matches an x86 FPU register name
(HLA syntax: st0, st1, ..., st7). |
@regmmx |
|
No |
Matches an x86 MMX register name
(HLA syntax: mm0, mm1, ..., mm7) |
@regxmm |
|
No |
Matches an x86 SSE register name
(HLA syntax: xmm0, xmm1, ..., xmm7) |
@matchid |
|
No |
Matches a sequence that looks
like an HLA identifier (begins with alphabetic or underscore, followed by
zero or more alphanumeric or underscore characters). |
@matchIntConst |
|
No |
Matches a sequence of one orr
more decimal digits. |
@matchRealConst |
|
No |
Matches a sequence that is a
syntactically (HLA) valid floating-point literal constant. |
@matchStrConst |
|
No |
Matches an HLA string literal
(including quotes around the object). |
@matchWord |
( “string” ) |
No |
Similar to @matchStr (or
“literal String”) except that the next character after the string it matches must not be alphanumeric or
underscore. |
@matchiWord |
( “string” ) |
No |
Case-insensitive variant of
@matchWord. |
@arb |
|
Yes |
Matches an arbitrary character.
Similar to ‘.’ but uses a lazy algorithm rather than a greey algorithm (that
is, it matches as few characters as possible rather than as many characters
as possible when the repetition operator allows an arbitrary number of
characters). |
@pos |
( n ) |
No |
n is a small unsigned integer.
This pattern succeeds if the current character being matched is the nth
character in the original source string (the one passed to @match). Note that the
first character in the string is at @pos(0). |
@tab |
( n ) |
No |
n is a small unsigned integer.
This pattern succeeds if n is greater than or equal to the current character
position in the original source string. If the current character position is
less than n, then @tab matches all characters up to the nth
position. Note that the first character in the string is at @tab(0). |
@at:identifier |
|
No |
This function stores the current
zero-based index into the source string into the VAL object identifier (identifier can also be a
#regex parameter or local symbol). The type of this value is UNS32. |
16.16.13.4 Backtracking
#regex regular expressions
fully support backtracking
during pattern matching. This means that if a regular expression ambiguously
specifies the text to match (and most non-trivial regular expressions are
ambiguous), then the @match function will back up and try possible alternatives
if one possibility fails. The most
obvious example is the alternation operator. If you have a regular expression
of the form R | S and R fails to match, then the @match function will “back
track” in the source string to where R began its match (‘unconsuming any
characters consumed by R) and retry the match using S.
Alternation certainly isn’t the
only case where backtracking occurs. Consider the following regular expression:
.*, “hello”
This regular expression
matches the string “hello” anywhere in the source string. The .* prefix skips
over an arbitrary number of characters and then “hello” must match some
substring of the source string. Note that the .* regular expression is greedy. That is, it will match as many characters as
possible. Indeed, when @match first encounters .*, it will match the remainder
of the string. Such a match, of course, will cause the next pattern (“hello”)
to fail as there are no characters left in the string. When this happens,
@match will back up some characters (up to the first character that .* matched)
and then see if the following regular expression matches. If so, then @match
succeeds. If @match backs up all the way in the source string to where .* began
matching in the source string. The @match function fails only if it back tracks
all the way to the start of what .* matches and then the subsequent pattern
still fails.
One thing to note here: because
.* is greedy, a regular expression like .*, “hello” will match everything up to
the last occurrence of “hello”
in the source string, not up to the first occurrence. If you would prefer to
match up to the first “hello” in the source string, you cannot use a greedy
algorithm when skipping arbitrary characters. The @arb function matches
arbitrary characters, like ‘.’, except it uses a lazy (or deferred) matching
algorithm, matching as few characters as possible. An expression like @arb*
begins by matching zero characters. If the subsequent pattern fails, it matches
one character. If the subsequent pattern fails, it tries matching two
characters, and so on. Therefore,
the regular expression @arb*, “hello” will match up to the first occurrence of
“hello” in the source string.
Backtracking can be a very
expensive operation if you’re not careful when designing your regular
expressions. Consider the following regular expression:
‘a’+, ‘a’+, ‘a’+
This regular expression
(ambiguously) matches three or more ‘a’ characters. Consider what happens,
however, when it is fed a source string such as “aaa”. The first ‘a’+ term
above matches the entire string. This causes the second ‘a’+ term to fail, so
backtracking occurs. The first ‘a’+ term backs off one character and now the
second ‘a’+ term can succeed. At this point, the third ‘a’+ term fails. So the
second ‘a’+ expression attempts to backtrack, but it fails to match, so the
first ‘a’+ term backs up one more character. Now, the second ‘a’+ term greedily
grabs the two available characters. The third ‘a’+ term fails at this point, so
backtracking occurs yet again. The second ‘a’+ term backs up one character and,
finally, the third ‘a’+ term succeeds. As you can see, this is a lot of work to
match a three character string. In general, backtracking is exponential time
complexity (that is, the number of backtracking operations that can take place
is proportional to 2**n, where n is the number of regular expression elements).
Fortunately, with a little care, you can almost always avoid the degenerate
cases that exhibit such poor performance. For example, the previous expression
could be efficiently written as ‘a’:[3,*].
Matching an arbitrary number of
characters is best done at the end of a regular expression rather than at the
beginning or in the middle of a regular expression. Doing so reduces the amount
of backtracking that will take place. If you cannot avoid matching an arbitrary
sequence of characters, then the next best thing to avoid is having two or more
subexpressions in a regular expression that match arbitrary expressions. When
you have two or more subexpressions that can match an arbitrary number of
characters, backtracking can get pretty ugly. Fortunately, you can usually
avoid such degenerate cases by carefully choosing your regular expressions.
16.16.13.5 Lazy Versus Greedy Evaluation
By default, the algorithms that
@match uses are greedy. That is, if a given subexpression can match an
arbitrary number of characters it will attempt to match as many as possible. If
matching too many would cause the match operation to fail, then backtracking will come to
the rescue and allow the pattern
match to succeed (if at all possible). If all you care about is whether the
pattern matches, then it really doesn’t matter whether the match algorithm is
greedy or non-greedy. There are two cases, however, where you might want to use
a non-greedy (“lazy”) algorithm: compile-time performance and minimal string
matching.
As you saw in the previous
section on backtracking, using a greedy algorithm can produce very slow
performance in certain degenerate situations. A lazy algorithm (which matches
as few characters as possible rather than as many characters as possible) will
generally produce much better performance as it can reduce the amount of
backtracking that takes place. For example, if you could run the ‘a’+, ‘a’+,
‘a’+ algorithm from the previous section using lazy evaluation, then it would
match the first three ‘a’ characters it finds and stop. No backtracking would
take place.
Another issue with greedy
evaluation is that it always matches the maximum length string. Perhaps this is
not what you want. Perhaps you want to match the minimal length string and then
process the remainder of the string (after the match) separately. For example,
you might expect the following pattern to match everything up to “hello” in the
source string and leave the rest of the source string in the remainder operand:
.*, “hello”
In fact, this regular
expression matches everything up to the last occurrence of “hello” in the
source string. So if the source string is something like “hello world, hello people,
hello creation” then the remainder string winds up being “ creation”. Sometimes
you want minimal string matching so greedy evaluation is inappropriate.
You can specify lazy evaluation
in a pattern using the following repetition forms (assume R is some regular
expression that supports repetition):
R::[n,m] Matches between n and m copies of R
R::[n,*] Matches n orrrr more copies of R
Although you cannot directly
specify lazy evalution for the unadorned * and + operators, you can easily
synthesize lazy evaluation for these operators as follows:
R::[0,*] Matches zero or more copies of R
R::[1,*] Matches one or more copies of R
16.16.13.6 The @match and @match2 Functions
Consider a simple regular
expression that matches a string of the form “id+id” (that is, a simple
arithmetic expression). The #regex macro might take the following form:
#regex simpleExpr;
@matchID, ‘+’, @matchID
#endregex
and you could use this regular
expression with an @match invocation like this:
?boolResult := @match(
“value1+value2”, simpleExpr );
This will work great right up
to the point you try something like the following, at which point the pattern
matching operation will fail:
?falseResult := @match( “value1
+ value2”, simpleExpr );
(notice there are spaces
around the ‘+’ operator in the source string.)
You can solve this problem, and
allow arbitrary whitespace in an expression, by inserting @ws* regular
expressions at appropriate points in your regular expression. For example, you
could rewrite simpleExpr thusly:
#regex simpleExpr;
@ws*, @matchID, @ws*, ‘+’, @ws*, @matchID
#endregex
This new regular expression
will ignore whitespace at all the appropriate points in the source string.
There are three problems with
sticking @ws* terms throughout your regular expression. First, it clutters up
the regular expression and makes it difficult to read. Second, it’s easy to
misplace (or leave out) one of the @ws* terms. Finally, a bunch of terms like
@ws* can have a serious impact on the processing time needed by @match when
backtracking occurs.
The @match2 function solves
these three problems. @match2 automatically skips any white space present
before each term it finds in a regular expression that it processes. This
spares you having to clutter your code with @ws* items, it guarantees that it
skips whitespace before each term, and the whitespace it skips is not subject
to backtracking issues. So unless you want absolute control over matching
whitespace in your source strings, you should really use the @match2 function
rather than @match.
In some very rare cases, you
may need the ability to switch between @match and @match2 semantics within the
same regular expression. For example, if you want to be able to parse HLA-style
character constants, you might be tempted to use a regular expression like the
following:
“‘’’’” | ‘’’’, ., ‘’’’
(that is, match ‘’’’ or a
single character surrounded by apostrophies.)
Unfortunately, if you use @match2 to process this
regular expression it will fail when you attempt to match the character
constant ‘ ‘. This is because @match2 will skip the space between the two
apostrophies. To avoid this problem, the solution is to make a recursive call
to @match within the regular expression, as follows:
“‘’’’” | @match( ‘’’’, ., ‘’’’
)
This guarantees @match
semantics (no whitespace skipping) for the specified subexpression. Note that
there are no returns, remainder, or matched parameters allowed here, and the
source string is always the current string being processed.
You can also call @match2 in a
similar manner if you want to guarantee @match2 semantics in a subexpression.
To improve pattern matching
performance, particularly when backtracking occurs, HLA does not interpret the
text of a #regex macro directly. Instead, HLA compiles a #regex macro into an internal format and operates on that
internal format rather than on the #regex text directly. This effects the
operation and usage of #regex macros in several subtle ways. To avoid
complications when using #regex macros, it’s important to understand how
compiling #regex macros affects their operation.
Prior to the introduction of
#regex macros, there were two distinct times a programmer had to be concerned
with: assembly (compile) time and run time. For example, the #if statement
operates at compile time whereas the if statement operates at run time. In
order to fully utilize the HLA compile-time language, a programmer has to
become comfortable with the difference between compile-time operations and
run-time code. #regex regular
expressions also exhibit two distinct phases – compile time and run time – though
the confusing part is that both of these phases take place during the HLA
compilation phase. Unfortunately, and this is the confusing part, the complete
facilities of the HLA compile-time language are only avaailable during regular
expression compilation, not while HLA is executing those regular expressions.
Consider, for a moment, the
following #regex macro definition:
#regex sample( count );
#for( i:= 1 to count )
‘a’,
#endfor
‘b’
#endregex
At first glance, this code
seems rather straight-forward. You would think that it would match the number
of ‘a’ characters passed as the parameter, followed by a single ‘b’ character.
If fact, the behavior is subtlely different. As for machine instructions, the
#for loop simply replicates the body while compiling the regular expression.
Once compiled, the number of matching ‘a’ characters is immutable. For example,
if you compile a regular expression using the value 5 as the actual argument
value, the above regular expression macro is equivalent to:
#regex sample( count );
‘a’, ‘a’, ‘a’, ‘a’, ‘a’,
‘b’
#endregex
Unless you recompile this regular expression with a different
argument value, the value will never be anything other than five.
Of course, one question that
naturally rises is “how does one compile a #regex macro?” None of the examples
to date have require the use of a special “regular expression compiler” to
process a #regex macro before using it. Well, as it turns out, HLA will
automatically compile a #regex macro to its internal form if you use such a
macro within an @match/@match function call or if a #regex macro name appears
within some other regular expression. Because the regular expression is
compiled on the spot, the distinction between compile time and run time for the
regular expression almost becomes
a moot point.
The only problem with compiling
a regular expression every time you encounter it is that compilation can be an
expensive operation if you recompile a regular expression on each use. Consider
the following #regex macros:
#regex matchHello;
“hello”
#endregex
#regex hasHello;
.*, matchHello
#endregex
The .* operand in hasHello
guarantees that backtracking will occur within this regular expression.
Unfortunately, on each backtracking instance (and there will be five of them in
this case), HLA is forced to recompile the regular expression. This is
extremely inefficient. For this reason, you should try to avoid placing
uncompiled regular expression macro invocations inside a #regex definition.
Instead, you should precompile the regular expression to the internal form and
specify that compiled version. This saves the expense of recompiling the
regular expression on each invocatio of the internal #regex macro.
The obvious question is “how
does one precompile a #regex macro?” This is accomplished by creating a VAL
object of type “regex” and assigning a #regex macro to that VAL identifier. For
example:
#regex matchHello;
“hello”
#endregex
val
compiledMatchHello :regex := matchHello;
When HLA sees a statement like
this, it compiles the #regex macro (matchHello in this example) to the internal form and stores
this internal data structure into the regex VAL object (compiledMatchHello in this example). Now you can use the compiled
variant of the #regex macro just like the macro itself with one very important
difference – compiled regexes do not allow any actual arguments. The processing
of the #regex parameters (and any HLA compile-time language statements
appearing in the macro) takess
place when the #regex macro is compiled, the statements that would make use of
those compile-time language statements is gone when HLA actually executes the
regular expression.
If you’re only going to use a
regular expression macro once in a source file, precompiling the macro won’t
achieve anything. However, if you use a regular expression macro several times,
and especially if you use the regex macro within some other regular expression,
you should get in the habit of precompiling the #regex macro and using the
compiled version. Here’s a good convention to use: prefix your #regex macro
names with an underscore and then immediately follow the #regex macro with a
VAL statement that compiles the macro to the unadorned name, e.g.,
regex _matchHello;
“hello”
#endregex
val
matchHello :regex := matchHello;
16.16.13.8 The #match..#endmatch Block
Although you can use @match and
regular expression macros as generic pattern-matching functions in your HLA
compile-time program, the true intended purpose of these pattern-matching
facilities is to allow you to write your own “mini-languages” (i.e., domain-specific
languages) directly in your HLA source files. The #match..#endmatch directives
provide a convenient way to compile such domain-specific languages
(DSELs). A #match..#endmatch block
takes the following form:
#match( regexID )
<<body>>
#endmatch
The #match directive converts
the block of text after the closing parenthesis and up to the #endmatch
directive into a single string, runs @match on this string along with the
regular expression specified by regexID, and then expands the return string to
text if the @match function returns true. This is roughly equivalent to:
?returnStr:string;
#if( @match( <<body text
as a string>>, regexID, returnStr ))
@text( returnStr );
#endif
Here is a hypothetical example
of #match..#endmatch in action:
#match( smallBASIClanguage )
for i = 1 to 10
print i
next i
#endmatch
Presumably, the
smallBASIClanguage regular expression would contain the statements to compile
the body of the #match..#endmatch statement into the corresponding machine
instructions.
16.16.13.9 Using Regular Expressions in Your Assembly Programs
Unless you’ve had a firm
grounding in compiler theory and pattern-matching theory, you’re probably
wondering what the heck these #regex macros are all about. What do they have to
do with assembly language? Although this documentation cannot begin to go into
details about automata theory and
what-not, it is useful to describe exactly why you might want to create and use
#regex macros in your assembly programs.
HLA’s standard macro facilities
let you extend the HLA language, but you don’t have a whole lot of say in the
design of the syntax for those macro invocations. Though HLA’s context-free
macro facilities provide lots of
options you just don’t see in other assemblers, the truth is that you’re stuck
using the standard HLA syntax when using macros. Regular expressions give you
the ability to design a syntax of your own choosing. You can even create full
programming languages inside HLA using #regex pattern matching macros. All you
need to is place your “program” inside some HLA compile-time string object
(e.g., using the #text..#endtext directive) and then call @match to compile
your program.
Examples of #regex macros
appear in the HLA examples download module. Please grab a copy of these
examples to see some working examples of HLA #regex macros.
16.16.14
The #asm..#endasm and #emit Directives
These directives are deprecated
and should not appear in new HLA programs. They will definitely be gone in HLA
v2.0 and will probably disappear soon from HLA v1.xx. Much of the need for
these statements has gone away over the years as HLA’s instruction set was
expanded to incorporate most x86 instructions.
16.16.15
The #system Directive
The #SYSTEM directive requires
a single string parameter. It
executes this string as an operating system (shell/command interpreter)
operation via the C "system" function call. This call is useful, for example, to run a program during
compilation that dynamically creates a text file that an HLA program may
include immediately after the #system invocation.
Example:
#system( "dir" )
Note that the
"#system" directive is legal anywhere white space is allowable and
doesn’t require a semicolon at the end of the statement.
16.16.16
The #print and #error Directives
The #PRINT directive displays its parameter values
during compilation. The basic
syntax is the following:
#print( comma, separated, list,
of, constant, expressions, ... )
The #PRINT statement is very useful for displaying
messages during assembly (e.g., when debugging complex macros or compile-time
programs). The items in the #PRINT
list must evaluate to constant (CONST or VAL) values at compile time.
The #ERROR directive behaves like #PRINT insofar as it prints
its parameter to the console device during compilation. However, this instruction also
generates an HLA error message and does not allow the creation of an object
file after compilation. This
statement only allows a single string expression as a parameter. If you need to print multiple values of
different types, use string concatenation and the @string function to achieve
this. Example:
#error( "Error, unexpected
value. Value = " + #string(
theValue ))
Notice that neither the #print
nor the #error statements end with a semicolon.
16.16.17
Compile-Time File Output (#openwrite, #append, #write, #closewrite)
These compile-time statements
let you do simple file output during compilation. The #openwrite
statement opens a single file for output, #write writes data to that output file, and #closewrite closes the file when output is complete. These statements are useful for
automatically generating INCLUDE files that the source file will include later
on during the compilation. These
statements are also useful for storing bulk data for later retrieval or
generating a log during assembly.
The #openwrite statement uses the following syntax:
#openwrite( string_expression )
This call opens a single
output file using the filename specified by the string expression. If the system cannot open the file, HLA
emits a compilation error. Note
that #openwrite only allows one output file to be active at a time. HLA will report an error if you execute
#openwrite and there is already an output file open. If the file already exists, HLA deletes it prior to opening
it (so be careful!). If the file
does not already exist, HLA creates a new one with the specified name.
The #append statement has the same syntax as #openwrite. The difference is that using #append will not first delete the file you are opening. Instead, all data
written to the file will be appended to the end of the existing file (if any).
The #write statement uses the same syntax as the #print directive.
Note, however, that #write doesn’t automatically emit a newline after writing all its operands to
the file; if you want a newline
output you must explicitly supply it as the last parameter to #write.
The #closewrite statement closes the file opened via #openwrite. HLA
automatically closes this file at the end of assembly if you leave it
open. However, you must explicitly
close this file before attempting to use the data (via include or #openread) in your program. Also, since HLA allows only one open output file at a time,
you must use #closewrite
to close the file before you can open another with #openwrite.
Warning: Internally, the #write statement simply redirects the standard output
stream to send output to the write file and then invokes #print, restoring the standard output file handle upon
return. This creates a minor
problem if there is a syntax error in the #write operand list -- the error message gets written to
the output file! If you’re having
problems with the #write
output, temporarily change it to #print to see if there’s an error in the statement. This defect will probably get fixed in
some future version (beyond HLA v1.32).
16.16.18
Compile-time File Input (#openread, @read, #closeread)
These compile-time statements
and function let you do simple file input during compilation. The #openread statement opens a single file for input, @read is a compile-time function that reads a line of
text from the file, and #closeread
closes the file when input is complete.
These statements are useful for reading files produced by #openwrite/#write/#close write or any other text file during compilation.
The #openread statement uses
the following syntax:
#openread( filename )
The filename parameter must be a string expression or HLA
reports an error. HLA attempts to
open the specified file for reading;
HLA prints an error message if it cannot open the file.
The @read function uses the following call syntax:
@read( val_object )
The val_object parameter must either be a symbol you’ve defined
in a VAL section (or via
"?") or it must be an undefined symbol (in which case @read defines it as a VAL object). @read is an HLA compile-time function (hence the
"@" prefix rather than "#"; HLA uses "#" for
compile-time statements). It
returns either true or false, true if the read was successful, false if the
read operation encountered the end of file. Note that if any other read error occurs, HLA will print an
error message and return false as the function result. If the read operation is successful,
then HLA stores the string it read (up to 4095 characters) into the VAL object specified by the parameter. Unlike #openread and #closeread, the @read function may not appear arbitrarily in
your source file. It must appear
within a constant expression since it returns a boolean result (and it is your
responsibility to check for EOF).
The #closeread
statement closes the input file.
Since you may only have one open input file at a time, you must close an
open input file with #closeread prior to opening a second file. Syntax:
#closeread
Example of using compile-time
file I/O:
#openwrite( "hw.txt"
)
#write( "Hello
World", nl )
#closewrite
#openread( "hw.txt" )
?goodread := @read( s );
#closeread
#print( "data read from
file = ", s )
16.16.19
The Conditional Compilation Statements (#if)
The conditional compilation
statements in HLA use the following syntax:
#if( constant_boolean_expression )
<< Statements to compile if the >>
<< expression above is true. >>
#elseif( constant_boolean_expression )
<< Statements to compile if the >>
<< expression immediately above >>
<< is true and the first expres->>
<< sion above is false.
>>
#else
<< Statements to compile if both >>
<< the expressions above are false. >>
#endif
The #ELSEIF and #ELSE clauses are optional. As you would expect, there may be more
than one #ELSEIF clause in the same conditional if sequence.
Unlike some other assemblers
and high level languages, HLA’s conditional compilation directives are legal
anywhere whitespace is legal. You
could even embed them in the middle of an instruction! While directly embedding these
directives in an instruction isn’t recommended (because it would make your code
very hard to read), it’s nice to know that you can place these directives in a macro and then replace
an instruction operand with a macro invocation.
An important thing to note
about this directive is that the constant expression in the #IF and #ELSEIF
clauses must be of type boolean or HLA will emit an error. Any legal constant expression that
produces a boolean result is legal here.
In particular, you are limited to expressions like those allowed by the
HLA HLL IF statement.
Keep in mind that conditional
compilation directives are executed at compile-time, not at run-time. You would not use these directives to
(attempt to) make decisions while your program is actually running.
16.16.20
The Compile-Time Loop Statements (#while and #for)
The HLA compile time language
also provides a couple of looping structures -- the #WHILE loop and the #FOR
loop.
The #while..#endwhile
compile-time loop takes the following form:
#while( constant_boolean_expression )
<< Statements to execute as long >>
<< as the expression is true. >>
#endwhile
While processing the #while..#endwhile loop, HLA evaluates the constant boolean
expression. If it is false, HLA
immediately skips to the first statement beyond the #endwhile directive.
If the expression is true, then
HLA proceeds to compile the body of the #while loop.
Upon encountering the #endwhile directive, HLA jumps back up to the #while clause in the source code and repeats this process
until the expression evaluates false.
Warning: since HLA allows you to create loops in
your source code that evaluation during the compilation process, HLA also
allows you to create infinite
loops that will lock up the system during compilation. If HLA seems to have gone off into la-la
land during compilation and you’re using #while loops in your code, it might not be a bad idea to
put some #print
directives into your loop(s) to see if you’ve created an infinite loop.
Note: because of the
limitations of HLA’s implementation language (FLEX and BISON), it is not
possible to begin a #while
loop and have the matching #endwhile appear in a (different) macro or TEXT constant. When the HLA compiler encounters a #while statement it scans the source code looking for the
matching #endwhile
collecting up the statements that make up the body of the loop. During this scan it does not expand
TEXT constants or macros. Hence,
if you bury the #endwhile
in a macro or TEXT constant HLA will not be able to find it. For performance and functional reasons,
HLA cannot expand macro and TEXT variables during this scan. This is a limitation we will all have
to live with until v2.0 of HLA (which will be rewritten in a different
language).
The #for..#endfor loop can take one of the following forms:
#for( loop_control_var := Start_expr to end_expr )
<< Statements to execute as long as the loop
control variable’s >>
<< value is less than or equal to the ending
expression. >>
#endfor
#for( loop_control_var := Start_expr downto end_expr )
<< Statements to execute as long as the loop
control variable’s >>
<< value is greater than or equal to the
ending expression.
>>
#endfor
The HLA compile-time #for..#endfor statement is very similar to the for loops found
in languages like Pascal and BASIC.
This is a definite loop that executes some number of times determine
when HLA first encounters the #for directive (this can be zero or more times, but the number is computed
only once when HLA encounters the #for). The
loop control variable must be a VALUE object or an undefined identifier (in
which case, HLA will create a new VALUE object with the specified name). Also, the number control variable must
be an eight, sixteen, or thirty-two bit integer value (uns8, uns16, uns32,
int8, int16, or int32). Also, the
starting and ending expressions must be values that an int32 VALUE object can
hold.
The #for loop with the to clause initializes the loop control variable with
the starting value and repeats the loop as long as the loop control variable’s
value is less than or equal to the ending expression’s value. The #for..to..#endfor loop increments the loop control variable on each
iteration of the loop.
The #for loop with the downto clause initializes the loop control variable with
the starting value and repeats the loop as long as the loop control variable’s
value is greater than or equal to the ending expression’s value. The #for..downto..#endfor loop decrements the loop control variable on each
iteration of the loop.
Note that the #for..to/downto..#endfor loop only computes the value of the ending
expression once, when HLA first encounters the #for statement.
If the components of this expression would change as a result of the
execution of the #for
loop’s body, this will not affect the number of loop iterations.
The #for..#endfor loop can also take the following form:
#for( loop_control_var in composite_expr )
<< Statements to execute for each element
present in the expression >>
#endfor
The composite_expr in this syntactical form may be a string, a
character set, an array, or a record constant.
This particular form of the #for loop repeats once for each item that is a member
of the composite expression. For
strings, the loop repeats once for each character in the string and the loop
control variable is set to each successive character in the string. For character sets, the loop repeats
for each character that is a member of the set; the loop control variable is assigned the value of each
character found in the set (you should assume that the extraction of characters
from the set is arbitrary, even though the current implementation extracts them
in order of their ASCII codes).
For arrays, this #for loop
variant repeats for each element of the array and assigns each successive array
element to the loop control variable.
For record constants, the #for loop extracts each field and assigns the fields,
in turn, to the loop control variable.
Examples:
#for( c in "Hello" )
#print( c ) // Prints the five characters ’H’, ’e’,
..., ’o’
#endfor
// The following prints a..z and 0..9 (not
necessarily in that order):
#for( c in {’a’..’z’, ’0’..’9’} )
#print( c )
#endfor
// The following prints 1, 10, 100, 1000
#for( i in [1, 10, 100, 1000] )
#print( i )
#endfor
// The following prints all the fields of the
record type r
// (presumably, r is a record type you’ve defined
elsewhere):
#for( rv in r:[0, ’a’, "Hello", 3.14159]
)
#print( rv )
#endfor
16.16.21
Compile-Time Functions (macros)
Keep in mind that HLA macros
are text expansion devices that may appear anywhere whitespace is allowed. Therefore, you can use them for so much
more than 80x86 instruction synthesis.
In particular, along with the "?" operator, you can create
compile-time functions. For
example, consider the following macro that converts the first character of a
string to upper case and forces the remaining characters to lower case:
program macroFuncDemo;
#include( "stdio.hhf"
);
#macro Capitalize( s );
@uppercase( @substr(
s,0,1), 0 ) +
@lowercase( @substr( s, 1, 1000 ), 0)
#endmacro
static
Hello: string := Capitalize( "hELLO"
);
World: string := Capitalize( "world"
);
begin macroFuncDemo;
stdout.put( Hello, " ", World, nl );
end macroFuncDemo;
16.17
HLA Units and External Compilation
This section discusses how to
create separately compilable modules in HLA and how you can link HLA code with
code written in other languages.
16.17.1
External Declarations
HLA provides two features to
support separate compilation: units and external objects. HLA uses a very general scheme, similar
to C++ to communicate linkage information between object modules. This scheme lets HLA programmers link
to their HLA programs code written in HLA, "pure" assembly (i.e.,
MASM code), and even code written in other high level languages (HLLs). Conversely, the HLA program can also
write modules to be linked with programs written in this other languages (as
well as HLA).
Writing separate modules is
quite similar to writing a single HLA program. The first thing to note is that an executable can have only
one main program. When writing HLA
programs, the "PROGRAM" reserved word tells HLA that you are writing
a module that contains a main program.
When writing other modules, you must use a "UNIT" rather than
a "PROGRAM" so as not to generate an extra main procedure. If you wish to write a library module
that contains only procedures and no main program, you would use an HLA
unit. Units have a syntax that is
nearly identical to programs, there just isn’t a BEGIN associated with the
unit, e.g.,
unit UnitName;
<< Declarations >>
end UnitName;
Since a unit does not contain a
main program, it cannot compile into a stand-alone program; therefore, you
should always compile units with the "-c" command line option to
avoid running the linker on the unit code (which will always produce a link
error)[28].
HLA uses the
"@EXTERNAL" keyword to communicate names between modules in a
compilation group. If a symbol is
defined to be external, HLA assumes that the symbol is declared in a separate
module and leaves it up to the linker to resolve the symbol’s address.
Only two types of symbols may
be external: procedures and static variables[29]. Variables declared in the VAR section
cannot be external because the linker cannot statically resolve their run-time
address. Constants declared in the
CONST or VAL section cannot be external, however this is not a limitation because
most programmers place public constants in header files and include them in the
source files that require them.
Recall the syntax for a
procedure declaration presented in the basic HLA documentation:
procedure identifier ( optional_parameter_list ); procedure_options
declarations
begin identifier;
statements
end identifier;
There are two additional forms
to consider:
procedure identifier ( optional_parameter_list );
options
@external;
procedure identifier ( optional_parameter_list );
options
@external("extname");
These two forms tell the HLA
compiler that it is okay to call the specified procedure, but the procedure
itself may not otherwise appear in the current source file. It is the responsibility of the linker
to ensure that the specified external procedures actually appear within the
object modules the linker is combining.
The first form above is
generally used when the external procedure is an HLA procedure that appears in
a different source module. HLA
assumes that the external name is the same name as the procedure identifier[30].
The second form above is generally used when calling
code written in a language other than HLA[31]. This form lets you explicitly state
(via the string constant "extname") the name of the external
procedure. This is especially
important when calling procedures whose names contain characters that are not
"HLA-Friendly." For
example, many Windows API calls have at signs ("@") in
their names; to call such routines
you would use the second form of the external declaration above supplying the
Windows API compatible name as the parameter to the @external reserved word.
It is perfectly legal to
declare an external procedure in the same source file that the procedure’s
actual code appears. However, the
external declaration must appear before the actual declaration or HLA will generate an error. Whenever an external declaration
appears in the same source file as the actual procedure code, HLA emits code to
ensure that the procedure’s name is public. Therefore, the external
declaration must appear in the
same file as the procedure’s code if you wish the linker to be able to resolve
the procedure’s address at link time.
This external declaration serves the same purpose as the
"public" directive in other assemblers (e.g., MASM). Note that, unlike C/C++, procedure
names are not automatically public.
An external declaration must appear in the same file as the procedure
code to make the symbol public.
Also note that the only options
an external procedure declaration supports are the @returns, @pascal, @cdecal, and @stdcall options.
You cannot use the @align, @noalignstack, @noframe or @nodisplay options in an external declaration. Conversely, if an @external (or @forward, for that matter) declaration appears in a source
file, the corresponding procedure code may only contain the @align,
@noalignstadk, @noframe, and/or
@nodisplay options. The @returns, @pascal, @cdecl, and @stdcall options are not legal in a procedure declaration
if a corresponding @external
(or @forward) declaration is
present in the source code.
Note: External procedures are
only legal at lex level one. You
cannot declare an external procedure that is embedded inside another procedure.
In addition to procedures, HLA
also lets you declare @external
variables. You may reference such
variables in different source modules.
The declaration of an external variable is very similar to the
declaration of an external procedure: you follow the variable’s name with the
external clause. If an optional
string parameter is not present, HLA uses the variable’s name as it’s external
name. If you need to specify a
specific name, to avoid conflicts with MASM or to contain characters illegal in
an HLA identifier, then provide a string with the identifier you need.
Note that HLA does not allow
the @EXTERNAL keyword after every static declaration. Instead, only the following variable declarations allow the
@EXTERNAL keyword:
name: procedure optional_parameters; @external;
name: pointer to typename; @external;
name: typename; @external;
name: typename [ dimensions ];
@external;
In particular, note that static
variable declarations with initializers cannot be external. Also note that ENUM, RECORD, and UNION
variables (those variables you directly create as ENUM, RECORD, or UNION) may
not be external; this is not a
serious limitation, however, since you can declare a named type in the
"TYPE" section and use the third form above to create an external
object of the desired type (this is also how you would declare @EXTERNAL class
variables).
Like the C/C++ language, you normally put all your
external declarations in a header file and include that header file using the
"#include" directive in each of the source
files that reference the external symbols. This eases program maintence by having to change only a
single definition in an include file rather than multiple definitions across
different source files (if not using include files). See the HLA Standard Library code for some good examples of
using HLA header files.
By convention, HLA header files
that contain external declarations always have an ".HHF" suffix (HLA
Header File). To help make your
programs easy to read by others, you should always use this same suffix for
your HLA header files.
16.17.2
HLA Naming Conventions and Other Languages
If you wish to link together
code written in a different language with code written in HLA, you must be
aware of the differences in naming conventions between the two languages.
With respect to names, keep in
mind that HLA is a case-neutral language. To the outside world, this means that HLA is case sensitive. Therefore, all public names that HLA
exports are case sensitive. If you
are using a case insensitive language like Pascal or Delphi, you should check
with your compiler vendor to determine how the language emits public names
(usually, case insensitive languages convert all public symbols to all upper
case or all lower case). Some
languages, e.g., MASM, let you choose whether public symbols are case sensitive
or case insensitive; for such
languages, you should select case sensitivity as the default and spell your
names the same (with respect to case) between the HLA code and the other
language.
In some cases, it might not be
possible to match an HLA identifier with a public or external identifier in
another language. One possible
reason for this problem is that HLA only allows alphanumeric characters and
underscores in identifiers; some other languages (e.g., MASM) allow other
characters in their names while other language (e.g., C++) often "mangle" their names by adding additional characters
that are normally illegal within identifiers (e.g., the at sign,
"@").
The HLA @EXTERNAL directive
provides an option that lets you use a standard HLA identifier within your
program, but utiltize a completely different identifier as the public
symbol. The standard HLA
identifier restrictions do not apply to the external name[32]. This variant of the external directive
takes the following forms:
External procedure
declaration:
procedure ProcName; @external(
"ExtProcName" );
External variable declaration:
varName: SomeType; @external(
"ExtVarName" );
Within the confines of the HLA
program, you would use the HLA identifiers "ProcName" and "varName".
To the outside world, however, you would use the names "ExtProcName" and "ExtVarName" to reference these objects.
Since the "@EXTERNAL"
parameter is a string constant rather than an HLA identifier, you can use
characters that would otherwise be illegal in an HLA identifier. For example, Microsoft’s Visual C++ language
and Windows often insert the "@" symbol into identifiers. Normally, this character is illegal in
(user-defined) HLA symbols. You may, however, give an identifier a legal HLA
name and then specify the VC++ compatible name within the string constant. For example, here is a typical
procedure declaration found in the HLA standard library "fileio.hla"
source file:
procedure WriteFile
(
overlapped: dword;
var bytesWritten: dword;
len:
dword;
var buffer: byte;
Handle:
dword
);
@external( "_WriteFile@20" );
(The "@20" suffix is
a Win32 convention that indicates that there are 20 bytes of parameter data in
this external function.)
As noted above, many languages
"mangle" their external names for one reason or another. In addition to the "@20"
suffix in the previous example, you will also note that VC++ added a leading
underscore to the name (this procedure calls the Win32 API "WriteFile" function). Once again, this name mangling is a function of the
particular compiler being used.
Sinces Windows itself is written in VC++, Win32 API calls follow the
VC++ standards for name mangling.
In addition to giving you the
ability to conform external names as needed by external languages, the string
parameter of the @EXTERNAL directive will let you change the name for more
mundane reasons. For example, if
you really don’t like the external name, perhaps it is not descriptive of the
operation, you can use the string parameter feature of the external directive to
allow the use of a different, perhaps more descriptive, name in your HLA code.
Some languages, for example
C++, provide function overloading. This means
that a program can use the same name to reference two completely different
procedures in the code. Within the
object file, however, all names must be unique. Once again, the compiler’s name mangling facilities come
into play to generate unique names.
How a particular name is mangled is extremely compiler sensitive (e.g.,
Borland’s C++ mangles names differently than Microsoft’s Visual C++, even when
compiling the same exact C++ program).
When deciding on the name with which to reference an external procedure,
you may need to consult your compiler documentation or be willing to experiment
around a bit.
Macintosh users, and HLA users
who think their code might someday be compiled for Mac OSX, should be aware of
an issue with the Macintosh version of Gas used as a back-end assembler for
HLA. The Macintosh Gas assembler
treats all symbols beginning with uppercase “L” as local symbols to the
assembly language module. This means that you cannot create any external
symbols that begin with uppercase “L” – such symbols will not be exported from
the Gas assembly language file that HLA produces. Stick an underscore, or some
other character, in front of the external name.
16.17.3
HLA Calling Conventions and Other Languages
Of course, HLA is an assembly
language, so it is possible via the PUSH and CALL instructions to mimic any
calling sequence used by any language that allows the call of external assembly
language code (which covers almost all languages). However, when using the HLA high level language features, in
particular, HLA procedure declarations and calls, there are some details you
must be aware of in order to successfully call code written in other languages
or have those other languages call your code.
By default, HLA assumes that all parameters are pushed on the
stack in a left-to-right order as the parameters appear in the formal parameter
list. Some languages, like Pascal
and Delphi, use this same calling mechanism. A few languages, most notably C/C++, push their parameters
in the right-to-left order. If the
language expects the parameters to be in the reverse order (right-to-left), a
simple solution is to use the @cdecl or @stdcall procedure options to specify the calling
convention.
Many languages, like HLA,
Pascal, and Delphi, make it the procedure’s resposibility to clear parameters
from the stack when the procedure returns to the caller. Some languages, like C/C++ make it the
caller’s responsibility to clear parameters from the stack after the procedure
returns to the caller. Procedures
you declare with the @pascal and
@stdcall procedure options
automatically remove their parameter data from the stack when they return. Procedures you declare with the @cdecl option leave it up to the caller to remove the
parameter data from the stack.
Note that when using the HLA high-level procedure calling syntax, HLA
automatically pushes the parameters on the stack in the correct order
("correct" as defined by the procedure’s calling convention).
HLA procedures do not support a
variable number of parameters in a parameter list. If you need this facility (e.g., to call a C/C++ function)
then you will need to manually push the parameters on the stack yourself prior
to calling the function. Procedures
that have a variable number of parameters almost always using the @cdecl calling convention; since only the caller knows how much parameter data to
remove from the stack, the procedure generally cannot remove the parameter data
(as the @pascal and
@stdcall conventions do).
16.17.4
Calling Procedures Written in a Different Language
When calling a subroutine
written in a different language, your code must pass the parameters as the other
language expects and clean up the parameters if the target language requires
your code to do so upon return.
Generally, calling code written in other languages is relatively easy. You’ve got to ensure that you’re
passing the parameters in the proper places (e.g., in registers or pushing them
on the stack in an appropriate order).
Generally, such a call only requires that you provide a suitable
external procedure declaration (e.g., swapping the order of the parameters in
the parameter list if the language passes parameters in a right-to-left
order). Some languages may require
additional data structures (e.g., static links) to be passed. It is your resposibility to determine
if such data is necessary and pass it to the subroutine you are calling.
16.17.5
Calling HLA Procedures From Another Language
Calling HLA procedures from
another language is somewhat more complex that the converse operation. You still have the problem of parameter
ordering; though this is usually fixed by reversing the parameters in the
parameter list (e.g., using the @cdecl or @stdcall procedure options).
A bigger problem is the
responsibility of cleaning up the parameters on the stack. By default, an HLA procedure automatically
removes parameter data from the stack upon return. If the calling code thinks that it has the responsibility to
do this cleanup, the parameter data will be removed twice, with disasterous
results. Such code must use the @cdecl calling convention or you must use the @noframe option (and probably @nodisplay as well) to disable the automatic generation of
procedure entry and exit code.
Then you must manually write the code that sets up the activation record
and returns from the procedure.
Upon return, you must use the "RET()" instruction without a
numeric parameter.
HLA external procedures must
always be declared at lex level one.
Since the condition of the stack is unknown upon entry into HLA code
from some externally written code, your external HLA procedures should not
depend upon the display to access non-local variables. HLA procedures that other languages
call should always have the @nodisplay option associated with them.
While it is okay to access non-local STATIC objects, you should never attempt
to access non-local VAR objects from a procedure that code written in a
different language will call.
HLA’s @pascal,
@stdcall, and @cdecl procedure options cover the calling conventions of
most modern high level languages.
However, other calling conventions do exist (for example, the METAWARE
compilers give you an option of passing parameters in the left-to-right order
and it is the caller’s responsibility to clean up the stack afterwards). Some languages don’t even pass their
parameters on the stack. Some
languages pass some or all of the parameters in registers. If you are linking your HLA code with a
language that uses one of these non-standard calling conventions, it is your
responsibility to write the explicit HLA code that passes these parameters and
cleans up the parameter data upon return from the procedure.
16.17.6
Linking in Code Written in Other Languages
When linking in code written in
a different language to an HLA main program, keep in mind that the foreign code
may make calls to the standard library associated with the other language. You may need to link in that code as
well. Also keep in mind that some
compilers emit code that assumes that certain initialization has occurred when
the program is loaded into memory.
Unfortunately, if the main program is not written in this other language
(i.e., main is written in HLA), this initialization might not have been
done. This may very well cause the
routine you’re linking into an HLA program to fail. Also note that HLA’s
exception handling system is probably quite a bit different from the exception
handling present in other languages; so it should go without saying that if
some foreign code raise an exception the HLA exception handling system may not
be able to respond to that exception.
Conversely, be very careful
about calling HLA standard library routines in code you expect to link into
programs written in other languages.
The HLA standard library routines (and the exception handling code, in
particular), rely upon initialization that the HLA main program performs. This could create a problem, for
example, if you attempt to execute some procedure that raises an exception and
the exception handling code has not been initialized.
17
The 80x86
Instruction Set in HLA
One of the most obvious
differences between HLA and standard 80x86 assembly language is the syntax for
the machine instructions. The two
primary differences are the fact that HLA uses a functional notation for
machine instructions and HLA arranges the operands in a (source, dest) format
rather than the (dest, source) format used by Intel.
A second difference, related to the fact that HLA
uses a functional notation, is that HLA allows you to compose instructions. That is, one
instruction may appear as an operand to a second instruction, e.g.,
mov( mov( 0, eax ), ebx );
To decipher this instruction,
all you need to do is to realize that at compile time each instruction returns
a string that HLA substitutes in place of the composed instruction. Usually, the string an instruction
returns is that instruction’s destination operand. In the example above, the interior mov instruction’s
destination operand is EAX, so that mov instruction “returns” the string “EAX”
which HLA substitutes for the interior mov instruction, producing “mov( eax,
ebx );” as the outside instruction.
HLA always processes interior instructions from left-to-right
interior-first. Therefore, the
above instruction is really equivalent to the MASM sequence:
mov eax, 0
mov ebx, eax
Consider a second example:
add( mov( i, eax ), mov( j, ebx ));
This instruction is equivalent
to the MASM sequence:
mov eax, i
mov ebx, j
add ebx, eax
Although, used sparingly,
instruction composition is useful and can help improve the readability of your
HLA programs in certain contexts, you should be careful when using instruction
composition because it can quickly produce unreadable code. Even this second example (add(mov,mov))
would probably prove difficult to read by most programmers.
If you need to modify the RETURNS value of an instruction (in a macro, for
example), you may use the "returns" statement in HLA. This statement takes the following
form:
RETURNS( { statements }, "string
Constant" )
This statement emits the code
for the statement(s) between the curly braces and then returns the specified
string constant as the "returns" value for this statement.
The following paragraphs
describe each of the HLA machine instructions. They also describe the string each instruction yields during
compile time (this is called the “returns” string). Note that some instructions
return the empty string as there is no return value one could reasonably
associated with them. Such
instructions cannot generally be used as operands within other instructions.
These descriptions do not
describe the purpose for each instruction; see an assembly text like “The Art of Assembly Language
Programming” for details on the operation of each instruction.
17.1
Zero Operand
Instructions (Null Operand Instructions)
Instruction |
Description |
aaa( ) |
ASCII adjust for addition. Returns “ax”. |
aad( ) |
ASCII adjust for division. Returns “ax”. |
aam( ) |
ASCII adjust for multiplication. Returns “ax”. |
aas( ) |
ASCII adjust for subtraction. Returns “ax”. |
cbw( ) |
Convert byte to word (sign extension). Returns “ax” |
cdq( ) |
Convert double to quadword. Returns “eax”. Note: in the future, this may return “edx:eax”. |
clc( ) |
Clear carry flag. Returns ““. |
cld( ) |
Clear direction flag. Returns ““. |
cli( ) |
Clear interrupt flag. Returns ““. |
clts() |
Clear task switched flag in CR0 (OS use only). |
cmc( ) |
Complement carry flag. Returns ““. |
cmpsb( ) |
Compares the byte at [esi] to the byte at [edi] and increments or decrements ESI & EDI by one. Returns "". |
cmpsd( ) |
Compares the dword at [esi] to the byte at [edi] and increments or decrements ESI & EDI by four. Returns "". |
cmpsw( ) |
Compares the word at [esi] to the byte at [edi] and increments or decrements ESI & EDI by two. Returns "". |
cpuid() |
On entry, EAX contains
zero, one, or two to determine how this instruction behaves. If EAX contains zero then
this instruction returns vendor information in EAX, EBX, ECX, and EDX. If EAX contains one upon
entry, EAX returns with version information and EDX contains feature
information. If EAX contains two upon
entry, EAX..EDX return with cache information. See the Intel documentation for more details concerning this instruction. |
cwd( ) |
Convert word to doubleword. Returns “ax”. Note: in the future, this may return “dx:ax”. |
cwde( ) |
Convert word to dword, extended. Returns “eax”. |
daa( ) |
Decimal adjust for addition. Returns “al”. |
das( ) |
Decimal adjust for subtraction. Returns “al”. |
hlt() |
Halt instruction (OS and embedded use only). |
insb( ) |
Inputs a byte from the port specified by DX and stores the byte at [EDI], then increments or decrements EDI by one. Returns "". |
insd( ) |
Inputs a dword from the port specified by DX and stores the dword at [EDI], then increments or decrements EDI by four. Returns "". |
insw( ) |
Inputs a word from the port specified by DX and stores the word at [EDI], then increments or decrements EDI by two. Returns "". |
into( ) |
Interrupt on overflow. Returns "". Raises the ex.IntoInstr exception if the overflow flag is set when you execute this instruction. |
invd() |
Invalidate internal caches (OS use only). |
iret( ) |
Interrupt return. Returns "". |
iretd( ) |
Interrupt return poping 32-bit flags. Returns "". |
lahf( ) |
Load AH from flags. Returns "al". |
leave( ) |
Remove activation record from stack. Returns "". |
lodsb( ) |
Load al from [ESI] and increment ESI by one. Returns "al". |
lodsd( ) |
Load eax from [ESI] and increment ESI by four. Returns "eax". |
lodsw( ) |
Load ax from [ESI] and increment ESI by two. Returns "ax". |
movsb( ) |
Moves a byte from the location specified by [ESI] to the location specified by [EDI], then increments or decrements ESI & EDI by one. Returns "". |
movsd( ) |
Moves a dword from the location specified by [ESI] to the location specified by [EDI], then increments or decrements ESI & EDI by four. Returns "". |
movsw( ) |
Moves a word from the location specified by [ESI] to the location specified by [EDI], then increments or decrements ESI & EDI by two. Returns "". |
nop( ) |
No operation. Returns "". |
outsb( ) |
Outputs the byte at address [ESI] to the port specified by DX, then increments or decrements ESI by one. Returns "". |
outsd( ) |
Outputs the dword at address [ESI] to the port specified by DX, then increments or decrements ESI by four. Returns "". |
outsw( ) |
Outputs the word at address [ESI] to the port specified by DX, then increments or decrements ESI by two. Returns "". |
popad( ) |
Pop all general purpose 32-bit registers from stack. Returns "". |
popa( ) |
Pop all general purpose 16-bit registers from stack. Returns "". |
popf( ) |
Pop 16-bit flags register from stack. Returns "". |
popfd( ) |
Pop 32-bit flags register from stack. Returns "". |
pusha( ) |
Push all general purpose 16-bit registers onto the stack. Returns "". |
pushad( ) |
Push all general purpose 32-bit registers onto the stack. Returns "". |
pushf( ) |
Push 16-bit flags register onto the stack. Returns "". |
pushfd( ) |
Push 32-bit flags register onto the stack. Returns "". |
rdmsr() |
Read from model specific register specified by ECX into EDX:EAX (OS use only). |
rdpmc() |
Read performance monitoring counter specified by ECX into EDX:EAX (OS use only). |
rdtsc() |
Reads the "time stamp" counter and returns the 64-bit result in edx:eax. |
rep.insb( ) |
Transfers ECX bytes from the port specified by DX to the location specified by [EDI]. Increments or decrements EDI by one after each transfer. Returns "". |
rep.insd( ) |
Transfers ECX dwords from the port specified by DX to the location specified by [EDI]. Increments or decrements EDI by four after each transfer. Returns "". |
rep.insw( ) |
Transfers ECX words from the port specified by DX to the location specified by [EDI]. Increments or decrements EDI by two after each transfer. Returns "". |
rep.movsb( ) |
Copies ECX bytes from the memory location specified by [ESI] to the location specified by [EDI]. Increments or decrements EDI & ESI by one after each transfer. Returns "". |
rep.movsd( ) |
Copies ECX dwords from the memory location specified by [ESI] to the location specified by [EDI]. Increments or decrements EDI & ESI by four after each transfer. Returns "". |
rep.movsw( ) |
Copies ECX words from the memory location specified by [ESI] to the location specified by [EDI]. Increments or decrements EDI & ESI by two after each transfer. Returns "". |
rep.outsb( ) |
Transfers ECX bytes from the the location specified by [ESI] to the port specified by DX. Increments or decrements EDI by one after each transfer. Returns "". |
rep.outsd( ) |
Transfers ECX dwords from the the location specified by [ESI] to the port specified by DX. Increments or decrements EDI by four after each transfer. Returns "". |
rep.outsw( ) |
Transfers ECX words from the the location specified by [ESI] to the port specified by DX. Increments or decrements EDI by two after each transfer. Returns "". |
rep.stosb( ) |
Copies CX bytes from AL to the location specified by [EDI]. Increments or decrements EDI by one after each transfer. Returns "". |
rep.stosd( ) |
Copies ECX dwords from EAX to the location specified by [EDI]. Increments or decrements EDI by four after each transfer. Returns "". |
rep.stosw( ) |
Copies ECX words from AX to the location specified by [EDI]. Increments or decrements EDI by two after each transfer. Returns "". |
repe.cmpsb( ) |
Compares ECX bytes starting at location [ESI] to the set of bytes at location [EDI] as long as the bytes are equal. The comparison stops once two unequal bytes are found. After each successful compare, this instruction increments or decrements ESI and EDI by one (and decrements ECX). Returns "". |
repe.cmpsd( ) |
Compares ECX dwords starting at location [ESI] to the set of dwords at location [EDI] as long as the dwords are equal. The comparison stops once two unequal dwords are found. After each successful compare, this instruction increments or decrements ESI and EDI by four (and decrements ECX). Returns "". |
repe.cmpsw( ) |
Compares ECX words starting at location [ESI] to the set of words at location [EDI] as long as the words are equal. The comparison stops once two unequal words are found. After each successful compare, this instruction increments or decrements ESI and EDI by two (and decrements ECX). Returns "". |
repe.scasb( ) |
Compares AL against ECX bytes starting at location [EDI] as long as the bytes are equal. The comparison stops once two unequal bytes are found. After each successful compare, this instruction increments or decrements EDI by one (and decrements ECX). Returns "". |
repe.scasd( ) |
Compares EAX against ECX dwords starting at location [EDI] as long as the dwords are equal. The comparison stops once two unequal dwords are found. After each successful compare, this instruction increments or decrements EDI by four (and decrements ECX). Returns "". |
repe.scasw( ) |
Compares AX against ECX words starting at location [EDI] as long as the words are equal. The comparison stops once two unequal words are found. After each successful compare, this instruction increments or decrements EDI by two (and decrements ECX). Returns "". |
repne.cmpsb( ) |
Compares ECX bytes starting at location [ESI] to the set of bytes at location [EDI] as long as the bytes are not equal. The comparison stops once two equal bytes are found. After each successful compare, this instruction increments or decrements ESI and EDI by one (and decrements ECX). Returns "". |
repne.cmpsd( ) |
Compares ECX dwords starting at location [ESI] to the set of dwords at location [EDI] as long as the dwords are not equal. The comparison stops once two equal dwords are found. After each successful compare, this instruction increments or decrements ESI and EDI by four (and decrements ECX). Returns "". |
repne.cmpsw( ) |
Compares ECX words starting at location [ESI] to the set of words at location [EDI] as long as the words are not equal. The comparison stops once two equal words are found. After each successful compare, this instruction increments or decrements ESI and EDI by two (and decrements ECX). Returns "". |
repne.scasb( ) |
Compares AL against ECX bytes starting at location [EDI] as long as the bytes are not equal. The comparison stops once two equal bytes are found. After each successful compare, this instruction increments or decrements EDI by one (and decrements ECX). Returns "". |
repne.scasd( ) |
Compares EAX against ECX dwords starting at location [EDI] as long as the dwords are not equal. The comparison stops once two equal dwords are found. After each successful compare, this instruction increments or decrements EDI by four (and decrements ECX). Returns "". |
repne.scasw( ) |
Compares AX against ECX words starting at location [EDI] as long as the words are not equal. The comparison stops once two equal words are found. After each successful compare, this instruction increments or decrements EDI by two (and decrements ECX). Returns "". |
rsm() |
Resume from system management mode (OS use only). |
sahf( ) |
Store AH into the flags register. Returns "ah". |
scasb( ) |
Compares the byte in al to the location specified by [EDI], then increments or decrements EDI by one. Returns "". |
scasd( ) |
Compares the dword in eax to the location specified by [EDI], then increments or decrements EDI by four. Returns "". |
scasw( ) |
Compares the word in ax to the location specified by [EDI], then increments or decrements EDI by two. Returns "". |
stc( ) |
Set the carry flag. Returns "". |
std( ) |
Set the direction flag. Returns "". |
sti( ) |
Set the interrupt flag. Returns "". |
stosb( ) |
Stores the byte in al to the location specified by [EDI], then increments or decrements EDI by one. Returns "". |
stosd( ) |
Stores the dword in eax to the location specified by [EDI], then increments or decrements EDI by four. Returns "". |
stosw( ) |
Stores the word in ax to the location specified by [EDI], then increments or decrements EDI by two. Returns "". |
ud2() |
Undefined opcode instruction. This instruction always raises an undefine opcode exception. |
wbinvd() |
Write back and invalidate cache (OS use only). |
wait( ) |
Coprocessor wait instruction. Returns "". |
xlat( ) |
Translate instruction. Returns "". |
Note: if the NULL-Operand
instructions appear as a stand-alone instruction (i.e., they are not part of an
instruction composition and, thus, appear as the operand to another
instruction), you can drop the "( )" after the instruction as long as
you terminate the instruction with a semicolon.
17.2
General Arithmetic and Logical Instructions
These instructions include adc, add, and, mov, or, sbb, sub, test, and xor.
They all take the same basic form (substitute the appropriate mnemonic
for "adc" in the syntax examples below):
Generic Form:
adc(
source, dest );
lock.adc(
source, dest );
Specific forms allowed:
adc( Reg8, Reg8 )
adc( Reg16, Reg16 )
adc( Reg32, Reg32 )
adc( const, Reg8 )
adc( const, Reg16 )
adc( const, Reg32 )
adc( const, mem )
adc( Reg8, mem )
adc( Reg16, mem )
adc( Reg32, mem )
adc( mem, Reg8 )
adc( mem, Reg16 )
adc( mem, Reg32 )
adc( Reg8, AnonMem )
adc( Reg16, AnonMem )
adc( Reg32, AnonMem )
adc( AnonMem, Reg8 )
adc( AnonMem, Reg16 )
adc( AnonMem, Reg32 )
Note: for the form "adc(
const, mem )", if the specified memory location does not have a size or
type associated with it, you must explicitly specify the size of the memory
operand, e.g., "adc(5,(type byte [eax]));"
These instructions all return
their destination operand as the "returns" value.
See Chapter Six in "Art of
Assembly" for a further discussion of these instructions.
If the "lock." prefix
is present, the instruction asserts the bus lock signal during execution. The "lock." prefix is valid
only on instructions that reference memory.
17.3
The XCHG Instruction
The xchg instruction allows the following syntactical forms:
Generic Form:
xchg( source, dest );
lock.xchg( source, dest
);
Specific Forms:
xchg( Reg8, Reg8 )
xchg( Reg8, mem )
xchg( Reg8, AnonMem)
xchg( mem, Reg8 )
xchg( AnonMem, Reg8 )
xchg( Reg16, Reg16 )
xchg( Reg16, mem )
xchg( Reg16, AnonMem)
xchg( mem, Reg16 )
xchg( AnonMem, Reg16 )
xchg( Reg32, Reg32 )
xchg( Reg32, mem )
xchg( Reg32, AnonMem)
xchg( mem, Reg32 )
xchg( AnonMem, Reg32 )
This instruction returns its
destination operand as its "returns" value.
See Chapter Six in "Art of
Assembly" for a further discussion of this instruction.
If the "lock." prefix
is present, the instruction asserts the bus lock signal during execution. The "lock." prefix is valid
only on instructions that reference memory.
17.4
The CMP Instruction
The "cmp" instruction uses the following general
forms:
Generic:
cmp( LeftOperand, RightOperand );
Specific Forms:
cmp(
Reg8, Reg8 );
cmp( Reg8, mem );
cmp( Reg8, AnonMem );
cmp( mem, Reg8 );
cmp( AnonMem, Reg8 );
cmp( Reg8, const );
cmp( Reg16, Reg16 );
cmp( Reg16, mem );
cmp( Reg16, AnonMem );
cmp( mem, Reg16 );
cmp( AnonMem, Reg16 );
cmp( Reg16, const );
cmp( Reg32, Reg32 );
cmp( Reg32, mem );
cmp( Reg32, AnonMem );
cmp( mem, Reg32 );
cmp( AnonMem, Reg32 );
cmp( Reg32, const );
cmp( mem, const );
Note that the CMP instruction’s
operands are ordered "dest, source" rather than the usual
"source,dest" format (that is, the operands are in the same order as
MASM expects them). This is to
allow an intuitive use of the instruction mnemonic (that is, CMP normally reads
as "compare dest to source.").
We will avoid this confusion by simply referring to the operands as the
"left operand" and the "right operand". Left vs. right signifies the placement
of the operands around a comparison operator like "<=" (e.g.,
"left <= right").
For the "cmp( mem, const
)" form, the memory operand must have a type or size associated with
it. When using anonymous memory
locations you must always coerce the type of the memory location, e.g., "cmp( (type word [ebp-4]), 0
);".
These instructions return their
dest (first) operand as their "returns" value.
17.5
The Multiply Instructions
HLA supports several variations
on the 80x86 "MUL" and IMUL instructions. The supported forms are:
Standard Syntax:
mul( reg8 )
mul( reg16)
mul( reg32 )
mul( mem )
mul( reg8, al )
mul( reg16, ax )
mul( reg32, eax )
mul( mem, al )
mul( mem, ax )
mul( mem, eax )
mul( AnonMem, ax )
mul( AnonMem, dx:ax )
mul( AnonMem, edx:eax )
imul( reg8 )
imul( reg16)
imul( reg32 )
imul( mem )
imul( reg8, al )
imul( reg16, ax )
imul( reg32, eax )
imul( mem, al )
imul( mem, ax )
imul( mem, eax )
imul( AnonMem, ax )
imul( AnonMem, dx:ax )
imul( AnonMem, edx:eax
)
intmul( const, Reg16 )
intmul( const, Reg16,
Reg16 )
intmul( const, mem,
Reg16 )
intmul( const, AnonMem,
Reg16 )
intmul( const, Reg32 )
intmul( const, Reg32,
Reg32 )
intmul( const, mem,
Reg32 )
intmul( const, AnonMem,
Reg32 )
intmul( Reg16, Reg16 )
intmul( mem, Reg16 )
intmul( AnonMem, Reg16
)
intmul( Reg32, Reg32 )
intmul( mem, Reg32 )
intmul( AnonMem, Reg32
)
Extended Syntax:
mul(
const, al )
mul( const, ax )
mul( const, eax )
imul( const, al )
imul( const, ax )
imul( const, eax )
The first, and probably most important,
thing to note about HLA’s multiply instructions is that HLA uses a different
mnemonic for the extended-precision integer multiply versus the
single-precision integer multiply (i.e., IMUL vs. INTMUL). Standard MASM syntax uses the same
mnemonic for both instructions.
There are two reasons for this change of syntax in HLA. First, there needed to be some way to
differentiate the "mul( const, al )" and the "intmul( const, al
)" instructions (likewise for the instructions involving AX and EAX). Second, the behavior of the INTMUL
instruction is substantially different from the IMUL instruction, so it makes
sense to use different mnemonics for these instructions.
The extended syntax
instructions create a static data variable, initialized with the specified
constant, and then specify the address of this variable as the source operand
of the MUL or IMUL instruction.
These instructions return their
destination operand (AX, DX:AX, or EDX:EAX for the extended precision MUL and
IMUL instructions) as their "returns" value.
See "The Art of Assembly Language Programming" for
more details on these instructions.
17.6
The Divide Instructions
HLA support several variations
on the 80x86 DIV and IDIV instructions.
The supported forms are:
Generic Forms:
div( source );
div( source, dest );
mod( source );
mod( source, dest );
idiv( source );
idiv( source, dest );
imod( source );
imod( source, dest );
Specific Forms:
div( reg8 )
div( reg16)
div( reg32 )
div( mem )
div( reg8, ax )
div( reg16, dx:ax)
div( reg32, edx:eax )
div( mem, ax )
div( mem, dx:ax)
div( mem, edx:eax )
div( AnonMem, ax )
div( AnonMem, dx:ax )
div( AnonMem, edx:eax )
mod( reg8 )
mod( reg16)
mod( reg32 )
mod( mem )
mod( reg8, ax )
mod( reg16, dx:ax)
mod( reg32, edx:eax )
mod( mem, ax )
mod( mem, dx:ax)
mod( mem, edx:eax )
mod( AnonMem, ax )
mod( AnonMem, dx:ax )
mod( AnonMem, edx:eax )
idiv( reg8 )
idiv( reg16)
idiv( reg32 )
idiv( mem )
idiv( reg8, ax )
idiv( reg16, dx:ax)
idiv( reg32, edx:eax )
idiv( mem, ax )
idiv( mem, dx:ax)
idiv( mem, edx:eax )
idiv( AnonMem, ax )
idiv( AnonMem, dx:ax )
idiv( AnonMem, edx:eax
)
imod( reg8 )
imod( reg16)
imod( reg32 )
imod( mem )
imod( reg8, ax )
imod( reg16, dx:ax)
imod( reg32, edx:eax )
imod( mem, ax )
imod( mem, dx:ax)
imod( mem, edx:eax )
imod( AnonMem, ax )
imod( AnonMem, dx:ax )
imod( AnonMem, edx:eax
)
Extended Syntax:
div( const, ax )
div( const, dx:ax )
div( const, edx:eax )
mod( const, ax )
mod( const, dx:ax )
mod( const, edx:eax )
idiv( const, ax )
idiv( const, dx:ax )
idiv( const, edx:eax )
imod( const, ax )
imod( const, dx:ax )
imod( const, edx:eax )
The destination operand is
always implied by the 80x86 "div" and "idiv" instructions
(AX, DX:AX, or EDX:EAX ). HLA
allows the specification of the destination operand in order to make your programs
easier to read (although the use of the destination operand is optional).
The HLA divide instructions
support an extended syntax that allows you to specify a constant as the divisor
(source operand). HLA allocates
storage in the static data segment and initializes the storage with the
specified constant, and then divides the accumulator by this newly specified
memory location.
The DIV and IDIV instructions
return "AL", "AX", or "EAX" as their
"returns" value (the quotient is left in the accumulator
register). The MOD and IMOD
instructions return "AH", "DX", or "EDX" as their
"returns" value. Indeed,
the "returns" value is the only difference between these
instructions. The DIV and MOD
instructions compile into the 80x86 DIV instruction; the IDIV and IMOD
instructions compile into the 80x86 IDIV instruction.
See the "Art of
Assembly" for a further discussion of these instructions.
17.7
Single Operand Arithmetic and Logical Instructions
These instructions include dec, inc, neg, and not.
They take the following general forms (substituting the specific
mnemonic for ‘dec’ as appropriate):
Generic Form:
dec( dest );
lock.dec( dest );
Specific forms allowed:
dec( Reg8 );
dec( Reg16 );
dec( Reg32 );
dec( mem );
Note: if mem is an untyped or unsized memory location (i.e., an
anonymous memory location), you must explicitly provide a size; e.g., "dec( (type word
[edi]));"
These instructions all return
their destination operand as the "returns" value.
See the "Art of
Assembly" for a further discussion of these instructions.
If the "lock." prefix
is present, the instruction asserts the bus lock signal during execution. The "lock." prefix is valid
only on instructions that reference memory.
17.8
Shift and Rotate Instructions
These instructions include RCL, RCR, ROL, ROR, SAL, SAR, SHL, and SHR.
These instructions support the following generic syntax, making the
appropriate mnemonic substitution.
Generic Form:
shl( count, dest );
Specific Forms:
shl( const, Reg8 );
shl( const, Reg16 );
shl( const, Reg32 );
shl( const, mem );
shl( cl, Reg8 );
shl( cl, Reg16 );
shl( cl, Reg32 );
shl( cl, mem );
The "const" operand is an unsigned integer constant
between zero and the maximum number of bits in the destination operand. The forms with a memory operand must
have a type or size associated with the operand; e.g., when using anonymous memory locations, you must coerce
the type,
"shl( 2, (type dword [esi]));"
These instructions return their
destination operand as their "returns" value.
See the "Art of
Assembly" for a further discussion of these instructions.
17.9
The Double Precision Shift Instructions
These instruction use the
following general form (you can substitute SHRD for SHLD below):
Generic Form:
shld( count, source,
dest )
Specific Forms:
shld( const, Reg16,
Reg16 )
shld( const, Reg16, mem
)
shld( const, Reg16,
AnonMem )
shld( cl, Reg16, Reg16
)
shld( cl, Reg16, mem )
shld( cl, Reg16,
AnonMem )
shld( const, Reg32,
Reg32 )
shld( const, Reg32, mem
)
shld( const, Reg32,
AnonMem )
shld( cl, Reg32, Reg32
)
shld( cl, Reg32, mem )
shld( cl, Reg32,
AnonMem )
These instructions return their
destination operand as the "returns" value.
See the "Art of
Assembly" for a further discussion of these instructions.
17.10
The Lea
Instruction
These instructions use the
following syntax:
lea( Reg32, memory )
lea( Reg32, AnonMem )
lea( Reg32, ProcID )
lea( Reg32, LabelID )
Extended Syntax:
lea( Reg32,
StringConstant )
lea( Reg32, const
ConstExpr )
lea( memory, Reg32 )
lea( AnonMem, Reg32 )
lea( ProcID, Reg32 )
lea( LabelID, Reg32 )
lea( StringConstant,
Reg32 )
lea( const ConstExpr,
Reg32 )
The "lea" instruction loads the specified 32-bit register
with the address of the specified memory operand, procedure, or statement
label. Note in the extended syntax
you can reverse the order of the operands. Since exactly one operand must be a register, there is no
ambiguity between the two forms (this syntax was added to satisfy those who
complained about the (reg,memory) syntax). Of course, good programming style suggests that you use only
one form (either reg,memory or memory, reg) within your programs.
The extended syntax form lets
you specify a constant rather than a memory address. There is no such thing as the address of a constant, but HLA
will create a memory variable in the constants data segment and initialize that
variable with the value of the specified memory constant and then load the
address of this variable into the specified register (or push it onto the
stack).
There is a subtle difference
between the following two instructions:
lea( eax, "String" );
lea( eax, const
"String" );
The first instruction loads EAX
with the address of the first character of the literal string constant. The second form loads the EAX register
with the address of a string variable (which is a pointer containing the
address of the first character of the string literal).
The LEA instructions return the
32-bit register as their "returns" value.
See Chapter Six in "Art of
Assembly" for a further discussion of the LEA instruction.
Note: HLA does not support an LEA instruction that
loads a 16-bit address into a 16-bit register. That form of the LEA instruction is not very useful in
32-bit programs running on 32-bit operating systems.
17.11
The Sign and Zero Extension Instructions
The HLA MOVSX and MOVZX instructions use the following syntax:
Generic Forms:
movsx(
source, dest );
movzx(
source, dest );
Specific Forms:
movsx( Reg8, Reg16 )
movsx( Reg8, Reg32 )
movsx( Reg16, Reg32 )
movsx( mem8, Reg16 )
movsx( mem8, Reg32 )
movsx( mem16, Reg32 )
movzx( Reg8, Reg16 )
movzx( Reg8, Reg32 )
movzx( Reg16, Reg32 )
movzx( mem8, Reg16 )
movzx( mem8, Reg32 )
movzx( mem16, Reg32 )
These instructions sign (MOVSX)
or zero (MOVZX) extend their source operand into the destination operand. They return their destination operand
as their "returns" value.
See the "Art of
Assembly" for a further discussion of these instructions.
17.12
The Push and Pop Instructions
These instructions take the
following general forms:
pop( reg16 );
pop( reg32 );
pop( mem );
push( Reg16 )
push( Reg32 )
push( memory )
pushw( Reg16 )
pushw( memory )
pushw( AnonMem )
pushw( Const )
pushd( Reg32 )
pushd( memory )
pushd( AnonMem )
pushd( Const )
These instructions push or pop
their specified operand. They all
return their operand as their "returns" value.
17.13
Procedure Calls
HLA provides several different ways to call a
procedure. Given a procedure named
"MyProc", any of the following syntaxes are legal:
MyProc(
parameter_list );
call(
MyProc );
call
MyProc;
If MyProc has a set of declared
parameters, the number and types of actual parameters must match the number and
types of the formal parameters.
HLA will emit the code needed to push the parameter list on the
stack. In the two call statements
above, it is the programmer’s responsibility to pass any needed
parameters. For more details, see
the section on procedure declarations.
In the examples above, MyProc can either be the name of an actual procedure or a
procedure variable (that is a pointer to a procedure declared as
"myproc:procedure( parameters );" in the VAR or a static section). If you need to call a procedure using an anonymous memory
variable (i.e., an addressing mode like [ebx]), an untyped dword value, or via
a register, you must use the syntax of the second call above, e.g., "call(
ebx );". Of course, any legal
HLA/80x86 address mode would be legal here.
When declaring a standard
procedure, the procedure declaration syntax allows you to specify a
"returns" value for that procedure, e.g.,
procedure MyProc; returns(
"eax" );
HLA substitutes the string that
appears as the "returns" argument for the call when using the first
syntax above. For example,
supposing that MyProc is
a function returning its result in EAX, you could use the following to call MyProc and save the return value in the
"Result" variable:
mov( MyProc(), Result );
For more details, see the
section on procedure declarations.
To call a class procedure, one
would use one of the following syntaxes:
className.ProcName( parameters );
call(
className.ProcName );
call
ClassName.ProcName;
objectName.ProcName( parameters );
call(
objectName.ProcName );
call
objectName.ProcName;
The difference between "className" and "objectName" is that "className" represents the actual name of the class data
type whereas "objectName"
represents the name of an instance of this class (i.e., a variable of type
"className" declared
in the VAR or a static section).
When calling a class procedure, HLA loads the ESI
register with the address of the object before calling the specified
procedure. Since there is no
instance variable (object) associated with the className form, HLA loads ESI with zero (NULL). Inside the class procedure you can test
the value of ESI to determine if the procedure was called via the class name or
an object name. This is quite useful,
for example when writing constructors, to determine whether the procedure needs
to allocate storage for an object.
Consider the following program that demonstrates the use of an object
constructor (create):
program demo;
#include( "memory.hhf" );
#include( "stdio.hhf" );
type
cc: class
var
i:int32;
procedure
create; returns( "esi"
);
endclass;
var
ccVar: cc;
ccPtr: pointer to cc;
static
ccStat:cc;
procedure cc.create; @nodisplay;
begin create;
push( eax );
if( esi = 0 ) then
stdout.put(
"Allocating" nl );
malloc(
@size( cc ));
mov(
eax, esi );
else
stdout.put(
"Already allocated" nl );
endif;
mov( &cc._VMT_,
this._pVMT_ );
mov( 0, this.i );
pop( eax );
end create;
begin demo;
// This first call to
create allocates storage.
mov( cc.create(), ccPtr
);
// In all the remaining
calls, ESI is loaded with
// the address of the
object and no storage is
// created.
ccPtr.create();
ccVar.create();
ccStat.create();
end demo;
The call( ) statement allows
any one of the following syntaxes:
call ProcID;
call( ProcID );
call( dwordvar );
call( anonmem ); //
Addressing mode like [ebx].
call( Reg32 );
The second form above returns
the string (if any) specified by ProcID’s "returns" option. The remaining call instructions return
the empty string as their "returns" value.
You may also call an iterator
procedure via the CALL instruction.
However, it is your responsibility to set up the parameters and other
state information prior to the call (see the section on iterators for more
details).
17.14
The Ret Instruction
The RET( ) statement allows two
syntactical forms:
ret( );
ret( integer_constant_expression );
The first form emits a simple
80x86 RET instruction, the second form emits the 80x86 RET instruction with the
specified numeric constant expression value (used to remove parameters from the
stack).
Normally, you would use these
instructions in a procedure that has the "@noframe" option. Unless you know exactly what you are doing, you should never
use the "RET" instruction inside a standard HLA procedure without
this option since doing so almost always produces disasterous results. If you do use this instruction within
such a procedure, it is your responsibility to deallocate local variables and
the display (if any), restore EBP, and remove any parameters from the stack.
17.15
The Jmp Instructions
The HLA "jmp" instruction supports the following
syntax:
jmp Label;
jmp ProcedureName;
jmp( dwordMemPtr );
jmp( anonMemPtr );
jmp( reg32 );
"Label" represents a statement label in the current
procedure. (You are not allowed to jump to labels in other procedures in the
current version of HLA. This
restriction may be relaxed somewhat in future versions.) A statement label is a unique (within
the current procedure) identifier with a colon after the identifier, e.g.,
InfiniteLoop:
<<
Code inside the infinite loop>>
jmp
InfiniteLoop;
Jumping to a procedure
transfers control to the first instruction in the specified procedure. You are responsible for explicitly
pushing any parameters and the return address for that procedure.
These instructions all return
the empty string as their "returns" value.
17.16
The Conditional Jump Instructions
These instructions include JA, JAE, JB, JBE, JC, JE, JG, JGE, JL, JLE, JO, JP, JPE, JPO, JS, JZ, JNA, JNAE, JNB, JNBE, JNC, JNE, JNG, JNGE, JNL, JNLE, JNO, JNP, JNS, JNZ, JCXZ, JECXZ, LOOP, LOOPE, LOOPZ, LOOPNE, and LOOPNZ.
They all take the following generic form (substituting the appropriate
instruction for "JA").
ja LocalLabel;
"LocalLabel" must be a statement label defined in the
current procedure (or a globally visible label declared in a label section or a
global label defined with the “::” symbol).
These instructions all return
the empty string as their "returns" value.
Note: due to the nature of the HLA compilation
process, you should avoid the use of the JCXZ, JECXZ, LOOP, LOOPE, LOOPZ,
LOOPNE, and LOOPNZ instructions.
Unlike the other conditional jump instructions, these instructions have
a very limited +/- 128 range.
Unfortunately, HLA cannot detect if the branch is out of range (this
task is handled by back-end assembler), so if a range error occurs, HLA cannot
warn you about this. The back-end
assembly will fail, but the result will be hard to decipher. Fortunately, these instructions are
easily, and usually more efficiently, implemented using other 80x86 instructions
so this should not prove to be a problem.
In a few special cases, the
boolean constants "true" and "false" are legal labels. See the discussion of HLA’s high level
language features for more details.
17.17
The Conditional Set Instructions
These instructions include: SETA, SETAE, SETB, SETBE, SETC, SETE, SETG, SETGE, SETL, SETLE, SETO, SETP, SETPE, SETPO, SETS, SETZ, SETNA, SETNAE, SETNB, SETNBE, SETNC, SETNE, SETNG, SETNGE, SETNL, SETNLE, SETNO, SETNP, SETNS, and SETNZ.
They take the following generic forms (substituting the appropriate
mnemonic for seta):
seta( Reg8 )
seta( mem )
seta( AnonMem )
See the "Art of
Assembly" for a further discussion of these instructions.
17.18
The Conditional Move Instructions
These instructions include CMOVA, CMOVAE, CMOVB, CMOVBE, CMOVC, CMOVE, CMOVG, CMOVGE, CMOVL, CMOVLE, CMOVO, CMOVP, CMOVPE, CMOVPO, CMOVS, CMOVZ, CMOVNA, CMOVNAE, CMOVNB, CMOVNBE, CMOVNC, CMOVNE, CMOVNG, CMOVNGE, CMOVNL, CMOVNLE, CMOVNO, CMOVNP, CMOVNS, and CMOVNZ.
They use the following general syntax:
CMOVcc( src, dest );
Allowable operands:
CMOVcc( reg16, reg16 );
CMOVcc( reg32,
reg32 );
CMOVcc( mem16,
reg16 );
CMOVcc( mem32,
reg32 );
These instructions move the
data if the specified condition is true (specified by the cc condition).
If the condition is false, these instructions behave like a
no-operation.
17.19
The Input and Output Instructions
The "in" and "out" instructions use the following syntax:
in( port, al )
in( port, ax )
in( port, eax )
in( dx, al )
in( dx, ax )
in( dx, eax )
out( al, port )
out( ax, port )
out( eax, port )
out( al, dx )
out( ax, dx )
out( eax, dx )
The "port" parameter
must be an unsigned integer constant in the range 0..255. The IN instructions return the
accumulator register (AL, AX, or EAX) as their "returns" value. The OUT instructions return the port
number (or DX) as their "returns" value.
Note that these instructions
may be priviledged instructions when running under Win32 or Linux. Their use may generate a fault in
certain instances or when accessing certain ports.
See the "Art of
Assembly" for a further discussion of these instructions.
17.20
The Interrupt
Instruction
This instruction uses the
syntax "int( constant)" where the constant operand is
an unsigned integer value in the range 0..255.
This instruction returns the
empty string as its "returns" value.
See Chapter Six in "Art of
Assembly" (DOS version) for a further discussion of this instruction. Note, however, that one generally does
not use "int" under Win32 to make OS or BIOS calls. The "int $80" instruction is
what you’d normally use to make very low-level *NIX calls.
17.21
Bound Instruction
This instruction takes the following forms:
bound( Reg16, mem )
bound( Reg16, AnonMem )
bound( Reg32, mem )
bound( Reg32, AnonMem )
Extended Syntax Form:
bound( Reg16, constL, constH )
bound( Reg32, ConstL, ConstH )
These instructions return the
register as their "returns" value.
The extended syntax forms emit
the two constants to the static data segment and substitute the address of the
first constant (ConstL) as
their memory operand.
The BOUND instruction compares
the register operand against the two constants (or the two consecutive memory
locations at the specified address).
If the register value is outside the range specified by the operand(s),
then the 80x86 CPU raises an ex.BoundInstr exception. You can handle this exception using the TRY..ENDTRY HLL
statement in HLA.
Because the BOUND instruction
tends to be slow, and of course it consumes memory, many programmers don’t use
it as often as they should for fear it will make their programs less
efficient. HLA solves this problem
through the use of the "@bound" compile-time pseudo-variable. If @bound contains true (the default
value) then HLA will compile the BOUND instruction and it will behave
normally. If @bound contains
false, then HLA will not emit any code for the bound instruction (this is
similar to "asserts" in C/C++).
You can set the value of @bound in the VAL section or with the
"?" operator, e.g.,
?@bound := false;
// Code that ignores
BOUND instructions
.
.
.
?@bound := true;
// BOUND instructions
are active again.
17.22
The Enter Instruction
The ENTER instruction uses the
syntax: "enter( const, const );". The first constant operand is the number of bytes of local
variables in a procedure, the second constant operand is the lex level of the
procedure. As a general rule, you
should not use this instruction (and the corresponding LEAVE)
instructions. HLA procedures automatically
construct the display and activation record for you (more efficiently than when
using ENTER).
See the "Art of
Assembly" for a further discussion of this instruction and the LEAVE instruction.
17.23
CMPXCHG Instruction
This instruction uses the
following syntax:
Generic Form:
cmpxchg( reg/mem, reg );
lock.cmpxchg( reg/mem, reg);
Specific Forms:
cmpxchg( Reg8, Reg8 )
cmpxchg( Reg8, Memory )
cmpxchg( Reg8, AnonMem
)
cmpxchg( Reg16, Reg16 )
cmpxchg( Reg16, Memory
)
cmpxchg( Reg16, AnonMem
)
cmpxchg( Reg32, Reg32 )
cmpxchg( Reg32, Memory
)
cmpxchg( Reg32, AnonMem
)
This instruction returns the
empty string as its "returns" value.
See the "Art of
Assembly" for a further discussion of this instruction.
If the "lock." prefix
is present, the instruction asserts the bus lock signal during execution. The "lock." prefix is valid
only on instructions that reference memory.
17.24
CMPXCHG8B Instruction
This instruction uses the
following syntax:
Generic Form:
cmpxchg( mem64 );
lock.cmpxchg8b( mem64);
This instruction compares
edx:eax with the specified qword operand.
If the values are equal, this instruction stores the value in ECX:EBX
into the destination operand;
otherwise it loads the memory operand into EDX:EAX.
This instruction returns the
empty string as its "returns" value.
See the "Art of
Assembly" for a further discussion of this instruction.
If the "lock." prefix
is present, the instruction asserts the bus lock signal during execution. The "lock." prefix is valid
only on instructions that reference memory.
17.25
The XADD Instruction
The XADD instruction uses the
following syntax:
Generic Form:
xadd( source, dest );
lock.xadd( source, dest );
Specific Forms:
xadd( Reg8, Reg8 )
xadd( mem, Reg8 )
xadd( AnonMem, Reg8 )
xadd( Reg16, Reg16 )
xadd( mem, Reg16 )
xadd( AnonMem, Reg16 )
xadd( Reg32, Reg32 )
xadd( mem, Reg32 )
xadd( AnonMem, Reg32 )
This instruction returns its
destination operand as its "returns" value.
See the "Art of
Assembly" for a further discussion of this instruction.
If the "lock." prefix
is present, the instruction asserts the bus lock signal during execution. The "lock." prefix is valid
only on instructions that reference memory.
17.26
BSF and BSR Instructions
The bit scan instructions use the following syntax
(substitute BSR for BSF as appropriate):
Generic Form:
bsr( source, dest );
Specific Forms Allowed:
bsf( Reg16, Reg16 );
bsf( mem, Reg16 );
bsf( AnonMem, Reg16 );
bsf( Reg32, Reg32 );
bsf( mem, Reg32 );
bsf( AnonMem, Reg32 );
These instructions return the
destination register as their "returns" value.
See the "Art of
Assembly" for a further discussion of these instructions.
17.27
The BSWAP Instruction
This instruction takes the form
"bswap( reg32 )". It converts between little endian and big endian data formats in the specified 32-bit
register.
It returns the 32-bit register
as its "returns" value.
See the "Art of
Assembly" for a further discussion of this instruction.
17.28
Bit Test Instructions
This group of instructions
includes BT, BTC, BTR, and BTS.
They allow the following generic forms:
Generic Form:
bt( BitNumber, Dest );
Specific Forms:
bt( const, Reg16 );
bt( const, Reg32 );
bt( const, mem );
bt( Reg16, Reg16 );
bt( Reg16, mem );
bt( Reg16, AnonMem );
bt( Reg32, Reg32 );
bt( Reg32, mem );
bt( Reg32, AnonMem );
bt( Reg16, CharacterSetVariable );
bt( Reg32, CharacterSetVariable );
Substitute the BTC, BTR, or BTS
mnemonic for BT in the examples above for these other instructions. The BTC, BTR, and BTS instructions also
allow a "lock." prefix, e.g., "lock.btc( reg32, mem
);" If the "lock."
prefix is present, the instruction asserts the bus lock signal during
execution. The "lock."
prefix is valid only on instructions that reference memory.
These instruction return the
destination operand as their "returns" value.
Notice the two special forms
that allow character set variables.
HLA actually casts these 16-byte objects as word or dword memory variables,
but they otherwise work just fine with cset objects.
Special forms available only
with the BT instruction:
bt( reg16, CharacterSetConstant );
bt( reg32, CharacterSetConstant );
These two forms return the
source register (BitNumber) as their "returns" value. Note that HLA will create a phantom
variable that contains the character set constant and then supplies the name of
this constant, effectively making these two instruction equivalent to "bt(
reg, CharacterSetVariable);".
See the "Art of Assembly"
for a further discussion of these instructions.
17.29
Floating Point Instructions
HLA supports the following FPU
instructions. Note: all FPU
instructions have a "returns" value of "st0" unless
otherwise noted.
fld( FPreg );
fst( FPreg );
fld( FPmem ); //
Returns operand.
fst( FPmem ); //
32 and 64-bits only! Returns
operand.
fstp( FPmem ); //
Returns operand.
fxch( FPreg );
fild( FPmem ); //
Returns operand.
fist( FPmem ); //
32 and 64-bits only! Returns
operand.
fistp( FPmem ); //
Returns operand.
fbld( FPmem ); //
Returns operand.
fbstp( FPmem ); //
Returns operand.
fadd( );
fadd( FPreg, st0 );
fadd( st0, FPreg );
fadd( FPmem ); //
Returns operand.
fadd( FPconst ); //
Returns operand.
faddp( );
faddp( st0, FPreg );
fmul( );
fmul( FPreg, st0 );
fmul( st0, FPreg );
fmul( FPmem ); //
Returns operand.
fmul( FPconst ); //
Returns operand.
fmulp( );
fmulp( st0, FPreg );
fsub( );
fsub( FPreg, st0 );
fsub( st0, FPreg );
fsub( FPmem ); //
Returns operand.
fsub( FPconst ); //
Returns operand.
fsubp( );
fsubp( st0, FPreg );
fsubr( );
fsubr( FPreg, st0 );
fsubr( st0, FPreg );
fsubr( FPmem ); //
Returns operand.
fsubr( FPconst ); //
Returns operand.
fsubrp( );
fsubrp( st0, FPreg );
fdiv( );
fdiv( FPreg, st0 );
fdiv( st0, FPreg );
fdiv( FPmem ); //
Returns operand.
fdiv( FPconst ); //
Returns operand.
fdivp( );
fdivp( st0, FPreg );
fdivr( );
fdivr( FPreg, st0 );
fdivr( st0, FPreg );
fdivr( FPmem ); //
Returns operand.
fdivr( FPconst ); //
Returns operand.
fdivrp( );
fdivrp( st0, FPreg );
fiadd( mem16 ); //
Returns operand.
fiadd( mem32 ); //
Returns operand.
fiadd( const ); //
Returns operand.
fimul( mem16 ); //
Returns operand.
fimul( mem32 ); //
Returns operand.
fimul( const ); //
Returns operand.
fidiv( mem16 ); //
Returns operand.
fidiv( mem32 ); //
Returns operand.
fidiv( mem32 ); //
Returns operand.
fidiv( const ); //
Returns operand.
fidivr( mem16 ); //
Returns operand.
fidivr( mem32 ); //
Returns operand.
fidivr( const ); //
Returns operand.
fcom( );
fcom( FPreg );
fccom( FPmem ); //
Returns operand.
fcomp( );
fcomp( FPreg );
fcomp( FPmem ); //
Returns operand.
fucom( );
fucom( FPreg );
fucomp( );
fucomp( FPreg );
fcompp();
fucompp();
ficom( mem16 ); //
Returns operand.
ficom( mem32 ); //
Returns operand.
ficom( const ); //
Returns operand.
ficomp( mem16 ); //
Returns operand.
ficomp( mem32 ); //
Returns operand.
ficomp( const ); //
Returns operand.
fsqrt(); //
The following all return "st0"
fscale();
fprem();
fprem1();
frndint();
fxtract();
fabs();
fchs();
ftst();
fxam();
fldz();
fld1();
fldpi();
fldl2t();
fldl2e();
fldlg2();
fldln2();
f2xm1();
fsin();
fcos();
fsincos();
fptan();
fpatan();
fyl2x();
fyl2xp1();
finit(); //
Returns ""
fwait();
fclex();
fincstp();
fdecstp();
fnop();
ffree( FPreg );
fldcw( mem );
fstcw( mem );
fstsw( mem );
See the chapter on real
arithmetic in "The Art of Assembly Language Programming" for details
on these instructions. Note that
HLA does not support the entire FPU instruction set. If you absolutely need the few remaining instructions, use
the #ASM..#ENDASM or #EMIT directives to generate them.
Note: prior to HLA v1.102, the
fadd(), fsub(), fsubr(), fmul(), fdiv(), and fdivr() instructions all emitted
the same code as the “pop” variants of these instructions (e..g, fadd() was the
same as faddp()). This was a design flaw in the HLA language that was corrected
in HLA v1.102. These instruction no longer affect the floating-point stack
pointer. That is, an instruction like fadd() will add ST1 to ST0 without
removing anything from the stack.
This may create some problems for older code that used the non-pop
variants and expected them to pop the top item from the stack after the
completion of the instruction. Be aware of this.
17.30
Additional Floating
Point Instructions for Pentium Pro and Later Processors
The FCMOVcc instructions (cc= a, ae, b, be, na, nae,
nb, nbe, e, ne, u, nu) use the following basic syntax:
FCMOVcc( stn, st0); // n=0..7
They move the specified floating
point register to ST0 if the specified condition is true.
The FCOMI and FCOMIP instructions use the following syntax:
fcomi( st0, stn );
fcomip( st0, stn
);
These instructions behave like
their (syntactical equivalent) FCOM and FCOMP brethren except they store the
status in the EFLAGs register directly rather than in the floating point status
register.
17.31
MMX Instructions
HLA supports the following MMX
instructions found on the Pentium and later processors (note that some
instructions are only available on Pentium III and later processors; see the
Intel reference manuals for details):
HLA uses the symbols mm0, mm1,
..., mm7 for the MMX register set.
The following MMX instructions
all use the same syntax. The
syntax is
mmxInstr( mmxReg, mmxReg );
mmxInstr( mem64, mmxReg );
mmxInstrs:
paddb
paddw
paddd
paddsb
paddsw
paddusb
paddusw
psubb
psubw
psubd
psubsb
psubsw
psubusb
psubusw
pmulhuw
pmulhw
pmullw
pmaddwd
pavgb
pavgw
pcmpeqb
pcmpeqw
pcmpeqd
pcmpgtb
pcmpgtw
pcmpgtd
packsswb
packuswb
packssdw
punpcklbw
punpcklwd
punpckldq
punpckhbw
punpckhwd
punpckhdq
pand
pandn
por
pxor
pmaxsw
pmaxub
pminsw
pminub
psadbw
The following MMX instructions
require a special syntax. The
syntax is listed for each instruction.
pextrw(
constant, mmxReg, Reg32 );
pinsrw( constant, Reg32, mmxReg );
pmovmskb( mmxReg, Reg32 );
pshufw( constant, mmxReg, mmxReg );
pshufw( constant, mem64, mmxReg );
movd( mem32, mmxReg );
movd( mmxReg, mem32 );
movq( mem64, mmxReg );
movq( mmxReg, mem64 );
emms();
The following MMX shift instructions
also require a special syntax.
They allow the following two forms:
mmxshift( immConst, mmxReg );
mmxshift( mmxReg, mmxReg );
psllw
pslld
psllq
psrlw
psrld
psrlq
psraw
psrad
Note that the psllw,
psrlw, and psraw instructions only allow an immediate constant in
the range 0..15, the pslld, psrld, and psrad
instructions only allow constants in the range 0..31, the psllq and psrlq instructions only allow immediate constants in the
range 0..63.
Please see the appropriate
Intel documentation or "The Art of Assembly Language" for a discussion of the behavior of
these instructions.
17.32
SSE Instructions
HLA supports the following SSE
and SSE/2 instructions found on the Pentium III, IV, and later processors (note
that some instructions are only available on Pentium IV and later processors;
see the Intel reference manuals for details):
HLA uses the symbols xmm0,
xmm1, ..., xmm7 for the SSE register set.
SSE Instrs:
addsd( sseReg/mem128, sseReg );
addpd( sseReg/mem128, sseReg );
addps( sseReg/mem128, sseReg );
addss( sseReg/mem128, sseReg );
andnpd( sseReg/mem128, sseReg
);
andnps( sseReg/mem128, sseReg
);
andpd( sseReg/mem128, sseReg );
andps( sseReg/mem128, sseReg );
clflush( mem8 );
cmppd( imm8, sseReg/mem128,
sseReg );
cmpps( imm8, sseReg/mem128,
sseReg );
cmpsdp( imm8, sseReg/mem64,
sseReg );
cmpss( imm8, sseReg/mem32,
sseReg );
cmpeqss( sseReg, sseReg );
cmpltss( sseReg, sseReg );
cmpless( sseReg, sseReg );
cmpneqss( sseReg, sseReg );
cmpnlts( sseReg, sseReg );
cmpnles( sseReg, sseReg );
cmpords( sseReg, sseReg );
cmpunordss( sseReg, sseReg );
cmpeqsd( sseReg, sseReg );
cmpltsd( sseReg, sseReg );
cmplesd( sseReg, sseReg );
cmpneqsd( sseReg, sseReg );
cmpnlts( sseReg, sseReg );
cmpnles( sseReg, sseReg );
cmpords( sseReg, sseReg );
cmpunords( sseReg, sseReg );
cmpeqps( sseReg, sseReg );
cmpltps( sseReg, sseReg );
cmpleps( sseReg, sseReg );
cmpneqps( sseReg, sseReg );
cmpnltp( sseReg, sseReg );
cmpnleps( sseReg, sseReg );
cmpordps( sseReg, sseReg );
cmpunordps( sseReg, sseReg );
cmpeqpd( sseReg, sseReg );
cmpltpd( sseReg, sseReg );
cmplepd( sseReg, sseReg );
cmpneqpd( sseReg, sseReg );
cmpnltpd( sseReg, sseReg );
cmpnlepd( sseReg, sseReg );
cmpordpd( sseReg, sseReg );
cmpunordpd( sseReg, sseReg );
comisd( sseReg/mem64, sseReg );
comiss( sseReg/mem32, sseReg );
cvtdq2pd( sseReg/mem64, sseReg
);
cvtdq2pq
cvtdq2ps( sseReg/mem128, sseReg
);
cvtpd2dq( sseReg/mem128, sseReg
);
cvtpd2pi( sseReg/mem128, mmxReg
);
cvtpd2ps( sseReg/mem128, sseReg
);
cvtpi2pd( sseReg/mem64, sseReg
);
cvtpi2ps( sseReg/mem64, sseReg
);
cvtpi2ss
cvtps2dq( sseReg/mem128, sseReg
);
cvtps2pd( sseReg/mem64, sseReg
);
cvtps2pi( sseReg/mem64, sseReg
);
cvtsd2si( sseReg/mem64, Reg32
);
cvtsi2sd( Reg32/mem32, sseReg
);
cvtsi2ss( sseReg/mem64, sseReg
);
cvtss2sd( sseReg/mem32, sseReg
);
cvtsd2ss( Reg32/mem32, sseReg
);
cvtss2si( sseReg/mem32, Reg32
);
cvttpd2pi( sseReg/mem128,
mmxReg );
cvttpd2dq( sseReg/mem128,
sseReg );
cvttps2dq( sseReg/mem128,
sseReg );
cvttps2pi( sseReg/mem64, mmxReg
);
cvttsd2si( sseReg/mem64, Reg32
);
cvttss2si( sseReg/mem32, Reg32
);
divpd( sseReg/mem128, sseReg );
divps( sseReg/mem128, sseReg );
divsd( sseReg/mem64, sseReg );
divss( sseReg/mem32, sseReg );
fxsave( mem512 );
fxrstor( mem512 );
ldmxcsr( mem32 );
lfence
maskmovdqu( sseReg, sseReg );
maskmovq( mmxReg, mmxReg );
maxpd( sseReg/mem128, sseReg );
maxps( sseReg/mem128, sseReg );
maxsd( sseReg/mem64, sseReg );
maxss( sseReg/mem32, sseReg );
mfence
minpd( sseReg/mem128, sseReg );
minps( sseReg/mem128, sseReg );
minsd( sseReg/mem64, sseReg );
minss( sseReg/mem32, sseReg );
movapd( sseReg/mem128, sseReg
);
movapd( sseReg, sseReg/mem128
);
movaps( sseReg/mem128, sseReg
);
movaps( sseReg, sseReg/mem128
);
movdqa( sseReg/mem128, sseReg
);
movdqa( sseReg, sseReg/mem128
);
movdqu( sseReg/mem128, sseReg
);
movdqu( sseReg, sseReg/mem128
);
movdq2q( sseReg, mmxReg );
movhlps( sseReg, sseReg );
movhpd( mem64, sseReg );
movhpd( sseReg, mem64 );
movhps( mem64, sseReg );
movhps( sseReg, mem64 );
movlpd( mem64, sseReg );
movlpd( sseReg, mem64 );
movlps( mem64, sseReg );
movlps( sseReg, mem64 );
movlhps( sseReg, sseReg );
movmskpd( sseReg, Reg32 );
movmskps( sseReg, Reg32 );
movnti( Reg32, mem32 );
movntpd( sseReg, mem128 );
movntps( sseReg, mem128 );
movntq( mmxReg, mem64 );
movntdq( sseReg, mem128 );
movq2dq( mmxReg, sseReg );
movsdp( sseReg, sseReg );
movsdp( mem64, sseReg );
movsdp( sseReg, mem64 );
movss( sseReg, sseReg );
movss( mem32, sseReg );
movss( sseReg, mem32 );
movupd( sseReg, sseReg );
movupd( sseReg, mem128 );
movupd( mem128, sseReg );
movups( sseReg, sseReg );
movups( sseReg, mem128 );
movups( mem128, sseReg );
mulpd( sseReg/mem128, sseReg );
mulps( sseReg/mem128, sseReg );
mulss( sseReg/mem32, sseReg );
mulsd( sseReg/mem64, sseReg );
orpd( sseReg/mem128, sseReg );
orps( sseReg/mem128, sseReg );
pause
pmuludq( mmxReg/mem64, mmxReg
);
pmuludq( sseReg/mem128, sseReg
);
prefetcht0( mem8 );
prefetcht1( mem8 );
prefetcht2( mem8 );
prefetchnta( mem8 );
pshufd( imm8, sseReg/mem128,
sseReg );
pslldq( imm8, sseReg );
psrldq( imm8, sseReg );
punpckhqdq( sseReg/mem128,
sseReg );
punpcklqdq( sseReg/mem128,
sseReg );
rcpps( sseReg/mem128, sseReg );
rcpss( sseReg/mem128, sseReg );
rsqrtps( sseReg/mem128, sseReg
);
rsqrtss( sseReg/mem32, sseReg
);
sfence;
shufpd( imm8, sseReg/mem128,
sseReg );
shufps( imm8, sseReg/mem128,
sseReg );
sqrtpd( sseReg/mem128, sseReg
);
sqrtps( sseReg/mem128, sseReg
);
sqrtsd( sseReg/mem64, sseReg );
sqrtss( sseReg/mem32, sseReg );
stmxcsr( mem32 );
subps( sseReg/mem128, sseReg );
subpd( sseReg/mem128, sseReg );
subsd( sseReg/mem64, sseReg );
subss( sseReg/mem32, sseReg );
ucomisd( sseReg/mem64, sseReg
);
ucomiss( sseReg/mem32, sseReg
);
unpckhpd( sseReg/mem128, sseReg
);
unpckhps( sseReg/mem128, sseReg
);
unpcklpd( sseReg/mem128, sseReg
);
unpcklps( sseReg/mem128, sseReg
);
xorpd( sseReg/mem128, sseReg );
xorps( sseReg/mem128, sseReg );
17.33
OS/Priviledged Mode
Instructions
Although HLA was originally
intended for writing 32-bit flat model user mode applications, some HLA users
may wish to write an operaing system kernel or device drivers within HLA. Therefore, HLA provides support for
various priviledged instructions and instructions that manipulate segment registers
on the 80x86 processor. This
section describes those instructions.
Normal application programs should not use these instructions (most will
cause a "General Protection Fault" if you attempt to execute them).
For additional information on
these instructions, please see the Intel documentation for the Pentia
processors.
arpl( r16, r/m16 );
Adjusts the RPL field of a
segment descriptor.
clts();
Clears the task switched flag
in CR0.
hlt();
Halts the processor until an
interrupt or reset comes along.
invd();
Invalidates the internal cache.
invlpg( mem );
Invalidates the TLB entry
associated with the memory address specified as the source operand.
lar( r/m16, r16 );
lar( r/m32, r32 );
Load access rights from the
segment descriptor specified by the first operand into the second operand.
lds( r32, m48 );
les( r32, m48 );
lfs( r32, m48 );
lgs( r32, m48 );
lss( r32, m48 );
Load a far (48-bit) segmented
pointer into ds, es, fs, gs, or ss, and some other 32-bit register. Note that HLA does not support an fword data type.
These instructions require a 48-bit memory operand, nonetheless. You may create your own 48-bit fword data type using a record declaration like the
following:
type
fword: record
offset: dword;
selector: word;
endrecord;
lgdt( mem48 );
lidt( mem48 );
sgdt( mem48 );
sidt( mem48 );
Loads or stores the global
descriptor table pointer (lgdt/sgdt) or interrupt descriptor table pointer
(lidt/sidt) via the specified 48-bit memory operand. HLA does not support a 48-bit data type specifically for
these instructions, but you can easily create one as follows:
type
descPtr: record
lowerLimit: word;
baseAdrs: dword;
endrecord
lldt( r/m16 );
sldt( r/m16 )
These instructions copy the
specified source operand to/from the local descriptor table.
lsl( r/m16, r16 );
lsl( r/m32, r32 );
Load segment limit
instruction;
ltreg( r/m16 );
streg( r/m16 );
Load and store the task
register. Note that Intel uses the
mnemonics "ltr" and "str" for these instructions. HLA changes these mnemonics to avoid
conflicts with the commonly-used "str" namespace (the HLA strings
module).
mov( r/m16, segreg );
mov( segreg, r/m16 );
Copies data between an 80x86
segment register and a 16-bit register or memory location. Note that HLA uses the following
register names for the segment registers:
cseg The
80x86 CS register.
dseg The 80x86 DS register
eseg The
80x86 ES register
fseg The 80x86
FS register
gseg The 80x86 GS
register
sseg The
80x86 SS register
HLA uses these names rather
than the Intel standard register names to avoid conflicts with the
"cs" (cset) namespace identifier and other commonly used application
identifiers. Note that CSEG may not be a destination register for the MOV instruction.
mov( r32, crx ); // note: x= 0, 2, 3, or 4.
mov( crx, r32 );
These instructions move data
between one of the 32-bit registers and one of the x86’s control
registers. Note that HLA reserves
names cr0..cr7 even though Intel doesn’t currently define all eight control
registers.
mov( r32, drx ); // note: x=0, 1, 2, 3, 6, 7
mov( drx, r32 );
These instructions move data
between the general purpose 32-bit registers the the x86 debug registers. Note that HLA reserves names dr0..dr7
even though the assembler doesn’t currently support the user of the dr4 and dr5
registers.
push( segreg );
pop( segreg );
These instructions push and pop
the x86 segment registers (cseg, dseg, eseg, fseg, gseg, and sseg). Note, however, that you cannot pop the
cseg register. (see the comment
earlier about HLA segment register names).
rdmsr();
rdpmc();
These instructions read
model-specific registers or performance-monitoring registers on the x86. The ECX register specifies the register
to read, these instructions copy the data to EDX:EAX.
rsm();
Resumes from system management
mode.
verr( r/m16 );
verw( r/m16 );
Verifies whether the specified
code segment is readable (verr) or writable (verw) from the current priviledge
level.
wbinvd();
Write-back and invalidate
cache.
17.34
Other Instructions
and features
Currently, HLA does not support
AMD’s 3DNow instructions. HLA does
support all the 32-bit Intel instructions, including all SSE instructions, on
CPUs up through Intel’s Pentium IV and Core processors (minus the 64-bit
instructions).
Note that HLA does not support
the LMSW and SMSW instructions (old, obsolete 286 instructions). Use MOV with CR0 instead.
HLA does not currently support
segment prefixes on addresses.
However, if you specify a segment register name, HLA will emit the
segment prefix byte for that segment register. So if you use the segment name
before an instruction, it will affect the address refereced by that
instruction:
fseg: mov( [eax], eax ); // Fetches from fs:[eax].
HLA does not provide for
segment overrides because HLA was intended for use in flat-model 32-bit OS environments. However, the operating system kernel
(even flat-model OSes) sometimes need to apply a segment override, hence this
discussion.
18
Memory Addressing Modes in HLA
HLA supports all the 32-bit
addressing modes of the Intel 80x86 instruction set[33]. A memory address on the 80x86 may
consist of one to three different components: a displacement (also called an
offset), a base pointer, and a scaled index value. The following are the legal combinations of these
components:
displacement
basePointer
displacement + basePointer
displacement + scaledIndex
basePointer + scaledIndex
displacement + basePointer + scaledIndex
The following addressing modes
are legal, but are mainly useful only within an LEA instruction:
scaledIndex
scaledIndex + displacement
HLA’s syntax for memory
addressing modes takes the following forms:
staticVarName
staticVarName [ constant ]
staticVarName[ breg32 ]
staticVarName[ ireg32
]
staticVarName[ ireg32*index
]
staticVarName[ breg32 + ireg32 ]
staticVarName[ breg32
+ ireg32*index ]
staticVarName[ breg32 + constant ]
staticVarName[ ireg32
+ constant ]
staticVarName[ ireg32*index
+ constant ]
staticVarName[ breg32 + ireg32 + constant ]
staticVarName[ breg32
+ ireg32*index + constant ]
staticVarName[ breg32 - constant ]
staticVarName[ ireg32
- constant ]
staticVarName[ ireg32*index
- constant ]
staticVarName[ breg32 + ireg32 - constant ]
staticVarName[ breg32
+ ireg32*index - constant ]
localVarName
localVarName [ constant ]
localVarName[ ireg32 ]
localVarName[ ireg32*index
]
localVarName[ ireg32 + constant ]
localVarName[ ireg32*index
+ constant ]
localVarName[ ireg32 - constant ]
localVarName[ ireg32*index
- constant ]
basereg:globalVarName
basereg:globalVarName [ constant ]
basereg:globalVarName[ ireg32 ]
basereg:globalVarName[ ireg32*index
]
basereg:globalVarName[ ireg32 + constant ]
basereg:globalVarName[ ireg32*index
+ constant ]
basereg:globalVarName[ ireg32 - constant ]
basereg:globalVarName[ ireg32*index
- constant ]
[ breg32 ]
[ breg32 + ireg32 ]
[ breg32 + ireg32*index
]
[ breg32 + constant ]
[ breg32 + ireg32 + constant ]
[ breg32 + ireg32*index
+ constant ]
[ breg32 - constant ]
[ breg32 + ireg32 - constant ]
[ breg32 + ireg32*index
- constant ]
The following are legal, but
are only useful within the LEA instruction:
[ ireg32*index ]
[ ireg32*index +
constant ]
"staticVarName" denotes any static variable currently in
scope (local or global).
"localVarName" denotes a local, automatic, variable
declared in the var section of the current procedure.
"basereg" denotes any general purpose 32-bit register.
"globalVarname" denotes a non-local variable declared in the
VAR section of some procedure other than the current procedure.
"breg32" denotes a base register and can be any
general purpose 32-bit register.
"ireg32"
denotes an index register and may also be any general purpose register,
even the same register as the base register in the address expression.
"index" denotes one of the four constants
"1", "2", "4", or "8". In those address expression that have
an index register without an index constant, "*1" is the default
index.
Those memory addressing modes
that do not have a variable name preceding them are known as "anonymous memory
locations." Anonymous memory
locations do not have a data type associated with them and in many instances
you must use the type coercion operator in order to keep HLA happy.
Those memory addressing modes
that do have a variable name attached to them inherit the base type of the
variable. Read the next section
for more details on data typing in HLA.
HLA allows another way to
specify addition of the various addressing mode components in an address
expression – by putting the components in separate brackets and concatenating
them together. The following
examples demonstrate the standard syntax and the alternate syntax:
[ebx+2] [ebx][2]
[ebx+ecx*4+8] [ebx][ecx][8]
lbl[ebp-2] lbl[ebp][-2]
[ ebx*8 + 5 ] [ebx*8][5]
The reason for allowing the
extended syntax is because you might want to construct these addressing modes
inside a macro from the individual pieces and it’s much easier to concatenate
two operands already surrounded by brackets than it is to pick the expressions
apart and construct the standard addressing mode.
19
Type Coercion in HLA
While an assembly language can
never really be a strongly typed language, HLA is much more strongly typed than
most other assembly languages.
Strong typing in an assembly
language can be very frustrating.
Therefore, HLA makes certain concessions to prevent the type system from
interfering with the typical assembly language programmer. Within an 80x86 machine instruction,
the only checking that takes place is a verification that the sizes of the
operands are compatible.
Despite HLA playing fast and
loose with machine instructions, there are many times when you will need to
coerce the type of some operand.
HLA uses the following syntax to coerce the type of a memory location or
register operand:
(type typeID memOrRegOperand)
There are two instances where
type coercion is especially important: (1) when you need to assign a type other
than byte, word, or dword to a register[34]; (2) when
you need to assign an anonymous memory location a type.
Type coercion is very useful in
HLA when manipulating pointer objects, especially pointers to classes and
records. Consider the following
example:
type
myRec_t: record
i:int32;
c:char;
endrecord;
mrPtr_t: pointer to myRec_t;
static
mpr: mrPtr_t;
.
.
.
malloc( @size( myRec_t ) );
mov( eax, mpr );
.
.
.
mov( mpr, ebx );
mov( cl, (type myRec_t [ebx]).c );
mov( 0, (type myRec_t [ebx]).i );
As you can see here, whatever
memory address appears inside the parentheses is treated like an object of the
specified type. So you can treat
that whole entity as though it were a variable of the specified type (myRec_t in this example) and you can apply the dot
operator or any other operation that would be legal on a variable of that type.
By default, the x86 general
purpose registers have the types byte, word, or dword (depending, of course, on
their size). Sometimes you might
want to coerce these register to a different type, especially when outputting
the value of a register or comparing a register with a constant. Coercion of a register is perfectly
legal as long as the coerced data type is the same size as the register, e.g.,
(type int32 eax)
A coercion like this last
example is especially useful when using the register without an output
statement (like stdout.put) or in a run-time boolean expression. Consider the following:
if( eax < 0 ) then
<< do something if EAX is negative>>
endif;
In this example, the
expression is always false because EAX is a dword object (which is
unsigned). Therefore, EAX can
never be less than zero (even if EAX contains something that you want
interpreted as a negative value).
You can solve this problem by coerce EAX to an INT32 object:
if( (type int32 eax) < 0 )
then
<< do something if EAX is negative>>
endif;
This code example will work
properly since HLA is smart enough to generate the appropriate signed
comparison/conditional jump sequence when it realizes one or more of the
operands are signed.
[1]This section will use the term "HLA/86" when specifically taking about the High Level Assembler product this documentation describes and use "HLA" as a generic term. After this section, this documentation will use the term "HLA" to specifically describe the "HLA/86" product.
[2]You must admit, though, HLA’s documentation is better than that of most free software.
[3]The ".exe" suffix appears only in the Windows’ version.
[4]Windows object files use the ".obj" suffix while Linux object files have the ".o" suffix. Although Linux users who write assembly code with Gas typically use a ".s" or ".S" suffix, HLA still uses ".asm" since Gas happily accepts this.
[5]link.exe is the Microsoft linker
[6]For C/C++ programmers: an HLA record is similar to a C struct. In language design terminology, a record is often referred to as a "cartesian product."
[7]As this is being written, HLA doesn’t fully support wchar or wstring types; ultimately the support will appear and you can add the sets {char, wchar} and {string, wstring} to the list.
[8]In the future, HLA may also promote char objects to wchar and string objects to wstring. However, this was not functional as this is being written.
[9]In theory, this should never happen since HLA maintains boolean values as zero or one.
[10]This section only discusses procedure declarations. Other sections will describe iterators and methods.
[11]Static variables are those you declare in the static, readonly, and storage sections. Non-static variables include parameters, VAR objects, and anonymous memory locations.
[12]Strictly speaking, this isn’t true. The nested procedure has access to all global variables that were declared before the procedure’s declaration.
[13]It is important that all nested procedures construct the display. You couldn’t use the @nodisplay option in lex1 and expect lex2 to properly build the display. In general, unless you know exactly what you are doing, your procedures should all have the @nodisplay option, or none of them should have it.
[14]Note, however, that HLA may automatically allocate storage for a display within the procedure. If you do not specify the @nodisplay procedure option, then the starting offset will be some negative number (depending on the lex level) to allow room for the display array. This is why the main program’s current offset always starts at -4, HLA always allocates storage for a four-byte display entry for the main program (there is no way to specify @nodisplay for the main program).
[15]Currently, this feature is available only under Windows as of HLA v1.32; plans are to add it to the Linux version at some point in the future. Please see the HLA change log to see if this feature has been added to the version you’re using.
[16]This feature depends upon operating system support.
[17]Actually, HLA doesn’t enforce this mutual exclusivity. However, if more than one of these options appears in a declaration, HLA only uses the last such declaration.
[18]Of course, you may create class variables (objects) by specifying the class type name in the var or static sections.
[19]Actually, HLA was designed this way because far too often programmers make fields private and other programmers decide they really needed access to those fields, software engineering be damned. HLA relies upon the discipline of the programmers to stay out of trouble on this matter.
[20]Note that the syntax is override, not overrides as is used for overriding data fields. This is an unfortunate consequence of HLA’s grammar.
[21]When calling a class procedure, HLA nevers disturbs the value in the EDI register. EDI is only tweaked when you call methods.
[22]Of course, it is the caller’s responsibilty to save this pointer away into an object pointer variable upon return from the class procedure.
[23]HLA’s iterators are based on the similar control structure from the CLU language. CLU’s iterators are considerably more powerful than the misnamed "iterators" found in the C/C++ language/library (which, technically, should be called "cursors" not iterators).
[24]Mind you, this is not a very efficient implementation of a standard for loop.
[25]Technically, yield is a variable of type thunk, not a statement. However, this discussion is somewhat clearly if we think of yield as a statement rather than a variable.
[26]Actually, the purists will argue that regular expressions are used for pattern generation, not recognition. Because these two facilities are technically equivalent in theoretical computer science, this documentation will ignore this issue and claim that regular expressions are pattern matching devices.
[27]For brevity, this document will use @match to imply the use of @match or @match2. The two functions are almost identical in usage other than how they handle whitespace.
[28]Actually, the HLA.EXE program allows you to specify several ".HLA" files on the command line. The command line option "-c" is only necessary if none of the files on the command line contain a main program.
[29]For the purposes of this discussion, variables appearing in the READONLY, and STORAGE sections are treated as static variables along with variables declared in the STATIC section.
[30]Because HLA emits MASM source code as its output, you must take care not to use any MASM reserved words as HLA external procedure names. Otherwise, MASM will generate an error when it attempts to assemble HLA’s output.
[31]Or when the HLA procedure name is a MASM reserved word.
[32]However, since HLA emits the identifier to the MASM assembly language output file, the external identifier must be MASM compatible.
[33]It does not support the 16-bit addressing modes since these are not very useful under Win32 or Linux.
[34]Probably the most common case is treating a register as a signed integer in one of HLA’s high level language statements. See the section on HLA High Level Language statements for more details.