This chapter describes Sassy’s instruction syntax and its facilities for controlling the flow of computation. If you only want to write traditional-looking assembly programs with Sassy, consisting of just labels and instructions, then Sassy can handle that just fine.
(text <text-top-level> ...) text-top-level = (label <label>) ; "empty" label definition | (label <label> <text-item> ...) ; label definition | (locals (<label-name> ...) <text-item> ...) ; locals declaration | <text-item> ; anonymous text | (align <amount>) | (text <text-item> ...) ; splice | (data <data-item> ...) ; one time section switch | (heap <heap-item> ...) ; ditto text-item = (label <label>) | (label <label> <text-item> ...) | (locals (<label-name> ...) <text-item> ...) | <instruction> | <assertion> | <control-primitive> | (bytes <any-data> ...) ; raw data in a text section | (words <any-data> ...) | (dwords <any-data> ...)
Note: in the text section, the
align special form may only appear at a
text directive’s top level.
Sassy’s instruction syntax is based on Intel’s, except that each instruction looks like a Scheme function call. Thus there are no commas. Sassy uses the same instruction and register names and recognizes the same operands as those listed in the Intel manuals. The order of the operands is also identical, and so, like the
set! idiom of Scheme, the destination comes first, the source second.
intel: add eax, 4 intel: mov cx, 3 sassy: (add eax 4) sassy: (mov cx 3)
Immediates are usually integers of the appropriate size. If an operand is a dword-sized operand and you write a float,
sassy converts it to its little-endian IEEE-754 single-precision representation.
You may also write characters and strings.
sassy places the byte value of the character in the lowest order byte and places
0 in the rest. A string can have no more characters than the size in bytes of the operand;
sassy pads the remainder with
(mov eax (dword #\a)) => (mov eax #x61) (mov eax (dword "abc")) => (mov eax #x61626300)
Sassy currently only understands 32-bit addressing syntax, regardless of the current setting of the
bits directive. In 16-bit mode, sassy emits an extra prefix byte (#x67) to signal to the processor that the following instruction is using 32-bit addressing syntax.
Write effective addresses using the following form:
(& <items> ...)
The <items> should be at least one, but not more than one of each, of the following in any order. The effective address is the implied sum of all the items.
Any number of integers (displacements)
Zero or one 32-bit general purpose registers (base)
Zero or one labels or custom relocations (displacements)
Zero or one indexes and scales, written as follows, where
(* <32-bit-reg> <scale>) (* <scale> <32-bit-reg>)
(add ecx (& edx)) (mov edx (& (* 8 ecx))) (add eax (& #x64)) (mov eax (& foo (* ebx 4) edx 1000)) (add eax (& -1 2 -3 4 ebx -5 6 -7 8))
Sassy understands these idioms as well:
(& edx ebx) => (& edx (* ebx 1)) ; If two registers are supplied, ; the second is assembled as an index with ; a scale of 1 (& (* eax 2)) => (& eax (* eax 1)) ; If only a scale and index are given, ; it is assembled as a base+index*scale/2
Finally, if you want to tell Sassy to emit a segment override prefix
for a particular memory operand, use one of the following syntaxes for
the addressing operand (If you are trying to generate branch taken/branch not taken prefixes, which are the same prefix byte as
ds, please see below):
(cs (& edx)) (ds (& (* 8 ecx))) (ss (& #x64)) (es (& edx)) (fs (& (* 8 ecx))) (gs (& #x64))
(cs: edx) => (cs (& edx)) (ds: (* 8 ecs)) => (ds (& (* 8 ecs))) (ss: #x64) => (ss (& #x64)) (es: edx) => (es (& edx)) (fs: (* 8 ecs)) => (fs (& (* 8 ecs))) (gs: #x64) => (gs (& #x64))
Because many of the x86’s instructions are overloaded, meaning the same instruction can sometimes accept different operands of various sizes in various orders, and will output different opcode sequences, Sassy has to try and infer the operand size from the context in which the operand appears. Sassy uses the opcode of the instruction itself, the other operands in the instruction, and the current setting of the
bits directive (the default is 32), to do so.
If instead you would like to be explicit, you may use the supplied hinting mechansim to specify an operand size for immediates and memory addresses (registers always have an implied size):
(byte <operand>) => 8-bit (word <operand>) => 16-bit (dword <operand>) => 32-bit (qword <operand>) => 64-bit (tword <operand>) => 80-bit (dqword <operand>) => 128-bit
If you don’t use the hinting mechanism, Sassy tries, with one exception (see below), to match an ambiguous operand size to the size of another operand in the instruction. Any hint you supply to one operand will be used to infer the size of the other:
(mov ebx 4) => (mov ebx (dword 4)) (mov cx (& foo)) => (mov cx (word (& foo))) (mov al 100) => (mov al (byte 100)) (mov (& foo) (byte 100)) => (mov (byte (& foo)) (byte 100))
If that’s not possible, Sassy examines the current
bits setting and uses that size for the operands:
(bits 32) (mov (& foo) 10) => (mov (dword (& foo)) (dword 10)) (bits 16) (mov (& foo) 10) => (mov (word (& foo)) (word 10))
The exception to the above is the case where certain instructions can generate shorter opcode sequences when their source operand is an immediate and a byte, instead of a word or dword. In those cases, Sassy uses the shorter form when the source operand is in fact a byte. This applies to the following instructions:
adc add and cmp or sbb sub xor push imul.
(add ecx 4) => Sassy assumes the default of (add ecx (byte 4)) (add ecx (dword 4)) => Sassy uses the long form
Finally, for any floating-point, mmx, or sse instruction that can accept memory operands of different sizes, the default is always a dword-sized operand. In these cases, other operand sizes of memory addresses must be explicitly specified:
(fst (& foo)) => Sassy assumes the default of (fst (dword (& foo))) (fst (qword (& foo))) => Explicit qword memory operand
The normal syntax for writing direct branches or conditional branches is
(jmp foo) or
(jnz bar). For these direct branches that you write,
sassy assumes that they are near branches, and thus generates 2-byte or 4-byte relative address depending on the current setting of the
bits directive. You always write the branch target you want (not the relative distance -
sassy computes that).
Some special forms exists for designating explicit short, near, and far versions of
call, and the
jcc-family of instructions. For branches that you write (not Sassy’s internally generated branches — see below), if you write a “short” branch,
sassy assembles a short branch provided the branch target is within range. Otherwise an “out of range” error will be signalled.
(jnz short foo) (jnz near foo) (jmp short foo) (jmp near foo)
For far jumps and calls to other segments, if you want to write a direct call, you specify a far pointer with two operands:
(jmp <imm16> <imm32>) ; jmp #x1234:12345678 (jmp <imm16> <imm16>) ; jmp #x1234:1234 (call <imm16> <imm32>) ; call #x1234:12345678 (call <imm16> <imm16>) ; call #x1234:1234
The first operand specifies the segment, and the second the offset into that segment. For either operand, you can specify an operand size of
dword, to be explicit.
To write an indirect far call, where the segment and offset are specified at a memory address, you use the keyword
far in the instruction:
(jmp far <mem32>) (jmp far (word <mem32)) (call far <mem32>) (call far (word <mem32))
Sassy knows the prefixes
repnz. Write them in the following manner:
(<prefix> <instruction>) e.g. (lock (inc (& my-guard)))
Sassy also knows about the branch hint prefixes used to control the processor’s default branch-prediction behavior. Sassy uses
brt to generate a “branch taken” prefix, and
brnt to generate a “branch not taken” prefix. Use these prefixes with a
jcc instruction, as above:
(brt (jnz foo)) (brnt (jz foo))
You can control the flow of computation by using “assertions” and “control primitives”.
Assertions check whether or not particular flags are set in x86’s “eflags” register, and alter the flow of computation accordingly by inserting conditional and unconditional branches. Exactly how the flow of computation is altered depends on their contextual use within a particular control primitive.
You write the assertions by writing the cc-code for the jcc-family of instructions followed by an exclamation point.
o! => assert overflow no! => assert not overflow b! / c! / nae! => assert carry ae! / nb! / nc! => assert not carry e! / z! => assert zero ne! / nz! => assert not zero be! / na! => assert either carry or zero a! / nbe! => assert neither carry or zero s! => assert sign ns! => assert not sign p! / pe! => assert parity np! / po! => assert not parity l! / nge! => assert less than ge! / nl! => assert greater than or equal to le! / ng! => assert less than or equal to g! / nle! => assert greater than
Since assertions may succeed or fail, there are always two possible paths to take, called the “win” and “lose” continuations. In addition, control primitives themselves may also “win” or “lose” depending upon whether they succeed or fail, but instructions always succeed or “win”. By saying “something wins” I mean that the computation immediately proceeds with the “win” continuation, possibly via a branch, and when “something loses”, computation immediately proceeds with the “lose” continuation, also possibly by branching.
In the following, item and refers to a
<text-item>. For illustrative examples (and the code they compile down to), please have a look at the Scheme files in the
tests/prims directory in Sassy’s distribution directory.
The following implement Baker’s semantics:
(while test body) is another looping construct. Each time through the loop, test is tried. If it succeeds, the body is executed. If it fails, then the whole
while wins. On the other hand, if the body fails, then the whole
(begin item ... tail) executes each item with both a win and lose continuation of the next item. The exception is the tail, which is executed with the win and lose continuations of the whole
begin. So if tail succeeds, the
begin wins. Otherwise it loses.
At the top level of
text directive, (and indeed, in between
sassy implicitly wraps all of the
<text-items> in a
begin. As well, following a
<label> declaration, all of the
<text-items> at the label’s top level are explicitly wrapped in a
(until test body) is like
while, except that the test is subjected to a
inv. So each time through the loop, if test fails, the body is executed, but if it succeeds, then the whole
until wins. Like
while, if the body fails, the whole
The following are provided to provide some means of “capturing” and over-riding the continuations.
(with-win k-win [item]) (with-lose k-lose [item]) (with-win-lose k-win k-lose [item])
(The square brackets around item are meant to indicate that it is optional. See below)
Each of these compiles item with an explicit win or lose continuation (or both) of k-win or k-lose, effectively overriding the particular default or implicit continuation Sassy would normally supply to the item. The continuation may be a
text-item or one of the specials symbols
$lose. Thus it is possible to express the semantics of many of Sassy’s primitives in an explicit continuation-passing style. Examples of this are here.
(with-win bar (if (seq (cmp eax 3) e!) (push eax) ; after the push jmp to bar (push ebx))) ; after the push jmp to bar
(with-win-lose (jmp 1000) (call foo) (seq (push eax) (= ecx 4))) ; if eax is 4, then (jmp 1000), else (call foo)
(label and-some-blocks (with-win (begin (push eax) (push ebx)) (with-win (zero? ebx) (zero? eax)))) == (label and-some-blocks (seq (zero? eax) (zero? ebx) (begin (push eax) (push ebx))))
sassy places the win or lose continuations after the items. If you use
with-win-lose, the lose continuation occurs last, the win continuation second, and the item first.
If an explicit continuation is either an unconditional branch
(jmp ...) or the instruction
sassy does not emit an extra branch to the contextual continuation of the “jmp” or “ret”, since these imply that the actual continuation of the thread of computation is the target of these branches.
In addition, sometimes you may want
sassy to emit a “single instruction” as a continuation, but nothing else. This might occur in the succeed or fail arm of an
if, for instance. In this case you can write either
(begin) (an “empty” sequence or block) for the item. This triggers the continuation generators without emitting anything else into the instruction stream. (The empty sequence and block are actually valid syntax anywhere.) Or you may simply elide the item, and the compiler will insert the extra
(text (mov eax 10) (label foo (if (= eax 3) (with-win (ret)) ; if eax is 3, just (ret) (with-win foo ; otherwise loop to foo (sub eax 1)))))
$lose are two special symbols that Sassy reserves for itself so that you may explicitly refer to the values (the addresses) of the current win and lose continuations. They always refer to the exact win or lose continuation in effect at the point of their usage, including explicit continuations given by
with-win etc. (Sassy records relocations for every usage of these).
(seq (add eax 1) (push $win) ; pushes the address of (add ebx 2) (add ebx 2) (push $lose) ; pushes the address of the ; lose continuation of the enclosing seq (add ecx 3))
$eip is a special symbol that Sassy reserves for itself to allow you to refer to the address of the next instruction. It always refers to the next instruction.
(esc (instruction ...) item) “turns off” Sassy’s continuation tracking for a moment so that you may explicitly store the value of a continuation (which is just an address). Sassy compiles item in the normal way, but it places each instruction in order just before the item, and each instruction is compiled with the item’s win and lose continuations. Thus, if any of the instructions utilize the special symbols
$lose, they will represent the win and lose addresses of the item.
This is useful, for instance, in the following “multiple-dispatch” situation, where the calling convention consists of pushing the return address first, and the arguments second. Assume the functions “foo” and “bar” pop their arguments, do their thing, and end with a
(ret) (or a pop and a branch).
(esc ((push $win)) (if (seq (cmp eax 10) z!) (with-win foo (push ebx)) (with-win bar (push ecx))))
The functions “foo” and “bar” will both return to the win continuation of the
if, rather than into an arm of the
if itself, from which they would immediately branch out of (“branch tensioning”, in other words).
(leap item-with-mark) (mark item)
These two forms work together to allow you to write a branch into the middle of an otherwise nested structure. At the desired entry point to the structure use
mark, and wrap the whole thing in a
leap can’t find a
mark it does nothing. This is useful, for instance, for entering a loop at an arbitrary point.
(leap (iter (seq (add esp 8) (mark (pop ecx)) (= ecx 3))))
Sassy currently optimizes all of its internally generated branches for size, so whenever it can assemble the “short” form of an internally generated conditional or unconditonal branch, it does so (provided the branch is not to an explicit continuation that is a label), regardless of the branch’s direction. This comes at a small cost, because this means
sassy has to make at least two passes, and possibly several more, over its looping forms (
until). Though this is the only time Sassy makes more than one pass, in the future, if this cost becomes unbearable, I may provide a compiler option for strict one-pass assembly of these forms, using Baker’s techniques (see section D.4).
If you want to place raw data into a text section that isn’t the generated output of an insruction or control primitive, you may use the following as “pseudo-opcodes” to do so within a text section.
(bytes <any-data> ...) (words <any-data> ...) (dwords <any-data> ...)
<any-datas> may be numbers, characters, strings, or labels (including custom relocs), and follow the same conventions for writing data in a data section, including the zero-filling of strings. For the purposes of flow control,
sassy considers each occurrence of the above as an indivisible "opcode" that always wins.