Ten tips for writing tiny x86 programs
Tags: low-level
This blog post is also available as a pretty PDF.
Recently I wrote an x86 assembler in 512 bytes of machine code: https://github.com/kvakil/0asm. This is called a “bootsector” program, because it fits in a typical hard-drive sector. This zine will give you pointers on writing bootsector x86 programs of your own, assuming familiarity with x86 assembly.
x86 resources I found useful:
- 80x86 is an octal machine: https://bit.ly/2OYLHLI Good pseudocode, and information about the ISA.
- 80386 reference manual: https://bit.ly/2OELlLr (particularly Chapter 17, and the appendices)
Ten Tips:
Baby’s first bootsector: You could write a bootsector from scratch, but I’ve made a bootsector skeleton to extract common scaffolding: https://github.com/kvakil/boot-skel. It also provides some nice debugging features–see the repository for details!
Use registers for their purpose:
sp
: use as stack pointer, too good to pass up.ax
: aim for comparisons to operate onax
, many instructions are shorter when they useax
. (For example,cmp ax,1
is shorter thancmp bx,1
.)cx
: useful as a loop counter: seeloop
,rep
, andjcxz
instructions.si
,di
: use as source and destination of memory accesses (respectively): used by some instructions, especiallylodsb
,stosb
, …. Prefer these tomov
since they are shorter and increment the pointers!bx
,bp
: can be used for addressing, likemov ax,[bx]
. Generallybx
is better thanbp
because the common zero offset case likemov ax,[bx+0]
is a byte shorter thanmov ax,[bp+0]
(Table 17-2 in manual).dx
: used bydiv
andmul
, otherwise not useful.
Know the instruction set: here is a non-exhaustive tier ranking of instructions you probably haven’t seen.
- Useful:
lodsb
,stosb
,inc
,dec
,xchg
. - Sometimes useful:
cbw
,scasb
,movsb
,loop
,stc
,clc
,neg
,not
, carry flag stuff likeadc
. - Usually useless: anything else (especially BCD instructions like
aaa
).
- Useful:
Use
FLAGS
: almost all instructions affect theFLAGS
register (Appendix C of manual). Because conditional jumps rely onFLAGS
, aim to have your function return boolean results inFLAGS
. Instructions likestc
let you manipulate flags manually, but try to have your code correctly modifyFLAGS
without them to save bytes.Forget calling conventions: you should think of functions as common snippets of code which may affect many registers. Using any “result” register may be useful. Any time you use
push
/pop
orleave
should be suspect. This also typically makes it easier to reuse code snippets between functions.Know the idioms: there are many “peephole” optimizations possible, I’ll just list the ones I find most useful. The best way to find them is by reading through other people’s code or by mucking around with the instruction set.
Zeroes: rather than
mov ax,0
(3 bytes) usexor ax,ax
(2 bytes). Similarly instead ofcmp ax,0
(3 bytes) usetest ax,ax
(2 bytes).Prefer
xchg
tomov
: If you are moving a register to or fromax
, consider usingxchg
(1 byte) instead of usingmov
(2 bytes). This is also useful since some instructions must useax
or have shorter encodings when they do.Shifts to multiply: You can use bitshifting to multiply or divide by powers of two.
Set register to -CF:
sbb bl,bl
(2 bytes) setsbl = -carry flag
. If you are OK with settingal
instead of another register, you can use the undocumented instructionsalc
(1 byte). Since-1
has all bits set and0
has no bits set (in two’s complement), this is useful as a bitmask.Tail call: if you have
call F & ret
(typically at the end of a function), you can replace this with justjmp F
, saving up to two bytes. You can also remove thejmp
completely by movingF
inline, but of course you can only inline once.
Beware
rel16
jumps: if you jump farther than 127 bytes, you use a long jump (costing an extra byte), and your assembler won’t tell you! Check the assembly listing to monitor for these, and reorder your code as appropriate.Beware unconditional jumps: jumping using a condition (like
jc
) doesn’t cost more than jumping unconditionally. Unconditional jumps with a nearby conditional jump are usually a sign that the code can be refactored to a single conditional jump (typically by rewriting awhile
-loop as ado-while
-loop).Self-modifying code: rarely useful, but very cool when it works. One use is global variables (saving one byte over the naïve solution). For example, this creates a counter starting at 1234:
LA: mov ax,1234 ; initial value is 1234
; LA+1 is the address of 1234
mov bx,LA+1
; use ax as counter value
inc word [bx] ; increment counter
- Ignore these rules: these are just guidelines which I’ve found typically reduce code size. It’s very difficult to write a small program, so all of these are really just heuristics. Happy hacking!