Ten tips for writing tiny x86 programs
Tags: low-level
This blog post is also available as a pretty PDF.
Recently I wrote an x86 assembler in 512 bytes of machine code: https://github.com/kvakil/0asm. This is called a “bootsector” program, because it fits in a typical hard-drive sector. This zine will give you pointers on writing bootsector x86 programs of your own, assuming familiarity with x86 assembly.
x86 resources I found useful:
- 80x86 is an octal machine: https://bit.ly/2OYLHLI Good pseudocode, and information about the ISA.
- 80386 reference manual: https://bit.ly/2OELlLr (particularly Chapter 17, and the appendices)
Ten Tips:
Baby’s first bootsector: You could write a bootsector from scratch, but I’ve made a bootsector skeleton to extract common scaffolding: https://github.com/kvakil/boot-skel. It also provides some nice debugging features–see the repository for details!
Use registers for their purpose:
sp: use as stack pointer, too good to pass up.ax: aim for comparisons to operate onax, many instructions are shorter when they useax. (For example,cmp ax,1is shorter thancmp bx,1.)cx: useful as a loop counter: seeloop,rep, andjcxzinstructions.si,di: use as source and destination of memory accesses (respectively): used by some instructions, especiallylodsb,stosb, …. Prefer these tomovsince they are shorter and increment the pointers!bx,bp: can be used for addressing, likemov ax,[bx]. Generallybxis better thanbpbecause the common zero offset case likemov ax,[bx+0]is a byte shorter thanmov ax,[bp+0](Table 17-2 in manual).dx: used bydivandmul, otherwise not useful.
Know the instruction set: here is a non-exhaustive tier ranking of instructions you probably haven’t seen.
- Useful:
lodsb,stosb,inc,dec,xchg. - Sometimes useful:
cbw,scasb,movsb,loop,stc,clc,neg,not, carry flag stuff likeadc. - Usually useless: anything else (especially BCD instructions like
aaa).
- Useful:
Use
FLAGS: almost all instructions affect theFLAGSregister (Appendix C of manual). Because conditional jumps rely onFLAGS, aim to have your function return boolean results inFLAGS. Instructions likestclet you manipulate flags manually, but try to have your code correctly modifyFLAGSwithout them to save bytes.Forget calling conventions: you should think of functions as common snippets of code which may affect many registers. Using any “result” register may be useful. Any time you use
push/poporleaveshould be suspect. This also typically makes it easier to reuse code snippets between functions.Know the idioms: there are many “peephole” optimizations possible, I’ll just list the ones I find most useful. The best way to find them is by reading through other people’s code or by mucking around with the instruction set.
Zeroes: rather than
mov ax,0(3 bytes) usexor ax,ax(2 bytes). Similarly instead ofcmp ax,0(3 bytes) usetest ax,ax(2 bytes).Prefer
xchgtomov: If you are moving a register to or fromax, consider usingxchg(1 byte) instead of usingmov(2 bytes). This is also useful since some instructions must useaxor have shorter encodings when they do.Shifts to multiply: You can use bitshifting to multiply or divide by powers of two.
Set register to -CF:
sbb bl,bl(2 bytes) setsbl = -carry flag. If you are OK with settingalinstead of another register, you can use the undocumented instructionsalc(1 byte). Since-1has all bits set and0has no bits set (in two’s complement), this is useful as a bitmask.Tail call: if you have
call F & ret(typically at the end of a function), you can replace this with justjmp F, saving up to two bytes. You can also remove thejmpcompletely by movingFinline, but of course you can only inline once.
Beware
rel16jumps: if you jump farther than 127 bytes, you use a long jump (costing an extra byte), and your assembler won’t tell you! Check the assembly listing to monitor for these, and reorder your code as appropriate.Beware unconditional jumps: jumping using a condition (like
jc) doesn’t cost more than jumping unconditionally. Unconditional jumps with a nearby conditional jump are usually a sign that the code can be refactored to a single conditional jump (typically by rewriting awhile-loop as ado-while-loop).Self-modifying code: rarely useful, but very cool when it works. One use is global variables (saving one byte over the naïve solution). For example, this creates a counter starting at 1234:
LA: mov ax,1234 ; initial value is 1234
; LA+1 is the address of 1234
mov bx,LA+1
; use ax as counter value
inc word [bx] ; increment counter- Ignore these rules: these are just guidelines which I’ve found typically reduce code size. It’s very difficult to write a small program, so all of these are really just heuristics. Happy hacking!