Z80 Assembly 18: Optimizing for T-States (Clock Cycles)

The T-State: Your Ultimate Performance Metric In Z80 programming, the speed of your code isn’t measured in lines, but in T-states (or clock cycles). Every instruction takes a precise, fixed number of T-states to execute. To write fast code, you must choose instructions that minimize this count. Why T-States Matter: If your Z80 runs at 3.5 MHz (as in the ZX Spectrum), 3,500,000 T-states happen every second. A faster instruction in a critical loop can save dozens of T-states every frame, leading to smoother graphics or faster game logic. ...

September 27, 2025

Z80 Assembly 17: Bitwise State Machines (Managing Flags with BIT/SET/RES)

State Machines: Why Bitwise is Best A State Machine manages the status of a system (e.g., a game character is ‘Jumping’, ‘Firing’, or ‘Hit’). In high-level languages, you might use three separate Boolean variables for these. In Z80 assembly, that’s inefficient. By using a single register (like C) as a status register, you can manage up to eight independent Boolean flags, dedicating one bit to each status. This is the fastest method for context management. ...

September 27, 2025

Z80 Assembly 16: Efficient 16-bit Multiplication Algorithm

The Challenge: Multiplying 16-bit Numbers The Z80 performs 8-bit addition natively (ADD), but has no instruction for 16-bit multiplication (RR×RR). When you multiply two 16-bit numbers (like HL and DE), the result is a 32-bit number. We must build this algorithm using basic operations. The Algorithm: We use the same technique taught in grade school: break the numbers into parts, multiply each part, and sum the results. However, in assembly, it’s faster to use repeated shifting and conditional addition. ...

September 27, 2025

Z80 Assembly 11: Self-Modifying Code and Look-Up Tables

Look-Up Tables (LUTs): Trading Memory for Speed A Look-Up Table (LUT) is a pre-calculated block of data stored in memory. Instead of performing a time-consuming calculation (like a sine function or a multiplication), the CPU simply uses the input value as an index to quickly read the pre-computed result. How to Use a LUT: Use the input value (N) to calculate the offset within the table. Add this offset to the table’s starting address. Read the byte (result) at that final calculated address. Example: 10-Value Square Root Table ...

September 27, 2025