Optimization

Z80 Assembly 18: Optimizing for T-States (Clock Cycles)

The T-State: Your Ultimate Performance Metric In Z80 programming, the speed of your code isn’t measured in lines, but in T-states (or clock cycles). Every instruction takes a precise, fixed number of T-states to execute. To write fast code, you must choose instructions that minimize this count. Why T-States Matter: If your Z80 runs at 3.5 MHz (as in the ZX Spectrum), 3,500,000 T-states happen every second. A faster instruction in a critical loop can save dozens of T-states every frame, leading to smoother graphics or faster game logic. ...

Z80 Assembly 17: Bitwise State Machines (Managing Flags with BIT/SET/RES)

State Machines: Why Bitwise is Best A State Machine manages the status of a system (e.g., a game character is ‘Jumping’, ‘Firing’, or ‘Hit’). In high-level languages, you might use three separate Boolean variables for these. In Z80 assembly, that’s inefficient. By using a single register (like C) as a status register, you can manage up to eight independent Boolean flags, dedicating one bit to each status. This is the fastest method for context management. ...

Z80 Assembly 16: Efficient 16-bit Multiplication Algorithm

The Challenge: Multiplying 16-bit Numbers The Z80 performs 8-bit addition natively (ADD), but has no instruction for 16-bit multiplication (RR×RR). When you multiply two 16-bit numbers (like HL and DE), the result is a 32-bit number. We must build this algorithm using basic operations. The Algorithm: We use the same technique taught in grade school: break the numbers into parts, multiply each part, and sum the results. However, in assembly, it’s faster to use repeated shifting and conditional addition. ...

Z80 Assembly 11: Self-Modifying Code and Look-Up Tables

Look-Up Tables (LUTs): Trading Memory for Speed A Look-Up Table (LUT) is a pre-calculated block of data stored in memory. Instead of performing a time-consuming calculation (like a sine function or a multiplication), the CPU simply uses the input value as an index to quickly read the pre-computed result. How to Use a LUT: Use the input value (N) to calculate the offset within the table. Add this offset to the table’s starting address. Read the byte (result) at that final calculated address. Example: 10-Value Square Root Table ...