Optimization

Z80 Assembly 85: Fast Block Fill (memset/bfill) with a Constant

The Need for a Fast Block Fill In game development and system programming, you frequently need to clear large areas of RAM—such as setting the entire screen buffer to a black background or zeroing out a data structure. Doing this one byte at a time with a simple loop is slow. Goal: Write a routine that fills a specified memory range (START_ADDR′ to END_ADDR′) with a single constant byte (`FILL_BYTE′) as quickly as possible. ...

Z80 Assembly 82: Fixed-Point Arithmetic (Fast Fractions)

The Need for Fixed-Point Math As established (Part 30), Z80 floating-point math is extremely slow. Fixed-point arithmetic is the solution: it allows the Z80 to perform calculations involving fractions using only fast integer operations. The Principle: A fixed-point number is an integer where the position of the decimal point is implied and fixed by the programmer. The Q-Format (Standard Fixed-Point) We represent a fractional number (like 3.14) by scaling it up to make it an integer. This is often called a Q-format (e.g., Q16, Q8). ...

Z80 Assembly 81: Floating-Point Multiplication/Division by Powers of Two

The Optimization: Exploiting the Format In standard integer math, multiplying by a power of two is done with a fast left shift (`SLA′). In floating-point math (Part 30), a multiplication or division by a power of two (like $2^N$ or $2^{-N}$) is even faster: it requires no change to the Mantissa (precision) and only a simple addition or subtraction to the Exponent. The Rule: Multiplication by $2^N$: Add $N$ to the Exponent. Division by $2^N$: Subtract $N$ from the Exponent. The Exponent Field Recall that the exponent is usually stored as a biased 8-bit integer (e.g., a bias of 127 is added to the true exponent). Adjusting the number simply means adjusting this byte. ...

Z80 Assembly 44: Sprite Multiplexing and Display Lists

The Limitation: Hardware Sprite Count Many systems with dedicated sprite hardware (like the MSX or certain arcade boards) have a severe limit on the number of sprites that can be displayed on a single horizontal scanline (e.g., 8 or 16 sprites). Exceeding this limit causes sprites to disappear or flicker. The Solution: Sprite Multiplexing Sprite Multiplexing is a software technique that dynamically reuses the hardware’s limited number of sprite slots multiple times per frame. The goal is to quickly change the sprite’s position and pattern as the display beam moves down the screen. ...

Z80 Assembly 36: Rendering Large Worlds with Tilemaps

The Tilemap Concept A tilemap is a technique that breaks a large game world (the map) into small, reusable graphic squares (tiles), typically $8\times 8$ or $16\times 16$ pixels. Why Tilemaps are Essential: Memory Saving: Instead of storing the pixel data for the entire map, you only store the data for a small set of unique tiles (the tileset) and then store the map itself as a small array of tile IDs (indices). Fast Rendering: The CPU can draw the screen by reading the tile ID from the map and using that ID as an index to quickly look up and draw the corresponding tile graphic. The Map Data Structure The map data is a simple, linear array stored in memory. ...

Z80 Assembly 34: Simple Physics (Gravity, Velocity, and Jumping)

The Concept of Velocity and Gravity To make a sprite move realistically (fall, jump), we need two new variables in our sprite data structure: Velocity (DY): The current speed and direction of the sprite on the Y-axis. Gravity: A constant value that is continually added to the velocity every frame, causing the sprite to accelerate downward. The Calculation: The physics update is done by: Velocity ← Velocity + Gravity Y-Coordinate ← Y-Coordinate + Velocity Updating Velocity with Gravity We will need a new field in our sprite descriptor (e.g., offset +A for an 8-bit velocity) and a constant for gravity. ...

Z80 Assembly 33: Managing Multiple Sprites with a Linked List

The Need for a Dynamic List When creating a game, you don’t want to waste time checking 50 memory locations if only 5 sprites are currently active. A Linked List is the most efficient data structure in Z80 assembly for managing a variable number of objects. The Linked List Principle: Each sprite’s data block (the node) contains a pointer to the next sprite in the sequence. When the CPU finishes processing the current sprite, it simply follows the pointer to the next one. The list ends when a pointer value is zero or another terminator. ...

Z80 Assembly 27: Horizontal and Vertical Scrolling (Fast Block Moves)

The Challenge of Scrolling Creating the illusion of movement across a large background map (scrolling) is a major performance challenge for 8-bit CPUs. You must shift thousands of bytes of screen memory every single frame. Slow code will cause a noticeable flicker or lag. The Solution: LDIR The Z80’s LDIR (Load, Increment, Repeat) instruction is the fastest tool for this job. It executes much faster than any software loop built with LD, INC, DEC, and JP. ...

Z80 Assembly 26: Collision Detection (Bounding Boxes and Bitwise Checks)

Bounding Box Collision (The Quick Check) Collision detection is computationally expensive. The fastest method is the Bounding Box check. This method checks if the rectangular area occupied by two objects (their boxes) overlap on the X and Y axes. If the boxes don’t overlap, the objects cannot be touching. The Principle: An overlap exists if: Object A’s right edge (A.X + A.Width) is > Object B’s left edge (B.X), AND Object A’s left edge (A.X) is < Object B’s right edge (B.X + B.Width), AND The same two conditions are met for the Y-axis. Z80 Implementation (X-Axis Check): ...

Z80 Assembly 19: Relocatable Code and Position-Independent Code (PIC)

The Problem with Absolute Addressing Normally, Z80 instructions like JP 8000H or LD HL, DATA_ADDR use absolute addressing. If you compile a program to run at address 8000H but later load it at 9000H, every one of those absolute addresses will be wrong, and the program will crash. Relocatable Code and Position-Independent Code (PIC) solve this problem. Relocatable Code (The Assembler’s Job) Relocatable code is written with absolute addresses, but the assembler generates a file that includes a relocation table. ...