How Things Work - A Visual Guide to Technology

Your screen is flat. The virtual worlds you see in games and 3D applications exist in three dimensions - characters walk forward and backward, cameras rotate around scenes, objects have depth and volume. Yet somehow this 3D information must be projected onto your 2D display. The mathematics that makes this possible - [[projection]] - is one of the fundamental operations in computer graphics, transforming coordinate systems and creating the illusion of depth that makes digital worlds feel real.

The core challenge is that we are discarding a dimension. Any 3D point (x, y, z) must become a 2D screen point (x', y'). There are infinitely many ways to do this, but two dominate graphics: [[perspective projection]] (where distant things look smaller, like human vision) and [[orthographic projection]] (where size is independent of distance, like architectural blueprints). Understanding both, and when to use each, is essential for 3D graphics programming.

The pinhole camera model

Perspective projection mimics how light enters a pinhole camera (or your eye). Light rays from the scene pass through a single point (the pinhole or lens center) and project onto a flat surface (the film or retina). Rays from distant objects arrive at shallower angles, creating smaller images. Rays from nearby objects arrive at steeper angles, creating larger images. This is why railroad tracks appear to converge in the distance - they are not actually converging, but their constant separation projects to progressively smaller screen distances as they recede.

Mathematically, if you have a point at position (x, y, z) relative to a camera at the origin looking down the negative z-axis, perspective projection divides x and y by z (or by -z, depending on coordinate conventions). The projected point becomes (x/z, y/z). Points farther away (larger |z|) get divided by larger numbers, making them smaller on screen. Points closer (smaller |z|) get divided by smaller numbers, making them larger.

This division by z is the essence of perspective, but it creates a problem: division is not a linear operation, so we cannot represent it with matrix multiplication. The graphics pipeline solves this using [[homogeneous coordinates]] - adding a fourth component w to create (x, y, z, w). The projection matrix transforms coordinates, placing z into w. After matrix multiplication, a perspective divide (dividing xyz by w) performs the actual perspective projection. This lets us keep everything as matrix operations until the very end.

Building the projection matrix

A perspective projection matrix is defined by several parameters: the [[field of view]] (fov) - how wide the camera sees; the aspect ratio - typically screen width divided by height; the [[near plane]] and [[far plane]] - the distance range within which objects are rendered. The near and far planes define a [[view frustum]] - a truncated pyramid shape representing the visible region of space.

// Building a perspective projection matrix
function perspectiveMatrix(fovY, aspect, near, far):
    // fovY is vertical field of view in radians
    f = 1.0 / tan(fovY / 2)  // Cotangent of half-fov
    
    return Matrix4x4(
        f / aspect,  0,    0,                              0,
        0,           f,    0,                              0,
        0,           0,    (far + near) / (near - far),   -1,
        0,           0,    (2 * far * near) / (near - far), 0
    )
    
// After multiplying a point (x,y,z,1) by this matrix:
// x' = x * (f / aspect)
// y' = y * f  
// z' = z * (far+near)/(near-far) + (2*far*near)/(near-far)
// w' = -z
//
// The perspective divide gives: (x'/w', y'/w', z'/w') = (x'/(-z), y'/(-z), ...)

Notice that after applying this matrix, z ends up in the w component (with a sign flip). The perspective divide then creates the (x/z, y/z) relationship we want. The z' formula maps the depth range [near, far] to [[NDC]] (Normalized Device Coordinates), typically [-1, 1] or [0, 1] depending on the API. This normalized depth is stored in the z-buffer for hidden surface determination.

Field of view and its effects

[[FOV]] dramatically affects how the scene feels. A narrow field of view (like 30°) creates a telephoto lens effect - objects appear flattened, distances are compressed, and the scene feels zoomed in. A wide field of view (like 120°) creates a fisheye effect - the periphery stretches, objects appear to rush past quickly, and the scene feels expansive but distorted.

First-person games typically use 60-90° horizontal FOV. Too narrow causes motion sickness because head movements translate to small view changes (your brain expects more). Too wide causes distortion that breaks immersion. Different players have different preferences, which is why FOV sliders are common in PC games. VR requires even wider FOV (ideally 100°+) to match human peripheral vision.

FOV	Feel	Common Uses
30-45°	Telephoto, compressed, cinematic	Cutscenes, sniper scopes, dramatic shots
60-75°	Natural, balanced	Third-person games, realistic simulations
75-90°	Wide, immersive	First-person shooters, action games
90-110°	Very wide, fast	Racing games, competitive shooters
110°+	Ultra-wide, distorted	VR, panoramic views

Orthographic projection

[[Orthographic projection]] preserves size regardless of distance - a 1-meter cube looks the same size whether it is 10 meters or 100 meters from the camera. This is achieved by simply dropping the z coordinate (or using it only for depth sorting) without any perspective division. Lines that are parallel in 3D remain parallel in 2D, unlike perspective where they converge.

Orthographic projection is defined by a box rather than a frustum: left, right, bottom, top, near, and far boundaries. Everything within this box is visible, linearly mapped to screen coordinates. There is no foreshortening, no convergence of parallel lines, no sense of depth from size differences.

// Building an orthographic projection matrix
function orthographicMatrix(left, right, bottom, top, near, far):
    return Matrix4x4(
        2/(right-left),      0,                   0,                   0,
        0,                   2/(top-bottom),      0,                   0,
        0,                   0,                   2/(near-far),        0,
        -(right+left)/(right-left), -(top+bottom)/(top-bottom), (far+near)/(near-far), 1
    )
    
// This linearly maps the box [left,right] x [bottom,top] x [near,far]
// to the NDC cube [-1,1] x [-1,1] x [-1,1]
// No perspective divide needed (w = 1 throughout)

Orthographic projection is essential for 2D games (which are really 3D scenes viewed orthographically), user interfaces, CAD applications, and any context where accurate measurement matters more than realistic depth perception. It is also used for shadow mapping - rendering the scene from a light's perspective to determine what is in shadow.

The complete transformation pipeline

A vertex's journey from model space to screen space involves multiple transformations, each with its own matrix. Understanding this pipeline helps debug visual issues and optimize rendering.

Model Space: Vertex positions as defined in the 3D model file, centered on the object's origin
World Space: After applying the model matrix - object positioned, rotated, and scaled in the scene
View Space (Camera Space): After applying the view matrix - coordinates relative to camera at origin
Clip Space: After applying the projection matrix - homogeneous coordinates ready for clipping
NDC: After perspective divide - normalized coordinates in [-1,1] range
Screen Space: After viewport transform - actual pixel coordinates on screen

Vertices outside the view frustum must not be rendered, but this is tricky because the perspective divide can create discontinuities (dividing by zero or negative values). The solution is clipping in clip space, before the perspective divide, using the homogeneous w coordinate. A point is inside the frustum if -w ≤ x,y,z ≤ w. Clipping generates new vertices where geometry crosses frustum boundaries, then these clipped vertices undergo perspective divide safely.

Depth precision and z-fighting

One subtle but important aspect of projection is depth buffer precision. The projection matrix maps the near-to-far range to NDC depth, but this mapping is not linear - it crowds more precision near the near plane and has less precision near the far plane. This non-linearity, inherent to perspective projection, can cause z-fighting: visual artifacts where surfaces at nearly the same depth flicker between which appears in front.

The depth precision problem worsens with larger near-to-far ratios. A scene with near=0.1 and far=10000 (ratio of 100,000:1) will have terrible depth precision at large distances. The solution is to push the near plane as far as possible and pull the far plane as close as possible, minimizing the ratio. Alternatively, reversed-z (flipping depth direction to use floating-point precision more effectively) or logarithmic depth can help.

Beyond simple projection

The standard perspective projection assumes a centered, symmetric view frustum - the camera looks straight ahead with equal extent left and right, up and down. But sometimes you want off-axis projections: virtual monitors that are not directly in front of you, curved screens, multi-projector setups, or VR headsets where each eye's view is slightly off-center.

An asymmetric perspective projection allows different left, right, bottom, top values rather than symmetric ±width/2, ±height/2. This creates frustums that point slightly to the side, essential for correct stereo 3D (each eye needs a different projection) and for projector setups where the projector is not perpendicular to the screen.

The mathematics of projection extend far beyond what we have covered - lens distortion correction, anamorphic projections, environment map projections, shadow map projections, and more. But the core principle remains: converting 3D coordinates to 2D through careful mathematical transformation, whether preserving depth perception with perspective or preserving size with orthographic projection. Understanding these fundamentals enables everything from simple games to complex visualization systems.