The shader pipeline

How graphics get drawn

1,907 words10 min read

Shaders are one of the most beautiful examples of how constraints breed creativity in programming. They are small programs that run massively in parallel on the GPU, with each instance responsible for calculating just one thing - the position of a single vertex or the color of a single pixel. The constraints are severe: limited memory, no communication between instances, and the same code must run identically across thousands of invocations. Yet within these constraints, programmers have created photorealistic rendering, stunning visual effects, and entirely new art forms.

Before shaders existed, graphics hardware had a 'fixed-function pipeline' - a predetermined sequence of operations with limited configurability. You could tweak parameters, but you could not fundamentally change how lighting was calculated or how textures were applied. Shaders made the graphics pipeline programmable, transforming GPUs from specialized rendering circuits into general-purpose parallel processors that happen to be very good at graphics.

The rendering pipeline overview

Every 3D scene you see on screen - whether a video game, 3D movie, or CAD application - goes through a series of transformation stages to convert 3D geometry into 2D pixels. This sequence is called the rendering pipeline or graphics pipeline. Understanding it is fundamental to graphics programming because shaders plug into specific stages of this pipeline.

The pipeline takes as input a description of 3D geometry (typically triangles defined by vertices) and produces as output a 2D image (pixels in a framebuffer). The major stages are: vertex processing, primitive assembly, rasterization, fragment processing, and output merging. Let us trace a single triangle through this entire journey.

The Graphics Pipeline

Vertex Data
Raw triangle vertices
Vertex Shader
Transform positions
Primitive Assembly
Form triangles
Rasterization
Convert to fragments
Fragment Shader
Calculate colors
Framebuffer
Final pixels
Vertices
Step through each stage of the graphics pipeline to see how 3D geometry becomes pixels

Stage 1: Vertex processing

Everything starts with vertices. A [[vertex]] is a point in 3D space, but it is usually much more than just a position. Each vertex typically carries additional [[attribute]]s: a normal vector (the direction the surface faces at that point), texture coordinates (where to sample textures), vertex colors, tangent vectors for normal mapping, bone weights for skeletal animation, and potentially custom data for special effects.

The [[vertex shader]] runs once for each vertex in your scene. Its primary job is to transform vertex positions through a series of coordinate spaces: from the object's local model space, to world space (where the object is positioned in the scene), to view space (relative to the camera), and finally to clip space (a normalized coordinate system ready for rasterization).

This transformation is accomplished through matrix multiplication. Three matrices encode the necessary transformations: the model matrix (positioning the object in the world), the view matrix (positioning and orienting the camera), and the projection matrix (applying perspective to make distant objects smaller). These are typically combined into a single Model-View-Projection (MVP) matrix and passed to the shader as a [[uniform]].

// A complete vertex shader showing the transformation pipeline
#version 330 core

// Input attributes - per-vertex data
layout(location = 0) in vec3 aPosition;  // Model space position
layout(location = 1) in vec3 aNormal;    // Model space normal
layout(location = 2) in vec2 aTexCoord;  // Texture coordinates

// Uniforms - same for all vertices
uniform mat4 uModel;       // Model matrix: local -> world
uniform mat4 uView;        // View matrix: world -> camera
uniform mat4 uProjection;  // Projection matrix: camera -> clip
uniform mat3 uNormalMatrix; // For transforming normals correctly

// Outputs to fragment shader (will be interpolated)
out vec3 vWorldPos;    // Position in world space
out vec3 vNormal;      // Normal in world space  
out vec2 vTexCoord;    // Passed through unchanged

void main() {
    // Transform position through all spaces
    vec4 worldPos = uModel * vec4(aPosition, 1.0);
    vWorldPos = worldPos.xyz;
    
    // Transform normal (must use normal matrix to handle non-uniform scaling)
    vNormal = normalize(uNormalMatrix * aNormal);
    
    // Pass texture coordinates to fragment shader
    vTexCoord = aTexCoord;
    
    // Final clip-space position
    gl_Position = uProjection * uView * worldPos;
}

Note that we also transform the normal vector, but we cannot just multiply it by the model matrix. If the object has been non-uniformly scaled (stretched more in one direction than another), multiplying the normal by the model matrix would skew it incorrectly. The normal matrix is derived from the inverse transpose of the model matrix to handle this correctly.

Stage 2: Primitive assembly and clipping

After vertex processing, the GPU groups vertices back into primitives - usually triangles. Each triangle is defined by three vertices with their transformed positions and interpolatable attributes. The GPU then performs clipping against the view frustum (the visible region of space). Triangles entirely outside the frustum are discarded. Triangles partially outside are clipped, potentially creating additional triangles to represent the visible portion.

Modern pipelines may include additional optional stages here. Tessellation shaders can subdivide geometry into finer triangles for detailed surfaces. Geometry shaders can create new primitives from existing ones (useful for particle systems, shadows, or procedural geometry). These stages add flexibility but are less commonly used than vertex and fragment shaders.

Stage 3: Rasterization

[[Rasterization]] is the heart of real-time graphics - the process that converts smooth vector triangles into discrete pixels. For each triangle, the rasterizer determines which pixels on the screen the triangle covers. For each covered pixel, it generates a [[fragment]] - a potential pixel with interpolated data.

The interpolation is crucial. Each vertex has attributes - position, normal, texture coordinates, color. The rasterizer computes these attributes for each fragment by interpolating between the three vertices of the triangle. A fragment in the center of a triangle gets values averaged from all three vertices. A fragment near one vertex gets values weighted toward that vertex.

This interpolation uses barycentric coordinates - a way of expressing any point inside a triangle as a weighted combination of its three vertices. If a fragment has barycentric coordinates (0.5, 0.3, 0.2), its interpolated normal would be 0.5 times vertex 0's normal plus 0.3 times vertex 1's normal plus 0.2 times vertex 2's normal.

Perspective-correct interpolation adds another layer of complexity. In 3D, a texture that looks evenly spaced in world space should not be evenly spaced in screen space - distant parts of a surface should have more texture compressed into fewer pixels. The rasterizer handles this automatically, adjusting interpolation to account for perspective.

Stage 4: Fragment processing

The [[fragment shader]] (also called pixel shader in DirectX terminology) runs once for every fragment generated by rasterization. This is where the magic happens - lighting, texturing, shadows, reflections, and every visual effect you see. The fragment shader's job is to determine the final color (and sometimes depth) of each potential pixel.

Fragment shaders receive interpolated data from the vertex shader via [[varying]] variables (called 'in' variables in modern GLSL). They can read from textures, perform complex lighting calculations, and output a color. Unlike vertex shaders which process a fixed number of vertices, fragment shaders run for every visible pixel of every triangle - potentially millions of invocations per frame.

// Phong lighting model - a classic fragment shader
#version 330 core

// Interpolated inputs from vertex shader
in vec3 vWorldPos;
in vec3 vNormal;
in vec2 vTexCoord;

// Uniforms
uniform vec3 uLightPos;
uniform vec3 uLightColor;
uniform vec3 uViewPos;       // Camera position
uniform sampler2D uTexture;  // Diffuse texture

// Output
out vec4 fragColor;

void main() {
    // Sample texture for base color
    vec3 objectColor = texture(uTexture, vTexCoord).rgb;
    
    // Normalize interpolated normal (interpolation can denormalize)
    vec3 normal = normalize(vNormal);
    
    // Calculate light direction
    vec3 lightDir = normalize(uLightPos - vWorldPos);
    
    // AMBIENT: constant minimum illumination
    vec3 ambient = 0.1 * uLightColor;
    
    // DIFFUSE: light scattered equally in all directions
    // Intensity depends on angle between surface and light
    float diff = max(dot(normal, lightDir), 0.0);
    vec3 diffuse = diff * uLightColor;
    
    // SPECULAR: mirror-like reflection creating highlights
    vec3 viewDir = normalize(uViewPos - vWorldPos);
    vec3 reflectDir = reflect(-lightDir, normal);
    float spec = pow(max(dot(viewDir, reflectDir), 0.0), 32.0);
    vec3 specular = spec * uLightColor;
    
    // Combine all lighting components
    vec3 result = (ambient + diffuse + specular) * objectColor;
    fragColor = vec4(result, 1.0);
}

This shader implements the Phong lighting model with three components: ambient light (a constant term preventing surfaces from being completely black), diffuse light (matte reflection that depends on the angle between the surface and light), and specular light (shiny highlights that depend on the viewing angle). More sophisticated models like Physically Based Rendering (PBR) build on these same principles with more accurate physics.

Stage 5: Output merging and the depth buffer

After the fragment shader outputs a color, that color is not immediately written to the screen. First, it goes through output merging, which handles depth testing, stencil testing, and blending. The most important of these is depth testing using the [[z-buffer]].

The z-buffer stores a depth value for every pixel in the [[framebuffer]]. When a fragment is about to be written, the GPU compares its depth (distance from camera) to the stored depth for that pixel. If the fragment is closer than what was previously drawn, both the color and depth buffers are updated. If the fragment is farther away (behind existing geometry), it is discarded - something else is already in front of it.

This elegant mechanism means we can draw triangles in any order and still get correct occlusion. Without a z-buffer, we would need to sort all triangles from back to front (the painter's algorithm), which is slow and fails for intersecting geometry.

The shader language landscape

Different graphics APIs use different shading languages, though they are all conceptually similar. The syntax and some features differ, but the core concepts of vertex shaders, fragment shaders, uniforms, and varyings are universal.

LanguageAPIPlatformNotes
GLSLOpenGL / WebGLCross-platformC-like syntax, most common for learning
HLSLDirectXWindows / XboxMicrosoft's shader language, very similar to GLSL
MSLMetalApple platformsC++ based, modern syntax
WGSLWebGPUWeb browsersNew standard, influenced by Rust
SPIR-VVulkanCross-platformBytecode format, compiled from other languages

In practice, many game engines and frameworks abstract these differences. You write shaders in one language (often HLSL or a custom format), and the engine cross-compiles to whatever the target platform needs. This is one reason frameworks like Unity, Unreal, and Three.js are so popular - they handle the cross-platform complexity.

Beyond the basics: modern pipeline features

Modern GPUs support several additional programmable stages beyond vertex and fragment shaders. Tessellation shaders (hull/domain shaders) can dynamically subdivide geometry based on camera distance, adding detail up close while keeping distant objects simpler. Geometry shaders can generate new primitives, useful for particle systems or creating shadow volumes. Compute shaders run outside the traditional graphics pipeline entirely, enabling general-purpose GPU computation.

Modern techniques also blur the lines of the traditional pipeline. Deferred rendering separates geometry processing from lighting by first rendering material properties to multiple textures (a G-buffer), then computing lighting in screen space. This allows hundreds of lights without the cost multiplying with geometry. Screen-space effects like ambient occlusion, reflections, and global illumination work on the 2D rendered image rather than 3D geometry.

Understanding the pipeline is not just academic - it directly informs optimization decisions. Vertex-bound scenes (with many vertices but simple shading) need different optimization strategies than fragment-bound scenes (with complex shaders covering many pixels). Knowing where your bottleneck is requires understanding what each stage does and how they interact.

How Things Work - A Visual Guide to Technology