How Things Work - A Visual Guide to Technology

The lossy tradeoff

JPEG changed the world. Before it, a single photo could take minutes to download over a modem. Digital cameras would have been impractical - each image would fill a floppy disk. JPEG made images practical on the web, in digital cameras, and in messaging apps. The secret? It throws away information your eyes probably won't miss.

This is [[lossy compression]]: unlike ZIP or PNG, you can never get the original data back from a JPEG. Something is permanently lost. But here's the genius - JPEG is carefully designed to lose the things humans are least likely to notice, exploiting quirks of our visual perception that evolution built into our brains.

The JPEG committee spent years studying human vision to understand what we're sensitive to and what we're not. The result is [[perceptual coding]] - compression that's psychovisually optimized. Two images might be mathematically very different, but if they look identical to humans, the compression worked. It's not about preserving data; it's about preserving the perceptual experience.

Step 1: Color space conversion

JPEG doesn't work directly with RGB. Instead, it converts to YCbCr: Y for luminance (brightness), Cb for blue-difference chroma, and Cr for red-difference chroma. Why? Because human eyes are much more sensitive to changes in brightness than changes in color. We inherited this from our ancestors who needed to detect predators in low light.

The conversion is linear and fully reversible at this stage: Y = 0.299R + 0.587G + 0.114B. Notice how green contributes most to perceived brightness - that's because our eyes have more green-sensitive cones than red or blue. This isn't arbitrary; it matches how we actually see.

Once in YCbCr, JPEG can use [[chroma subsampling]] - storing color information at half or quarter resolution while keeping full-resolution luminance. In the common 4:2:0 scheme, for every four luminance samples, there's only one Cb and one Cr sample. This alone halves the data with barely perceptible quality loss because your eye simply can't resolve color at the same precision as brightness.

Chroma subsampling schemes:

4:4:4 - Full resolution for all channels (no subsampling)
        Used for professional work, archival

4:2:2 - Half horizontal resolution for color
        Used in broadcast video, some professional cameras

4:2:0 - Half resolution in both dimensions for color
        Standard for JPEG, most video codecs
        
Original 1920×1080 image with 4:2:0:
  Y channel:  1920×1080 = 2,073,600 samples
  Cb channel:  960×540  =   518,400 samples
  Cr channel:  960×540  =   518,400 samples
  
Total: 3.1 million samples vs 6.2 million for RGB
50% reduction before any frequency-domain compression!

Step 2: Block splitting and DCT

JPEG divides each channel into 8×8 pixel blocks, then applies the [[DCT]] (Discrete Cosine Transform) to each block. This converts the spatial pixel values into frequency components - essentially decomposing the block into a sum of cosine waves of different frequencies and orientations.

Why cosines? Because cosine functions are orthogonal (mathematically independent) and have nice properties at boundaries. The DCT is closely related to the Fourier transform but works better for finite-length signals like image blocks. And it turns out to be very efficient - we can compute an 8×8 DCT in just 54 operations using factorization tricks.

JPEG DCT Compression

Quality50%

Original 8×8 Block

109

113

144

104

122

154

106

104

126

DCT Coefficients

+ positive− negative

After Quantization

-26

-3

-6

-1

-2

-4

-3

-1

-3

-1

Original values

Non-zero after quant

69%

Zeros (compressible)

Lower quality = more aggressive quantization = more zeros = smaller file = more artifacts

Adjust quality to see how DCT coefficients get quantized

The DCT itself is completely lossless and reversible. The magic happens in how we treat the output. The 64 DCT coefficients represent different frequencies: the top-left coefficient (called DC) is the average brightness of the block. As you move right, horizontal frequency increases. As you move down, vertical frequency increases. The bottom-right represents the highest-frequency diagonal patterns.

The 64 DCT basis patterns for an 8×8 block:

DC → increasing horizontal frequency →
↓    [▓▓][▓░][▓░][▓░][▓░][▓░][▓░][▓░]
i    [▓▓][▓░][░▓][░▓][▓░][▓░][░▓][░▓]
n    [▓▓][▓░][░▓][░▓][░▓][▓░][▓░][░▓]
c    [▓▓][▓░][░▓][░▓][░▓][▓░][░▓][▓░]
r    [▓▓][▓░][░▓][░▓][░▓][░▓][▓░][░▓]
e    [▓▓][▓░][░▓][░▓][░▓][░▓][▓░][▓░]
a    [▓▓][▓░][░▓][░▓][░▓][░▓][░▓][▓░]
s    [▓▓][▓░][░▓][░▓][░▓][░▓][░▓][░▓]
i    
n    Each block is a weighted sum of these patterns.
g    A smooth area needs mostly low frequencies.
↓    Sharp edges need high frequencies.
v
e
r
t
.

Human vision is much less sensitive to high-frequency variations - we see smooth gradients much better than we see fine textures and rapid changes. This is partly because of the optics of our eye and partly because of neural processing. JPEG exploits this: we can be quite aggressive about discarding high-frequency coefficients without people noticing.

Step 3: Quantization - where data dies

[[Quantization]] is the heart of JPEG compression, and it's where information is irreversibly lost. Each of the 64 DCT coefficients is divided by a value from a quantization table, then rounded to the nearest integer. Large quantization divisors mean aggressive rounding, which means more data loss but smaller files.

Standard JPEG Luminance Quantization Table (Q=50):
┌────┬────┬────┬────┬────┬────┬────┬────┐
│ 16 │ 11 │ 10 │ 16 │ 24 │ 40 │ 51 │ 61 │
├────┼────┼────┼────┼────┼────┼────┼────┤
│ 12 │ 12 │ 14 │ 19 │ 26 │ 58 │ 60 │ 55 │
├────┼────┼────┼────┼────┼────┼────┼────┤
│ 14 │ 13 │ 16 │ 24 │ 40 │ 57 │ 69 │ 56 │
├────┼────┼────┼────┼────┼────┼────┼────┤
│ 14 │ 17 │ 22 │ 29 │ 51 │ 87 │ 80 │ 62 │
├────┼────┼────┼────┼────┼────┼────┼────┤
│ 18 │ 22 │ 37 │ 56 │ 68 │109 │103 │ 77 │
├────┼────┼────┼────┼────┼────┼────┼────┤
│ 24 │ 35 │ 55 │ 64 │ 81 │104 │113 │ 92 │
├────┼────┼────┼────┼────┼────┼────┼────┤
│ 49 │ 64 │ 78 │ 87 │103 │121 │120 │101 │
├────┼────┼────┼────┼────┼────┼────┼────┤
│ 72 │ 92 │ 95 │ 98 │112 │100 │103 │ 99 │
└────┴────┴────┴────┴────┴────┴────┴────┘

Notice: values increase toward bottom-right (high frequencies)
High frequencies get divided by larger numbers (40-121)
More of them become zero after rounding → compresses well!
Low frequencies (top-left) keep more precision (10-24)

When you adjust JPEG quality in your image editor, you're scaling this quantization table. Quality 100 uses values close to 1 (minimal loss, huge file). Quality 50 uses the standard table. Quality 10 might multiply the table by 5-10x (massive loss, tiny file). After quantization, many coefficients become zero, especially in the high-frequency positions.

These zeros are crucial for compression. After quantization, the coefficients are read in a zigzag pattern from low to high frequency, then run-length encoded (many zeros in a row → short code) and Huffman coded. Most of the actual file size reduction comes from all those zeros in the high-frequency coefficients.

Why the artifacts appear

Now you can understand exactly where JPEG artifacts come from. [[Blocking artifacts]] appear because each 8×8 block is processed independently - at low quality, neighboring blocks can have noticeably different quantization errors, creating a visible grid pattern. The DC coefficients (block averages) of adjacent blocks should match smoothly, but quantization error makes them jagged.

[[Ringing artifacts]] (halos around sharp edges, sometimes called Gibbs phenomenon or mosquito noise) happen because sharp edges need high frequencies to represent accurately. A perfect step edge requires infinite frequency content. When we discard high frequencies, we can't reconstruct sharp transitions - the math produces overshoots and ripples that look like ghostly echoes around text and hard edges.

Color bleeding occurs because chroma subsampling spreads color across larger areas. A sharp red line on a white background might have its red bleed slightly into the white areas because the color information is at half resolution. This is most visible at sharp color boundaries.

Modern alternatives

JPEG is from 1992 - three decades old. We've learned enormous amounts about perception, compression, and computing since then. Modern formats achieve better quality at smaller sizes through more sophisticated techniques that weren't practical on 1990s hardware.

WebP (Google, 2010) uses variable block sizes (4×4 to 16×16 vs JPEG's fixed 8×8), prediction from neighboring pixels, and better entropy coding. HEIC/HEIF (Apple, 2017), based on the H.265 video codec, uses even more advanced prediction, larger transform sizes, and sophisticated in-loop deblocking filters that smooth block boundaries before the decoder output.

AVIF (2019), based on the AV1 video codec, represents the current state of the art for lossy images. It uses film grain synthesis (encoding grain parameters instead of actual grain), extremely sophisticated prediction modes, multiple transform types, and neural-network-trained components. The complexity is enormous compared to JPEG, but the results speak for themselves.

Format	Year	Savings vs JPEG	Browser Support	Key Improvement
JPEG	1992	baseline	Universal	First practical lossy format
WebP	2010	25-35%	All modern	Variable blocks, prediction
HEIC	2017	40-50%	Safari, iOS	H.265 intra coding
AVIF	2019	50%+	Chrome, FF, Safari	AV1 technology, grain synthesis
JPEG XL	2021	60%+	Limited (Chrome dropped)	Progressive, HDR, lossless option

Despite vastly superior alternatives existing, JPEG persists because of its universal support. Every browser, every image viewer, every camera, every printer, every operating system understands JPEG. That's the power of being first and being "good enough" - even 30+ years later, JPEG remains the lingua franca of images on the web.