Every digital photograph begins with photons - particles of light streaming through a lens and striking a silicon sensor. The journey from those photons to the JPEG file you share online involves physics, electronics, signal processing, and sophisticated compression algorithms. Understanding this pipeline reveals both the magic and the limitations of digital imaging.
The photoelectric effect
At the heart of every digital camera is an image sensor - a silicon chip covered with millions of tiny light-sensitive elements called photosites. When a photon strikes a photosite, it can knock an electron loose from the silicon lattice through the photoelectric effect. These freed electrons accumulate in a potential well, and the number collected over the exposure time represents the amount of light that hit that location.
Modern sensors are remarkably efficient. Quantum efficiency - the percentage of incident photons that successfully generate electrons - can exceed 90% for well-designed sensors. But efficiency varies with wavelength: most silicon sensors are more sensitive to red and near-infrared light than to blue.
Each photosite is essentially a tiny bucket collecting electrons. The bucket has a limited capacity - its full well capacity - typically tens of thousands of electrons for consumer cameras, hundreds of thousands for professional models. When the bucket overflows, excess electrons spill into neighboring pixels, causing the 'blooming' artifacts sometimes visible around extremely bright light sources. Modern sensors include anti-blooming drains to channel overflow electrons away before they contaminate neighbors.
The Bayer filter pattern
Silicon photosites cannot distinguish color - they only count photons regardless of wavelength. To capture color information, nearly all consumer cameras use a Color Filter Array (CFA) over the sensor. The most common pattern, invented by Bryce Bayer at Kodak in 1976, places red, green, and blue filters over individual photosites in a repeating pattern: RGGB, with twice as many green filters as red or blue.
Bayer Filter Pattern
How it works:
Each photosite captures only one color channel. The RGGB pattern has twice as many green pixels because human eyes are most sensitive to green wavelengths (~555nm peak sensitivity).
The predominance of green filters matches human vision: we are most sensitive to green light and derive most of our luminance information from it. The Bayer pattern trades spatial resolution for color information - at each photosite, two of the three color channels are missing and must be estimated from neighbors.
Alternative filter patterns exist. Fujifilm's X-Trans pattern uses a more complex 6x6 arrangement that reduces moiré artifacts without an optical low-pass filter. Some cameras aimed at astrophotography omit the filter array entirely for maximum sensitivity, capturing monochrome images. Foveon sensors stack three layers of silicon, each sensitive to different wavelengths, capturing full color at every pixel location - but this approach has struggled with noise and market adoption.
Demosaicing: reconstructing color
The process of estimating the missing color values is called demosaicing or debayering. At a red photosite, the camera must estimate what the green and blue values would have been. The simplest approach is bilinear interpolation - average the surrounding known values - but this produces visible artifacts along high-contrast edges.
Modern demosaicing algorithms are far more sophisticated. Edge-directed interpolation detects the orientation of edges and interpolates along them rather than across. Adaptive algorithms analyze local image content and switch strategies based on what they find. The best algorithms produce remarkably clean results, but all demosaicing is fundamentally making educated guesses - the true full-color information was never captured.
Demosaicing can produce several types of artifacts. Zipper artifacts appear as alternating light and dark pixels along edges. False colors appear when the algorithm misjudges edge orientation, creating colored fringes that do not exist in the original scene. Maze artifacts create a waffle-like pattern in areas of fine detail. These artifacts are most visible in high-frequency regions like fabric patterns, brick walls, or wire fences - which is why many cameras include an optical low-pass filter to slightly blur the image before it reaches the sensor.
// Simple bilinear demosaicing for a green pixel at (x, y)
function demosaicGreen(raw: number[][], x: number, y: number): RGB {
// Green is already known
const g = raw[y][x]
// Red: average of top and bottom (or left and right)
const r = (raw[y-1][x] + raw[y+1][x]) / 2
// Blue: average of diagonals
const b = (raw[y-1][x-1] + raw[y-1][x+1] +
raw[y+1][x-1] + raw[y+1][x+1]) / 4
return { r, g, b }
}Analog to digital conversion
The electrons accumulated in each photosite must be converted to digital values. First, the charge is read out row by row, converted to a voltage by a charge amplifier. This analog signal then passes through an analog-to-digital converter (ADC) that outputs a digital number - typically 12 or 14 bits, giving 4096 or 16384 possible values.
The choice of bit depth matters for post-processing latitude. A 14-bit raw file has 16 times more tonal levels than a 12-bit file, which can make a visible difference when pushing shadows or recovering highlights. But more bits also mean larger files and slower processing.
The conversion process introduces read noise - random errors that occur every time the sensor is read. This noise sets a floor on the minimum detectable signal. Dynamic range - the ratio between the brightest and darkest tones a sensor can capture - is limited by full well capacity on the bright end and read noise on the dark end. Modern sensors achieve 12-14 stops of dynamic range, meaning the brightest recordable tone is roughly 4000-16000 times brighter than the darkest.
Image processing pipeline
The raw sensor data must be processed before it becomes a viewable image. White balance adjusts the relative gains of the red, green, and blue channels to compensate for the color temperature of the illumination - so a white object appears white regardless of whether it was photographed under warm tungsten light or cool daylight.
Tone curves map the linear sensor response to a perceptually pleasing output. Noise reduction suppresses the random variations that become especially visible in shadows. Sharpening enhances edge contrast to counteract the softening effects of the lens and demosaicing. Color matrices transform the sensor's native color response to a standard color space like sRGB.
Modern cameras also correct for lens imperfections. Vignetting (darkening at corners) is compensated by brightening the edges. Distortion (barrel or pincushion warping) is mathematically unwrapped. Chromatic aberration (color fringing from wavelength-dependent refraction) is corrected by slightly scaling the color channels differently. These corrections happen automatically for known lenses, using profiles stored in camera firmware or raw processing software.
JPEG compression basics
The first step is color space conversion. JPEG converts from RGB to Y'CbCr - a luminance channel (Y') plus two chrominance channels (Cb and Cr). Human vision is much more sensitive to luminance detail than color detail, so the chrominance channels can be subsampled, reducing them to half or quarter resolution with minimal visible impact.
Chroma subsampling is described with ratios like 4:4:4, 4:2:2, and 4:2:0. The numbers represent, in a 4-pixel-wide region, how many chroma samples exist in the first row and the second row. 4:4:4 means full resolution for all channels (no subsampling). 4:2:2 means half horizontal chroma resolution. 4:2:0, the most common for JPEGs, means half resolution in both dimensions - one chroma sample for every four luminance samples. This alone reduces file size by 50% with minimal quality loss for most images.
JPEG DCT Compression
How it works:
Lower quality = more aggressive quantization = fewer DCT coefficients retained. High-frequency detail (textures, edges) is discarded first, as humans are less sensitive to these components.
The Discrete Cosine Transform
JPEG divides the image into 8x8 pixel blocks and applies the Discrete Cosine Transform (DCT) to each block. The DCT converts the spatial pattern of pixel values into a set of frequency components - essentially describing the block as a sum of cosine waves at different frequencies and orientations.
The DCT is lossless - you can perfectly reconstruct the original block from its DCT coefficients. The compression comes from the next step: quantization. Each DCT coefficient is divided by a value from a quantization table and rounded to the nearest integer. This discards information, especially in high-frequency components that contribute less to perceived image quality.
The quantization table is where quality settings matter. Lower quality means larger quantization divisors, more aggressive rounding, and more information loss. The standard JPEG quantization tables apply stronger compression to high-frequency coefficients (which represent fine detail) than to low-frequency ones (which represent broad tonal variations). After quantization, many coefficients become zero, especially in the high-frequency positions, which is what makes the subsequent entropy coding so effective.
Compression artifacts
Heavy JPEG compression produces characteristic artifacts. Blocking artifacts appear when the 8x8 DCT blocks become visible as a grid pattern. Ringing artifacts - halos around sharp edges - result from discarding the high-frequency components needed to represent sharp transitions. Color bleeding occurs in highly compressed images when chrominance subsampling smears color across edges.
Mosquito noise is a particularly annoying artifact that appears as shimmering or buzzing around sharp edges, especially in video. It results from the interaction between quantization and the block structure - the same edge might fall at different positions within its 8x8 block in adjacent frames, causing the artifacts to shift position. This is why video codecs use motion compensation to keep blocks aligned across frames.
Understanding these artifacts helps in choosing appropriate compression levels. For archival purposes, use high quality settings that preserve detail. For web delivery, moderate compression is usually acceptable. For thumbnails, aggressive compression is fine since the small size hides most artifacts.
Beyond JPEG
JPEG dates from 1992 and reflects the hardware constraints of that era. Modern codecs like HEIC (based on HEVC/H.265) and AVIF (based on AV1) achieve significantly better quality at the same file size - or smaller files at the same quality. They use larger variable-size blocks, more sophisticated prediction, and better entropy coding.
WebP, developed by Google, offers both lossy and lossless compression with smaller files than JPEG at equivalent quality. AVIF pushes even further, achieving impressive compression ratios but at the cost of encode time - great for serving images, slow for editing workflows. JPEG XL is an emerging standard that combines excellent compression with features photographers want: lossless transcoding of existing JPEGs, high bit depth, and wide gamut support. The challenge is not creating better formats - it is getting browsers, cameras, and software to support them.
The photon-to-JPEG pipeline is a remarkable chain of physical, analog, and digital processes - each step introducing constraints and trade-offs that propagate through to the final image. Understanding this pipeline helps photographers make better technical decisions and helps engineers build better imaging systems.