Decoding QR codes

How those squares store data

2,775 words14 min read

Those black and white squares are everywhere - restaurant menus, business cards, payment systems, vaccine passports. QR codes seem like magic: point your camera at a jumble of pixels and suddenly you're on a website. But there's no magic, just clever engineering that packs redundancy, error correction, and elegant geometry into a 2D barcode that can survive coffee stains and crumpled paper. Understanding how QR codes work reveals beautiful mathematics hiding in plain sight.

A brief history: from factory floors to everywhere

QR codes were invented in 1994 by Masahiro Hara at Denso Wave, a Japanese automotive company, to track vehicle parts during manufacturing. Traditional 1D barcodes were too limited - they could only store about 20 characters and had to be scanned at specific angles. Hara's team needed something that could store more data, be scanned quickly from any direction, and survive the harsh factory environment.

The name stands for 'Quick Response' - the primary design goal was speed. While regular barcodes encode data in one dimension (varying line widths), QR codes use both dimensions, dramatically increasing capacity. A typical 1D barcode holds 20-25 characters; a QR code can store over 4,000. The distinctive finder patterns in the corners allow instant orientation detection, living up to the 'quick' promise.

Denso Wave made a crucial decision: they patented the technology but chose not to exercise the patent rights. This openness allowed QR codes to spread freely. After initially being used mainly in Japanese manufacturing and logistics, QR codes exploded into mainstream use with the smartphone era. The 2020 pandemic accelerated adoption further, as contactless menus and digital health passes became ubiquitous.

QR Code Anatomy

Encoding Modes

Numeric
0-9
3.33/char
Max: 7,089 chars
Alphanumeric
0-9, A-Z, $%*+-./:
5.5/char
Max: 4,296 chars
Byte
ISO 8859-1
8/char
Max: 2,953 chars
Kanji
Japanese
13/char
Max: 1,817 chars
Click any component to highlight it. QR codes use Reed-Solomon error correction allowing up to 30% damage recovery.
Explore the structural components of a QR code

Anatomy of a QR code: every square has a purpose

Every QR code consists of several distinct regions, each serving a specific purpose in the detection and decoding process. Understanding these components reveals why QR codes are so robust - nearly every design choice optimizes for reliable scanning under adverse conditions.

Finder patterns: the eyes that never blink

The three large squares in the corners are [[finder patterns]] - they're why QR codes are instantly recognizable. These 7x7 module patterns have a specific ratio of black:white:black:white:black (1:1:3:1:1) that's unlikely to occur naturally in other images. Scanners look for this ratio in horizontal, vertical, and diagonal scans to locate the code regardless of orientation or perspective.

Why three corners and not four? With three finder patterns, the scanner can determine both the location AND orientation of the code unambiguously. The fourth corner is intentionally left open - this asymmetry lets you always know which corner is 'top-left' because it's the only one adjacent to two other finder patterns. This was a key insight from Hara's team: the minimum information needed for complete orientation.

FINDER PATTERN STRUCTURE (7x7 modules)

█ █ █ █ █ █ █     <- Row of 7 black
█ ░ ░ ░ ░ ░ █     <- Black, 5 white, black
█ ░ █ █ █ ░ █     <- Black, white, 3 black, white, black
█ ░ █ █ █ ░ █     <- (center row)
█ ░ █ █ █ ░ █     <- 
█ ░ ░ ░ ░ ░ █     <- Black, 5 white, black  
█ █ █ █ █ █ █     <- Row of 7 black

Cross-section ratio (any line through center):
Black : White : Black : White : Black
  1   :   1   :   3   :   1   :   1

This ratio is checked in all directions during scanning.
The 1:1:3:1:1 ratio is distinctive enough to avoid false positives.

Alignment patterns: fighting distortion

Larger QR codes (Version 2 and above) include [[alignment patterns]] - smaller 5x5 patterns distributed throughout the code. These help correct for perspective distortion when the code is photographed at an angle, wrapped around a curved surface, or slightly warped by printing imperfections. The scanner uses these reference points to 'unwarp' the image before reading data.

The number and placement of alignment patterns increases with QR code version. Version 2 has one alignment pattern; Version 7 has six; Version 40 has forty-six. They're carefully positioned to avoid overlapping with finder patterns and to provide evenly-distributed reference points across the code surface.

Timing patterns: establishing the grid

Alternating black and white modules run horizontally and vertically between the finder patterns, forming a crosshatch. These [[timing patterns]] help the scanner determine the exact module (pixel) size and count. Even if the code is slightly stretched, compressed, or photographed at an angle, the timing patterns provide a reliable reference for establishing the grid structure.

Format and version information: metadata zones

A 15-bit [[format string]] near the finder patterns encodes two critical pieces of information: the error correction level (L/M/Q/H) and which mask pattern was applied to the data. This information is duplicated in two locations for redundancy - if one copy is damaged, the other can be read. The format string itself is protected by BCH error correction code.

Larger codes (Version 7 and above) also include [[version information]] - an 18-bit string that specifies exactly which of the 40 possible QR code sizes is being used. This is also duplicated and error-protected. Having version information embedded means the scanner doesn't have to guess the size by counting modules - crucial for handling low-resolution or partially obscured images.

Encoding modes: optimizing for your data

QR codes support multiple [[encoding modes]] optimized for different types of data. The mode affects how efficiently data is packed into the available space. Smart mode selection can significantly increase capacity for specific data types.

ModeCharacters supportedBits per charBest for
Numeric0-9 only3.33Phone numbers, product IDs, ZIP codes
Alphanumeric0-9, A-Z (uppercase), space, $%*+-./:5.5URLs (uppercase), short codes
ByteISO 8859-1 (Latin-1) or UTF-8 with ECI8General text, binary data, URLs with lowercase
KanjiShift JIS double-byte characters13Japanese text (2x more efficient than byte mode)

A single QR code can switch between modes to optimize capacity. For example, encoding 'TEL:5551234567' might use alphanumeric for 'TEL:' and numeric for the phone number. The mode indicator (4 bits) precedes each data segment, followed by a character count indicator whose length varies by mode and version.

ENCODING EFFICIENCY EXAMPLE

Data: "HELLO123"
Contains: Letters (A-Z) and digits (0-9)

Option 1: All Byte mode
  8 characters × 8 bits = 64 bits

Option 2: All Alphanumeric mode  
  8 characters × 5.5 bits = 44 bits (28% smaller!)

Option 3: Mixed modes
  "HELLO" in Alphanumeric: 5 × 5.5 = 27.5 bits
  Mode switch overhead: 4 bits
  "123" in Numeric: 3 × 3.33 = 10 bits
  Total: 41.5 bits + overhead ≈ similar to pure alphanumeric

Best choice: Pure alphanumeric (mode switching not worth it for this data)

NUMERIC MODE ENCODING:
Groups digits into threes, encodes each group as 10-bit number
"123" → 123 → 0001111011 (10 bits for 3 digits!)
"45"  → 45  → 0101101    (7 bits for 2 digits)  
"6"   → 6   → 0110       (4 bits for 1 digit)

Error correction: the mathematical miracle

The most impressive feature of QR codes is their ability to remain readable even when partially damaged. This uses [[Reed-Solomon error correction]], the same algorithm used in CDs, DVDs, deep-space communication, and RAID storage systems. It's arguably the most important enabling technology for QR codes' real-world reliability.

LevelNameRecovery capacityOverheadUse case
LLow~7% damage~20% extra dataClean environment, maximum data capacity
MMedium~15% damage~38% extra dataGeneral purpose (default)
QQuartile~25% damage~55% extra dataIndustrial, outdoor, rough handling
HHigh~30% damage~65% extra dataHarsh conditions, decorative logos

Reed-Solomon works by treating the data as coefficients of a polynomial and evaluating it at multiple points. The genius is that from enough points, the original polynomial (and thus the data) can be reconstructed even if some points are wrong or missing. It can correct up to t errors if 2t redundant symbols are added - this mathematical guarantee is why QR codes can promise specific recovery percentages.

REED-SOLOMON ERROR CORRECTION (simplified)

1. VIEW DATA AS POLYNOMIAL
   Data bytes: [67, 85, 70, 69]
   Polynomial: 67x³ + 85x² + 70x + 69

2. GENERATE ERROR CORRECTION CODEWORDS
   Divide data polynomial by generator polynomial
   Remainder = error correction codewords
   These are appended to the data

3. TRANSMIT/ENCODE
   Send: [data bytes] + [error correction bytes]
   In QR: Interleave across blocks for burst error resistance

4. ON DECODE: DETECT AND CORRECT
   Check if received polynomial evaluates correctly at test points
   If errors found, solve system of equations to locate and fix them

WHY IT WORKS:
- Any k points uniquely determine a polynomial of degree k-1
- With n=k+2t points, can reconstruct even if t are wrong
- Math guarantees: can ALWAYS correct up to t errors

EXAMPLE: Level M with 15% recovery
- For every 100 data symbols, add ~38 error correction symbols
- Can correct up to ~19 unknown errors
- That's where "15% recovery" comes from

Data masking: avoiding optical illusions

Raw encoded data might accidentally create patterns that look like finder patterns or timing patterns, confusing the scanner. Large areas of the same color can also cause scanning problems due to how camera sensors work. To prevent this, QR codes apply one of eight [[mask patterns]] to the data area.

The encoder tries all eight masks and chooses the one that minimizes a penalty score. The penalty function considers: runs of same-color modules, 2x2 blocks of same color, patterns resembling finder patterns, and overall color imbalance. The chosen mask number is stored in the format information so the decoder can reverse it.

// The 8 QR code mask patterns
// Each formula determines if module at (row, col) should be inverted

const masks = [
  (r, c) => (r + c) % 2 === 0,           // Mask 0: checkerboard
  (r, c) => r % 2 === 0,                  // Mask 1: horizontal lines
  (r, c) => c % 3 === 0,                  // Mask 2: vertical stripes
  (r, c) => (r + c) % 3 === 0,            // Mask 3: diagonal stripes
  (r, c) => (Math.floor(r/2) + Math.floor(c/3)) % 2 === 0,  // Mask 4
  (r, c) => (r*c) % 2 + (r*c) % 3 === 0,  // Mask 5
  (r, c) => ((r*c) % 2 + (r*c) % 3) % 2 === 0,  // Mask 6
  (r, c) => ((r+c) % 2 + (r*c) % 3) % 2 === 0   // Mask 7
]

// PENALTY SCORING (applied to find best mask)
// Rule 1: 5+ same-color in row/column → 3 + (count-5) points
// Rule 2: 2×2 same-color block → 3 points each
// Rule 3: Pattern matching finder (1:1:3:1:1) → 40 points each
// Rule 4: Color imbalance → 10 points per 5% deviation from 50%

// The mask with LOWEST total penalty wins

The complete encoding process

Let's trace how the text 'HELLO' gets encoded into a QR code, step by step:

ENCODING "HELLO" INTO A QR CODE

1. ANALYZE DATA
   "HELLO" contains only uppercase letters
   Best mode: Alphanumeric (5.5 bits/char vs 8 for byte)
   
2. DETERMINE VERSION AND ERROR CORRECTION
   5 characters in alphanumeric = 28 bits of data
   Version 1 with EC Level M can hold 34 alphanumeric chars ✓
   
3. BUILD DATA BITSTREAM
   Mode indicator:    0010 (alphanumeric mode)
   Char count:        000000101 (5 characters, 9 bits for V1 alphanum)
   
   Encode character pairs (each pair → 11 bits):
   H=17, E=14 → 17×45 + 14 = 779 → 01100001011
   L=21, L=21 → 21×45 + 21 = 966 → 01111000110
   Final O=24 (odd char) → 6 bits: 011000
   
   Data: 0010 000000101 01100001011 01111000110 011000
   
4. ADD TERMINATOR AND PADDING
   Terminator: 0000 (end of data)
   Pad to byte boundary with zeros
   Fill remaining capacity with 11101100 00010001 alternating

5. CALCULATE ERROR CORRECTION
   Version 1-M uses RS(26,16,10) - 26 total, 16 data, 10 EC
   Divide data polynomial by generator polynomial
   Append 10 EC codewords
   
6. STRUCTURE FINAL MESSAGE
   Interleave data and EC codewords (for V1, just one block)
   Add remainder bits if needed (0 for V1)
   
7. PLACE IN MATRIX
   Reserve areas for finder, timing, format, version info
   Place data in zigzag pattern from bottom-right
   
8. APPLY BEST MASK
   Try all 8 masks, calculate penalty for each
   Apply mask with lowest penalty
   
9. ADD FORMAT INFORMATION
   EC level (M=00) + Mask pattern → 5 bits
   BCH error protection → 10 bits  
   XOR with 101010000010010 (ensures no all-zero format)
   Place around finder patterns (twice for redundancy)

QR code versions and capacity

QR codes come in 40 [[versions]], from 21×21 modules (Version 1) to 177×177 modules (Version 40). Each version increase adds 4 modules per side, providing more data capacity but requiring higher resolution to scan reliably. The version is typically chosen automatically based on data length and error correction level.

VersionModulesNumeric (L)Alphanumeric (L)Bytes (L)Typical use
121×21412517Short URLs, WiFi passwords
537×371549364Business cards, product IDs
1057×57395240165Long URLs, contact info
2097×971,249758521Large data, documents
40177×1777,0894,2962,953Maximum capacity

The scanning and decoding pipeline

Modern smartphone cameras decode QR codes through a sophisticated pipeline that happens in milliseconds. Understanding this process explains why some QR codes scan easily while others frustrate users.

QR CODE SCANNING PIPELINE

1. IMAGE ACQUISITION
   └── Camera captures frame (typically 720p+ for reliable scanning)
   └── Image may be blurry, tilted, partially obscured
   
2. PREPROCESSING
   └── Convert to grayscale
   └── Apply adaptive thresholding (handle varying lighting)
   └── Optional: edge enhancement, noise reduction
   
3. FINDER PATTERN DETECTION
   └── Scan rows for 1:1:3:1:1 black/white ratio
   └── Scan columns for same ratio
   └── Find intersections → candidate finder centers
   └── Validate: three patterns forming valid triangle?
   └── Check relative sizes consistent?
   
4. PERSPECTIVE CORRECTION
   └── Use three finder patterns to compute affine transform
   └── Apply homography to "unwarp" image to square
   └── For larger codes: use alignment patterns for refinement
   
5. GRID SAMPLING
   └── Read timing patterns to determine module count
   └── Calculate grid cell size
   └── Sample center of each cell → black (1) or white (0)
   └── Handle borderline cases with confidence scoring
   
6. FORMAT DECODING
   └── Read format bits near finder patterns
   └── Apply BCH error correction
   └── Extract: EC level (2 bits) + mask pattern (3 bits)
   └── If first copy fails, try redundant copy
   
7. DATA EXTRACTION
   └── XOR data area with mask pattern to remove masking
   └── Read bits in zigzag pattern from bottom-right
   └── Separate data codewords and EC codewords
   └── De-interleave if multiple blocks
   
8. ERROR CORRECTION
   └── For each block: apply Reed-Solomon decoding
   └── Detect errors via syndrome calculation
   └── Locate and correct errors (if within capacity)
   └── If errors exceed capacity → scan failed
   
9. DATA DECODING
   └── Parse mode indicators
   └── For each segment:
       └── Read character count
       └── Decode characters per mode (numeric/alphanum/byte/kanji)
   └── Concatenate segments
   └── Return decoded string!

Security considerations: QR codes as attack vectors

QR codes are essentially machine-readable URLs or data, which makes them potential vectors for attacks. A malicious QR code could redirect to a phishing site, trigger automatic downloads, inject commands into vulnerable apps, or social-engineer users into dangerous actions.

QR CODE SECURITY THREATS

1. PHISHING
   - QR leads to fake login page mimicking legitimate service
   - User enters credentials, attacker captures them
   - Defense: Always preview URL before opening

2. STICKER ATTACKS
   - Attacker places malicious QR sticker over legitimate one
   - Common on parking meters, restaurant tables
   - Defense: Check for stickers, verify URL domain

3. MALWARE DELIVERY
   - QR links to malicious APK/app download
   - May exploit auto-download on some browsers
   - Defense: Never install apps from QR codes

4. WIFI CREDENTIAL THEFT
   - Fake "free wifi" QR actually joins attacker's network
   - All traffic then passes through attacker
   - Defense: Verify network legitimacy independently

5. PAYMENT FRAUD
   - QR code for payment modified to attacker's account
   - Common in regions with QR-based payment systems
   - Defense: Verify recipient before confirming payment

6. QUISHING (QR + Phishing)
   - QR in email bypasses URL scanners
   - Email security can't inspect QR code contents
   - Defense: Treat QR codes in emails with extra suspicion

SAFE SCANNING PRACTICES:
- Use camera app that shows URL preview
- Verify domain before tapping
- Be suspicious of stickers or tampering
- Never scan QR codes in unsolicited emails
- Don't assume QR in physical location is legitimate

Beyond basic QR: variations and future

The original QR code has spawned several variants for specific use cases:

**Micro QR**: A smaller format (11×11 to 17×17 modules) with only one finder pattern, used where space is extremely limited like on tiny electronic components. Lower capacity (up to 35 numeric characters) but much smaller minimum size.

**iQR**: A rectangular variant that can be as small as a traditional barcode while still offering 2D data density. Can also store more data than standard QR of the same area. Developed by Denso Wave for industrial applications where shape constraints matter.

**Frame QR**: Designed to incorporate graphics or text in a large center region by placing data only around the edges. Maintains scannability while allowing substantial visual customization - great for branded marketing materials.

**Secure QR (SQRC)**: Contains both public and private data regions, where the private portion requires a special scanner with the correct key to decode. Used for authentication, anti-counterfeiting, and access control applications.

**JAB Code**: A newer colored barcode that uses 8 colors instead of just black and white, increasing data density by roughly 3×. Not widely adopted yet but promising for applications needing maximum capacity in minimum space.

The QR code was designed to be decoded quickly and accurately. The key was making a code that could convey a lot of information but could be read even if it was dirty or damaged.

Masahiro Hara, QR Code inventor
How Things Work - A Visual Guide to Technology