Database Schema Design for Web Applications
A database schema is a foundation that becomes harder to change after launch. Improperly normalized tables, missing foreign keys, or incorrect data types become technical debt that accumulates over years. Let's examine principles and concrete solutions for typical web applications.
Normalization and Denormalization
Third Normal Form (3NF) is a reasonable level for most applications. Each field depends only on the primary key, without transitive dependencies.
When denormalization is justified:
- Counters (
comments_count,likes_count) — instead of COUNT JOIN for every query. - Cached aggregates for reports (monthly order totals).
- Flattening hierarchical data for search.
When denormalization is harmful:
- Personal data duplicated across multiple tables.
- Statuses that change frequently.
Data Types: Common Mistakes
-- Bad
user_id INT -- overflows at 2.1 billion records
price FLOAT -- precision loss in financial calculations
status INT -- magic numbers, no domain constraint
created VARCHAR(30) -- string sorting instead of dates
settings TEXT -- no structure, not indexable
-- Good
user_id BIGINT -- or UUID
price DECIMAL(12, 2) -- exact arithmetic
status VARCHAR(20) CHECK (status IN ('draft', 'published', 'archived'))
created TIMESTAMPTZ -- with timezone
settings JSONB -- structured, indexable
TIMESTAMPTZ stores time in UTC and converts when reading according to the session's TimeZone. TIMESTAMP stores "as-is" — if server timezone changes, data loses meaning.
Primary Keys: SERIAL vs UUID vs ULID
-- SERIAL (auto-increment): simple, compact (8 bytes), predictable
id BIGSERIAL PRIMARY KEY
-- UUID v4: globally unique, but 16 bytes, random order = index fragmentation
id UUID PRIMARY KEY DEFAULT gen_random_uuid()
-- ULID via pg_ulid or application-side generation:
-- lexicographically sorted by time, 16 bytes
id UUID PRIMARY KEY DEFAULT uuid_generate_v7() -- PostgreSQL 17+
For most web applications, BIGSERIAL is the optimal choice. UUID is needed when IDs are generated client-side or predictability needs to be hidden.
Example: E-commerce Schema
CREATE TABLE categories (
id BIGSERIAL PRIMARY KEY,
name VARCHAR(200) NOT NULL,
slug VARCHAR(220) NOT NULL UNIQUE,
parent_id BIGINT REFERENCES categories(id) ON DELETE SET NULL,
sort_order INT NOT NULL DEFAULT 0,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE TABLE products (
id BIGSERIAL PRIMARY KEY,
title VARCHAR(500) NOT NULL,
slug VARCHAR(520) NOT NULL UNIQUE,
category_id BIGINT NOT NULL REFERENCES categories(id) ON DELETE RESTRICT,
price DECIMAL(12, 2) NOT NULL CHECK (price > 0),
status VARCHAR(20) NOT NULL DEFAULT 'draft'
CHECK (status IN ('draft', 'published', 'archived')),
stock INT NOT NULL DEFAULT 0 CHECK (stock >= 0),
specs JSONB,
search_vector TSVECTOR, -- for full-text search
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE TABLE orders (
id BIGSERIAL PRIMARY KEY,
user_id BIGINT NOT NULL REFERENCES users(id) ON DELETE RESTRICT,
status VARCHAR(20) NOT NULL DEFAULT 'pending'
CHECK (status IN ('pending', 'paid', 'shipped', 'completed', 'cancelled')),
total DECIMAL(12, 2) NOT NULL,
currency CHAR(3) NOT NULL DEFAULT 'USD',
meta JSONB, -- delivery address etc.
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE TABLE order_items (
id BIGSERIAL PRIMARY KEY,
order_id BIGINT NOT NULL REFERENCES orders(id) ON DELETE CASCADE,
product_id BIGINT NOT NULL REFERENCES products(id) ON DELETE RESTRICT,
quantity INT NOT NULL CHECK (quantity > 0),
unit_price DECIMAL(12, 2) NOT NULL, -- price at time of purchase
UNIQUE (order_id, product_id)
);
unit_price in order_items is intentional denormalization: product price changes over time, but historical price in the order must remain unchanged.
ON DELETE RESTRICT vs CASCADE — rule: CASCADE only when child records have no meaning without parent (order_items without order). RESTRICT when deleting the parent should be explicitly prevented (can't delete category with products).
Indexes: What to Add When Designing
Add these when creating the schema:
-- FK columns — always, otherwise DELETE parent = seq scan of child table
CREATE INDEX idx_products_category_id ON products (category_id);
CREATE INDEX idx_order_items_order_id ON order_items (order_id);
CREATE INDEX idx_order_items_product_id ON order_items (product_id);
-- Common filters
CREATE INDEX idx_products_status_created ON products (status, created_at DESC);
CREATE INDEX idx_orders_user_created ON orders (user_id, created_at DESC);
-- Partial index for active records
CREATE INDEX idx_products_published ON products (category_id, created_at DESC)
WHERE status = 'published';
-- GIN for JSONB
CREATE INDEX idx_products_specs ON products USING GIN (specs);
Audit and Soft Delete
Soft delete pattern:
ALTER TABLE products ADD COLUMN deleted_at TIMESTAMPTZ;
CREATE INDEX idx_products_deleted_at ON products (deleted_at) WHERE deleted_at IS NULL;
Partial index by WHERE deleted_at IS NULL — active records are indexed separately. Deleted records don't enter the index and don't slow down queries.
Audit pattern via trigger:
CREATE TABLE audit_log (
id BIGSERIAL PRIMARY KEY,
table_name VARCHAR(100) NOT NULL,
row_id BIGINT NOT NULL,
operation CHAR(1) NOT NULL CHECK (operation IN ('I', 'U', 'D')),
old_data JSONB,
new_data JSONB,
changed_by BIGINT REFERENCES users(id),
changed_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
Timelines
Schema design for a new project (up to 15 tables, ER diagram, SQL DDL, discussion of indexes and FK strategies): 1–2 days. Review and refactoring of existing schema with problem identification (wrong types, missing FK, redundant indexes): 1–3 days depending on volume.







