SQL Query Optimization for Web Applications
Slow queries kill UX faster than any other factor. 95% of database performance problems are solved by one of four methods: adding an index, rewriting a query, denormalization, or caching. Let's examine diagnostics and concrete techniques.
Diagnostic Tools
pg_stat_statements — first thing to enable on production:
-- postgresql.conf
shared_preload_libraries = 'pg_stat_statements'
pg_stat_statements.max = 10000
pg_stat_statements.track = all
-- Top 20 queries by total time
SELECT
round(total_exec_time::numeric, 2) AS total_ms,
round(mean_exec_time::numeric, 2) AS mean_ms,
calls,
round((stddev_exec_time / mean_exec_time * 100)::numeric, 1) AS coeff_var_pct,
left(query, 120) AS query
FROM pg_stat_statements
WHERE calls > 100
ORDER BY total_exec_time DESC
LIMIT 20;
coeff_var_pct — coefficient of variation: high percentage indicates unstable plan (different parameters give drastically different time).
EXPLAIN ANALYZE:
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT p.*, c.name AS category_name
FROM products p
JOIN categories c ON c.id = p.category_id
WHERE p.status = 'published'
AND p.created_at > NOW() - INTERVAL '30 days'
ORDER BY p.created_at DESC
LIMIT 50;
Key nodes in the plan:
-
Seq Scanon large table — no index or planner decided index isn't worthwhile. -
Nested Loopwith many iterations — N+1 at SQL level. -
Hash JoinwithBatches > 1— not enoughwork_mem. -
SortwithoutIndex Scanon ORDER BY column — no suitable index.
Typical Antipatterns and Solutions
1. SELECT * in ORM
-- Bad: fetching all 30 columns, including BLOB fields
SELECT * FROM products WHERE category_id = 5;
-- Good: only needed fields
SELECT id, title, slug, price, status FROM products WHERE category_id = 5;
2. OFFSET pagination on large tables
-- Bad: at offset=10000 PostgreSQL reads 10050 rows and discards 10000
SELECT * FROM products ORDER BY created_at DESC LIMIT 50 OFFSET 10000;
-- Good: keyset pagination
SELECT * FROM products
WHERE (created_at, id) < ('2024-03-01 12:00:00', 5000)
ORDER BY created_at DESC, id DESC
LIMIT 50;
Keyset pagination requires composite index (created_at DESC, id DESC) and passing last value from previous page.
3. OR on different columns
-- Bad: OR often doesn't use index
SELECT * FROM users WHERE email = $1 OR phone = $1;
-- Good: UNION
SELECT * FROM users WHERE email = $1
UNION ALL
SELECT * FROM users WHERE phone = $1
LIMIT 1;
4. Functions in WHERE
-- Bad: function on column blocks index
SELECT * FROM orders WHERE DATE(created_at) = '2024-03-15';
-- Good: range condition
SELECT * FROM orders
WHERE created_at >= '2024-03-15'
AND created_at < '2024-03-16';
5. NOT IN with subquery
-- Bad: if subquery returns NULL — NOT IN returns empty result
SELECT * FROM products WHERE id NOT IN (SELECT product_id FROM order_items);
-- Good: NOT EXISTS or LEFT JOIN
SELECT p.* FROM products p
LEFT JOIN order_items oi ON oi.product_id = p.id
WHERE oi.product_id IS NULL;
JOIN Optimization
-- Add composite index for typical filter
CREATE INDEX idx_orders_user_status_created
ON orders (user_id, status, created_at DESC);
-- Query uses index scan without Sort
SELECT id, total, status, created_at
FROM orders
WHERE user_id = $1
AND status = 'completed'
ORDER BY created_at DESC
LIMIT 10;
Index column order matters: equality conditions first (user_id = $1, status = 'completed'), then range/sort (created_at DESC).
work_mem and Sort Spill
If in EXPLAIN ANALYZE we see external merge (Disk: ...) during Sort — increase work_mem for the session:
SET work_mem = '64MB';
-- Execute heavy analytical query
-- Reset or use connection-level setting
In postgresql.conf keep work_mem low (4-8MB default) and raise for specific queries via SET LOCAL work_mem.
CTE vs Subquery vs Lateral
-- CTE (WITH) in PostgreSQL 12+ optimizes as subquery by default
-- Before PG12: CTE always materializes — optimization fence
WITH recent_orders AS (
SELECT user_id, COUNT(*) AS cnt
FROM orders
WHERE created_at > NOW() - INTERVAL '30 days'
GROUP BY user_id
)
SELECT u.*, ro.cnt
FROM users u
JOIN recent_orders ro ON ro.user_id = u.id;
-- LATERAL: for row-dependent subqueries
SELECT u.id, u.email, recent.total
FROM users u
CROSS JOIN LATERAL (
SELECT SUM(total) AS total
FROM orders o
WHERE o.user_id = u.id
AND o.created_at > NOW() - INTERVAL '30 days'
) AS recent;
LATERAL allows using variables from outer query in subquery — often gives better plan than JOIN on aggregated CTE.
Planner Statistics
If plan chooses wrong access method — update statistics:
ANALYZE products;
-- or to update specific column with higher accuracy:
ALTER TABLE products ALTER COLUMN status SET STATISTICS 500;
ANALYZE products (status);
Default default_statistics_target = 100. For high-cardinality columns (timestamps, UUID) raise to 200–500.
Timelines
Diagnostics and optimization of 10–15 slow queries (via pg_stat_statements, EXPLAIN ANALYZE, adding indexes, rewriting): 2–3 days. Deep audit of schema and queries for high-load application: 3–5 days.







