Website SEO Audit
SEO audit is systematic check of website technical state, content factors and link profile. Result is not abstract score "7 out of 10", but concrete list of problems with priorities and solutions. Audit divides into technical part, on-page analysis and signal analysis.
Technical Part
Site Crawling
First step is crawl site with crawler (Screaming Frog, Sitebulb, or custom Python) and gather all URLs:
# Quick check via sitemap
import requests
from xml.etree import ElementTree
resp = requests.get('https://mysite.ru/sitemap.xml')
tree = ElementTree.fromstring(resp.content)
ns = {'sm': 'http://www.sitemaps.org/schemas/sitemap/0.9'}
urls = [loc.text for loc in tree.findall('.//sm:loc', ns)]
print(f'URLs in sitemap: {len(urls)}')
# Check status codes
errors = []
for url in urls[:200]: # first 200 for demo
r = requests.head(url, allow_redirects=True, timeout=5)
if r.status_code not in (200, 301, 302):
errors.append({'url': url, 'status': r.status_code})
print(f'Error URLs: {len(errors)}')
Checked Technical Parameters:
| Parameter | What to Look For | Criticality |
|---|---|---|
| Status Codes | 4xx in sitemap, 5xx in navigation | High |
| Duplicate Pages | Without canonical, ?sort=, ?page= |
High |
| robots.txt | Block needed sections | High |
| HTTPS | Mixed content, www/http redirects | High |
| Speed (Core Web Vitals) | LCP, INP, CLS | High |
| Canonical | Missing or incorrect self-canonical | Medium |
| Hreflang | Duplicate/incorrect language codes | Medium |
| Structured Data | schema.org errors | Medium |
Robots.txt and Indexation:
# Check robots.txt availability
curl -s https://mysite.ru/robots.txt
# Check indexation in Google
# site:mysite.ru — how many pages in index
# site:mysite.ru/admin — no closed sections in index
Sitemap:
- XML sitemap should contain only 200 pages
- No pages with
noindex -
<lastmod>should match real change date - Sitemap size: max 50,000 URLs or 50 MB per file
On-page Analysis
Title and Meta Description:
# Mass check title/description
from bs4 import BeautifulSoup
issues = []
for url in urls:
soup = BeautifulSoup(requests.get(url).text, 'html.parser')
title = soup.find('title')
desc = soup.find('meta', {'name': 'description'})
t_len = len(title.text) if title else 0
d_len = len(desc['content']) if desc and desc.get('content') else 0
if t_len == 0: issues.append((url, 'no_title'))
elif t_len > 70: issues.append((url, f'title_too_long:{t_len}'))
if d_len == 0: issues.append((url, 'no_description'))
elif d_len > 160: issues.append((url, f'desc_too_long:{d_len}'))
Norms: title 50–70 characters, description 120–160 characters. Template descriptions ("Buy X in our store. Best prices!" for all products) — duplicate problem.
Headings H1–H6:
- One H1 per page with main keyword
- H2–H3 structure content, don't duplicate H1
- Order: H1 → H2 → H3 (no skipping)
Content:
- Thin content: pages with less than 300 words without structure
- Duplicate content: identical product descriptions from supplier feed
- Pagination pages:
rel="next"/rel="prev"or canonical to first
Internal Linking
Good internal link structure distributes page weight and helps crawlers. Problems:
- Orphan pages — pages without incoming internal links (crawler won't find them)
- Click depth > 3 for important pages
- Broken internal links — links to deleted/redirected pages
-- For SQL CMS: find pages without incoming links
SELECT p.url
FROM pages p
LEFT JOIN internal_links il ON il.target_url = p.url
WHERE il.id IS NULL AND p.status = 'published'
Structured Data
Google supports rich snippets for: Product, Article, BreadcrumbList, FAQPage, Review, LocalBusiness, Event.
Validation:
# Google Rich Results Test (API)
curl "https://searchconsole.googleapis.com/v1/urlTestingTools/mobileFriendlyTest:run" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"url": "https://mysite.ru/product/123"}'
Manual check: search.google.com/test/rich-results.
Core Web Vitals Analysis in SEO Context
Google uses CrUX data (field data) for ranking. If competitor has LCP < 2.5s but you have > 4s — this factor affects positions, especially when else equal. Check via GSC: Search Console > Core Web Vitals shows percent URLs in Good/Needs improvement/Poor.
Link Profile (Basic Level)
In scope of technical audit we check:
- Ahrefs/Semrush index — toxic links from spammed domains
- anchor text distribution — overoptimization (>60% commercial anchors)
- lost links to deleted pages without 301 redirect
Report Format
Audit formatted as table with priorities:
| Issue | Pages | Priority | Effort | Expected Effect |
|---|---|---|---|---|
| Duplicate titles | 340 | Critical | 2 days | Rise in SERP CTR |
| Missing canonical on pagination | 180 | High | 0.5 day | Eliminate duplicates |
| LCP > 4s on mobile | All pages | High | 3–5 days | Rise in positions |
| Orphan pages in catalog | 52 | Medium | 1 day | Improve crawling |
| Missing schema.org Product | 1200 products | Medium | 2 days | Rich snippets |
Audit Timeline
Technical crawl, on-page analysis, structured data, CWV, prioritized report for site with 1000–5000 URLs: 3–4 days. Large eCommerce (50,000+ SKU): 5–7 days.







