Existing content audit before website migration

Our company is engaged in the development, support and maintenance of sites of any complexity. From simple one-page sites to large-scale cluster systems built on micro services. Experience of developers is confirmed by certificates from vendors.

8+Years of workmore info 900+Completed projectsmore info 100+In house employeesmore info 19+Partnersmore info

Development and maintenance of all types of websites:

Informational websites or web applications

Business card websites, landing pages, corporate websites, online catalogs, quizzes, promo websites, blogs, news resources, informational portals, forums, aggregators

E-commerce websites or web applications

Online stores, B2B portals, marketplaces, online exchanges, cashback websites, exchanges, dropshipping platforms, product parsers

Business process management web applications

CRM systems, ERP systems, corporate portals, production management systems, information parsers

Electronic service websites or web applications

Classified ads platforms, online schools, online cinemas, website builders, portals for electronic services, video hosting platforms, thematic portals

These are just some of the technical types of websites we work with, and each of them can have its own specific features and functionality, as well as be customized to meet the specific needs and goals of the client.

Offered services

Showing 1 of 1 servicesAll 2065 services

Existing content audit before website migration

Medium

~2-3 business days

FAQ

Our competencies:

Free consultation

Book a free consultation if you have any questions. A dedicated specialist will advise you.

Cost calculation

If you know what exactly you need to develop, or you already have a ready-made technical task.

Development stages

Latest works

Development of a web application for FEEDME
1170
Development of an online store for the company FURNORO
1094
Development of a web application for Enviok
830
CRM development for Chasseurs
879
Website development for SBH Partners
999
Website development for Red Pear
453

Show more works

Implementing Content Audit Before Migration

Content audit is inventorying and analyzing all materials on current site before transferring to new platform. Allows understanding what to transfer, what to update, and what to delete.

Audit Tasks

Compile complete URL list on site
Identify outdated and duplicate content
Determine SEO metrics for each page
Find broken links and missing metadata
Prioritize content for transfer

Site Crawling for Inventory

Screaming Frog — standard audit tool:

Configuration:
- Configuration → Spider → Crawl all subdomains
- Enable: JS rendering (for SPA)
- Export: All tabs → Save as CSV

Result: CSV with fields — URL, Title, Meta Description, H1, Status Code, Indexability, Word Count, Inlinks, Outlinks.

Python crawler for automation:

import scrapy
from scrapy.crawler import CrawlerProcess

class ContentAuditSpider(scrapy.Spider):
    name = 'content_audit'
    start_urls = ['https://company.com']
    custom_settings = {
        'DEPTH_LIMIT': 10,
        'DOWNLOAD_DELAY': 0.5,
        'FEEDS': {'audit_results.csv': {'format': 'csv'}}
    }

    def parse(self, response):
        yield {
            'url': response.url,
            'status': response.status,
            'title': response.css('title::text').get(''),
            'h1': response.css('h1::text').get(''),
            'meta_description': response.css('meta[name="description"]::attr(content)').get(''),
            'canonical': response.css('link[rel="canonical"]::attr(href)').get(''),
            'robots': response.css('meta[name="robots"]::attr(content)').get('all'),
            'word_count': len(' '.join(response.css('main *::text').getall()).split()),
            'internal_links': len(response.css('a[href^="/"]')),
            'images_without_alt': len(response.css('img:not([alt])')),
            'last_modified': response.headers.get('Last-Modified', b'').decode()
        }

        for link in response.css('a::attr(href)').getall():
            yield response.follow(link, self.parse)

Analyzing SEO Data

Export from Google Search Console:

Performance → Pages: clicks, impressions, CTR, position
Coverage: indexed / not indexed pages
URL Inspection: status of specific URLs

Matching crawl data in Python:

import pandas as pd

crawl_data = pd.read_csv('audit_results.csv')
gsc_data = pd.read_csv('gsc_pages.csv')  # export from GSC

merged = crawl_data.merge(gsc_data, on='url', how='left')
merged['has_seo_value'] = merged['clicks'] > 0  # pages with traffic

Content Classification

Each page gets a label:

Decision	Criteria
Transfer	Clicks > 0, unique content, relevant
Update during transfer	Content outdated but has SEO value
Consolidate	Duplicate pages on same topic
Delete + redirect	No traffic, duplicate, thin content
Don't transfer	Test pages, archive, service URLs

def classify_page(row):
    if row.get('noindex') or row['status'] != 200:
        return 'skip'
    if row.get('clicks', 0) > 100 or row.get('inlinks', 0) > 5:
        return 'migrate_priority_high'
    if row.get('word_count', 0) < 100:
        return 'review_thin_content'
    if row.get('clicks', 0) > 0:
        return 'migrate'
    return 'archive'

Media File Analysis

# List all media files on server
find /var/www/uploads -type f | awk -F. '{print $NF}' | sort | uniq -c

# Files without links from content (potential garbage)
# Step 1: extract all img src from DB
mysql -e "SELECT DISTINCT image_url FROM posts WHERE image_url IS NOT NULL" > used_files.txt

# Step 2: compare with files on disk
comm -23 <(ls /uploads/ | sort) <(sort used_files.txt)

SEO Metadata Inventory

# Find pages without meta description
missing_meta = merged[merged['meta_description'].isna() | (merged['meta_description'] == '')]
print(f"Without meta description: {len(missing_meta)} pages")

# Find duplicate Titles
duplicate_titles = merged[merged.duplicated(subset='title', keep=False)]
print(f"Duplicate Titles: {len(duplicate_titles)} pages")

# Export tasks for copywriters
missing_meta[['url', 'title', 'h1']].to_csv('tasks_add_meta.csv', index=False)

Final Audit Report

Report structure:

Summary statistics (total URLs, statuses, distribution by type)
SEO health (% pages with meta description, H1, canonical)
Technical issues (broken links, error pages)
Page list by decision (table with URL and action)
Recommendations for transfer priorities

Execution Time

Audit of site up to 1000 pages with classification and report — 3–5 working days.