Optimizing Internal Linking Structure of Your Website
Internal Linking — a system of internal links that distributes link weight (PageRank) between pages and helps search robots discover and index content. Proper structure makes important pages more authoritative.
Audit of Current Structure
import scrapy
import networkx as nx
class InternalLinksSpider(scrapy.Spider):
name = 'internal_links'
start_urls = ['https://company.com']
def __init__(self):
self.graph = nx.DiGraph()
def parse(self, response):
current_url = response.url
for link in response.css('a[href]::attr(href)').getall():
absolute = response.urljoin(link)
if 'company.com' in absolute:
self.graph.add_edge(current_url, absolute)
yield response.follow(absolute, self.parse)
def closed(self, reason):
# Pages with highest PageRank
pagerank = nx.pagerank(self.graph)
top_pages = sorted(pagerank.items(), key=lambda x: x[1], reverse=True)[:20]
# Orphan pages (no incoming links)
orphans = [node for node in self.graph.nodes()
if self.graph.in_degree(node) == 0
and node != 'https://company.com']
print(f"Orphan pages: {len(orphans)}")
for url in orphans[:10]:
print(f" {url}")
Metrics for analysis:
- Orphan pages — pages with no incoming internal links
- Crawl depth — nesting depth (important pages should be 1–3 clicks from home)
- PageRank distribution — is weight distributed evenly
Principles of Proper Structure
Flat hierarchy — important pages close to home:
Home → Category → Product page (maximum 3 clicks)
Thematic clusters — pages on same topic link to each other:
Pillar page (main): /guide/seo
↔ /guide/seo/technical
↔ /guide/seo/on-page
↔ /guide/seo/link-building
Breadcrumbs — automatic internal links with Schema.org markup:
<nav aria-label="breadcrumb">
<ol itemscope itemtype="https://schema.org/BreadcrumbList">
<li itemprop="itemListElement" itemscope itemtype="https://schema.org/ListItem">
<a itemprop="item" href="/"><span itemprop="name">Home</span></a>
<meta itemprop="position" content="1">
</li>
</ol>
</nav>
Automatic Related Articles
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
def find_related_articles(target_article, all_articles, top_n=5):
texts = [a['title'] + ' ' + a['body'] for a in all_articles]
target_text = target_article['title'] + ' ' + target_article['body']
vectorizer = TfidfVectorizer(max_features=1000, stop_words='english')
tfidf_matrix = vectorizer.fit_transform([target_text] + texts)
similarities = cosine_similarity(tfidf_matrix[0:1], tfidf_matrix[1:]).flatten()
top_indices = similarities.argsort()[-top_n:][::-1]
return [all_articles[i] for i in top_indices if similarities[i] > 0.1]
Anchor Text Optimization
Anchor text tells search engines about the topic of the target page:
Bad: <a href="/guide/seo">here</a>
Bad: <a href="/guide/seo">click</a>
Good: <a href="/guide/seo">SEO guide</a>
Good: <a href="/guide/technical-seo">technical SEO audit</a>
Fixing Orphan Pages
def fix_orphan_pages(orphan_urls, content_db):
"""Find logical place to add links to orphan pages"""
for url in orphan_urls:
page = content_db.get_by_url(url)
keywords = extract_keywords(page['title'])
# Find pages mentioning these keywords
related = content_db.search(keywords, exclude_url=url, limit=5)
for related_page in related:
print(f"Add link to {url} from {related_page['url']}")
Timeline
Internal linking audit + recommendations for structure improvement — 2–3 business days.







