Setting Up Automatic Indexation Checking for New Pages
A new page published doesn't mean Google found it. Bot might visit in 3 hours or 3 weeks: depends on crawl budget, sitemap structure, update frequency. Manual checks via "site:" or GSC don't scale: with 50+ pages published monthly, automation is needed.
Task: on new page publication — submit for indexation, await confirmation, log result, alert on issues.
Level 1: Sitemap + Ping
Minimal option — auto-update sitemap.xml and ping search engines when adding new pages.
Google ping endpoint:
https://www.google.com/ping?sitemap=https://example.com/sitemap.xml
Called via GET request. Can be built into deploy or CMS publication hook:
import requests
def notify_google_sitemap(sitemap_url: str) -> bool:
ping_url = f"https://www.google.com/ping?sitemap={sitemap_url}"
resp = requests.get(ping_url, timeout=10)
return resp.status_code == 200
For WordPress — Yoast/RankMath plugins do this automatically. On custom CMS — hook into post_published event.
Sitemap with lastmod:
<url>
<loc>https://example.com/new-page/</loc>
<lastmod>2024-11-15T10:30:00+03:00</lastmod>
<changefreq>weekly</changefreq>
<priority>0.8</priority>
</url>
lastmod should update on every page change — else Google ignores as static.
Level 2: Google Indexing API
Official way to force indexation check. Originally for JobPosting/BroadcastEvent markup, but works for any URLs and significantly speeds indexation.
Service account setup:
- Google Cloud Console → IAM → Service Accounts → Create
- Create JSON key
- In GSC add service account as Owner (not Viewer — returns 403)
Notification code:
from google.oauth2 import service_account
from googleapiclient.discovery import build
SCOPES = ['https://www.googleapis.com/auth/indexing']
KEY_FILE = 'service-account-key.json'
credentials = service_account.Credentials.from_service_account_file(
KEY_FILE, scopes=SCOPES
)
service = build('indexing', 'v3', credentials=credentials)
def request_indexing(url: str, update_type: str = 'URL_UPDATED') -> dict:
"""update_type: URL_UPDATED | URL_DELETED"""
response = service.urlNotifications().publish(
body={
'url': url,
'type': update_type
}
).execute()
return response
# Result contains urlNotificationMetadata
# {'url': 'https://...', 'latestUpdate': {...}, 'latestRemove': {...}}
Batch requests (up to 100 URLs):
def batch_request_indexing(urls: list[str]) -> list:
batch = service.new_batch_http_request()
results = []
def callback(request_id, response, exception):
if exception:
results.append({'url': request_id, 'error': str(exception)})
else:
results.append(response)
for url in urls[:100]: # API limit
batch.add(
service.urlNotifications().publish(
body={'url': url, 'type': 'URL_UPDATED'}
),
request_id=url,
callback=callback
)
batch.execute()
return results
Quota: 200 requests/day for regular sites. Can request expansion in GSC.
Level 3: Checking Indexation Status
Submitting is half the task. Need to know if page got indexed.
GSC URL Inspection API:
def check_indexing_status(site_url: str, page_url: str) -> dict:
"""
site_url: 'https://example.com' — as registered in GSC
page_url: full page URL
"""
service = build('searchconsole', 'v1', credentials=credentials)
result = service.urlInspection().index().inspect(
body={
'inspectionUrl': page_url,
'siteUrl': site_url
}
).execute()
inspection = result.get('inspectionResult', {})
index_status = inspection.get('indexStatusResult', {})
return {
'verdict': index_status.get('verdict'), # PASS | FAIL | NEUTRAL
'coverage_state': index_status.get('coverageState'),
'last_crawl_time': index_status.get('lastCrawlTime'),
}
Possible coverageState values:
-
Submitted and indexed— page in index -
Crawled - currently not indexed— bot visited, didn't add -
Discovered - currently not indexed— found, not processed -
Excluded by 'noindex' tag— site-side issue
Full Automation: Complete Cycle
Process diagram:
Page publication
↓
Webhook/cron trigger
↓
Add URL to queue (Redis / DB)
↓
Worker: Indexing API → publish URL_UPDATED
↓
Cron after 48h: GSC Inspection API → check status
↓
If not indexed → retry + alert
↓
Log results
Monitoring table (PostgreSQL):
CREATE TABLE indexing_queue (
id SERIAL PRIMARY KEY,
url TEXT NOT NULL UNIQUE,
submitted_at TIMESTAMPTZ,
last_checked_at TIMESTAMPTZ,
status VARCHAR(64),
coverage_state TEXT,
attempts INT DEFAULT 0,
created_at TIMESTAMPTZ DEFAULT NOW()
);
-- Pages needing recheck
SELECT url, status, attempts
FROM indexing_queue
WHERE status NOT IN ('indexed', 'excluded')
AND attempts < 5
AND (last_checked_at IS NULL OR last_checked_at < NOW() - INTERVAL '48 hours')
ORDER BY created_at DESC;
Telegram alerts on issues:
async def send_telegram_alert(message: str, bot_token: str, chat_id: str):
url = f"https://api.telegram.org/bot{bot_token}/sendMessage"
await httpx.AsyncClient().post(url, json={
'chat_id': chat_id,
'text': message,
'parse_mode': 'HTML'
})
# Usage
pages_not_indexed = get_pages_not_indexed_after_7_days()
if pages_not_indexed:
msg = f"⚠️ {len(pages_not_indexed)} pages not indexed in 7 days:\n"
msg += "\n".join(p.url for p in pages_not_indexed[:10])
await send_telegram_alert(msg, BOT_TOKEN, CHAT_ID)
CMS Integration
WordPress — custom plugin with publish_post hook:
add_action('publish_post', function(int $post_id) {
$url = get_permalink($post_id);
wp_schedule_single_event(time() + 60, 'submit_url_to_indexing_api', [$url]);
});
Laravel — via Events/Listeners:
class PageCreated
{
public function __construct(public readonly Page $page) {}
}
class SubmitPageToGoogleIndexing implements ShouldQueue
{
public function handle(PageCreated $event): void
{
$url = route('page.show', $event->page->slug);
GoogleIndexingService::submit($url);
}
}
Timeline
Basic setup (sitemap ping + Indexing API on publication) — 1–2 business days. Full system with status monitoring, retry queue, alerts — 3–5 days.







