Database Old Data Archiving Implementation
Table events with 500 million rows and growth of 2 million daily is a problem that becomes acute daily. SELECT slows down, VACUUM can't keep up, indexes consume gigabytes. Archiving solves it: hot data stays in main DB, cold data moves to archive with different storage mode.
Archiving Strategies
Partition detach — if table is partitioned, old partitions detach and move to archive DB or tablespace. Fastest approach.
INSERT + DELETE in batches — for unpartitioned tables. Copy rows to archive table in batches, delete from main. Doesn't create long transactions.
Logical replica — set up publication on main DB, subscription on archive DB, filtered by date. Archive updates in real time.
Dump + truncate — export to CSV/parquet, delete from DB. Data no longer in PostgreSQL/MySQL — only in file archive.
PostgreSQL: Batch Movement
-- Archive table (can be separate schema or database)
CREATE TABLE archive.events (
LIKE public.events INCLUDING ALL
);
-- Archiving function with batches
CREATE OR REPLACE FUNCTION archive_old_events(
p_before_date TIMESTAMPTZ,
p_batch_size INTEGER DEFAULT 10000
) RETURNS TABLE(batches_processed INTEGER, rows_archived BIGINT)
LANGUAGE plpgsql AS $$
DECLARE
v_batches INTEGER := 0;
v_total BIGINT := 0;
v_moved INTEGER;
BEGIN
LOOP
-- Move one batch to archive
WITH moved AS (
DELETE FROM public.events
WHERE id IN (
SELECT id FROM public.events
WHERE created_at < p_before_date
LIMIT p_batch_size
FOR UPDATE SKIP LOCKED -- skip locked rows
)
RETURNING *
)
INSERT INTO archive.events SELECT * FROM moved;
GET DIAGNOSTICS v_moved = ROW_COUNT;
EXIT WHEN v_moved = 0;
v_batches := v_batches + 1;
v_total := v_total + v_moved;
-- Pause between batches — don't overload disk
PERFORM pg_sleep(0.1);
-- Progress
RAISE NOTICE 'Batch %: % rows archived (total: %)', v_batches, v_moved, v_total;
END LOOP;
RETURN QUERY SELECT v_batches, v_total;
END $$;
Run:
SELECT * FROM archive_old_events('2024-01-01'::timestamptz, 10000);
-- Batch 1: 10000 rows archived (total: 10000)
-- Batch 2: 10000 rows archived (total: 20000)
-- ...
SKIP LOCKED allows multiple parallel workers without conflicts.
Scheduler: Artisan Command
// app/Console/Commands/ArchiveOldData.php
class ArchiveOldData extends Command
{
protected $signature = 'db:archive {--days=365 : Archive data older than N days}';
protected $description = 'Archive old records to archive tables';
public function handle(): int
{
$beforeDate = now()->subDays($this->option('days'))->toDateTimeString();
$this->info("Archiving events before {$beforeDate}...");
$result = DB::selectOne(
'SELECT * FROM archive_old_events(?::timestamptz, 5000)',
[$beforeDate]
);
$this->info("Done: {$result->batches_processed} batches, {$result->rows_archived} rows");
// VACUUM after bulk delete
DB::statement('VACUUM ANALYZE events');
return 0;
}
}
Schedule:
// app/Console/Kernel.php
$schedule->command('db:archive --days=365')->daily('03:00');







