AI User Content Moderation

Our company is engaged in the development, support and maintenance of sites of any complexity. From simple one-page sites to large-scale cluster systems built on micro services. Experience of developers is confirmed by certificates from vendors.
Development and maintenance of all types of websites:
Informational websites or web applications
Business card websites, landing pages, corporate websites, online catalogs, quizzes, promo websites, blogs, news resources, informational portals, forums, aggregators
E-commerce websites or web applications
Online stores, B2B portals, marketplaces, online exchanges, cashback websites, exchanges, dropshipping platforms, product parsers
Business process management web applications
CRM systems, ERP systems, corporate portals, production management systems, information parsers
Electronic service websites or web applications
Classified ads platforms, online schools, online cinemas, website builders, portals for electronic services, video hosting platforms, thematic portals

These are just some of the technical types of websites we work with, and each of them can have its own specific features and functionality, as well as be customized to meet the specific needs and goals of the client.

Our competencies:
Development stages
Latest works
  • image_web-applications_feedme_466_0.webp
    Development of a web application for FEEDME
    1161
  • image_ecommerce_furnoro_435_0.webp
    Development of an online store for the company FURNORO
    1041
  • image_crm_enviok_479_0.webp
    Development of a web application for Enviok
    822
  • image_crm_chasseurs_493_0.webp
    CRM development for Chasseurs
    847
  • image_website-sbh_0.png
    Website development for SBH Partners
    999
  • image_website-_0.png
    Website development for Red Pear
    451

Implementing AI Content Moderation for User-Generated Content

User-generated content — comments, reviews, images, chat messages — requires moderation. Manual review doesn't scale: at 10,000 posts per day, a team of five moderators simply can't handle it. AI moderation solves this automatically, leaving only borderline cases to humans.

What Can Be Automatically Moderated

Text content: spam, profanity, hate speech, threats, exposed personal data. Images: explicit material, violence, copyright infringement (via perceptual hashing). Links: phishing, malicious domains. Tone: toxic comments without explicit banned words.

Each category requires its own model or API endpoint — there's no universal solution.

Moderation System Architecture

Synchronous pre-publication check — user submits content, server checks before saving. Latency 200–800ms. Suitable for critical scenarios: paid reviews, legally significant posts.

Asynchronous queue — content is saved with status pending, background worker checks via queue (RabbitMQ, SQS, Redis Streams). Publication happens after approval or after N minutes if no violations. Suitable for high-load forums and chats.

Hybrid scheme — fast synchronous check by simple rules (banned words, length, patterns) + asynchronous ML-check for content that passed the initial filter.

POST /api/comment
  → sync: banned words check (< 5ms)
  → sync: OpenAI Moderation API (< 300ms)
  → save with status=published/flagged
  → async: image scan if attachments

Tools and APIs

OpenAI Moderation API — free /v1/moderations endpoint. Returns categories: hate, hate/threatening, self-harm, sexual, violence, harassment. Text only. No prompt needed — separate specialized model.

import openai

def moderate_text(content: str) -> dict:
    response = openai.moderations.create(input=content)
    result = response.results[0]

    if result.flagged:
        categories = {k: v for k, v in result.categories.__dict__.items() if v}
        return {"allowed": False, "categories": categories}

    return {"allowed": True}

Google Perspective API — toxicity analysis with score 0 to 1. Attributes: TOXICITY, SEVERE_TOXICITY, IDENTITY_ATTACK, INSULT, PROFANITY, THREAT. Supports multilingual content. Free quota: 1 QPS, paid from $0.25 per 1000 requests.

AWS Rekognition — image moderation. API DetectModerationLabels returns label hierarchy with confidence score. Categories: Explicit Nudity, Violence, Visually Disturbing, Hate Symbols.

Azure Content Safety — text and images in one API. Categories: hate, sexual, violence, self-harm. Each scored 0–6. Includes Groundedness Detection for response verification.

Custom Fine-Tuned Model

For domain-specific content (technical forum with specialized terminology, medical platform), third-party APIs produce many false positives. Solution — fine-tune on your own data.

Process: gather dataset of 2000–5000 labeled examples (approved/rejected), fine-tune distilbert-base-multilingual-cased via Hugging Face Transformers, deploy as separate service.

from transformers import pipeline

classifier = pipeline(
    "text-classification",
    model="./moderation-model",
    device=0  # GPU
)

def classify_content(text: str) -> tuple[str, float]:
    result = classifier(text, truncation=True, max_length=512)[0]
    return result["label"], result["score"]

Inference on CPU — ~50ms for text up to 512 tokens. On GPU (T4) — ~5ms.

Image Processing

Before sending to API, preprocess: resize to 2048px on longest side, convert to JPEG quality 85%, strip EXIF metadata. This reduces cost and speeds response.

To protect against uploading known-banned content — PhotoDNA (Microsoft) or pHash comparison with hash database. PhotoDNA integrates via Azure, pHash is self-implemented:

import imagehash
from PIL import Image

def compute_phash(image_path: str) -> str:
    img = Image.open(image_path)
    return str(imagehash.phash(img))

def is_known_violation(phash: str, banned_hashes: set, threshold: int = 10) -> bool:
    for banned in banned_hashes:
        if imagehash.hex_to_hash(phash) - imagehash.hex_to_hash(banned) < threshold:
            return True
    return False

Manual Moderation Dashboard

Automation doesn't decide borderline cases — they're shown to a moderator. Manual queue contains:

  • content with confidence 0.4–0.7 (uncertain result)
  • content reported by users
  • content from new accounts without history

UI: list with filters, hotkeys for quick decisions (approve/reject/escalate), decision history tied to operator, accuracy metrics per operator.

Feedback and Retraining

Model degrades as content patterns change. Improvement cycle:

  1. Save all decisions (automatic and manual) with labels
  2. Weekly analyze discrepancies: where automation failed, moderator corrected
  3. Monthly retrain model on accumulated corrections
  4. A/B test new version on 10% traffic before full rollout

Monitoring

Metrics for Grafana/Datadog:

  • moderation.requests.total — total volume
  • moderation.latency.p99 — 99th percentile latency
  • moderation.flagged.rate — share of blocked content
  • moderation.false_positive.rate — share of incorrectly blocked (by appeals)
  • moderation.queue.depth — manual moderation queue depth

Alert: if false_positive.rate > 5% in 24 hours — model needs review.

Timeline

Stage Timeline
OpenAI Moderation API + basic rules integration 3–5 days
Asynchronous queue + content statuses 3–4 days
Manual moderation dashboard 5–7 days
Image moderation (AWS Rekognition) 2–3 days
Fine-tune custom model 10–15 days
Retraining cycle + monitoring 3–5 days

Basic integration with OpenAI Moderation API and manual review queue — 2 weeks. Full system with custom model, monitoring, and dashboard — 5–6 weeks.