AI system for media and publishing businesses
The media industry is experiencing a structural crisis: audience attention is fragmented, and advertising revenues are declining. AI is helping to produce more relevant content, automate routine tasks, and retain readers through personalization.
Automation of content production
Automatic news generation:
Structured data → news text. Applications: - Sports scores: match ended 3-1, player statistics → automatic note - Financial reports: quarterly reports → brief analysis for the business press - Registry data: real estate transactions, legal entity changes → business briefs
from openai import OpenAI
client = OpenAI()
def generate_sports_report(match_data):
"""Генерация репортажа о матче из структурированных данных"""
prompt = f"""
Напиши спортивный репортаж объёмом 150-200 слов по данным матча:
Турнир: {match_data['tournament']}
Дата: {match_data['date']}
Команды: {match_data['home_team']} {match_data['score']} {match_data['away_team']}
Голы: {match_data['goals']}
Лучший игрок: {match_data['man_of_match']}
Ключевые события: {match_data['key_events']}
Стиль: профессиональный спортивный журнализм.
Не используй банальные фразы типа «команды сошлись в захватывающем матче».
"""
response = client.chat.completions.create(
model='gpt-4o-mini',
messages=[{'role': 'user', 'content': prompt}],
temperature=0.7
)
return response.choices[0].message.content
AI assistant editor:
LLM + tools for journalists: - Interview transcription (Whisper) + structuring of key quotes - Fact-checking: automatic verification of figures and facts through databases - SEO optimization: keyword analysis, recommendations for titles and subtitles
Personalization and recommendations
Next Article Recommendation:
Keep the reader on the site after reading the article: - Content-based filtering: articles with similar content (embedding similarity) - Collaborative filtering: what users with similar behavior read - Hybrid: a weighted combination, taking into account relevance (new articles get a boost)
from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np
class ArticleRecommender:
def __init__(self):
self.model = SentenceTransformer('paraphrase-multilingual-MiniLM-L12-v2')
self.article_embeddings = {}
def index_article(self, article_id, title, body, category, pub_date):
text = f"{title}. {body[:500]}"
embedding = self.model.encode(text)
self.article_embeddings[article_id] = {
'embedding': embedding,
'category': category,
'pub_date': pub_date,
'title': title
}
def recommend(self, current_article_id, user_history=None, top_k=5):
current_emb = self.article_embeddings[current_article_id]['embedding']
all_ids = [aid for aid in self.article_embeddings if aid != current_article_id]
all_embs = np.array([self.article_embeddings[aid]['embedding'] for aid in all_ids])
similarities = cosine_similarity([current_emb], all_embs)[0]
# Учесть свежесть: статьи старше 7 дней получают penalty
from datetime import datetime, timedelta
recency_scores = []
for aid in all_ids:
age_days = (datetime.now() - self.article_embeddings[aid]['pub_date']).days
recency = max(0, 1 - age_days / 30) # снижение за 30 дней
recency_scores.append(recency)
final_scores = similarities * 0.7 + np.array(recency_scores) * 0.3
top_indices = np.argsort(final_scores)[::-1][:top_k]
return [(all_ids[i], final_scores[i]) for i in top_indices]
Monetization and audience analytics
Propensity to Subscribe:
Free readers → paid subscribers. ML predicts P(subscribe_7d): - Features: reading depth, number of articles, RFM pattern, traffic source - Trigger email: at P > 0.4 → personalized offer (trial/discount)
Dynamic paywall:
Instead of a rigid "3 articles for free" - an adaptive paywall: - ML decides whether to show the wall or give another article based on P(subscribe) - High intent = show the wall; low = give more content, "warm up"
Advertising ML:
- Contextual targeting without cookies (GDPR compliant): page content analysis - Brand safety: ML checks whether an article is suitable for brand advertising - Viewability prediction: ML predicts whether a user will see a banner
Combating disinformation
Fact-checking:
- Cross-reference fact base (Wikidata, verified sources) - Stance detection: does the article contradict other publications on the same topic - Source credibility scoring: ML-assessment of the source's reliability
Development time: 4–7 months for a media AI platform with auto-generation, a recommendation system, and paywall optimization.







