Synonym Search Implementation for Web Applications
Synonyms expand search coverage: user searches "laptop" — finds results with "notebook" and "laptop". Without synonyms, search is tied to specific word forms and loses relevant results.
PostgreSQL: Thesaurus Dictionary
PostgreSQL FTS supports thesaurus — file with word replacement rules during indexing.
Create /etc/postgresql/14/main/thesaurus_ru.ths:
# Syntax: input words : replaced with
laptop notebook computer : laptop
phone smartphone mobile : phone
headphones earphones : headphones
tv television : tv
refrigerator fridge : refrigerator
Create text search configuration:
CREATE TEXT SEARCH DICTIONARY thesaurus_ru (
TEMPLATE = thesaurus,
DictFile = thesaurus_ru,
Dictionary = russian_ispell -- base dictionary for input normalization
);
CREATE TEXT SEARCH CONFIGURATION search_ru (COPY = russian);
ALTER TEXT SEARCH CONFIGURATION search_ru
ALTER MAPPING FOR asciiword, word, numword
WITH thesaurus_ru, russian_stem;
Check:
SELECT to_tsvector('search_ru', 'notebook Dell with SSD');
-- Result: 'dell':2 'laptop':1 'ssd':4
-- "notebook" replaced with "laptop"
Update index with new configuration:
UPDATE products SET search_vector =
setweight(to_tsvector('search_ru', coalesce(title, '')), 'A') ||
setweight(to_tsvector('search_ru', coalesce(description, '')), 'C');
-- Query now finds "notebook" when searching "laptop":
SELECT id, title
FROM products
WHERE search_vector @@ plainto_tsquery('search_ru', 'laptop');
Limitation: PostgreSQL applies synonyms only during indexing, not search. Adding new synonym requires reindexing data.
Elasticsearch: Synonym Token Filter
Elasticsearch processes synonyms both at indexing and search (via search_analyzer).
Option 1: Synonym file:
# config/synonyms_ru.txt
laptop, notebook, computer
phone, smartphone, mobile
headphones, earphones
tv, television
PUT /products
{
"settings": {
"analysis": {
"filter": {
"synonym_ru": {
"type": "synonym",
"synonyms_path": "synonyms_ru.txt",
"updateable": true
},
"russian_stop": {
"type": "stop",
"stopwords": "_russian_"
},
"russian_stemmer": {
"type": "stemmer",
"language": "russian"
}
},
"analyzer": {
"ru_with_synonyms": {
"tokenizer": "standard",
"filter": [
"lowercase",
"russian_stop",
"russian_stemmer",
"synonym_ru"
]
}
}
}
},
"mappings": {
"properties": {
"title": {
"type": "text",
"analyzer": "ru_with_synonyms",
"search_analyzer": "ru_with_synonyms"
}
}
}
}
"updateable": true — update synonyms without reindexing via API:
POST /products/_reload_search_analyzers
Option 2: Query-time synonyms:
{
"query": {
"bool": {
"should": [
{
"match": {
"title": {
"query": "laptop",
"analyzer": "ru_with_synonyms"
}
}
}
]
}
}
}
Query-time synonyms are flexible: no reindexing on dictionary changes.
Option 3: Synonym graph for multiword phrases:
{
"filter": {
"synonym_graph_ru": {
"type": "synonym_graph",
"synonyms": [
"washing machine => washer",
"mobile phone => smartphone, phone",
"ssd drive => solid state drive"
]
}
}
}
synonym_graph correctly handles multiword synonyms — standard synonym breaks token positions for phrase search.
Meilisearch: Built-in Synonyms
import meilisearch
client = meilisearch.Client('http://localhost:7700', 'masterKey')
index = client.index('products')
# Update synonym dictionary
index.update_synonyms({
'laptop': ['notebook', 'computer'],
'phone': ['smartphone', 'mobile'],
'headphones': ['earphones'],
'tv': ['television'],
})
Meilisearch applies synonyms at search time — reindexing unnecessary. Dictionary updates via API in seconds.
Synonym Dictionary Management
Synonyms should be managed by business, not just developers:
# api/synonyms.py (FastAPI)
from fastapi import APIRouter, Depends
from pydantic import BaseModel
router = APIRouter(prefix='/admin/synonyms')
class SynonymGroup(BaseModel):
words: list[str] # all words in group are mutual synonyms
@router.get('/')
async def list_synonyms():
return index.get_synonyms()
@router.put('/')
async def update_synonyms(groups: list[SynonymGroup]):
"""Replace entire synonym dictionary."""
synonym_dict: dict[str, list[str]] = {}
for group in groups:
for word in group.words:
# each word references others in group
synonym_dict[word.lower()] = [
w.lower() for w in group.words if w.lower() != word.lower()
]
task = index.update_synonyms(synonym_dict)
return {'task_uid': task.task_uid, 'status': 'accepted'}
@router.delete('/')
async def clear_synonyms():
return index.reset_synonyms()
Synonym Testing
import pytest
def test_synonym_search():
results_laptop = index.search('laptop', {'limit': 5})
results_notebook = index.search('notebook', {'limit': 5})
ids_laptop = {h['id'] for h in results_laptop['hits']}
ids_notebook = {h['id'] for h in results_notebook['hits']}
# Results should intersect
assert len(ids_laptop & ids_notebook) > 0, (
f"Synonyms don't work: {ids_laptop} vs {ids_notebook}"
)
Timelines
PostgreSQL thesaurus (dictionary, configuration, reindexing): 1 day. Elasticsearch with synonym_graph and admin API: 1–2 days. Meilisearch (synonyms + API management): half a day–1 day.







