Typo-Tolerant Search Implementation for Web Application

Our company is engaged in the development, support and maintenance of sites of any complexity. From simple one-page sites to large-scale cluster systems built on micro services. Experience of developers is confirmed by certificates from vendors.

8+Years of workmore info 900+Completed projectsmore info 100+In house employeesmore info 19+Partnersmore info

Development and maintenance of all types of websites:

Informational websites or web applications

Business card websites, landing pages, corporate websites, online catalogs, quizzes, promo websites, blogs, news resources, informational portals, forums, aggregators

E-commerce websites or web applications

Online stores, B2B portals, marketplaces, online exchanges, cashback websites, exchanges, dropshipping platforms, product parsers

Business process management web applications

CRM systems, ERP systems, corporate portals, production management systems, information parsers

Electronic service websites or web applications

Classified ads platforms, online schools, online cinemas, website builders, portals for electronic services, video hosting platforms, thematic portals

These are just some of the technical types of websites we work with, and each of them can have its own specific features and functionality, as well as be customized to meet the specific needs and goals of the client.

Offered services

Showing 1 of 1 servicesAll 2065 services

Typo-Tolerant Search Implementation for Web Application

Complex

~2-3 business days

FAQ

Our competencies:

Free consultation

Book a free consultation if you have any questions. A dedicated specialist will advise you.

Cost calculation

If you know what exactly you need to develop, or you already have a ready-made technical task.

Development stages

Latest works

Development of a web application for FEEDME
1171
Development of an online store for the company FURNORO
1094
Development of a web application for Enviok
831
CRM development for Chasseurs
879
Website development for SBH Partners
999
Website development for Red Pear
453

Show more works

Typo-Tolerant Search Implementation for Web Applications

Users make mistakes: "hedphones", "wireles", "samsnge". Search without fuzzy matching returns empty results, losing conversions. Implement typo-tolerant search three ways — depending on scale and requirements.

Distance Metrics: Levenshtein vs Damerau-Levenshtein

Levenshtein distance: minimum insertions, deletions, substitutions to transform one string into another.

Damerau-Levenshtein adds transposition (adjacent character swap): "haedphones" → "headphones" — is 1 transposition, not 2 operations. Better for search.

PostgreSQL: pg_trgm

pg_trgm — PostgreSQL extension for similarity search based on trigrams. Works without external services.

CREATE EXTENSION IF NOT EXISTS pg_trgm;

-- Index for similarity search
CREATE INDEX idx_products_title_trgm ON products USING GIN (title gin_trgm_ops);
CREATE INDEX idx_products_description_trgm ON products USING GIN (description gin_trgm_ops);

-- Search with similarity threshold
SET pg_trgm.similarity_threshold = 0.3;

SELECT id, title, similarity(title, 'hedphones') AS sim
FROM products
WHERE title % 'hedphones'         -- similarity operator
ORDER BY sim DESC
LIMIT 10;

-- Or combine FTS + fuzzy:
SELECT
    p.id,
    p.title,
    p.price,
    greatest(
        similarity(p.title, 'wireles hedphones'),
        ts_rank(p.search_vector, plainto_tsquery('russian', 'wireles hedphones'))
    ) AS relevance
FROM products p
WHERE
    p.title % 'wireles hedphones'
    OR p.search_vector @@ plainto_tsquery('russian', 'wireles')
ORDER BY relevance DESC
LIMIT 20;

% — similarity operator, uses GIN index. Without index degrades to seq scan.

Threshold tuning: 0.3 — liberal (lots of noise), 0.5 — strict (few typos). For short queries (1–2 words) threshold should be lower.

Meilisearch: Dedicated Fuzzy Search Engine

Meilisearch written in Rust, supports typo tolerance out-of-the-box, simple to configure.

# Docker
docker run -p 7700:7700 getmeili/meilisearch:latest

# Or binary
curl -L https://install.meilisearch.com | sh
./meilisearch --master-key="your-master-key"

Index configuration:

import meilisearch

client = meilisearch.Client('http://localhost:7700', 'your-master-key')
index = client.index('products')

# Search settings
index.update_settings({
    'searchableAttributes': ['title', 'brand', 'description', 'tags'],
    'filterableAttributes': ['category_id', 'status', 'price', 'brand'],
    'sortableAttributes': ['price', 'created_at', 'popularity'],
    'rankingRules': [
        'words',
        'typo',
        'proximity',
        'attribute',
        'sort',
        'exactness',
    ],
    'typoTolerance': {
        'enabled': True,
        'minWordSizeForTypos': {
            'oneTypo': 5,     # words >= 5 chars allow 1 typo
            'twoTypos': 9,    # words >= 9 chars allow 2 typos
        },
        'disableOnWords': ['iPhone', 'iPad', 'MacBook'],  # brands without fuzzy
        'disableOnAttributes': ['sku', 'barcode'],
    },
    'pagination': {
        'maxTotalHits': 10000,
    },
})

Data indexing:

import asyncio
from typing import Any

async def sync_products_to_meilisearch(products: list[dict[str, Any]]) -> None:
    """Batch product sync."""
    documents = [
        {
            'id': p['id'],
            'title': p['title'],
            'brand': p.get('brand', ''),
            'description': p.get('description', ''),
            'category_id': p['category_id'],
            'price': float(p['price']),
            'status': p['status'],
            'tags': [t['name'] for t in p.get('tags', [])],
            'created_at': p['created_at'].timestamp(),
            'popularity': p.get('view_count', 0),
        }
        for p in products
        if p['status'] == 'published'
    ]

    # Meilisearch accepts batches up to 100MB
    batch_size = 1000
    for i in range(0, len(documents), batch_size):
        batch = documents[i:i + batch_size]
        task = index.add_documents(batch)
        index.wait_for_task(task.task_uid)

Search:

from dataclasses import dataclass

@dataclass
class SearchParams:
    query: str
    category_id: int | None = None
    price_min: float | None = None
    price_max: float | None = None
    page: int = 1
    hits_per_page: int = 20
    sort: str = 'relevance'  # relevance | price:asc | price:desc | created_at:desc

def build_filter(params: SearchParams) -> str | None:
    filters = ['status = "published"']

    if params.category_id:
        filters.append(f'category_id = {params.category_id}')
    if params.price_min is not None:
        filters.append(f'price >= {params.price_min}')
    if params.price_max is not None:
        filters.append(f'price <= {params.price_max}')

    return ' AND '.join(filters) if filters else None

def search_products(params: SearchParams) -> dict:
    sort_map = {
        'price:asc':   ['price:asc'],
        'price:desc':  ['price:desc'],
        'created_at:desc': ['created_at:desc'],
        'relevance':   [],  # default ranking rules
    }

    results = index.search(params.query, {
        'filter':       build_filter(params),
        'sort':         sort_map.get(params.sort, []),
        'page':         params.page,
        'hitsPerPage':  params.hits_per_page,
        'attributesToHighlight': ['title', 'description'],
        'highlightPreTag':  '<mark>',
        'highlightPostTag': '</mark>',
        'attributesToCrop': {'description': 200},
        'showMatchesPosition': False,
    })

    return results

Elasticsearch: Fuzzy Query

If Elasticsearch already used for FTS — fuzzy built-in:

{
  "query": {
    "bool": {
      "should": [
        {
          "multi_match": {
            "query": "hedphones",
            "fields": ["title^3", "brand^2", "description"],
            "fuzziness": "AUTO",
            "prefix_length": 2,
            "max_expansions": 50
          }
        },
        {
          "match_phrase": {
            "title": {
              "query": "hedphones",
              "slop": 2
            }
          }
        }
      ]
    }
  }
}

prefix_length: 2 — first 2 characters must match exactly. Reduces false positives and speeds query.

AUTO fuzziness: 0 typos for ≤2 chars, 1 for 3–5 chars, 2 for 6+ chars.

Approach Selection

For small catalog (up to 100k records) with PostgreSQL — pg_trgm sufficient. For large catalog with facets, filters and <10ms requirement — Meilisearch. For analytics platform with aggregations — Elasticsearch.

Timelines

pg_trgm (extension, indexes, queries, threshold tuning): 1 day. Meilisearch (deploy, index config, sync, API): 2–3 days. Fuzzy in existing Elasticsearch cluster: 1 day.