Typo-Tolerant Search Implementation for Web Application

Our company is engaged in the development, support and maintenance of sites of any complexity. From simple one-page sites to large-scale cluster systems built on micro services. Experience of developers is confirmed by certificates from vendors.

Development and maintenance of all types of websites:

Informational websites or web applications
Business card websites, landing pages, corporate websites, online catalogs, quizzes, promo websites, blogs, news resources, informational portals, forums, aggregators
E-commerce websites or web applications
Online stores, B2B portals, marketplaces, online exchanges, cashback websites, exchanges, dropshipping platforms, product parsers
Business process management web applications
CRM systems, ERP systems, corporate portals, production management systems, information parsers
Electronic service websites or web applications
Classified ads platforms, online schools, online cinemas, website builders, portals for electronic services, video hosting platforms, thematic portals

These are just some of the technical types of websites we work with, and each of them can have its own specific features and functionality, as well as be customized to meet the specific needs and goals of the client.

Showing 1 of 1 servicesAll 2065 services
Typo-Tolerant Search Implementation for Web Application
Complex
~2-3 business days
FAQ

Our competencies:

Development stages

Latest works

  • image_web-applications_feedme_466_0.webp
    Development of a web application for FEEDME
    1171
  • image_ecommerce_furnoro_435_0.webp
    Development of an online store for the company FURNORO
    1094
  • image_crm_enviok_479_0.webp
    Development of a web application for Enviok
    831
  • image_crm_chasseurs_493_0.webp
    CRM development for Chasseurs
    879
  • image_website-sbh_0.png
    Website development for SBH Partners
    999
  • image_website-_0.png
    Website development for Red Pear
    453

Typo-Tolerant Search Implementation for Web Applications

Users make mistakes: "hedphones", "wireles", "samsnge". Search without fuzzy matching returns empty results, losing conversions. Implement typo-tolerant search three ways — depending on scale and requirements.

Distance Metrics: Levenshtein vs Damerau-Levenshtein

Levenshtein distance: minimum insertions, deletions, substitutions to transform one string into another.

Damerau-Levenshtein adds transposition (adjacent character swap): "haedphones" → "headphones" — is 1 transposition, not 2 operations. Better for search.

PostgreSQL: pg_trgm

pg_trgm — PostgreSQL extension for similarity search based on trigrams. Works without external services.

CREATE EXTENSION IF NOT EXISTS pg_trgm;

-- Index for similarity search
CREATE INDEX idx_products_title_trgm ON products USING GIN (title gin_trgm_ops);
CREATE INDEX idx_products_description_trgm ON products USING GIN (description gin_trgm_ops);

-- Search with similarity threshold
SET pg_trgm.similarity_threshold = 0.3;

SELECT id, title, similarity(title, 'hedphones') AS sim
FROM products
WHERE title % 'hedphones'         -- similarity operator
ORDER BY sim DESC
LIMIT 10;

-- Or combine FTS + fuzzy:
SELECT
    p.id,
    p.title,
    p.price,
    greatest(
        similarity(p.title, 'wireles hedphones'),
        ts_rank(p.search_vector, plainto_tsquery('russian', 'wireles hedphones'))
    ) AS relevance
FROM products p
WHERE
    p.title % 'wireles hedphones'
    OR p.search_vector @@ plainto_tsquery('russian', 'wireles')
ORDER BY relevance DESC
LIMIT 20;

% — similarity operator, uses GIN index. Without index degrades to seq scan.

Threshold tuning: 0.3 — liberal (lots of noise), 0.5 — strict (few typos). For short queries (1–2 words) threshold should be lower.

Meilisearch: Dedicated Fuzzy Search Engine

Meilisearch written in Rust, supports typo tolerance out-of-the-box, simple to configure.

# Docker
docker run -p 7700:7700 getmeili/meilisearch:latest

# Or binary
curl -L https://install.meilisearch.com | sh
./meilisearch --master-key="your-master-key"

Index configuration:

import meilisearch

client = meilisearch.Client('http://localhost:7700', 'your-master-key')
index = client.index('products')

# Search settings
index.update_settings({
    'searchableAttributes': ['title', 'brand', 'description', 'tags'],
    'filterableAttributes': ['category_id', 'status', 'price', 'brand'],
    'sortableAttributes': ['price', 'created_at', 'popularity'],
    'rankingRules': [
        'words',
        'typo',
        'proximity',
        'attribute',
        'sort',
        'exactness',
    ],
    'typoTolerance': {
        'enabled': True,
        'minWordSizeForTypos': {
            'oneTypo': 5,     # words >= 5 chars allow 1 typo
            'twoTypos': 9,    # words >= 9 chars allow 2 typos
        },
        'disableOnWords': ['iPhone', 'iPad', 'MacBook'],  # brands without fuzzy
        'disableOnAttributes': ['sku', 'barcode'],
    },
    'pagination': {
        'maxTotalHits': 10000,
    },
})

Data indexing:

import asyncio
from typing import Any

async def sync_products_to_meilisearch(products: list[dict[str, Any]]) -> None:
    """Batch product sync."""
    documents = [
        {
            'id': p['id'],
            'title': p['title'],
            'brand': p.get('brand', ''),
            'description': p.get('description', ''),
            'category_id': p['category_id'],
            'price': float(p['price']),
            'status': p['status'],
            'tags': [t['name'] for t in p.get('tags', [])],
            'created_at': p['created_at'].timestamp(),
            'popularity': p.get('view_count', 0),
        }
        for p in products
        if p['status'] == 'published'
    ]

    # Meilisearch accepts batches up to 100MB
    batch_size = 1000
    for i in range(0, len(documents), batch_size):
        batch = documents[i:i + batch_size]
        task = index.add_documents(batch)
        index.wait_for_task(task.task_uid)

Search:

from dataclasses import dataclass

@dataclass
class SearchParams:
    query: str
    category_id: int | None = None
    price_min: float | None = None
    price_max: float | None = None
    page: int = 1
    hits_per_page: int = 20
    sort: str = 'relevance'  # relevance | price:asc | price:desc | created_at:desc

def build_filter(params: SearchParams) -> str | None:
    filters = ['status = "published"']

    if params.category_id:
        filters.append(f'category_id = {params.category_id}')
    if params.price_min is not None:
        filters.append(f'price >= {params.price_min}')
    if params.price_max is not None:
        filters.append(f'price <= {params.price_max}')

    return ' AND '.join(filters) if filters else None

def search_products(params: SearchParams) -> dict:
    sort_map = {
        'price:asc':   ['price:asc'],
        'price:desc':  ['price:desc'],
        'created_at:desc': ['created_at:desc'],
        'relevance':   [],  # default ranking rules
    }

    results = index.search(params.query, {
        'filter':       build_filter(params),
        'sort':         sort_map.get(params.sort, []),
        'page':         params.page,
        'hitsPerPage':  params.hits_per_page,
        'attributesToHighlight': ['title', 'description'],
        'highlightPreTag':  '<mark>',
        'highlightPostTag': '</mark>',
        'attributesToCrop': {'description': 200},
        'showMatchesPosition': False,
    })

    return results

Elasticsearch: Fuzzy Query

If Elasticsearch already used for FTS — fuzzy built-in:

{
  "query": {
    "bool": {
      "should": [
        {
          "multi_match": {
            "query": "hedphones",
            "fields": ["title^3", "brand^2", "description"],
            "fuzziness": "AUTO",
            "prefix_length": 2,
            "max_expansions": 50
          }
        },
        {
          "match_phrase": {
            "title": {
              "query": "hedphones",
              "slop": 2
            }
          }
        }
      ]
    }
  }
}

prefix_length: 2 — first 2 characters must match exactly. Reduces false positives and speeds query.

AUTO fuzziness: 0 typos for ≤2 chars, 1 for 3–5 chars, 2 for 6+ chars.

Approach Selection

For small catalog (up to 100k records) with PostgreSQL — pg_trgm sufficient. For large catalog with facets, filters and <10ms requirement — Meilisearch. For analytics platform with aggregations — Elasticsearch.

Timelines

pg_trgm (extension, indexes, queries, threshold tuning): 1 day. Meilisearch (deploy, index config, sync, API): 2–3 days. Fuzzy in existing Elasticsearch cluster: 1 day.