Back to blog
Mar 13, 2024
19 min read

How I Built a Real-World Search System (That Doesn't Suck)

Lessons learned building a scalable search stack—starting with naive SQL, evolving through PostgreSQL full-text and trigram matching, and landing on a hybrid approach with Elasticsearch. What worked, what didn't, and why.

Prologue: The Day Everything Broke

It was 3 AM when the first email hit my inbox.

“Search is broken. Takes forever to load.”

I rolled my eyes and went back to sleep. Probably just a slow internet connection.

By 9 AM, my inbox was a war zone. Twenty-seven angry emails, three Slack threads, and one particularly colorful message that would make a sailor blush. Users were rioting. My beautiful, simple search system—the one I’d been so proud of—was failing spectacularly.

I opened my terminal with trembling fingers and ran the query that would change everything:

EXPLAIN ANALYZE
SELECT COUNT(*) FROM articles
WHERE title ILIKE '%solutio%' OR content ILIKE '%solutio%';

The result made my blood run cold.

Execution Time: 3902.948 ms

Four seconds. Four. Freaking. Seconds. For a search that should be instant.

That’s when I realized I had a problem. A big one.


Chapter 1: The Naive Beginning

The Siren Song of Simple SQL

Every developer has that moment—the moment they think they’ve solved a complex problem with a simple solution. I had mine in the spring of 2022.

I was working as a freelance developer, building a content platform for a client, and I needed search. Simple, right? I had articles in a database, users needed to find them. So I wrote the most obvious query in the world:

SELECT * FROM articles 
WHERE title LIKE '%search_term%' OR content LIKE '%search_term%';

It was beautiful in its simplicity. It worked on my test data. I shipped it to production and patted myself on the back. “Look at me,” I thought, “solving complex problems with elegant simplicity.”

I was an idiot.

The Performance Nightmare

Fast forward three months. The platform had grown from a few hundred articles to over a million. Users were happy, traffic was up, and I was feeling like a rock star developer.

Then the complaints started trickling in.

“Search is slow.” “Takes forever to find anything.” “Is the site broken?”

I brushed them off. “Must be their internet,” I told myself. “Or maybe they’re searching for something that doesn’t exist.”

But the complaints kept coming. And coming. And coming.

Finally, one user sent me a screenshot. They’d typed “solution” into the search box and waited. And waited. And waited. After 15 seconds, they gave up.

That’s when I decided to investigate.

The Horror Show

I ran the same search that user had tried:

EXPLAIN ANALYZE
SELECT COUNT(*) FROM articles
WHERE title ILIKE '%solutio%' OR content ILIKE '%solutio%';

The execution plan was a horror story:

Aggregate (cost=62943.74..62943.75 rows=1 width=8)
          (actual time=3902.913..3902.920 rows=1 loops=1)
  ->  Seq Scan on articles (cost=0.00..62892.99 rows=20300 width=0)
                           (actual time=0.041..3844.906 rows=44456 loops=1)
        Filter: ((title ~~* '%solutio%'::text) OR (content ~~* '%solutio%'::text))
        Rows Removed by Filter: 955544
Planning Time: 0.764 ms
Execution Time: 3902.948 ms

Let me break down what this nightmare meant:

  • Seq Scan: Postgres had to read every single row in the table. All 1 million of them.
  • Rows Removed by Filter: It looked at 955,544 rows and said “nope, not this one” for each.
  • Execution Time: Almost 4 seconds. For a simple search.

I was basically asking Postgres to read through an entire library to find a book instead of using the card catalog. It was like asking someone to search through every house in a city to find a specific person, instead of using a phone book.

The Wake-Up Call

When I saw those numbers, I knew I had to change something. Four seconds for a search query is unacceptable. Users expect search to be instant—like Google instant.

That’s when I discovered PostgreSQL’s full-text search capabilities. And let me tell you, it was like discovering fire.


Chapter 2: The Redemption

The Magic of tsvector and tsquery

PostgreSQL’s full-text search is built around two concepts that sound complicated but are actually pretty clever:

  • tsvector: Think of it as a compressed, searchable version of your text
  • tsquery: The user’s search terms, parsed and ready to match

Here’s how it works. Take a simple sentence:

SELECT to_tsvector('english', 'The quick brown fox');
'brown':3 'fox':4 'quick':2

Notice what happened:

  • “The” disappeared (it’s a stopword—too common to be useful)
  • Words got stemmed (“quick” stays “quick” but “running” would become “run”)
  • Positions are preserved (useful for ranking later)

Now for the user’s search:

SELECT plainto_tsquery('english', 'quick fox');
'quick' & 'fox'

Postgres converts this into a query that looks for both words. The & means “AND”—both words must be present.

I had an articles table with title and content fields. To make them searchable, I added a tsvector column:

ALTER TABLE articles ADD COLUMN search_vector tsvector;

Then I populated it, giving titles more weight than content (because titles are usually more important):

UPDATE articles
SET search_vector =
  setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
  setweight(to_tsvector('english', coalesce(content, '')), 'B');

The setweight function lets you assign importance levels:

  • A: Highest weight (titles)
  • B: Medium weight (content)
  • C: Lower weight
  • D: Lowest weight

This took a while to run on a million rows, but it was a one-time cost.

The GIN Index That Changed Everything

The real magic happened when I added a GIN index:

CREATE INDEX idx_search_vector ON articles USING GIN(search_vector);

GIN stands for “Generalized Inverted Index,” and it’s the secret sauce. Instead of storing rows and pointing to words, it stores words and points to rows. It’s like having a reverse index—you look up a word and immediately get all the rows that contain it.

The Performance Transformation

Here’s what happened when I ran the same search with full-text search:

EXPLAIN ANALYZE
SELECT *
FROM articles
WHERE search_vector @@ plainto_tsquery('english', 'Solutio arbitro est universe')
ORDER BY ts_rank(search_vector, plainto_tsquery('english', 'Solutio arbitro est universe')) DESC;
Sort (cost=236.01..236.02 rows=3 width=1014) (actual time=8.285..8.300 rows=5 loops=1)
  Sort Key: (ts_rank(search_vector, '''solutio'' & ''arbitro'' & ''est'' & ''univers'''::tsquery)) DESC
  Sort Method: quicksort  Memory: 30kB
  ->  Bitmap Heap Scan on articles  (cost=223.98..235.99 rows=3 width=1014) (actual time=8.246..8.270 rows=5 loops=1)
        Recheck Cond: (search_vector @@ '''solutio'' & ''arbitro'' & ''est'' & ''univers'''::tsquery)
        Heap Blocks: exact=5
        ->  Bitmap Index Scan on idx_search_vector  (cost=0.00..223.98 rows=3 width=0) (actual time=8.230..8.232 rows=5 loops=1)
              Index Cond: (search_vector @@ '''solutio'' & ''arbitro'' & ''est'' & ''univers'''::tsquery)
Planning Time: 0.243 ms
Execution Time: 8.341 ms

Look at that execution time: 8.341 ms. Down from 4 seconds to 8 milliseconds. That’s a 500x improvement.

What’s happening here:

  • Bitmap Index Scan: Postgres uses the GIN index to quickly find matching rows
  • Heap Blocks: exact=5: Only 5 data blocks were touched (instead of scanning the entire table)
  • Sort Method: quicksort Memory: 30kB: Results are sorted in memory, super fast

The Ranking Magic

The ts_rank function scores how well each result matches the query. It considers:

  • Word frequency (rare words get higher scores)
  • Word proximity (words close together score better)
  • Field weights (titles matter more than content)

This means relevant results bubble to the top automatically.

Keeping It Fresh

Manually updating the search vector was a pain, so I added a trigger:

CREATE FUNCTION articles_fts_trigger() RETURNS trigger AS $$
BEGIN
  NEW.search_vector :=
    setweight(to_tsvector('english', coalesce(NEW.title, '')), 'A') ||
    setweight(to_tsvector('english', coalesce(NEW.content, '')), 'B');
  RETURN NEW;
END
$$ LANGUAGE plpgsql;

CREATE TRIGGER articles_fts_update
BEFORE INSERT OR UPDATE ON articles
FOR EACH ROW EXECUTE FUNCTION articles_fts_trigger();

Now every time I insert or update an article, the search vector updates automatically. No more manual maintenance.

The Language Problem

PostgreSQL’s default English configuration is smart—it stems words and drops common ones. But sometimes it’s too smart.

I ran into issues with:

  • Brand names getting mangled (“Go” language vs. the word “go”)
  • Technical terms getting stemmed (“GraphQL” becoming “graphql”)
  • Exact matches being impossible

The solution? Use the ‘simple’ configuration when you need exact matching:

SELECT to_tsvector('simple', 'Elasticsearch is fast');
'elasticsearch':1 'fast':3 'is':2

Compare with ‘english’:

SELECT to_tsvector('english', 'Elasticsearch is fast');
'elasticsearch':1 'fast':3

Notice how “is” disappears in English mode but stays in simple mode.

The Bottom Line on FTS

PostgreSQL full-text search is powerful. It’s fast, it’s flexible, and it’s built into your database. For most use cases, it’s all you need.

But I had a problem: users make typos. A lot of typos.


Chapter 3: The Typo Wars

The Typo Problem

Users don’t type perfectly. They search for “seach” instead of “search,” “postgres” instead of “postgresql,” “react” instead of “reactjs.”

My beautiful full-text search was useless against typos. If someone searched for “seach,” they got zero results, even though “search” was everywhere.

That’s when I discovered the pg_trgm extension.

What Are Trigrams?

Trigrams break strings into overlapping three-character chunks. It sounds weird, but it’s surprisingly effective.

Take the word “search”:

  • Padded: ” search ” (with spaces)
  • Trigrams: [’ s’, ’ se’, ‘sea’, ‘ear’, ‘arc’, ‘rch’, ‘ch ’]

Now take the typo “seach”:

  • Padded: ” seach ”
  • Trigrams: [’ s’, ’ se’, ‘sea’, ‘eac’, ‘ach’, ‘ch ’]

The similarity is calculated by how many trigrams overlap:

  • Shared: ’ s’, ’ se’, ‘sea’, ‘ch ’ (4 trigrams)
  • Total unique: 7 trigrams
  • Similarity: 4/7 = 0.571

Since 0.571 > 0.3 (default threshold), “seach” matches “search”.

First, enable the extension:

CREATE EXTENSION IF NOT EXISTS pg_trgm;

Then add a trigram index on the title column:

CREATE INDEX idx_articles_title_trgm ON articles USING gin (title gin_trgm_ops);

Now I could do fuzzy searches:

SELECT * FROM articles
WHERE title % 'sreach'
ORDER BY similarity(title, 'sreach') DESC
LIMIT 10;

The % operator means “similar to,” and it’s magic for catching typos.

The Performance Reality Check

Fuzzy search isn’t free. Here’s what happened when I searched for “spiritus”:

EXPLAIN ANALYZE
SELECT * FROM articles
WHERE title % 'spiritus'
ORDER BY similarity(title, 'spiritus') DESC
LIMIT 10;
Limit  (cost=742.30..742.32 rows=10 width=1014) (actual time=2887.228..2887.275 rows=10 loops=1)
  ->  Sort  (cost=742.30..742.55 rows=100 width=1014) (actual time=2887.225..2887.245 rows=10 loops=1)
        Sort Key: (similarity(title, 'spiritus'::text)) DESC
        Sort Method: quicksort  Memory: 47kB
        ->  Bitmap Heap Scan on articles  (cost=345.67..740.14 rows=100 width=1014) (actual time=160.919..2887.139 rows=15 loops=1)
              Recheck Cond: (title % 'spiritus'::text)
              Rows Removed by Index Recheck: 559525
              Heap Blocks: exact=38080 lossy=66033
              ->  Bitmap Index Scan on idx_articles_title_trgm  (cost=0.00..345.64 rows=100 width=0) (actual time=72.039..72.040 rows=181721 loops=1)
                    Index Cond: (title % 'spiritus'::text)
Planning Time: 0.799 ms
Execution Time: 2887.332 ms

Almost 3 seconds. Why so slow?

  • 181,721 index hits: The trigram index found 181k potential matches
  • 559,525 rows removed: Most of those matches were false positives
  • Heap blocks: Had to check over 100k data blocks

Trigrams are great for catching typos, but they’re not very selective. They find a lot of “close enough” matches that aren’t actually relevant.

The Hybrid Approach

I realized I didn’t have to choose between relevance (full-text search) and typo tolerance (trigrams). I could have both.

Here’s my hybrid search strategy:

SELECT *
FROM articles
WHERE
  search_vector @@ plainto_tsquery('english', 'id spirirus antiquus')
  OR title % 'id spirirus antiquus'
ORDER BY
  GREATEST(
    ts_rank(search_vector, plainto_tsquery('english', 'id spirirus antiquus')),
    similarity(title, 'id spirirus antiquus')
  ) DESC
LIMIT 10;

This query:

  1. Finds relevant matches using full-text search
  2. Finds typo-tolerant matches using trigrams
  3. Ranks by whichever method scored higher

The GREATEST() function picks the better score from each method.

Performance of the Hybrid

Limit  (cost=1372.53..1372.55 rows=10 width=1014) (actual time=284.411..284.463 rows=10 loops=1)
  -> Sort  (cost=1372.53..1372.80 rows=110 width=1014) (actual time=284.406..284.431 rows=10 loops=1)
       Sort Key: (GREATEST(ts_rank(search_vector, '''id'' & ''spirirus'' & ''antiquus'''::tsquery), similarity(title, 'id spirirus antiquus'::text))) DESC
       Sort Method: quicksort  Memory: 42kB
       -> Bitmap Heap Scan on articles  (cost=935.79..1370.15 rows=110 width=1014) (actual time=80.736..284.358 rows=10 loops=1)
            Recheck Cond: ((search_vector @@ '''id'' & ''spirirus'' & ''antiquus'''::tsquery) OR (title % 'id spirirus antiquus'::text))
            Rows Removed by Index Recheck: 22435
            Heap Blocks: exact=20855
            -> BitmapOr  (cost=935.79..935.79 rows=110 width=0) (actual time=77.735..77.742 rows=0 loops=1)
                 -> Bitmap Index Scan on idx_search_vector  (cost=0.00..170.08 rows=10 width=0) (actual time=0.113..0.114 rows=0 loops=1)
                      Index Cond: (search_vector @@ '''id'' & ''spirirus'' & ''antiquus'''::tsquery)
                 -> Bitmap Index Scan on idx_articles_title_trgm  (cost=0.00..765.65 rows=100 width=0) (actual time=77.617..77.618 rows=22445 loops=1)
                      Index Cond: (title % 'id spirirus antiquus'::text)
Planning Time: 1.579 ms
Execution Time: 284.543 ms

284ms—much better than 3 seconds, but still not great. The trigram part is still expensive.

When to Use Trigrams

Trigrams are great for:

  • Short text fields (titles, usernames, tags)
  • High typo rates
  • Low write volume (indexes are expensive to maintain)

They’re not great for:

  • Long text content
  • High write volume
  • When you need exact precision

I use them sparingly—only on fields where typos are common and the performance hit is acceptable.


Chapter 4: The Breaking Point

The Product Team Strikes Back

I was feeling pretty good about my search system. It was fast, it handled typos, and users were happy. Then the product team came knocking.

“Can we add filters to the search?” they asked.

Sure, I thought. Just add some WHERE clauses, right?

SELECT * FROM articles 
WHERE search_vector @@ plainto_tsquery('english', 'search term')
  AND category = 'technology'
  AND published_date > '2022-01-01'
  AND author_id IN (1, 2, 3)
ORDER BY ts_rank(search_vector, plainto_tsquery('english', 'search term')) DESC;

But then they wanted more: “What about faceted search? And can we boost certain articles? Oh, and we need to search across multiple tables…”

Suddenly my beautiful, fast Postgres queries were turning into monsters. The execution plans looked like they were designed by a sadist. And the performance? Let’s just say users started asking if the site was broken again.

Why Postgres Started Struggling

Here’s what happened when I tried to scale:

Complex Filtering: Every time I added a filter, Postgres had to decide whether to use the search index or the filter indexes. Sometimes it made good choices, sometimes it didn’t. And when it didn’t? Full table scans. Ouch.

Faceted Search: Want to show “10 results in Technology, 5 in Science”? That meant running multiple queries or complex aggregations. Either way, performance tanked.

Cross-Table Search: Articles, comments, user profiles—all searchable. Postgres can do this, but the queries get ugly fast. And forget about relevance scoring across different content types.

Scoring Complexity: I wanted to boost articles by popularity, recency, author reputation. Postgres can do this, but it’s not pretty:

SELECT *,
  ts_rank(search_vector, query) * 
  (1 + (views / 1000.0)) * 
  (1 + (EXTRACT(EPOCH FROM (NOW() - published_date)) / 86400.0)) as final_score
FROM articles, plainto_tsquery('english', 'search term') query
WHERE search_vector @@ query
ORDER BY final_score DESC;

This works, but it’s fragile. Add one more factor and it becomes unreadable.

The Elasticsearch Epiphany

I was drowning in complex queries when a colleague mentioned Elasticsearch. “It’s built for this stuff,” they said.

I was skeptical. Another service to manage? More infrastructure? But I was desperate enough to try.

The first time I saw Elasticsearch in action, it was like watching someone solve a Rubik’s cube in 10 seconds. Everything I’d been struggling with in Postgres was just… easy.

Setting Up Elasticsearch (The Painful Part)

Let me be honest—getting Elasticsearch running wasn’t fun. But once it was up, the magic started happening.

First, I defined my mapping (think of it as a schema, but smarter):

{
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "analyzer": "english",
        "boost": 3.0
      },
      "content": {
        "type": "text", 
        "analyzer": "english"
      },
      "category": {
        "type": "keyword"
      },
      "author": {
        "type": "keyword"
      },
      "published_date": {
        "type": "date"
      },
      "views": {
        "type": "integer"
      },
      "tags": {
        "type": "keyword"
      }
    }
  }
}

Notice the boost: 3.0 on title? That’s Elasticsearch saying “titles are 3x more important than content.” Simple, clean, powerful.

The Search Query That Changed Everything

Here’s what my complex Postgres query became in Elasticsearch:

{
  "query": {
    "bool": {
      "must": [
        {
          "multi_match": {
            "query": "search term",
            "fields": ["title^3", "content"],
            "fuzziness": "AUTO"
          }
        }
      ],
      "filter": [
        {"term": {"category": "technology"}},
        {"range": {"published_date": {"gte": "2022-01-01"}}},
        {"terms": {"author": ["author1", "author2", "author3"]}}
      ]
    }
  },
  "aggs": {
    "categories": {
      "terms": {"field": "category"}
    },
    "authors": {
      "terms": {"field": "author"}
    }
  },
  "highlight": {
    "fields": {
      "title": {},
      "content": {"fragment_size": 150}
    }
  }
}

This single query:

  • Searches across multiple fields with different weights
  • Applies filters (fast, cached)
  • Returns faceted aggregations
  • Highlights matching terms
  • Handles typos automatically

And it’s fast. Like, really fast.

The Performance Difference

I ran the same search on both systems. Here’s what happened:

Postgres (with all my optimizations):

  • Query time: ~800ms
  • Memory usage: High (complex joins and sorts)
  • CPU: Spiked during execution
  • Scalability: Started struggling at 2M+ documents

Elasticsearch:

  • Query time: ~50ms
  • Memory usage: Consistent (dedicated heap)
  • CPU: Stable
  • Scalability: Handled 10M+ documents without breaking a sweat

The difference was night and day. Users stopped complaining about slow search. The product team could add new features without me having a panic attack.

But Wait—There’s a Catch

Elasticsearch isn’t magic. It comes with its own set of challenges:

Data Synchronization: How do you keep Postgres and Elasticsearch in sync? I tried triggers, but they’re fragile. I tried message queues, but that’s another service to manage. Eventually, I settled on a hybrid approach—Postgres for writes, Elasticsearch for reads.

Complexity: Elasticsearch has its own query language, its own concepts, its own gotchas. The learning curve is steep.

Resource Usage: Elasticsearch is hungry. It wants RAM, it wants CPU, it wants disk I/O. My cloud bill went up.

Operational Overhead: Another service to monitor, backup, and maintain. Another potential point of failure.

The Hybrid Architecture That Actually Works

After wrestling with both systems, I landed on a hybrid approach that plays to each system’s strengths:

Postgres: Primary data store, transactions, complex relationships Elasticsearch: Search, analytics, faceted navigation

Here’s how it works:

  1. Write Path: Data goes to Postgres first (for ACID compliance), then gets indexed in Elasticsearch asynchronously
  2. Read Path: Search queries hit Elasticsearch, but I can still use Postgres for complex business logic
  3. Sync Strategy: I use a simple job that runs every few minutes to sync changes
# Simplified version of my sync job
def sync_to_elasticsearch():
    # Get recent changes from Postgres
    recent_articles = get_recent_articles_from_postgres()
    
    for article in recent_articles:
        # Transform to Elasticsearch format
        es_doc = transform_article_for_es(article)
        
        # Index in Elasticsearch
        es.index(index='articles', id=article.id, body=es_doc)

This gives me the best of both worlds: Postgres’s reliability and Elasticsearch’s search power.

What I’d Do Differently Next Time

If I had to build this again, here’s what I’d change:

Start with Elasticsearch earlier: I spent too much time optimizing Postgres when I should have just bitten the bullet and added Elasticsearch from the start.

Use a proper sync strategy: Instead of ad-hoc jobs, I’d use something like Debezium or a proper CDC (Change Data Capture) solution.

Plan for scale from day one: I’d design the data model with search in mind, not retrofit it later.

Invest in monitoring: Elasticsearch needs monitoring. I’d set up proper alerting for cluster health, query performance, and sync lag.

The Real Cost (Time and Money)

Let’s talk numbers, because that’s what matters in the real world:

Development Time:

  • Postgres-only approach: 2 weeks to get basic search working
  • Adding trigrams: 1 more week
  • Complex filtering: 2 more weeks
  • Elasticsearch migration: 3 weeks
  • Total: 8 weeks of development time

Infrastructure Costs (monthly):

  • Postgres-only: $200/month (managed database)
  • With Elasticsearch: $500/month (managed Elasticsearch + increased Postgres)
  • Additional cost: $300/month

Maintenance Overhead:

  • Postgres-only: 2-3 hours/month
  • With Elasticsearch: 5-6 hours/month
  • Additional time: 3 hours/month

Was it worth it? Absolutely. The improved user experience, faster development of new features, and reduced support tickets more than paid for the additional cost.


Epilogue: The Lessons Learned

The Bottom Line

Here’s what I learned: search is one of those features that seems simple until it isn’t. You can start with LIKE queries, but you’ll outgrow them fast. Postgres full-text search is powerful, but it has limits. Elasticsearch is overkill for simple cases, but when you need it, you really need it.

The key is knowing when to make the jump. For me, it was when I needed:

  • Complex filtering and faceting
  • Cross-table search
  • Sophisticated relevance scoring
  • Scale beyond a few million documents

If you’re building search and you’re not sure which approach to take, start simple. Use Postgres full-text search with trigrams. It’ll get you far. But keep Elasticsearch in your back pocket—you’ll probably need it sooner than you think.

And remember: the best search system is the one your users don’t notice. When search just works, that’s when you know you’ve got it right.

The Journey’s End

Building a real-world search system taught me that the devil is in the details. What starts as a simple LIKE query can quickly become a complex distributed system. But with the right tools and the right approach, you can build something that scales.

The journey from naive SQL to a hybrid Postgres/Elasticsearch system was frustrating, enlightening, and ultimately rewarding. I hope this post helps you avoid some of the mistakes I made and gives you a roadmap for building search that doesn’t suck.

Because let’s face it—life’s too short for slow search.