Click Once Measure Twice
Feb 13, 2012

I have a fairly simple table with 8+ million rows. I want to do a postfix wildcard searches against this table of the type

SELECT a, b, c FROM t WHERE Lower( a ) LIKE 'fooba%'

Postgres allows one to build and use an index like below for carrying out searches such as above quickly and efficiently.

CREATE INDEX idx_a ON t (Lower( a ) varchar_pattern_ops);

I get the search term in $q as below

$q = param 'q';

# Postfix wildcard on the search term
$q .= '%';

Following conventional wisdom about good practices, I prepared the statement with a bind value to be supplied during execution

## query 1
my $sql = qq{
    SELECT a, b, c
    FROM t 
    WHERE Lower( a ) LIKE '$q'
my $sth1 = $dbh->prepare($sql);

The above query takes around 2 to 4 seconds depending on the search term. Quite by accident, I had another version of the script where I had the search term inserted inline in the statement to be prepared like so

## query 2
my $sth2 = $dbh->prepare(qq{
    SELECT a, b, c
    FROM t  
    WHERE Lower( a ) LIKE ?

This second version is an order of magnitude faster, taking 200 to 400 ms. Hmmmm… After much querying on Stackoverflow, Perlmonks, and on the DBI list, this is what I discovered: the Postgres query planner sees the first query and already knows that is has a string with a postfix wildcard, so it figures it can use the index. In the case of the second query, however, the Postgres query planner has no idea what term might be supplied, so it decides to do a sequential scan, which is rather expensive on a table with 8+ million rows.

Identical queries from my point of view, but seen very differently by Postgres. Now, it is possible to turn on a switch in DBI and prevent server prepared queries. But, moral of the story -- click once, but do measure twice.

⬅︎ Why File When You Can Full-Text Search
➡︎ Dictionary in a Webpage