PostgreSQL 7.2.1 Documentation
Prev	Chapter 4. Functions and Operators	Next

4.6. Pattern Matching

There are two separate approaches to pattern matching provided by PostgreSQL: the SQL LIKE operator and POSIX-style regular expressions.

Tip: If you have pattern matching needs that go beyond this, or want to make pattern-driven substitutions or translations, consider writing a user-defined function in Perl or Tcl.

4.6.1. Pattern Matching with `LIKE`

string LIKE pattern [ ESCAPE escape-character ]
string NOT LIKE pattern [ ESCAPE escape-character ]

Every pattern defines a set of strings. The LIKE expression returns true if the string is contained in the set of strings represented by pattern. (As expected, the NOT LIKE expression returns false if LIKE returns true, and vice versa. An equivalent expression is NOT (string LIKE pattern).)

If pattern does not contain percent signs or underscore, then the pattern only represents the string itself; in that case LIKE acts like the equals operator. An underscore (_) in pattern stands for (matches) any single character; a percent sign (%) matches any string of zero or more characters.

Some examples:

'abc' LIKE 'abc'    true
'abc' LIKE 'a%'     true
'abc' LIKE '_b_'    true
'abc' LIKE 'c'      false

LIKE pattern matches always cover the entire string. To match a pattern anywhere within a string, the pattern must therefore start and end with a percent sign.

To match a literal underscore or percent sign without matching other characters, the respective character in pattern must be preceded by the escape character. The default escape character is the backslash but a different one may be selected by using the ESCAPE clause. To match the escape character itself, write two escape characters.

Note that the backslash already has a special meaning in string literals, so to write a pattern constant that contains a backslash you must write two backslashes in the query. Thus, writing a pattern that actually matches a literal backslash means writing four backslashes in the query. You can avoid this by selecting a different escape character with ESCAPE; then backslash is not special to LIKE anymore. (But it is still special to the string literal parser, so you still need two of them.)

It's also possible to select no escape character by writing ESCAPE ''. In this case there is no way to turn off the special meaning of underscore and percent signs in the pattern.

The keyword ILIKE can be used instead of LIKE to make the match case insensitive according to the active locale. This is not in the SQL standard but is a PostgreSQL extension.

The operator ~~ is equivalent to LIKE, and ~~* corresponds to ILIKE. There are also !~~ and !~~* operators that represent NOT LIKE and NOT ILIKE. All of these operators are PostgreSQL-specific.

4.6.2. POSIX Regular Expressions

Table 4-10. Regular Expression Match Operators

Operator	Description	Example
`~`	Matches regular expression, case sensitive	`'thomas' ~ '.thomas.'`
`~*`	Matches regular expression, case insensitive	`'thomas' ~* '.Thomas.'`
`!~`	Does not match regular expression, case sensitive	`'thomas' !~ '.Thomas.'`
`!~*`	Does not match regular expression, case insensitive	`'thomas' !~* '.vadim.'`

POSIX regular expressions provide a more powerful means for pattern matching than the LIKE function. Many Unix tools such as egrep, sed, or awk use a pattern matching language that is similar to the one described here.

A regular expression is a character sequence that is an abbreviated definition of a set of strings (a regular set). A string is said to match a regular expression if it is a member of the regular set described by the regular expression. As with LIKE, pattern characters match string characters exactly unless they are special characters in the regular expression language --- but regular expressions use different special characters than LIKE does. Unlike LIKE patterns, a regular expression is allowed to match anywhere within a string, unless the regular expression is explicitly anchored to the beginning or end of the string.

Regular expressions ("RE"s), as defined in POSIX 1003.2, come in two forms: modern REs (roughly those of egrep; 1003.2 calls these "extended" REs) and obsolete REs (roughly those of ed; 1003.2 "basic" REs). PostgreSQL implements the modern form.

A (modern) RE is one or more non-empty branches, separated by |. It matches anything that matches one of the branches.

A branch is one or more pieces, concatenated. It matches a match for the first, followed by a match for the second, etc.

A piece is an atom possibly followed by a single *, +, ?, or bound. An atom followed by * matches a sequence of 0 or more matches of the atom. An atom followed by + matches a sequence of 1 or more matches of the atom. An atom followed by ? matches a sequence of 0 or 1 matches of the atom.

A bound is { followed by an unsigned decimal integer, possibly followed by , possibly followed by another unsigned decimal integer, always followed by }. The integers must lie between 0 and RE_DUP_MAX (255) inclusive, and if there are two of them, the first may not exceed the second. An atom followed by a bound containing one integer i and no comma matches a sequence of exactly i matches of the atom. An atom followed by a bound containing one integer i and a comma matches a sequence of i or more matches of the atom. An atom followed by a bound containing two integers i and j matches a sequence of i through j (inclusive) matches of the atom.

Note: A repetition operator (?, *, +, or bounds) cannot follow another repetition operator. A repetition operator cannot begin an expression or subexpression or follow ^ or |.

An atom is a regular expression enclosed in () (matching a match for the regular expression), an empty set of () (matching the null string), a bracket expression (see below), . (matching any single character), ^ (matching the null string at the beginning of the input string), $ (matching the null string at the end of the input string), a \ followed by one of the characters ^.[$()|*+?{\ (matching that character taken as an ordinary character), a \ followed by any other character (matching that character taken as an ordinary character, as if the \ had not been present), or a single character with no other significance (matching that character). A { followed by a character other than a digit is an ordinary character, not the beginning of a bound. It is illegal to end an RE with \.