Using Regex for Pattern Matching in PostgreSQL
PostgreSQL Regex: Advanced Pattern Matching
In PostgreSQL, regex (regular expressions) enables powerful pattern matching for string data, useful for filtering, searching, and manipulating text. Regex allows you to identify specific patterns within text fields, making it ideal for data validation, cleaning, and advanced searches within your database. PostgreSQL supports regex-based functions like ~ (match), ~* (case-insensitive match), !~ (does not match), and !~* (case-insensitive does not match), offering flexibility for handling various matching needs.
Syntax:
1. Pattern Matching:
column_name ~ 'regex_pattern'
2. Case-Insensitive Matching:
column_name ~* 'regex_pattern'
3. Negation:
!~ for "does not match" (case-sensitive). !~* for "does not match" (case-insensitive).
Example 1: Basic Regex Pattern Matching
This example searches for rows in the users table where the email column contains email addresses from a specific domain (e.g., @gmail.com).
Code:
-- Find all users with a Gmail email address
SELECT * FROM users
WHERE email ~ '@gmail\.com$';
Explanation:
- ~ '@gmail\.com$': Uses the ~ operator to match emails ending with @gmail.com.
- The \. escapes the dot character, and $ signifies the end of the string.
Example 2: Case-Insensitive Regex Matching
This example retrieves data for users whose first names start with "a" or "A".
Code:
-- Find users whose first name starts with "a" or "A"
SELECT * FROM users
WHERE first_name ~* '^a';
Explanation:
- ~* '^a': The ~* operator performs a case-insensitive match for names starting with "a".
- ^a: The caret ^ denotes the start of the string.
Example 3: Regex to Exclude Patterns
In this case, we select records where the username column does not contain any digits.
Code:
-- Select users with usernames that have no digits
SELECT * FROM users
WHERE username !~ '[0-9]';
Explanation:
- !~ '[0-9]': The !~ operator filters out rows containing any numeric characters in username.
- [0-9]: Matches any digit, and !~ negates the match, selecting usernames without numbers.
Example 4: Using Regex for Validation
Validate phone numbers in the contacts table to ensure they follow a specific format, e.g., (XXX) XXX-XXXX.
Code:
-- Select valid phone numbers in the format (XXX) XXX-XXXX
SELECT * FROM contacts
WHERE phone ~ '^\(\d{3}\) \d{3}-\d{4}$';
Explanation:
- ~ '^\(\d{3}\) \d{3}-\d{4}$': This regex pattern validates phone numbers in the format (123) 456-7890.
- \d{3}: Matches three digits; \( and \) match the parentheses, while - matches the hyphen.
Additional PostgreSQL Regex Functions:
1. regexp_replace(): Replaces occurrences of a regex pattern within a string.
Code:
SELECT regexp_replace('abc123', '[0-9]', '', 'g');
-- Result: 'abc'
2. regexp_matches(): Returns an array of text matching a regex pattern.
Code:
SELECT regexp_matches('hello world', '\w+');
-- Result: {hello, world}
3. regexp_split_to_table(): Splits a string based on a regex pattern into table rows.
Code:
SELECT regexp_split_to_table('apple,banana,cherry', ',');
-- Result: apple, banana, cherry
Summary:
PostgreSQL's regex support allows for sophisticated string pattern matching and manipulation, ideal for data validation and complex text-based queries. By leveraging regex functions, you can efficiently filter, search, and clean data to meet various application requirements.
All PostgreSQL Questions, Answers, and Code Snippets Collection.
It will be nice if you may share this link in any developer community or anywhere else, from where other developers may find this content. Thanks.
https://w3resource.com/PostgreSQL/snippets/postgresql-regex.php
- Weekly Trends and Language Statistics
- Weekly Trends and Language Statistics