Understanding Python Regular Expressions

Learn how to use regular expressions in Python effectively. Our comprehensive guide covers Python regex syntax, examples, and best practices. Everything you need to know about regex in Python
E
Edtoks6:02 min read

Understand re module

The regular expressions library re is a built-in library in Python.

Here first we should understand the pattern. For a 10-digit phone number, you can pattern as re"\d\d\d\d\d\d\d\d\d\d". Here d stands for digit and \ (backslash) corresponds to an individual (single) character If you see it as repeating digits, you can simplify this as re"\d{10}". This is called quantifyin.

Check the below code with search

import re
string = 'Customer phone numer is 8888777666'
pattern = "\d{10}"
match = re.search(pattern, string)
print(match)
match2 = re.search("pattern", string)
print(match2)
 

For success cases, it prints matched words in the string and location (span) in the string. for not matching case match is None

Output for the above code 

<re.Match object; span=(24, 34), match='8888777666'>
None

search gives only the first successful match. To find all matches use findall

import re
string = 'Customer phone numer is 8888777666 and alternate is 2233445566'
pattern = "\d{10}"
match = re.findall(pattern, string)
print(match)
match2 = re.findall("pattern", string)
print(match2)

 

findall returns a list of matched words, empty lits for no match case. Output for the above code is

['8888777666', '2233445566']

 

Similarly, findter return a list of match-type objects.

import re
string = 'Customer phone numer is 8888777666 and alternate is 2233445566'
pattern = "\d{10}"
for match in re.finditer(pattern, string):
    print(match)
    print(match.start())

Output is

<re.Match object; span=(24, 34), match='8888777666'>
24
<re.Match object; span=(52, 62), match='2233445566'>
52

Now we understood basic regular expressions methods searchfindall and finditer to search patterns in a string. Now we go through the complex patterns.

Pattrns

Here is a complete list of patterns

Pattern Description Example Pattern Code Example Match
\d A digit file_\d\d file_66
\D A non-digit file_\D file_x
\w A alphanumeric character \w+ Hello123
\W A non-alphanumeric character \W+ 22#$+3
\s A whitespace character \s+ Hello World
\S A non-whitespace character \W+ HelloWorld
. Any character except newline py..n python, py123n
^ Start of a string ^Hello Hello, Hello World
$ End of a string World$ Hello World
[abc] Any one of a, b, or c [aeiou] e, o
[0-9] Any digit from 0 to 9 [0-9]+ 123, 456
[^0-9] Any character except digits [^0-9]+ abcXYZ
(abc) A group (captures) (\d{2}) 12 (captured)
a* Zero or more 'a's a* '', 'a', 'aa'
a+ One or more 'a's a+ 'a', 'aa'
a? Zero or one 'a' a? '', 'a'
a{3} Exactly 3 'a's a{3} 'aaa'
a{3,5} Between 3 and 5 'a's a{3,5} 'aaa', 'aaaaa'
a{3,} 3 or more 'a's a{3,} 'aaa', 'aaaaaa'

 

You can group search patterns in parentheses () like (\d{3}) using compile method

import re
string = 'Customer phone numer is 8888-777-666 and alternate number is 2233445566'
pattern = re.compile(r'(\d{4})-(\d{3})-(\d{3})')
match = re.search(pattern, string)
print(match)
print(match.group(1))
print(match.group(2))
print(match.group(3))

Output is

<re.Match object; span=(24, 36), match='8888-777-666'>
8888
777
666

More Regular Expressions

or'ing using |

Using '|' you can do logic or operation. like searching for John or George with John|George

import re
print(re.search('John|George', 'John and George came yesterday'))
print(re.search('John|George', 'John alone came yesterday'))
print(re.search('John|George', 'George alone came yesterday'))
print(re.search('John|George', 'None came yesterday'))

Output is

<re.Match object; span=(0, 4), match='John'>
<re.Match object; span=(0, 4), match='John'>
<re.Match object; span=(0, 6), match='George'>
None

Wildcard (., *, + )

import re
string = 'Customer phone nubmer is 8888-777-666 and alternate number is 2233445566'
# Without wildcard
print(re.findall('er', string))
# with wildcard \w+
print(re.findall('\w+er', string))

See the difference in output with and without a wildcard

['er', 'er', 'er', 'er']
['Customer', 'nubmer', 'alter', 'number']

See some complex patterns

import re
string = 'Customer phone nubmer is 8888-777-666 and alternate number is 2233445566'
# find non-digit characters
print(re.findall('[^\d]', string))
# In the above example, it check for each charater. hence you see big list of characters. 
# You convert them to words using + wildcard
print(re.findall('[^\d]+', string))

Output is

['C', 'u', 's', 't', 'o', 'm', 'e', 'r', ' ', 'p', 'h', 'o', 'n', 'e', ' ', 'n', 'u', 'b', 'm', 'e', 'r', ' ', 'i', 's', ' ', '-', '-', ' ', 'a', 'n', 'd', ' ', 'a', 'l', 't', 'e', 'r', 'n', 'a', 't', 'e', ' ', 'n', 'u', 'm', 'b', 'e', 'r', ' ', 'i', 's', ' ']
['Customer phone nubmer is ', '-', '-', ' and alternate number is ']
 

another example to exclude function. in the above example use a pattern [^!.?]

import re
print(re.findall('[^!.?,]+','Hi! How are you? I am doing good.'))

 

Let's keep in touch!

Subscribe to keep up with latest updates. We promise not to spam you.