Understanding Python Regular Expressions

Understand re module

The regular expressions library re is a built-in library in Python.

Here first we should understand the pattern. For a 10-digit phone number, you can pattern as re"\d\d\d\d\d\d\d\d\d\d". Here d stands for digit and \ (backslash) corresponds to an individual (single) character If you see it as repeating digits, you can simplify this as re"\d{10}". This is called quantifyin.

Check the below code with search

import re
string = 'Customer phone numer is 8888777666'
pattern = "\d{10}"
match = re.search(pattern, string)
print(match)
match2 = re.search("pattern", string)
print(match2)

For success cases, it prints matched words in the string and location (span) in the string. for not matching case match is None

Output for the above code

<re.Match object; span=(24, 34), match='8888777666'>
None

search gives only the first successful match. To find all matches use findall

import re
string = 'Customer phone numer is 8888777666 and alternate is 2233445566'
pattern = "\d{10}"
match = re.findall(pattern, string)
print(match)
match2 = re.findall("pattern", string)
print(match2)

findall returns a list of matched words, empty lits for no match case. Output for the above code is

['8888777666', '2233445566']

Similarly, findter return a list of match-type objects.

import re
string = 'Customer phone numer is 8888777666 and alternate is 2233445566'
pattern = "\d{10}"
for match in re.finditer(pattern, string):
    print(match)
    print(match.start())

Output is

<re.Match object; span=(24, 34), match='8888777666'>
24
<re.Match object; span=(52, 62), match='2233445566'>
52

Now we understood basic regular expressions methods search, findall and finditer to search patterns in a string. Now we go through the complex patterns.

Pattrns

Here is a complete list of patterns

Pattern	Description	Example Pattern Code	Example Match
\d	A digit	file_\d\d	file_66
\D	A non-digit	file_\D	file_x
\w	A alphanumeric character	\w+	Hello123
\W	A non-alphanumeric character	\W+	22#$+3
\s	A whitespace character	\s+	Hello World
\S	A non-whitespace character	\W+	HelloWorld
.	Any character except newline	py..n	python, py123n
^	Start of a string	^Hello	Hello, Hello World
$	End of a string	World$	Hello World
[abc]	Any one of a, b, or c	[aeiou]	e, o
[0-9]	Any digit from 0 to 9	[0-9]+	123, 456
[^0-9]	Any character except digits	[^0-9]+	abcXYZ
(abc)	A group (captures)	(\d{2})	12 (captured)
a*	Zero or more 'a's	a*	'', 'a', 'aa'
a+	One or more 'a's	a+	'a', 'aa'
a?	Zero or one 'a'	a?	'', 'a'
a{3}	Exactly 3 'a's	a{3}	'aaa'
a{3,5}	Between 3 and 5 'a's	a{3,5}	'aaa', 'aaaaa'
a{3,}	3 or more 'a's	a{3,}	'aaa', 'aaaaaa'

You can group search patterns in parentheses () like (\d{3}) using compile method

import re
string = 'Customer phone numer is 8888-777-666 and alternate number is 2233445566'
pattern = re.compile(r'(\d{4})-(\d{3})-(\d{3})')
match = re.search(pattern, string)
print(match)
print(match.group(1))
print(match.group(2))
print(match.group(3))

Output is

<re.Match object; span=(24, 36), match='8888-777-666'>
8888
777
666

More Regular Expressions

or'ing using |

Using '|' you can do logic or operation. like searching for John or George with John|George

import re
print(re.search('John|George', 'John and George came yesterday'))
print(re.search('John|George', 'John alone came yesterday'))
print(re.search('John|George', 'George alone came yesterday'))
print(re.search('John|George', 'None came yesterday'))

Output is

<re.Match object; span=(0, 4), match='John'>
<re.Match object; span=(0, 4), match='John'>
<re.Match object; span=(0, 6), match='George'>
None

Wildcard (., *, + )

import re
string = 'Customer phone nubmer is 8888-777-666 and alternate number is 2233445566'
# Without wildcard
print(re.findall('er', string))
# with wildcard \w+
print(re.findall('\w+er', string))

See the difference in output with and without a wildcard

['er', 'er', 'er', 'er']
['Customer', 'nubmer', 'alter', 'number']

See some complex patterns

import re
string = 'Customer phone nubmer is 8888-777-666 and alternate number is 2233445566'
# find non-digit characters
print(re.findall('[^\d]', string))
# In the above example, it check for each charater. hence you see big list of characters. 
# You convert them to words using + wildcard
print(re.findall('[^\d]+', string))

Output is

['C', 'u', 's', 't', 'o', 'm', 'e', 'r', ' ', 'p', 'h', 'o', 'n', 'e', ' ', 'n', 'u', 'b', 'm', 'e', 'r', ' ', 'i', 's', ' ', '-', '-', ' ', 'a', 'n', 'd', ' ', 'a', 'l', 't', 'e', 'r', 'n', 'a', 't', 'e', ' ', 'n', 'u', 'm', 'b', 'e', 'r', ' ', 'i', 's', ' ']
['Customer phone nubmer is ', '-', '-', ' and alternate number is ']

another example to exclude function. in the above example use a pattern [^!.?]

import re
print(re.findall('[^!.?,]+','Hi! How are you? I am doing good.'))

Understanding Python Regular Expressions

Table of contents

Understand re module

Pattrns

More Regular Expressions

or'ing using |

Wildcard (., *, + )

Let's keep in touch!

This article is also part of

Learning Python: A Comprehensive Beginner Level Tutorial Series 🚀

Recently published

Kubernetes ConfigMaps and Secrets

Kubernetes Volumes

Step-by-Step Guide to Creating a TypeScript Express Server Application

Kubernetes Service