Beginner Level
Intermediate Level
Advanced Level
Introduction
Regex sets and ranges are some of the most important concepts in Regular expressions that help you to search for patterns in text data. Through this tutorial, you will learn how to use regex sets and ranges in Python to search for patterns with specific characters, digits, and symbols in text data. You'll also learn how these regex sets and ranges help you to write more efficient and compact code to manipulate data strings. So, let's dive in and learn the power of regex sets and ranges in Python!
Table of Contents :
- Sets in Regex
- Ranges in Regex
- Excluding Sets & Ranges
Sets in Regex :
- Sets are used in Python regular expressions for matching any character from a given set of characters.
- Sets are created using square brackets [].
- For example,
[abc]
will match any one of the characters 'a', 'b', or 'c'. - Sets can also include ranges of characters, such as
[a-z]
which will match any lowercase letter from 'a' to 'z'. - Sets can also be negated, matching any character NOT in the set, with the '^' character.
- For example,
[^abc]
will match any character that is NOT 'a', 'b', or 'c'. - Code Sample :
import re
text = "abcdefg"
pattern = r"[a-c]"
result = re.findall(pattern, text)
print(result) # Output: ['a', 'b', 'c']
Ranges in Regex :
- Ranges match any character within a specified range.
- Ranges are specified using the '-' character between two characters.
- For example,
[a-z]
will match any lowercase letter from 'a' to 'z'. - Ranges can also be combined to match multiple ranges or individual characters.
- For example,
[a-zA-Z0-9]
will match any letter or digit. - Code Sample :
# Ranges example
text = "The quick brown fox jumps over the lazy dog 123"
pattern = r"[a-z]"
result = re.findall(pattern, text)
print(result)
# Output: ['h', 'e', 'q', 'u', 'i', 'c', 'k', 'b', 'r', 'o', 'w', 'n', 'f', 'o', 'x', 'j', 'u', 'm', 'p', 's', 'o', 'v', 'e', 'r', 't', 'h', 'e', 'l', 'a', 'z', 'y', 'd', 'o', 'g']
Excluding Sets & Ranges :
- You can also exclude characters or ranges from a set using the
^
character at the beginning of the set. - For example,
[^a-z]
will match any character that is not lowercase letter from 'a' to 'z'. - Code Sample :
# Excluding sets example
text = "The quick brown fox jumps over the lazy dog 123"
pattern = r"[^a-zA-Z ]"
result = re.findall(pattern, text)
print(result)
# Output:
['1', '2', '3']
Prev. Tutorial : Non-greedy Quantifiers
Next Tutorial : Capturing groups