{article Examples__Python 3.4 }{title} {text} {/article}

Regular Expressions - Python

Raw Strings


It is customary in Python regular expressions to pass the patterns as raw strings (r'pattern') to avoid escaping the special characters that are likely included in the pattern.

Raw String - Suppresses actual meaning of Escape characters. The syntax for raw strings is exactly the same as for normal strings with the exception of the raw string operator, the letter "r," which precedes the quotation marks. The "r" can be lowercase (r) or uppercase (R) and must be placed immediately preceding the first quote mark.

The backslash (\) character is used to escape characters that otherwise have a special meaning, such as newline, backslash itself, or the quote character. String literals may optionally be prefixed with a letter 'r' or 'R'; such strings are called raw strings and use different rules for interpreting backslash escape sequences. A prefix of 'u' or 'U' makes the string a Unicode string. Unicode strings use the Unicode character set as defined by the Unicode Consortium and ISO 10646. Some additional escape sequences, described below, are available in Unicode strings. A prefix of 'b' or 'B' is ignored in Python 2; it indicates that the literal should become a bytes literal in Python 3 (e.g. when code is automatically converted with 2to3). A 'u' or 'b' prefix may be followed by an 'r' prefix.

character classes

Element Description (for regex with default flags)
. This element matches any character except newline \n
\d
This matches any decimal digit; this is equivalent to the class [0-9]
\D
This matches any non-digit character; this is equivalent to the class [^0-9]
\s
"> This matches any whitespace character; this is equivalent to the class [\t\n\r\f\v]
\S
This matches any non-whitespace character; this is equivalent to the class [^ \t\n\r\f\v]
\w
This matches any alphanumeric character; this is equivalent to the class [a-zA-Z0-9_]
\W
This matches any non-alphanumeric character; this is equivalent to the class [^a-zA-Z0-9_]

. This element matches any character except newline \n

Python 2.7.6 (default, Nov 10 2013, 19:24:24) [MSC v.1500 64 bit (AMD64)] on win32
Type "copyright", "credits" or "license()" for more information.
>>> import re
>>> hello1 = "Hello, world." # string hello1
>>> if re.search(r".", hello1):
print "Match " +\
"contained in " + hello1
else:
print "NO Match" +\
"contained in " + hello1
Match contained in Hello, world.
>>> if re.search(r"e.", hello1):
print "Match " +\
"contained in " + hello1
else:
print "NO Match" +\
"contained in " + hello1
Match contained in Hello, world.
>>> if re.search(r"a.", hello1):
print "Match " +\
"contained in " + hello1
else:
print "NO Match" +\
"contained in " + hello1
NO Matchcontained in Hello, world.
>>> if re.search(r"...l", hello1):
print "Match " +\
" contained in " + hello1
else:
print "NO Match" +\
" contained in " + hello1
Match contained in Hello, world.
>>> if re.search(r"...z", hello1):
print "Match " +\
" contained in " + hello1
else:
print "NO Match" +\
" contained in " + hello1
NO Match contained in Hello, world.
>>>

{source}
<!-- You can place html anywhere within the source tags -->
<pre class="brush:py;">
Python 2.7.6 (default, Nov 10 2013, 19:24:24) [MSC v.1500 64 bit (AMD64)] on win32
Type "copyright", "credits" or "license()" for more information.
>>> import re
>>> hello1 = "Hello, world." # string hello1
>>> if re.search(r".", hello1):
    print "Match " +\
    "contained in " + hello1
else:
    print "NO Match" +\
        "contained in " + hello1

Match contained in Hello, world.
>>> if re.search(r"e.", hello1):
    print "Match " +\
    "contained in " + hello1
else:
    print "NO Match" +\
        "contained in " + hello1

Match contained in Hello, world.
>>> if re.search(r"a.", hello1):
    print "Match " +\
    "contained in " + hello1
else:
    print "NO Match" +\
        "contained in " + hello1

NO Matchcontained in Hello, world.
>>> if re.search(r"...l", hello1):
    print "Match " +\
    " contained in " + hello1
else:
    print "NO Match" +\
        " contained in " + hello1

Match contained in Hello, world.
>>> if re.search(r"...z", hello1):
    print "Match " +\
    " contained in " + hello1
else:
    print "NO Match" +\
        " contained in " + hello1

NO Match contained in Hello, world.
>>>
</pre>

<script language="javascript" type="text/javascript">
    // You can place JavaScript like this

</script>
<?php
    // You can place PHP like this

?>
{/source}

negated character

Element Description
[ Matches a set of characters
^ Not matching this symbol's following characters
\ / Matches a / character
\ Matches a \ character
] End of the set

Alternation
We have just learned how to match a single character from a set of characters.
Now, we are going to learn a broader approach: how to match against a set of
regular expressions. This is accomplished using the pipe symbol | .

Element Description
| or (pipe symbol | )

{source}
<!-- You can place html anywhere within the source tags -->
<pre class="brush:py;">
Python 2.7.6 (default, Nov 10 2013, 19:24:24) [MSC v.1500 64 bit (AMD64)] on win32
Type "copyright", "credits" or "license()" for more information.
>>> import re
>>> hello1 = "Hello, world." # string hello1
>>> if re.search(r"(Mel|No|Two)", hello1):
    print "At least one of Hello, No, or Two is " +\
    "contained in " + hello1
else:
    print "There are no words with Mel, No, or Two is " +\
        "contained in " + hello1


There are no words with Mel, No, or Two is contained in Hello, world.
>>> if re.search(r"(Hello|No|Two)", hello1):
    print "At least one of Hello, No, or Two is " +\
    "contained in " + hello1
else:
    print "There are no words with Hello, No, or Two is " +\
        "contained in " + hello1


At least one of Hello, No, or Two is contained in Hello, world.
>>>

</pre>

<script language="javascript" type="text/javascript">
    // You can place JavaScript like this

</script>
<?php
    // You can place PHP like this

?>
{/source}

Quantifiers