An Extension for Python re: regex

One thing I dislike about python is how painful it is to use regular expressions that capture matching groups of characters. For example, if I want to find words in quotes, I would write:

quotedWordPattern = re.compile(r'"([^"]+)"(.*)')
match = quotedWordPattern(text)

while match:
    print match.group(1)
    text  = match.group(2)
    match = quotedWordPattern(text)

This is a lot clunkier than the equivalent perl:

while (text =~ /"([^"]+"(.*)/) {
    print $1
    text = $2
}

To get around this, ideally I would like to subclass python’s re, but it turns out that, even in python version 3, res aren’t really first class objects. So instead, I created a wrapper class called regex. Using it, I can write:

quotedWordPattern = regex.compile(r'"([^"]+)"(.*)')

while quotedWordPattern(text):
    print quotedWordPattern.last_match.group(1)
    text = quotedWordPattern.last_match.group(2)

Here’s the code of regex.py:

import re

class Regex():
    def __init__(self, pattern, flags=0):
        self.re         = re.compile(pattern, flags)
        self.last_match = None
        
    def search(self, string, pos=0, endpos=0):
        self.last_match = (self.re.search(string, pos, endpos) 
                           if endpos else self.re.search(string, pos))
        return self.last_match
        
    def match(self, string, pos=0, endpos=0):
        self.last_match = (self.re.match(string, pos, endpos) 
                           if endpos else self.re.search(string, pos))
        return self.last_match
        
    def fullmatch(self, string, pos=0, endpos=0):
        self.last_match = (self.re.fullmatch(string, pos, endpos) 
                           if endpos else self.re.search(string, pos))
        return self.last_match

def compile(pattern, flags=0):
    return Regex(pattern, flags)

Note that all operations other than search, match, and fullmatch can be accomplished using the regex’s re member, and flags can be imported from re if needed. For example:

import re
import regex
pattern = regex.compile(r'\wJim\w', re.UNICODE)

if pattern.match(name):
    bio = pattern.re.sub("James", bio)

 

Advertisements

About jimbelton

I'm a software developer, and a writer of both fiction and non-fiction, and I blog about movies, books, and philosophy. My interest in religious philosophy and the search for the truth inspires much of my writing.
This entry was posted in python programming and tagged , , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s