If your a regex guru, and you know why you came here, you can go straight to the brief explanation. If not just keep reading.
I found a workaround for python bug 1519638. It most definitely will not solve all of the puzzles out there but it stops breaking the sub method for replacing with the use of backrefs.
If you would like to replace this:
And you’re not sure if the <small> tags is there, you would group the chars “<small>” and use a question mark for making them optional. BTW, running a replace on just “Name” is not allowed because they would mess up other parts of the file in question.
Example updated. Thanx dbr!
Using a compiled pattern and thus a regex to replace this, a solution might look like this:
reg = re.compile(r'(<label for="author">)(<small>)?(Name)', re.VERBOSE | re.MULTILINE | re.DOTALL) replace = r'g<1>g<2>g<3>' search = reg.sub(replace, data)
In this case the replacement string uses backreferences to the groups being the sub expressions within the parenthesis in the search pattern.
However, if the “<small>” tag is not there the search command raises an exception.
$ python regex.py Traceback (most recent call last): File "regex.py", line 14, in <module> search = reg.sub(replace, data) File "/usr/lib/python2.5/re.py", line 274, in filter return sre_parse.expand_template(template, match) File "/usr/lib/python2.5/sre_parse.py", line 793, in expand_template raise error, "unmatched group" sre_constants.error: unmatched group
This happens because the second group represented with “g<2>” in the replacement string returns a “None” instead of an empty string. That is (seems) the bug.
Solving the oops
This can be resolved by replacing the optional notation “(<small>)?” with an alternation “(|<small>)” because with the “<small>” tag being absent it matches on the empty subexpression. And then it actually returns an empty string so the search command won’t raise the exception.
In other words …
When doing a search and replace with sub, replace the group represented as optional for a group represented as an alternation with one empty subexpression. So instead of this “(.+?)?” use this “(|.+?)” (without the double quotes).
If there’s nothing matched by this group the empty subexpression matches. Then an empty string is returned instead of a None and the sub method is executed normally instead of raising the “unmatched group” error.
That’s all folks …