Differences between revisions 19 and 20
Revision 19 as of 2009-04-15 14:41:14
Size: 5279
Editor: 82
Comment: Adding link to cheat sheet
Revision 20 as of 2009-04-15 14:45:24
Size: 5167
Editor: 82
Comment: Move cheat sheet to "see also" section
Deletions are marked like this. Additions are marked like this.
Line 68: Line 68:
 * [[http://www.bitcetera.com/en/techblog/2008/04/01/regex-in-a-nutshell/|Regex in a Nutshell]] cheat sheet
Line 106: Line 107:

=== Cheat Sheet ===

Here's the colorful [[http://www.bitcetera.com/en/techblog/2008/04/01/regex-in-a-nutshell/|Regex in a Nutshell]] cheat sheet which may come in handy for your daily work with regular expressions.
 

http://wiki.python.org/moin/RegularExpression?action=AttachFile&do=get&target=regex_characters.png

flags when compiling:

http://taoriver.net/img/for_pi/regex_flags.png

(All images PD, released by author, LionKimbro.)

Searching & Matching

You can search or match.

  • search -- find something anywhere in the string, and return it

  • match -- find something from the beginning of the string, and return it

You can also split on a pattern.

For example:

   1 import re
   2 split_up = re.split(r"(\(\([^)]+\)\))",
   3                     "This is a ((test)) of the ((emergency broadcasting station.))")

...which produces:

["This is a ", "((test))", " of the ", "((emergency broadcasting station.))" ]

Compiling

If you use a regex a lot, compile it first.

Consider:

   1 import re
   2 match_obj=re.match("<(.*?)>(.*?)</(.*?)>", "<h1>robot</h1>")
   3 print mo.groups()

...which outputs: ('h1', 'robot', 'h1')

If you were going to do that match a lot, you could compile it, like so:

   1 import re
   2 match_re=re.compile("<(.*?)>(.*?)</(.*?)>")
   3 match_obj=match_re.match("<h1>robot</h1>")
   4 print match_obj.groups()

...which yields the same result.

I don't know how much faster compiled forms are than non-compiled forms.

See Also

Discussion

Requests

  • documentation on using re with Unicode ..?

Problem?

The following feature does not seems to work in python:

For example, the ICU regular expression provides the following patterns:

  • \N{UNICODE CHARACTER NAME} Correspond au caractère nommé
  • \p{UNICODE PROPERTY NAME} Correspond au carctère doté de la propriété Unicode spécifiée.
  • \P{UNICODE PROPERTY NAME} Correspond au carctère non doté de la propriété Unicode spécifiée.
  • \s Correspond à un caractère séparateur. un séparateur est définit comme [\t\n\f\r\p{Z}].
  • \uhhhh Correspond à un caractère dont la valeur hexa est hhhh.
  • \Uhhhhhhhh Correspond à un caractère dont la valeur hexa est hhhhhhhh. Exactement huit chiffres héxa doivent être fournis, même si le code point unicode le plus grand est \U0010ffff.

-- anonymous

I don't understand the problem. -- LionKimbro 2006-03-25 16:31:35

Visualization

  • I like the Venn diagram in this image. However, one part of the image is confusing. Where it refers to python strings, and "regex strings" (which are actually Python "raw" strings) and something called "match strings" ... what are these "match strings. -- JimD 2004-12-30 20:03:54

The image isn't meant to be explanatory, it is meant to be reference and refreshing material.

That said: The "match string" is the final product of either of the two above expressions. It is what the above two expressions will literally match. If you have a better phrase, or would like to correct "raw" to "regex," feel free to download the SVG, edit the text, place an image on the web, and link it from here. (The damn LongImageIncorporationProcess strikes again.) I may eventually get around to it myself one day, but it seems there are higher priorities, and the diagram is "good enough."

That said, I appreciate the correction. -- LionKimbro 2005-01-01 00:22:14

Image Hosting

At the top of this page were two images/scehmes about re. Is it possible to redraw them here somehow? That server is not working, maybe someone has them dowloaded locally. Thanks a lot. -- PavelKosina

I've uploaded one as an attachment, still need to upload the other... And, the source...

(Anyone can do this, though, when the computers are online.)

-- LionKimbro 2006-03-25 16:31:35

RegularExpression (last edited 2015-08-30 10:11:18 by pythonguru)

Unable to edit the page? See the FrontPage for instructions.