Differences between revisions 9 and 10
Revision 9 as of 2005-05-14 05:31:23
Size: 3946
Editor: aaron
Comment: David Mertz's regex reference
Revision 10 as of 2005-07-19 23:34:55
Size: 4352
Editor: aaron
Comment: redemo is awesomeness
Deletions are marked like this. Additions are marked like this.
Line 97: Line 97:

I should mention: PythonCard comes with "'''redemo'''," which is simply indispensible if you are doing a bunch of regex work.

It lets you type in a regular expression, and highlights matches. You can edit either regex or text, the highlights adjust in realtime. And then, it prints out how the .groups() are identified in code. It's terribly useful.
-- LionKimbro [[DateTime(2005-07-19T23:34:53Z)]]

Reference Diagrams

http://taoriver.net/img/for_pi/regex_characters.png

flags when compiling:

http://taoriver.net/img/for_pi/regex_flags.png

(All images PD, released by me, author, LionKimbro.)

DavidMertz has also created [http://gnosis.cx/TPiP/regex_patterns.gif a regular expression reference.]

Searching & Matching

You can search or match.

  • search -- find something anywhere in the string, and return it

  • match -- find something from the beginning of the string, and return it

You can also split on a pattern.

For example:

   1 import re
   2 split_up = re.split(r"(\(\([^)]+\)\))",
   3                     "This is a ((test)) of the ((emergency broadcasting station.))")

...which produces:

["This is a ", "((test))", " of the ", "((emergency broadcasting station.))" ]

Compiling

If you use a regex a lot, compile it first.

Consider:

   1 import re
   2 match_obj=re.match("<(.*?)>(.*?)</(.*?)>", "<h1>robot</h1>")
   3 print mo.groups()

...which outputs: ('h1', 'robot', 'h1')

If you were going to do that match a lot, you could compile it, like so:

   1 import re
   2 match_re=re.compile("<(.*?)>(.*?)</(.*?)>")
   3 match_obj=match_re.match("<h1>robot</h1>")
   4 print match_obj.groups()

...which yields the same result.

I don't know how much faster compiled forms are than non-compiled forms.

Links

For those interested in visualization, you may also be interested in a [http://www.ozonehouse.com/mark/blog/code/PeriodicTable.pdf periodic table of PERL operators.]

Discussion

I've made a couple of diagrams, which I've linked at the top of the page here.

I have SVG links as well; [http://taoriver.net/img/for_pi/regex_characters.svg first] and [http://taoriver.net/img/for_pi/regex_flags.svg second.]

Damn the [http://visual.wiki.taoriver.net/moin.cgi/LongImageIncorporationProcess LongImageIncorporationProcess!] Damn it to hell! We'd have a billion pretty pictures here, if tablets were cheap, and we had protocols and implementations for saving and loading straight to and from the wiki.

I have another diagram I'd like to make and place; It's the pattern (RegexObject) and match (MatchObject) API, visualized, and arranged dense. We'll see if I get around to drawing it, but it looks like no. Too much else to do.

-- LionKimbro DateTime(2004-12-28T08:38:12Z)

  • I like the Venn diagram in this image. However, one part of the image is confusing. Where it refers to python strings, and "regex strings" (which are actually Python "raw" strings) and something called "match strings" ... what are these "match strings. -- JimD DateTime(2004-12-30T20:03:54Z)

The image isn't meant to be explanatory, it is meant to be reference and refreshing material.

That said: The "match string" is the final product of either of the two above expressions. It is what the above two expressions will literally match. If you have a better phrase, or would like to correct "raw" to "regex," feel free to download the SVG, edit the text, place an image on the web, and link it from here. (The damn [http://visual.wiki.taoriver.net/moin.cgi/LongImageIncorporationProcess LongImageIncorporationProcess] strikes again.) I may eventually get around to it myself one day, but it seems there are higher priorities, and the diagram is "good enough."

That said, I appreciate the correction. -- LionKimbro DateTime(2005-01-01T00:22:14Z)

I should mention: PythonCard comes with "redemo," which is simply indispensible if you are doing a bunch of regex work.

It lets you type in a regular expression, and highlights matches. You can edit either regex or text, the highlights adjust in realtime. And then, it prints out how the .groups() are identified in code. It's terribly useful. -- LionKimbro DateTime(2005-07-19T23:34:53Z)

RegularExpression (last edited 2015-08-30 10:11:18 by pythonguru)

Unable to edit the page? See the FrontPage for instructions.