Differences between revisions 3 and 4
Revision 3 as of 2005-12-02 16:38:21
Size: 1035
Editor: DavidGoodger
Comment:
Revision 4 as of 2005-12-03 19:30:24
Size: 1128
Editor: DavidGoodger
Comment:
Deletions are marked like this. Additions are marked like this.
Line 52: Line 52:
  - "real" parsers   - "real" parsers (including XML)
Line 55: Line 55:
Trainer: `David Goodger <http://python.net/~goodger>`_ (`email <goodger@python.org>`_) Please send feedback & ideas for further specific topics to the trainer, David Goodger
(`email <goodger@python.org>`_, `home page <http://python.net/~goodger>`_).

Intended Audience

Beginning to intermediate programmers. A basic working knowledge of Python is assumed.

Summary

This tutorial will introduce beginning to intermediate programmers to the many useful Python tools & techniques for text and data processing. Topics will include regular expressions, filtering data with generators, and parsing.

Outline

  • Common data sources needing processing:
    • log files
    • CSV
    • tabular data
    • email
    • XML
  • Tools & techniques:
    • lists & dicts
    • s.join(list) instead of accumulating
    • for line in file
    • filters, large data sources: generators
    • decorate-sort-undecorate
    • StringIO
  • Regular expressions:
    • pattern matching
    • filtering
    • substitution
    • splitting
  • Parsing:
    • s.split()
    • s.find()
    • regular expressions
    • "real" parsers (including XML)
    • state machines

Please send feedback & ideas for further specific topics to the trainer, David Goodger (email, home page).

PyCon2006/Tutorials/TextProcessing (last edited 2008-11-15 14:01:15 by localhost)

Unable to edit the page? See the FrontPage for instructions.