Intended Audience
Beginning to intermediate programmers. A basic working knowledge of Python is assumed.
Summary
This tutorial will introduce beginning to intermediate programmers to the many useful Python tools & techniques for text and data processing. Topics will include regular expressions, filtering data with generators, and parsing.
Outline
- Common data sources needing processing:
- log files
- CSV
- tabular data
- XML
- Tools & techniques:
- lists & dictionaries
- s.join(list) instead of accumulating
- for line in file
- filters, large data sources: generators
- decorate-sort-undecorate
- StringIO
- Regular expressions:
- pattern matching
- filtering
- substitution
- splitting
- Parsing:
- text.split()
- text.find()
- regular expressions
- "real" parsers (including XML)
- state machines
Please send feedback & ideas for further specific topics to the trainer, David Goodger (email, home page).