Your search query "linkto%3A%22TextProcessingInPython%22" didn't return any results. Please change some terms and refer to HelpOnSearching for more information.
(!) Consider performing a full-text search with your search terms.

Clear message

Text Processing in Python is a book by DavidMertz.

The free text is available at:

Buy the dead-trees version at:

A review by Danny Yee at:


I'm uploading some notes I kept from section 1.1, on his combinatorial HigherOrderFunctions, in the thought that someone might gain some use or insight out of it. -- LionKimbro

Chapter 1, Section 1



Question 3

I took a particular interest in question 3. Question 3 asks: "Why is ident() important?"

ident() is critical because it's a pass-through function.

You really have to look at what the whole "higher order function" (HOF) thing is all about, in order to really get it.

Outline of explanation:

HOF's Contribute Functionality

A lot of times, we look at these higher-order-functions, and we think to ourselves, "Why, that's easy. That's so basic. That's not really doing anything. It's just this little flim-flam wrapper."

So, let's look for a second at one of these flim-flam wrappers. Let's look at the function: "and_".

and_ = lambda f,g: lambda x, f=f, g=g: f(x) and g(x)

Look at it! Look at it! It's rediculous! There's hardly anything there. It's arguably more wrapper, than functionality.

So, we think of this as being "nothing."

But, look more carefully.

In Question 2, right before Question 3, David was starting to point this out to us- making us think about "shortcutting." Let's look at this a liitle more deeply, shall we?

if operator.truth(f(x)):
    if operator.truth(g(x)):
        return True
return False

What I've done here is expanded out the functionality of "and", in long form, to show a bit more of the structure.

Does it still appear flimsy? Well, let's talk about this. Let's look at the functionality, qualitatively.

Functionality and structure that we can find in here:

So, we see, that there's actually some "stuff" that's going on in here; There's some stuff happening in that little "and" expression.

And, in fact, for the HOF's in general, there's "stuff" going on. It is often hidden; especially within these more simple atomic functions.

How was it hidden?

It was hidden in the language. It was hidden in the operators. We just assume when we see and, to think that it will do these various things. But we rarely think of it as having 4 attributes of functionality, or 3 attributes, or whatever. We just think "oh, this is very simple and basic."

But, it is a mistake. And when we see these HOFs in, and think that because they're all one-liners, that there's "not much going on in there," we're making this mistake. We fail to see the structure, the operations being performed, and so on.

So we end up with this simplistic idea of what HOF's are all about. And then when we're asked: "Why is ident() so important," we're struck dumb, because we can't see it.

There is Space for Custom Functionality

HOFs accept or return functionality.

Let's just consider those HOFs that accept functionality, right now. (Regardless of whether or not they return functionality.) We're talking about the ident() function. Arguably, it could be returned from "foo(0)", but that's just a distraction at this point; We're here to understand the value of ident(). (You can come back and think about this later, after you've read the full article.)

So considering HOFs that accept functionality,... What's happening here?

We've got a HOF, a HOF that is itself a complex array of function calls and structure, and we're being given the opportunity to supply some function calls of our own, into this system.

The image to hold in your mind is of a gigantic clockwork. You've got a HOF. By now, you should have ejected the image in your mind of a simple flimsy thing. Instead, I want you to see the HOF as a mighty machine, a gigantic clockwork.

And now you have this opportunity, to plug in your own clockwork, into the system.

It's got some sockets, you see, where you can say, "Oh, I'm going to plug in my own custom machine here."

This is what the whole "Higher Order Function" thing is about, you see: It's about working with machinery itself, and not just data.

So, in a HOF, (one that accepts functionality, at least,) you have all these opportunities for inserting your own, custom, functionality.

Ignoring Custom Functionality

But sometimes, you don't WANT to put in custom functionality!

Sometimes, you want to just say, "I'll pass."

Why would you do that?

That's what our question is all about, isn't it?

Why would you do that?

Framed this way, the answer is clear:

You'd do that, because you care about the rest of the functionality.

Remember: The HOF is a gigantic clockwork. It's a "mighty machine."

You might not care about whatever little piece of machinery you can put in there; You might only want to make use of exactly whatever is in the gigantic clockwork.

So if you want what is in the gigantic clockwork, but you don't care about what's in your little contribution to the final machine, then you just pass "ident()".

Ident is None, NULL, 0, Pass-Through

ident() is "None." ident() is "NULL." ident() is 0 (zero).

ident(), in higher-order-function land, is "pass-through," or "pass."

"No thank you."

So the answer to the question "Why ident()?", ... "Because I wanted the functionality that was in the HOF, unaltered."

The reason we were confused, is because we thought that HOF's didn't "do anything" on their own. But they do, and it can be adventageous to us to take advantage of it.

Question 3 was nice, because it kindly pointed towards what this whole "higher order function" thing is all about.

Frankly, I don't know what the "hint" was about.

      Hint: Suppose you have a list of lines of text, where some
      of the lines may be empty strings.  What filter can you
      apply to find all the lines that start with a '#'?

Perhaps I just don't have a very good answer to question 3. Perhaps I've missed some essential point.

But, here was my answer to the hint:

   1 return filter(lambda x:x.startswith("#"), lines)

Frankly, I didn't see any insight there.

If anyone knows, please let me know. (Perhaps attach a comment here, or something.)

Remainder of the Answer

From here, I'll show a couple examples, of what we learned here. That is, I'll show places where you might want to use ident(), in order to take advantage of the unadulterated functionality embodied within the HOF.


The first example is validation.

We can perform validation on an argument, making sure that it is non-Null, before performing a test on the argument with another function, to see if it's true or not.

So, instead of writing:

if x is not None:
    if operator.truth(f(x)):  # f(x) will break, if x is None

...over and over, we can construct a general function through HOF magic,...

general = and_(ident, f)

...and then:

if general(x):

You've got two benefits here:


Just to see something a little different, we'll look at using ident() to generate a copy of input, in the output.

apply_each(funcs, args) is a function that applies the arguments to a series of functions, and gives you back the results in a list.

   1 apply_each = lambda fns, args=[]: map(apply, fns, [args]*len(fns))

Or, in the preferred Python list-comprehension form:

   1 apply_each = lambda fns, args=[]: [f(*args) for f in fns]

So, if you wanted to make a function that constructs: [f(x), x]

...for whatever, reason, you can:

   1 my_func = apply_each(f, ident)


So, just to remind you: Why is ident() useful?

It's useful because it makes it possible to take advantage of the raw functionality of combinatorial higher order functions.

Unable to edit the page? See the FrontPage for instructions.