Top Down vs Bottom-Up: The Battle to Understand Speech

Gael Hannan
March 20, 2018

by Kathi Mestayer


Top-down grabs wheel, runs into ditch

There are two different ways in which we make sense out of speech – “top-down” and “bottom-up.”

I first read about this in The Language Instinct by Steven Pinker, in which he describes how we take sound input and make sense of it as speech. Let’s start with the top-down system, since it’s the one we’re more aware of.

Top-down auditory processing is more, shall we say, thoughtful, using tools like context (dinner table, board meeting, classroom), expectations (past experience, person speaking), and nonverbal cues (facial expressions, body language).  It considers those factors, along with the speech sounds, and does its best to interpret what was said. 

The bottom-up system, on the other hand, makes a lightning-fast, best guess based on the raw sound data. Period. No consideration of context or those other complicating, time-consuming factors. As a result, bottom-up attempts can be comically wrong, like the mis-heard lyrics of The Battle Hymn of the Republic, “he is trampling out the vintage where the great giraffes are stored.”  A thoughtful, deliberative system, like top-down, would not report those lyrics, especially if you’ve heard that song a thousand times. But bottom-up’s job is to get you an interpretation really, really fast.  No editor, no proofreader…a hip-shot.

But sometimes a fast reaction is needed, because we’re busy using up top-down capacity with things like multitasking or making a tough decision.  In those cases, our brains opt for the bottom-up mode and hope for the best. 

So, who makes the decision about delegating tasks to the slow road or the express lane?  Our brains do, usually without consulting us, and that’s how we end up on the receiving end of ‘great giraffes’ or ‘national pelvic’ radio.


Top-down and bottom-up, toe-to-toe

Because it has a few more milliseconds to work with, it’s natural that top-down, the more deliberative process, is correct far more often.  But not always. 

Take the other night in a noisy restaurant, when my brain handed the task to top-down, assuming that it would be in a better position to tell us what was being said. I was sitting with two friends who were chatting away, when the waitress came up to me and asked, “Are you ready to order?”

“Yes,” I answered.  Then she turned and walked away.

Hmm, what just happened?  I sat there for awhile, puzzled.

A few minutes later, the waitress came back to our table, and said, “Are you ready to order now?”

“Didn’t you ask us that the last time you were here?” 

“No, I asked if you needed a little more time.”

At that point, top-down started whirring away, figuring things out. Putting the pieces back together, I see that my top-down system took over as the head interpreter, elbowed bottom-up out of the way, and made the call based on what it expected the waitress to say as she approached the table – were we ready to order?  Nice try.

In so doing, top-down completely ignored what speech sounds were available in that noisy space (which, in fairness, were pretty garbled).  Bottom-up would have given me, “did you betty the border?”  And bottom-up plus top-down would probably have gotten it right.  But top-down, in this case, was about as helpful as the great giraffes. Of course, bottom-up gets a kick out of this. He’s usually the one who gets things wrong. It’s not as amusing as hearing canned spinach instead of the king’s speech, but a new kind of lapse. 

Good thing I’m not a control freak. Now, I have two different kinds of mistakes to look out for.  I’m just batting clean-up.  Who’s on first? 



kathi mestayerKathi Mestayer writes for Hearing Health Magazine, Be Hear Now on, and serves on the Board of the Virginia Department for the Deaf and Hard-of-Hearing.  In this photo she is using her iPhone with a neckloop, audio jack, and t-coils which connects her to FaceTime, VoiceOver, turn-by-turn navigation, stereo music and movies, and output from third party apps, including games, audiobooks, and educational programs.


  1. I love church hymns such as “Lead on oh kinky turtle” and “Get your niece acquainted with the cold and rocky ground”.

  2. Hi Kathi, I love this blog – it describes the two ways we process sound very well. Couple that to a Dutch brain and you can understand that I have come up with some really weird interpretations in conversations where both languages are being used. 🙂

  3. Stephen Pinker writes about the misheard-song phenomenon in “the Language Instinct,” which is utterly fascinating. There’s a term for it….”Mondegreen.” An old sone lyric went, “they took his body out and LAID HIM ON THE GREEN.” Heard: “Lady Mondegreen.”

  4. I have been working with auditory processing disorders for over 10 years and I find some of your comments perplexing. Auditory processing is totally a bottom up anatomical event. Speech is not bottom up. It is totally top down. Speech cannot occur until all the auditory information has been delivered to the speech cortex. The difference in timing that you talk about would have to occur because of changes in the speech cortex routing or executive function involvement. It cannot anatomically or physiologically be due to a bottom up process.

  5. I think it’s always risky to get “hardening of the categories” as we consider the labels we use to describe neuro-cognitive events. “Bottom-Up” and “Top-Down” are like post-it notes to help us grasp some of the basic streams that we hypothesize to occur in the brain. It is critical to not compartmentalize brain function based on our own “props,” however. These are not separate events, but are operating concurrently, just as afferent and efferent streams create a feedback loop. While we love to break things down into neat categories, this minimalism does not serve the concepts described very well. Neurological functions are like a “pop-up” book–they have to be unfolded and carefully investigated to appreciate –to the extent we can–the fascinating integrative complex which is created. We tend to dumb down what we don’t really understand.

Leave a Reply