The Hidden Orchestra: How Ancient Systems Underlie Speech

Exploring the sensorimotor foundations of human speech and language.

Posted October 23, 2025 | Reviewed by Margaret Foley

This post is Part 2 of a series. Read Part 1 here .

Although speech and language ability is relatively new in our evolutionary history—our ape cousins can’t speak—it may be built on a foundation that nature has used for millions of years: the basic architecture of sensorimotor control. This insight is reshaping our understanding of how language works in the brain and why certain types of brain damage produce specific patterns of speech difficulties.

Building on Ancient Foundations

Evolution rarely invents entirely new solutions. Instead, it tinkers with existing systems, repurposing and refining them for new functions. Before our ancestors developed language, they already possessed sophisticated systems for controlling movement—reaching for objects, grasping tools, navigating space. These systems shared a common architecture: a motor planning component in the frontal lobes, a sensory target component in the temporal and parietal lobes, and a translation system connecting the two.

The revolutionary idea is that speech production may have evolved by duplicating and adapting this basic sensorimotor template, stacking multiple layers on top of each other. At the lowest level, this architecture controls the fine details of articulation—the precise movements of your tongue, lips, and jaw. At higher levels, it manages phonological patterns (the sound structure of words) and, at the highest level, even grammatical structure. Each level maintains the same basic organization: motor plans, sensory targets, and a system for translating between them.

This doesn't mean that language is "just" motor control, any more than a bat's wing is "just" a modified forelimb. Both evolved from common ancestral structures but developed specialized features for entirely different purposes. Human speech systems have evolved abstract, uniquely linguistic capabilities while still maintaining the fundamental sensorimotor architecture.

Two Routes to the Same Destination

One of the most puzzling observations in the study of brain damage and language is that injury to different brain regions produces distinctly different speech and language problems at the same linguistic level , such as phonology or syntax. For example, damage to frontal or temporal lobe areas can disrupt syntactic ability, but they give rise to very different symptoms. Frontal lobe damage causes agrammatism —halting, effortful speech lacking grammatical morphemes—while temporal lobe damage causes paragrammatism —fluent, syntactically rich speech that is full of grammatical errors, or so-called “confused sentence monsters." Similar fluency differences associated with frontal (nonfluent) versus temporal (fluent) error patterns hold at the level of phonology.

The integrated model explains this pattern elegantly. Each level of linguistic planning involves two systems, one in the temporal lobe that codes sensory-like phonological or syntactic targets for speaking, and another in the frontal lobe that codes the motor-like plans at the phonological or syntactic level for hitting those temporal lobe targets. A translation system in between each level ensures communication and serves to detect and correct errors in planning. Damage to the frontal lobe systems disrupts the plans (sequences of morphemes or syllables), causing nonfluencies at that level, while damage to temporal lobe systems disrupts the targets (syntactic hierarchies or phonological sound patterns) or disrupts the translation systems, allowing the frontal planning system to fluently run amok.

How does your brain catch errors before you make them? The answer involves a sophisticated prediction system. When you prepare to speak a word, your brain simultaneously activates both the auditory-phonological target (what the word should sound like) and the motor-phonological plan (the sequences needed to produce it). The plan system then sends an inhibitory signal to the target system—essentially a prediction of what you're about to say.

If your plan is correct, this inhibitory signal cancels out the target, and speech proceeds smoothly. But if your plan is wrong—if you're about to say "cat" when you meant "cap"—the inhibitory signal goes to the wrong target representation. The correct target remains active and sends a correction signal to activate the right plan. This internal feedback loop allows you to catch and fix errors before they reach your lips.

This system explains a specific type of aphasia called conduction aphasia, typically caused by damage to the translation system between auditory and motor phonological systems. People with this condition speak fluently but make frequent sound-based errors. Crucially, they recognize their mistakes immediately when they hear themselves speak, because their target system remains intact—they just can't use it for error correction before speaking.

Theoretical elegance is one thing, but does this integrated model actually explain the data better than traditional approaches? To find out, my colleague, Grant Walker, and I built a computational model that simulates how people name objects and compared two architectures: a standard psycholinguistic model with semantic, word, and phonological levels, versus an integrated sensorimotor-like model that splits the phonological level into separate plan and target components.

Both models were tested on their ability to reproduce the specific patterns of naming errors made by people with aphasia. The integrated model outperformed the traditional architecture, particularly for people with conduction aphasia—exactly the population whose deficits most directly implicate the connection between auditory and motor phonological systems.

Clinical Implications

Understanding speech production as a hierarchical sensorimotor system has practical implications for treating communication disorders. Different types of brain damage affect different components of this architecture, producing characteristic patterns of impairment. Frontal lobe damage disrupts motor planning, causing effortful, nonfluent speech. Temporal lobe damage impairs auditory targets and error monitoring, causing fluent but error-prone speech. Damage to the connections between these systems, as in conduction aphasia, specifically disrupts the ability to translate between auditory targets and motor plans.

This framework suggests that rehabilitation approaches might be tailored to strengthen specific components of the system or to help patients develop compensatory strategies using intact pathways. For example, someone with damage to the auditory-motor translation system might benefit from exercises that strengthen direct lexical-to-motor connections, bypassing the damaged pathway.

The Evolution of Understanding

The history of neuroscience is filled with debates between competing theories, each capturing part of the truth. The study of speech production has been no exception, with psycholinguistic and motor control approaches often seeming incompatible. The emerging integrated view doesn't declare a winner in this debate. Instead, it reveals how both perspectives describe different aspects of the same underlying system—a system built on ancient evolutionary foundations but elaborated to support humanity's most distinctive capability: language.

Excepted and adapted from Wired for Words: The Neural Architecture of Language by Gregory Hickok, published by MIT Press.

Hickok, G. (2012). Computational neuroanatomy of speech production. Nature Reviews Neuroscience, 13 (2), 135-145. doi:10.1038/nrn3158

Hickok, G. (2014). Toward an Integrated Psycholinguistic, Neurolinguistic, Sensorimotor Framework for Speech Production. Lang Cogn Process, 29 (1), 52-59. doi:10.1080/01690965.2013.852907

Hickok, G., Houde, J., & Rong, F. (2011). Sensorimotor integration in speech processing: computational basis and neural organization. Neuron, 69 (3), 407-422. doi:S0896-6273(11)00067-5 [pii]10.1016/j.neuron.2011.01.019

Walker, G. M., & Hickok, G. (2016). Bridging computational approaches to speech production: The semantic-lexical-auditory-motor model (SLAM). Psychon Bull Rev, 23 (2), 339-352. doi:10.3758/s13423-015-0903-7

Share this post Facebook Bluesky Linkedin Email

There was a problem adding your email address. Please try again.

By submitting your information you agree to the Psychology Today Terms & Conditions and Privacy Policy

Greg Hickok, Ph.D., is Distinguished Professor at the University of California, Irvine, and the author of the forthcoming book Wired for Words.

Get the help you need from a therapist near you–a FREE service from Psychology Today.

This article is part of the Bringwise Psychology Journal — daily insights on human behavior, mental health, and personal growth.