Can the Way You Talk Reveal How Your Brain Is Aging?

I spend a lot of time thinking about the brain, specifically about why some people's brains age faster than others, and whether we can tell from the outside before anything goes wrong.

The standard way to measure brain aging requires an MRI scanner, a radiologist, and usually a good reason to be there in the first place. What if we could skip all of that? What if something as simple as how you talk (how many content words you use per minute, how diverse your vocabulary is, how many verbs you pack into a sentence) could tell us something real about the biological age of your brain?

That is the question our lab (Aphasia Lab) has been working on. And what I found after running analyses surprised me.

A bit of background

We worked with 300+ neurologically healthy adults. For each person, we had two things: a structural MRI scan, and a battery of behavioral measurements (spontaneous speech tasks, motor tests, blood work, hearing, balance, questionnaires about sleep and mood and daily life).

From the MRI, we computed something called the Brain Age Gap (BAG). This is the difference between how old your brain looks, estimated by a machine learning model trained on brain volumes, and how old you actually are. A positive BAG means your brain appears older than your chronological age. A negative BAG means it looks younger. We did this not just for the whole brain, but for 12 specific regions, including regions involved in language processing.

Then we asked: after removing the obvious part, the fact that older people tend to have older-looking brains, what behavioral measures explain who ends up with an accelerated or decelerated aging trajectory?

What the data said

The answer, more than anything else, was speech.

How many content-rich words a person produces per minute. How lexically diverse their spontaneous speech is. The proportion of verbs versus nouns in their connected speech. These features, extracted from a short naturalistic speech task where people were simply asked to describe a picture or tell a story, predicted to a statistically meaningful degree how biologically old a person's language network appeared, independent of their chronological age.

The regions showing the strongest effects were the Domain General network (a set of frontal and parietal areas involved in cognitively demanding language tasks) and the right Temporal cortex. Both showed around 15-16% of explained variance in age-independent brain aging. The language-specific network, the classical perisylvian regions you might associate with Broca's and Wernicke's areas, was also significant.

People who produced more information-dense, verb-rich, lexically diverse speech tended to have younger-appearing language networks. People whose speech was more noun-heavy and less varied tended to show the opposite pattern.

Why this is not as straightforward as it sounds

I want to be honest about what these numbers mean and what they don't.

The explained variance (15%) sounds small. And in absolute terms, it is. But there are good reasons for this. We removed chronological age from the brain aging measure before running any analysis, which means we were modeling only the residual biological variation, the part of brain aging that age itself cannot explain. That is the hardest part to predict, and predicting 15% of it from a brief speech sample is not nothing.

We also ran this in a healthy sample with no neurological disease. Healthy adults are, by design, compressed in their brain-behavior variability. In a clinical sample (people with aphasia, or early cognitive decline, or post-stroke recovery) the same speech features would likely show much larger effects. What we are establishing here is the baseline: what the relationship between speech and brain aging looks like before anything goes wrong.

The honest part: the autoencoder did not work

Science rarely goes in a straight line, and ours didn't either.

Before settling on the stepwise regression approach, we built a supervised autoencoder, a deep learning model designed to compress all 119 behavioral variables into just two latent factors, then use those factors to predict brain aging across all 12 regions simultaneously. The idea was elegant: maybe there are two fundamental axes of behavioral aging that explain most of what is happening in the brain.

It did not find them. The model's predictions were not better than chance, and the permutation test confirmed the null result. This is a finding too, even if it is not the one we hoped for. It tells us that whatever signal exists in behavioral measures for predicting age-independent brain aging is not large enough for a compressed deep learning representation to recover at this sample size. Stepwise regression, which makes fewer assumptions, did find real but modest associations. The signal is subtle.

What I think this means

We spend enormous resources on neuroimaging to understand brain health. MRI is powerful, but it is expensive, inaccessible, and produces a single snapshot in time. The possibility that a 10-minute spontaneous speech sample carries information about biological aging in the language network (not in the same ballpark as MRI, but meaningfully correlated) opens a different kind of question.

Can speech replace MRI? It cannot, not yet, possibly not ever for clinical diagnosis. But: can speech serve as a low-cost, repeatable behavioral signal that tracks how a person's language network is aging over time? Can changes in how you talk, becoming less informationally dense, relying more on nouns and less on verbs, serve as an early flag, years before imaging would show anything clinical?

That is the question I want to answer next. And it starts with data like these.

This research was conducted as part of the ABC BHI project in the Aphasia Lab. Code, and analysis scripts are available on my GitHub.