Unfolding shared semantic representations in vision and language

Simkova, Katerina Marie ORCID: 0000-0002-2702-6396 (2025). Unfolding shared semantic representations in vision and language. University of Birmingham. Ph.D.

This is the latest version of this item.

Simkova2025PhD.pdf
Text - Accepted Version
Restricted to Repository staff only until 31 July 2026.
Available under License All rights reserved.
Download (9MB)

Abstract

When we interact with language, the brain effortlessly draws vivid mental images that resemble representations of directly perceived visual scenes. Similarly, research shows that the visual system encodes representations of visual scenes in a manner akin to how embeddings from large language models encode the meaning of a sentence. This implies that vision and language draw on a shared representational system for semantic encoding, but the precise nature of this visuo-linguistic overlap in the brain, as well as its implications for behaviour, remain unknown. The present thesis leveraged similarity judgements of natural scene images and their corresponding sentence captions to explore how vision and language converge in their ability to recognise semantic regularities of the visual world. We found that the visual and linguistic similarity judgements not only converge in the behavioural context but both also predict a remarkably similar network of visually evoked patterns along the mid- and high-level visual regions. Furthermore, we established the behavioural relevance by showing that the perceived dissimilarities in both vision and language effectively predict response times to a visual target primed by a sentence. Upon the observation that linguistic similarity judgements exhibit a higher interindividual variability, we used the participants’ own perceived dissimilarities to make predictions about their priming effect. This revealed that idiosyncratic linguistic dissimilarity is the strongest predictor of participants’ response times. The findings of the present thesis demonstrate that vision and language project their representations onto a shared similarity space, most likely embedded in the representational structure of the visual brain. Along with the observation that language plays a distinct role in eliciting individually relevant representations, these findings may profoundly impact our understanding of the visual system by placing greater emphasis on semantic idiosyncrasies.

Type of Work:

Thesis (Doctorates > Ph.D.)

Award Type:

Doctorates > Ph.D.

Supervisor(s):