Simkova, Katerina Marie
ORCID: 0000-0002-2702-6396
(2025).
Unfolding shared semantic representations in vision and language.
University of Birmingham.
Ph.D.
This is the latest version of this item.
|
Simkova2025PhD.pdf
Text - Accepted Version Restricted to Repository staff only until 31 July 2026. Available under License All rights reserved. Download (9MB) |
Abstract
When we interact with language, the brain effortlessly draws vivid mental images that resemble representations of directly perceived visual scenes. Similarly, research shows that the visual system encodes representations of visual scenes in a manner akin to how embeddings from large language models encode the meaning of a sentence. This implies that vision and language draw on a shared representational system for semantic encoding, but the precise nature of this visuo-linguistic overlap in the brain, as well as its implications for behaviour, remain unknown. The present thesis leveraged similarity judgements of natural scene images and their corresponding sentence captions to explore how vision and language converge in their ability to recognise semantic regularities of the visual world. We found that the visual and linguistic similarity judgements not only converge in the behavioural context but both also predict a remarkably similar network of visually evoked patterns along the mid- and high-level visual regions. Furthermore, we established the behavioural relevance by showing that the perceived dissimilarities in both vision and language effectively predict response times to a visual target primed by a sentence. Upon the observation that linguistic similarity judgements exhibit a higher interindividual variability, we used the participants’ own perceived dissimilarities to make predictions about their priming effect. This revealed that idiosyncratic linguistic dissimilarity is the strongest predictor of participants’ response times. The findings of the present thesis demonstrate that vision and language project their representations onto a shared similarity space, most likely embedded in the representational structure of the visual brain. Along with the observation that language plays a distinct role in eliciting individually relevant representations, these findings may profoundly impact our understanding of the visual system by placing greater emphasis on semantic idiosyncrasies.
| Type of Work: | Thesis (Doctorates > Ph.D.) | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Award Type: | Doctorates > Ph.D. | ||||||||||||
| Supervisor(s): |
|
||||||||||||
| Licence: | All rights reserved | ||||||||||||
| College/Faculty: | Colleges > College of Life & Environmental Sciences | ||||||||||||
| School or Department: | School of Psychology | ||||||||||||
| Funders: | European Research Council | ||||||||||||
| Subjects: | B Philosophy. Psychology. Religion > BF Psychology Q Science > Q Science (General) |
||||||||||||
| URI: | http://etheses.bham.ac.uk/id/eprint/15720 |
Available Versions of this Item
- Unfolding shared semantic representations in vision and language. (deposited 11 Mar 2026 15:21) [Currently Displayed]
Actions
![]() |
Request a Correction |
![]() |
View Item |
Downloads
Downloads per month over past year

