As we are now five years on from the first Covid-19 lockdown in the UK, we asked Justyna Robinson (“Mass Observing Covid-19” project academic advisor) how researchers can harness the knowledge presented in Covid-19 collection. The collection includes texts discussing life during the Covid-19 pandemic as well as rich metadata, which consists of biographic information about the Writers-Observers. The exploration of this data can take various avenues. In this blog post we outline one of the possible approaches of reading the data, namely distant reading.
Who are the Observers of the pandemic life?
Let’s focus on the day diaries collected on the 12th May 2020. The biographic information about the Diarists is presented in Figure 1. Women are the main contributors to that collection. This is not an unusual finding as similar gender distribution is seen across the Mass Observation Archive (MOA). When it comes to the decade of birth, the typical male Diarists tend to be younger than their female counterparts. A wider MOA also indicate a bias towards a South-East-based Observers and those from higher socioeconomic backgrounds. Thus, conclusions drawn on trends observed in the content of diaries need to consider the universe of speakers who are represented in the collection.

Figure 1: The May 2020 diarists by gender and decade of birth
What are the key terms of the Covid-19 collection?
The arrival of the pandemic brought a dramatic increase in the usage of many words and phrases such as lockdown, home schooling, social distancing etc. Many of those terms were picked up by media and discussed as terms encapsulating pandemic reality, such as covidiot (see here). But what were characteristics of the pandemic life for every-day people? The Covid-19 collection can reveal the world view of Observers through the key terms they used. In order to identify the key terms of the 12th May 2020 we compare language in those diaries against language of the same diaries produced 2010-2019.[1] The results of this exercise are presented in Table 1 where both individual keywords and key multi-word terms are ranked in columns.
Table 1: The key words and multi-word terms from the May 2020 diaries
| Rank | Keywords | Key multi-word terms |
| 1 | Lockdown | Social distancing |
| 2 | Zoom | Face mask |
| 3 | Virus | Zoom meeting |
| 4 | Pandemic | Video call |
| 5 | Covid-19 | Boris Johnson |
| 6 | Coronavirus | Daily exercise |
| 7 | Covid | Key worker |
| 8 | Mask | Daily walk |
| 9 | Distancing | School work |
| 10 | Furlough | VE day |
| 11 | Boris | Corona virus |
| 12 | Restriction | Social distance |
| 13 | Metre | Current situation |
| 14 | Johnson | Covid-19 pandemic |
| 15 | Isolate | Furlough scheme |
| 16 | Isolation | Stay alert |
| 17 | Shield | Strange time |
| 18 | Corona | Zoom call |
| 19 | Covid19 | Coronavirus pandemic |
| 20 | Quarantine | Normal people |
The words and phrases in Table 1 speak to the ‘aboutness’ of the May 2020 diaries. The results reflect salient themes of the pandemic life and serve to characterise that period, such as mask, furlough, isolate, and social distancing. Additionally, the Diarists provide their unique perspective of how they experienced the pandemic. The newness of the situation is evident in naming of the SARS‑CoV‑2 as the Diarists used a range of terms for the virus, such as Covid, Covid-19, Corona, Coronavirus (pandemic). This linguistic behaviour typically signals a new phenomenon for which language users are still at the stage of negotiating a name.
The keyterms also show that traditionally mundane activities such as daily exercise, daily walk, school work become a highlight of the day, so do the calls, either video or zoom calls. There is a sense of nowness and an uncertain future emerging through terms such as stay alert, current situation. Making sense of the pandemic reality proves difficult as Observers try to categorise experiences that seem to them as normal from those that do not, see terms normal people vs strange time.
What do these key terms mean for Covid-19 Observers?
Further insight into the lockdown diaries can be gleaned from exploring the meaning of these terms through their collocates, i.e. words that they typically co-occur with. For example, the collocates of zoom presented in Figure 2 tell us how and in which contexts people used zoom during this period.

Figure 2: The terms that modify zoom in the May 2020 diaries
Diarists used zoom is used in the context of work, e.g. meeting and conference, as well as education, e.g. tutorial, lesson, and class. The use of zoom for recreational and social purposes is also evident through the collocates rehearsal, pilate(s), quiz, get-together, and party.
Another question one can ask about the Covid-19 data is how the collective meaning of these diaries differs to the collective meaning of the diaries from the previous decade. In order to answer this question we move from analysis of individual words to concepts, that is, clusters of words that share a core meaning. At Concept Analytics Lab based at the University of Sussex we developed software that identifies key themes of a text. The software extracts and visualises conceptual profile of texts against a references text. Figure 3 illustrates the conceptual profile of the responses to the Mass Observation’s 12th May diaries from 2020 in comparison with the diaries from 2010-2019. The size of each node represents the raw frequency of a concept, while its colour represents how distinctive of the data set each concept is, relative to the past diaries, with darker colours representing greater distinctiveness.

Figure 3: The conceptual profile for the May 2020 diaries
The conceptual profile serves as a compass to the explorations of the data, pointing the researcher in the direction of conceptual distinctiveness, sacrificing the readability of less distinctive concepts. Figure 3 shows that relative to the previous ten years, in 2020, Diarists were less likely to discuss physical entities, but were more likely to discuss abstractions. Particularly salient abstractions include government, lockdown, rule, exercise, walk, health, and delivery. There are specific examples of physical entities that buck this trend. For example, garden, home, call, and virus are abstractions that are distinctive of the 2020 diaries. These findings are a testimony of the pandemic reality as experienced by the Diarists. The distinctive themes generated by the software allow researchers to navigate their analysis in a time-efficient and empirical way.
If you want to apply these solutions to your data contact Justyna Robinson and/or consider joining a NCRM training workshop on
“Meaning extraction from large text data: Thematic analysis via corpus linguistics”
Date: 9 June 2025 via zoom
Registration: https://www.ncrm.ac.uk/training/show.php?article=14155
Biography
Dr Justyna Robinson is a Director of Concept Analytics Lab at the University of Sussex. She researches meaning in language and is interested in methods of analysing meaning empirically. Her publications focus on ways of researching meaning from historical perspectives (2012), from cognitive angles (2014), using socio-demographic information and other text metadata (2012, 2022), using corpus and statistical methods (2014, 2022). She researches meaning represented by words (2010), concepts and themes (2017, 2023). With the research team at Concept Analytics Lab, she delivered a range of projects investigating current meanings of loneliness, aging, UK trade deals post Brexit, political manifestos, recycling practises, or post-covid behaviour changes. Contact: justyna.robinson@sussex.ac.uk
[1] The statistical processes for determining the key terms are discussed in the blog post here.