Researching MO: Mass Observation Covid-19 collection: What does it mean for researchers?

As we are now five years on from the first Covid-19 lockdown in the UK, we asked  Justyna Robinson (“Mass Observing Covid-19” project academic advisor) how researchers can harness the knowledge presented in Covid-19 collection. The collection includes texts discussing life during the Covid-19 pandemic as well as rich metadata, which consists of biographic information about the Writers-Observers. The exploration of this data can take various avenues. In this blog post we outline one of the possible approaches of reading the data, namely distant reading.

Who are the Observers of the pandemic life?

Let’s focus on the day diaries collected on the 12th May 2020. The biographic information about the Diarists is presented in Figure 1.  Women are the main contributors to that collection. This is not an unusual finding as similar gender distribution is seen across the Mass Observation Archive (MOA). When it comes to the decade of birth, the typical male Diarists tend to be younger than their female counterparts. A wider MOA also indicate a bias towards a South-East-based Observers and those from higher socioeconomic backgrounds. Thus, conclusions drawn on trends observed in the content of diaries need to consider the universe of speakers who are represented in the collection.

Figure 1: The May 2020 diarists by gender and decade of birth

What are the key terms of the Covid-19 collection?

The arrival of the pandemic brought a dramatic increase in the usage of many words and phrases such as lockdown, home schooling, social distancing etc.  Many of those terms were picked up by media and discussed as terms encapsulating pandemic reality, such as covidiot (see here). But what were characteristics of the pandemic life for every-day people? The Covid-19 collection can reveal the world view of Observers through the key terms they used. In order to identify the key terms of the 12th May 2020 we compare language in those diaries against language of the same diaries produced 2010-2019.[1] The results of this exercise are presented in Table 1 where both individual keywords and key multi-word terms are ranked in columns.

Table 1: The key words and multi-word terms from the May 2020 diaries

RankKeywordsKey multi-word terms
1LockdownSocial distancing
2ZoomFace mask
3VirusZoom meeting
4PandemicVideo call
5Covid-19Boris Johnson
6CoronavirusDaily exercise
7CovidKey worker
8MaskDaily walk
9DistancingSchool work
10FurloughVE day
11BorisCorona virus
12RestrictionSocial distance
13MetreCurrent situation
14JohnsonCovid-19 pandemic
15IsolateFurlough scheme
16IsolationStay alert
17ShieldStrange time
18CoronaZoom call
19Covid19Coronavirus pandemic
20QuarantineNormal people

The words and phrases in Table 1 speak to the ‘aboutness’ of the May 2020 diaries. The results reflect salient themes of the pandemic life and serve to characterise that period, such as mask, furlough, isolate, and social distancing. Additionally, the Diarists provide their unique perspective of how they experienced the pandemic.  The newness of the situation is evident in naming of the SARS‑CoV‑2 as the Diarists used a range of terms for the virus, such as Covid, Covid-19, Corona, Coronavirus (pandemic). This linguistic behaviour typically signals a new phenomenon for which language users are still at the stage of negotiating a name.

The keyterms also show that traditionally mundane activities such as daily exercise, daily walk, school work become a highlight of the day, so do the calls, either video or zoom calls. There is a sense of nowness and an uncertain future emerging through terms such as stay alert, current situation. Making sense of the pandemic reality proves difficult as Observers try to categorise experiences that seem to them as normal from those that do not, see terms normal people vs strange time.

What do these key terms mean for Covid-19 Observers?

Further insight into the lockdown diaries can be gleaned from exploring the meaning of these terms through their collocates, i.e. words that they typically co-occur with. For example, the collocates of zoom presented in Figure 2 tell us how and in which contexts people used zoom during this period.

Figure 2: The terms that modify zoom in the May 2020 diaries

Diarists used zoom is used in the context of work, e.g.  meeting and conference, as well as education, e.g.  tutorial, lesson, and class. The use of zoom for recreational and social purposes is also evident through the collocates rehearsal, pilate(s), quiz, get-together, and party.

Another question one can ask about the Covid-19 data is how the collective meaning of these diaries differs to the collective meaning of the diaries from the previous decade. In order to answer this question we move from analysis of individual words to concepts, that is, clusters of words that share a core meaning. At Concept Analytics Lab based at the University of Sussex we developed software that identifies key themes of a text.  The software extracts and visualises conceptual profile of texts against a references text. Figure 3 illustrates the conceptual profile of the responses to the Mass Observation’s 12th May diaries from 2020 in comparison with the diaries from 2010-2019. The size of each node represents the raw frequency of a concept, while its colour represents how distinctive of the data set each concept is, relative to the past diaries, with darker colours representing greater distinctiveness.

Figure 3: The conceptual profile for the May 2020 diaries

The conceptual profile serves as a compass to the explorations of the data, pointing the researcher in the direction of conceptual distinctiveness, sacrificing the readability of less distinctive concepts. Figure 3 shows that relative to the previous ten years, in 2020, Diarists were less likely to discuss physical entities, but were more likely to discuss abstractions. Particularly salient abstractions include government, lockdown, rule, exercise, walk, health, and delivery. There are specific examples of physical entities that buck this trend. For example, garden, home, call, and virus are abstractions that are distinctive of the 2020 diaries. These findings are a testimony of the pandemic reality as experienced by the Diarists. The distinctive themes generated by the software allow researchers to navigate their analysis in a time-efficient and empirical way.

If you want to apply these solutions to your data contact Justyna Robinson and/or consider joining a NCRM training workshop on

“Meaning extraction from large text data: Thematic analysis via corpus linguistics”

Date: 9 June 2025 via zoom

Registration: https://www.ncrm.ac.uk/training/show.php?article=14155

Biography

Dr Justyna Robinson is a Director of Concept Analytics Lab at the University of Sussex. She researches meaning in language and is interested in methods of analysing meaning empirically. Her publications focus on ways of researching meaning from historical perspectives (2012), from cognitive angles (2014), using socio-demographic information and other text metadata (20122022), using corpus and statistical methods (20142022). She researches meaning represented by words (2010), concepts and themes (20172023). With the research team at Concept Analytics Lab, she delivered a range of projects investigating current meanings of loneliness, aging, UK trade deals post Brexit, political manifestos, recycling practises, or post-covid behaviour changes. Contact: justyna.robinson@sussex.ac.uk


[1] The statistical processes for determining the key terms are discussed in the blog post here.

Discover more from Mass Observation Archive

Subscribe now to keep reading and get access to the full archive.

Continue reading