Chapter 6 Preregistering qualitative research

This chapter will cover items that are commonly included in (pre)registration forms for qualitative research. (Pre)registration is the process of freezing your project and relevant documentation about the project before, during, and/or after the project.

Open Science frameworks, such as the Open Science Framework (https://osf.io), generally freeze all files in your project repository if you register your project. This preserves that state of your project, along with what you entered in the (pre)registration form, for future reference. In a way, you’re facilitating playing archeologist with your project.

6.1 Types of registration

6.1.1 Preregistration

The most common type of registrations is preregistration, where you freeze your plans and expectations beforehand. This has a number of benefits. One is that it’s very helpful to avoid fooling yourself later on. For example, the process of collecting and interacting with the data shapes and changes how you see (part of) the world. Because human memory is pretty fallible and a far cry from a video or a computer’s hard disk (which usually allow perfect recollection of stored data), these changes in how you see the world bias your memories.

It is easy to erroneously be convinced that you already expected some findings - or conversely (and perhaps worse), that patterns you observed in the data represent new findings, rather than a reflection of your prior expectations. In qualitative research (and, by the way in quantitative research), the choices you make during (the design of) your data collection and (the design of) your analyses are influenced in part by your expectations, being up-front about them before your data collection starts is very helpful to get a grip on that influence.

6.1.2 Repeated registration

Because expectations and insights change, and often, during a project, it is necessary or beneficial to change your approach, it can be helpful to repeatedly register your project. A straightforward example is to freeze a registration when you submit your request for ethical approval; and then freeze it again when your made changes in response to comments or questions from the ethical committee; and then freeze it again just before you start data collection.

Another example is freezing a registration halfway during your data collection so that you can document changes in data collection procedures (e.g. you might add questions to the interview scheme). Another example is freezing a registration when you submit a manuscript to a journal; and another when you resubmit after having revised the manuscript, potentially having done additional or different analyses.

This process allows you to keep track of your changing expectations, representations, and analytical choices, as well as facilitating insight into such changes over multiple projects.

6.2 Common registration form items

6.2.1 Prior expectations and/or initial code structure

Qualitative data is usually coded by humans because as yet, computers are unable to effectively process language and extract meaning. Language is messy and often ambiguous, and context is often vital for correct interpretation, where the amount of context taken into account can change the meaning of an expression. Humans are quite good at this, partly because our psychological machinery is very good at pattern recognition and categorization. This skill also comes at a price, though: humans categorize too enthusiastically, often seeing patterns where none exist. Using human coders is therefore both inevitable and risky.

Before coders start coding data, they often have expectations and mental representations of the bits of the world that are captured in the data. For example, coders will often be aware of the research questions in the project. Those research questions did not come about in a vacuum, but have scientific, cultural, societal, political and subjective provenance and context. Similarly, the coders are human, and as such, have representations of the world that often manifest as predispositions to categorize parts of the world in a certain manner. For researchers, those predispositions will often be shaped by the theories that they are familiar with, even if those theories are not focal in the study they code data for. These predispositions will shape how the coders will code the data: they are the lens through which coders will code the data.

Not all humans excel in metacognitive skills such as introspection, and therefore these predispositons can be hard to identify and accurately describe in a (pre)registration. One approach to facilitate this process is to use an initial coding structure with clear coding instructions accompanying each code. The advantage of this approach is that it provides a vehicle for explicitly discussing the framework that will be used for coding, simultaneously aligning coder’s lenses and making the resulting ‘shared lens’ explicit.

However, not all projects lend themselves to development of such an initial coding structure. It is possible that there are no expectations at all. This is exceedingly rare – given how hard metacognition can be, and given that humans automatically categorize the world, if researchers think they have no prior expectations it may well indicate a lack of awareness of their implicit expectations – but it can happen, for example if the coders are not researchers, have no theoretical knowledge or naive mental models of the substance to be coded, and are unaware of the goals of the study and the research questions.

A more common scenario is that the project was designed to use multiple coders that will code inductively. This can be useful to explore variation in the coding structures that are developed by the independent coders, and benefit from the discussions that serve to reconcile those differences into one unifying coding structure (that is then often better understood and justified, provided the discussions are comprehensively documented).

In that scenario, each coder will however still have prior expectations and categorization tendencies that will shape how they will code the data. It is likely these will manifest in the coding structure each coder produced. The patterns in those coding structures can then easily be misunderstood as conveying characteristics of the data, whereas in fact, they represent expectations that existed prior to, and independent of, the data collection and coding. If the coders share the same background, this is even more risky: the convergences in their coding structures can easily be misinterpreted as evidence of the ‘truthfulness’ of the patterns represented by those parts of the coding structure.

This risk of bias can be ameliorated by having each coder make their expectations, and if possible, their initial coding structure, explicit individually and independently. Ideally, they include coding instructions, just as you would for an initial coding tree if you engage in deductive coding to enable replication by other coders.[^One could argue that, given the inevitability of pre-existing expectations and categorization tendencies, coding by humans is never truly inductive – and therefore, acknowledging the deductive aspects is a powerful tool to avoid fooling yourself during analysis and interpretation of the data.] This approach requires one of the project member who will not themselves engage in coding to compile these initial expectations or coding structures and add them into the preregistration form just before it is publicly frozen. This safeguards the independence of the coders (i.e. if these initial expectations or coding structures are shared between coders, they will no longer be independent).