Chapter 20 Using the ROCK for Decentralized Construct Taxonomies

20.1 The importance of clear coding instructions

When studying humans, one must deal with the somewhat challenging fact of life that one often does not study natural kinds. The objects of study are generally variables that are assumed to exist in people’s psychology, usually called constructs. Those constructs are not assumed to exist as more or less modular, discrete entities (Peters & Crutzen, 2017). Instead, these constructs concern definitions that enable consistent measurement and consistent manipulation of certain aspects of the human psychology, without the pretense that the constructs are somehow clearly distinguished from other constructs.

As a consequence, data collection and analysis in research with humans differs fundamentally from data collection in sciences that do deal with natural kinds. Specifically regarding qualitative data, this lack of natural kinds further complicates the challenges that come with having humans code rich, messy data. Human perception and processing are flawed enough as it is: humans are, fortunately, no robots or artificial intelligences. The same capability for reading meaning and context in qualitative data that makes humans indispensible in coding also comes with risks. Without the existence of discrete, modular, objectively existing entities to code, the coding instructions become the only tangible foothold coders can rely on.

Therefore, being able to engage in the scientific endeavour with any degree of consistency over studies requires unequivocal communication about the constructs under study (Eronen & Bringmann, 2021). However, many theories do not provide sufficiently explicit definitions of the described constructs. Instead, there is often much room for interpretation: room that manifests as heterogeneity in constructs’ definitions, operationalizations, and instructions for coding the constructs.

In the broader sense, it has been argued that this heterogeneity is a feature, not a bug (Devezer et al., 2019; Muthukrishna & Henrich, 2016; Ogden, 2016; Zollman, 2010). Heterogeneity in construct definitions and instructions for coding is not problematic, and to a degree, it is inevitable for different people to have different definitions of constructs, and therefore to code data differently.

However, if a group of researchers collaborates on a study, or if different groups of researchers aim to contribute to a cumulative knowledge base about a topic, such heterogeneity does becomes problematic if it remains unknown. The problem is illusory agreement about what exactly is being studied. If researchers fail to explicitly discuss their definitions and the corresponding coding instructions, it is very easy to remain under the impression that everybody has the same definitions and coding parameters in mind. Such illusory agreement is problematic within a group of collaborators doing a study and prevents knowledge accumulation of multiple studies.

For a lot of qualitative research, therefore, having comprehensive coding instructions that accompany comprehensive definitions of the constructs being coded (and often, corresponding procedures for obtaining qualitative data relating to those constructs) is one of the most important parts of qualitative research. Without the ability to unequivocally refer to specific construct definitions and the corresponding coding instructions, there is no guarantee that all involved researchers are coding the same constructs, even if everybody uses the same names.

20.2 Introduction to Decentralized Construct Taxonomies

To facilitate unequivocal references to specific definitions of constructs, combined with coherent instructions for operationalisation and coding, Decentralized Construct Taxonomy specifications (DCTs) were developed. DCTs are simple plain text files in the YAML format that specify, for one or more constructs:

  • A unique identifier for the construct, the Unique Construct Identifier (UCID);
  • A human-readable label (title / name) for the construct (which doesn’t need to be unique, as the identifier is already unique);
  • An exact definition of the construct;
  • Instructions for developing a measurement instrument to measure the construct;
  • Instructions for coding measurement instruments as measurement instruments that measure this construct;
  • Instructions for generating qualitative data pertaining to this construct;
  • Instructions for identifying when qualitative data pertains to this construct and then coding it as such.

20.2.1 Consistency over studies

DCT specifications can easily be re-used in different studies, for example in all studies in the same lab, in the same faculty, or organisation. To this end, a decentralized repository has been created. One instance that contains a number of DCT specifications for psychological constructs is available at https://psycore.one (this stands for Psychological Construct Repository). You can look at the coding instructions for the constructs available in that repository at https://psycore.one/coding-qualitative-data. Each of the DCTs in this repository has a unique URL with the construct’s definition and corresponding instructions, for example https://psycore.one/construct/?ucid=expAttitude_evaluation_73dnt5z2.

Any study can re-use these DCTs by listing their Unique Construct Identifiers (UCIDs; for this last example, expAttitude_evaluation_73dnt5z2) and including the associated specifications with articles about the study (see below for the explanation of the file format). For constructs that have already been published in a repository, the unique URLs can also be included to provide a more userfriendly interface (note that since the repositories may not be persistent, it remains important to include the files with the DCT specifications).

20.3 Creating a DCT

20.3.1 Thinking about constructs

Creating a DCT requires knowing which construct you want to describe and what exactly the construct is and is not. This seems trivial - most researchers working with constructs (e.g., psychologists) rely on the assumption that they have sufficient tacit knowledge of the constructs they work with. However, because this knowledge never has to be made explicit, this assumption is never tested. Producing a DCT for a construct confronts one with exactly how much one knows about a construct. Based on our experience, this is usually depressingly little.

The reason for this is that theories and the textbooks describing them usually do not provide clear definitions, either. In fact, that is one of the causes of the heterogeneity that exists. To a degree this is inevitable because constructs are not directly observable, and often do not represent natural kinds. But to a degree it can be remedied - by being very explicit about a construct’s definition, by producing a DCT. Thus, while producing a DCT may not necessarily be easy, it is definitely worthwhile.

When creating DCTs, it is important to keep in mind that there are no objectively wrong or right “answers”. After all, the constructs do not correspond to natural kinds. Various definitions can co-exist without any of them being wrong or right. In fact, since the constructs do not correspond to more or less discrete or modular entities anyway, one could argue that they are all ‘wrong’ (or are all ‘right’). Given that at present, most constructs lack clear, explicit definitions, any explicitation is progress. And DCTs can always be updated or adjusted by updating their UCID. If you end up iterating through several versions, that’s clear evidence that there was room for improvement in your original, implicit, definitions.

When creating a DCT, it doesn’t matter where you start. If you have a pretty clear idea about the construct’s definition, you start by making that explicit. But it’s possible that while there are a number of measurement instruments for the construct (e.g. questionnaires), there is no clear definition available. In that case, you can start with the measurement instruments, too, and first complete the instruction for developing measurement instruments by deriving common principles from the measurement instruments you have.

In any case the process will be iterative. Eventually, you will complete at the definition of the construct, and probably at least two of the instructions (either the instruction for developing measurement instruments and for coding measurement instruments; or for developing manipulations and for coding manipulations; or for eliciting (‘developing’) qualitative data and for coding qualitative data). As you complete these sections, you will probably need to update other sections to make sure everything stays coherent.

On the surface, producing a DCT just consists of putting stuff in words. After all, you just need to type in the construct’s name, definition, and add the instructions that allow you (and others) to work with the construct. This can be done within an hour. Most time is not spent on specifying the DCT in a file, but on arriving at definitions and instructions that you and your colleagues agree on. However, that is time well-spent.

By discussing the constructs you work with and the varying definitions that everybody uses, you achieve consensus. If you don’t manage to achieve consensus about a given construct, that’s fine of course - simply create two DCTs for two different constructs. You can even give them the same name - as long as they have different identifiers (UCIDs). If after these discussions, all researchers and their supervised students within your lab use the DCTs you produced, all research will be consistent. Of course, researchers without DCTs will often assume such consistency as well. And if they are right, the process of producing DCTs should be effortless. If the process proves more cumbersome, clearly it was necessary.

20.3.2 Description of edge cases

Clear definitions are most valuable when edge cases are encountered. For example, most people will have little difficulty in identifying ‘chairs’ and agreeing whether an object is a chair even without first explicitly communicating about and calibrating the definitions they use. It is with edge cases such as seating furniture with one, two, or three legs, or furniture that seats two or three people, where unclear definitions become problematic.

For example, a definition of a chair could be “A piece of furniture designed to support a sitting human”. In this case, a bicycle would fall under this definition, and in a qualitative study, would therefore be coded as a [[chair]]. This example is easily solved by updating the definition to “A piece of static furniture designed to support a sitting human”. However, in this definition, a bar stool with one leg would also be coded as [[chair]], which in this case might fall beyond the intended definition. Describing all specific edge cases explicitly in the definition may make the definition unwieldy.

Therefore, the specific instructions in a DCT normally discuss edge cases explicitly, referring the user to alternative codes where appropriate. For example, the coding instructions for coding a piece of qualitative data as [[chair]] could include the sentence “Note that furniture without back and arm support and having three legs or less should not be coded as [[chair]] but instead as [[stool]].”.

Thus, coding instructions are often most useful if they do not only describe the core of a construct, but if they pay special attention to the periphery of a construct’s definition. Coding errors often concern ambiguity, and coding instructions should not add to this ambiguity.

20.3.3 Creating a DCT file

To create a DCT file, you can use any software that can create plain text files, such as Notepad, Textedit, Notepad++, BBEdit, Vim, Nano, or the RStudio IDE. A DCT file contains one or more DCT specifications, delimited by a line containing only three dashes (“---”). This is an example of an extremely simple DCT specification:

---
dct:
  dctVersion: 0.1.0
  version: 1
  id: chair_75vl264q
  label: "Chair"
  definition:
    definition: "A piece of furniture designed to support a sitting human."
  measure_dev:
    instruction: ""
  measure_code:
    instruction: ""
  aspect_dev:
    instruction: ""
  aspect_code:
    instruction: "Objects that have legs and a surface that was designed for humans to sit on. Note that if the object is in use, the surface's height should be such that most humans can put their feet flat on the ground while sitting on the object."
  rel:
    id: furniture_75vl25k8
    type: "semantic_type_of"
---

This example only specifies the UCID, name (label), definition, and instructions for coding, as well as one relationship to another construct with UCID “furniture_75vl25k8” that this construct is apparently a type of. These relationships are parsed when the rock package reads a set of DCT specifications, and they are used to build a hierarchical tree of constructs (i.e. a deductive coding structure). You could omit these relationships of course, if you will not need to collapse codes or fragments based on higher levels in the hierarchy.

Instad of writing the DCT file by hand, you can also use the R package psyverse, specifically the functions psyverse::dct_object() and psyverse::save_to_yaml(), to create and save a DCT specification. For example, the above DCT specification can also be created using these two commands (after you installed the psyverse package):

myDCTspec <-
  psyverse::dct_object(
    prefix = "chair",
    label = "Chair",
    definition = "A piece of furniture designed to support a sitting human.",
    aspect_code = "Objects that have legs and a surface that was designed for humans to sit on. Note that if the object is in use, the surface's height should be such that most humans can put their feet flat on the ground while sitting on the object.",
    rel = list(list(id = "furniture_75vl25k8",
                    type = "semantic_type_of"))
  );
myDCTspec_asYAML <-
  psyverse::save_to_yaml(
    myDCTspec,
    file = tempfile()
  );

20.4 Coding with DCTs

When coding with DCTs, you code slightly differently than when you code without DCTs. Regular codes are simply delimited by two square brackets, e.g. [[chair]]. However, if you use DCTs, you specify this in the code: [[dct:chair_75vl264q]]. You can still combine this with inductive coding, for example for indicating that an important subtype of chairs are the thrones: [[dct:chair_75vl264q>throne]]. Like normal inductive codes, you can keep on nesting such subcodes infinitely to indicate ever more precise subconstructs, if need be (although one level will usually suffice).

20.5 Analysing DCT-coded sources

References

Devezer, B., Nardin, L. G., Baumgaertner, B., & Buzbas, E. O. (2019). Scientific discovery in a model-centric framework: Reproducibility, innovation, and epistemic diversity. PLOS ONE, 14(5), e0216125. https://doi.org/gf86cs
Eronen, M. I., & Bringmann, L. F. (2021). The Theory Crisis in Psychology: How to Move Forward. Perspectives on Psychological Science, 1745691620970586. https://doi.org/ghw2x3
Muthukrishna, M., & Henrich, J. (2016). Innovation in the collective brain. Philosophical Transactions of the Royal Society B: Biological Sciences, 371(1690), 20150192. https://doi.org/gfzkmd
Ogden, J. (2016). Celebrating variability and a call to limit systematisation: The example of the Behaviour Change Technique Taxonomy and the Behaviour Change Wheel. Health Psychology Review, 10(3), 245–250. https://doi.org/gh25r5
Peters, G.-J. Y., & Crutzen, R. (2017). Pragmatic nihilism: How a Theory of Nothing can help health psychology progress. Health Psychology Review, 11(2). https://doi.org/10.1080/17437199.2017.1284015
Zollman, K. J. S. (2010). The Epistemic Benefit of Transient Diversity. Erkenntnis, 72(1), 17–35. https://doi.org/dnjk7f