Chapter 7 Coding

Coding is the process of attaching codes to data fragments. This process is a tool to organize the patterns you identify in the data. On the one hand, attaching codes to data fragments enables you to organize the data at the hand of those codes, for example by collecting all data fragments coded with a given code. On the other hand, attaching codes to data fragments produces a representation of the patterns you identified.

In addition, an advantage of coding is that it allows you to arrive at your conclusions in a transparent, reproducible manner. Without coding, after having conducted interviews, watched or listened to recordings, or otherwise processed qualitative data, researchers often also get an idea of what these data convey about the concept(s) they are studying.

However, as discussed in the chapter about developing initial code sets, humans are very bad at identifying patterns, because humans are very good at identifying patterns. Without a systematic procedure, it is very hard to separate cases where a researcher’s claims mainly reflect their biases from cases where their claims reflect patterns in the data.

Like developing initial code sets, a formal coding phase is a tool to try to avoid fooling yourself. In general, this is achieved by constraining when you are allowed to draw conclusions about your research questions, and coding achieves this restrictions of your researcher degrees of freedom. It does this especially effectiveness if you make the most of the opportunities it offers to justify your decisions. This will be discussed more in depth in the Justifying section below.

7.1 Coding

Because codes are attached to data fragments, this means that coding first requires segmentation: segmenting the data into some smallest codable unit, for example sentences, letters, seconds, exchanges, posts, or words. Segmenting is a process that shares many characteristics with coding, but has vastly different implications methodologically. It is therefore discussed more in depth in its own Segmenting section below. For now, we will focus on the process of attaching codes to data fragments.

In this chapter, we will assume you code textual data, so that we can just use the verb “to read” instead of having to repeat “reading / listening to / watching” every time (and still missing modalities, since qualitative data can in principle also be olfactory or tactile).

The process of coding, then, starts with reading through the data. In this phase, you will read the data extensively, both linearly (the way you will probably start), and increasingly, non-linearly (once you have coded a set of data fragments and you want to selectively look at data coded with specific codes).

Depending on a number of factors such as the type of data, the complexity of the phenomena you’re studying, and whether the data represents a registration of events that you yourself were present at, you may want to refrain from applying codes the first time (or even first few times) around. The easier your coding task, the earlier you can start applying codes, and in very straightforward situations (e.g. where you only code deductively with comprehensive coding instructions) you may even be able to apply codes on your first reading. Usually, however, you’ll want to develop an understanding of the breadth of the data before starting to make coding decisions.

The act of coding typically consists of taking a great number of successive decisions, each of which is a choice between the following alternatives, that have been carefully selected to all start with an “A”:

  • Abstain from coding
  • Accommodate a data fragment
  • Assimilate a data fragment
  • Add a code
  • Adapt the code structure

You can also decide to do several of these things in response to your having read a given data fragment. In addition, in consecutive coding iterations, you can decide different things for the same fragment as your views on the data as well as your code structure, code descriptions, and corresponding coding instructions change.

7.1.1 Abstaining from coding a data fragment

This is the simplest decision: you decide that the data fragment you just read is not relevant given the phenomena you’re interested in. In this case you simply move right along, not applying any codes. However, depending on how trivial your decision was, it can be a good idea to document your reasoning.

If it was an edge case (i.e. you can also see why you would code the fragment), this is especially important. Document your reasoning makes your process transparent to your collaborators, other researchers, and future you. After all, the decision to not code a data fragment can constitute further elaboration of how you delineate the definitions of the existing codes as well as a substantive decision that some expressions are not a manifestation of the phenomena you are studying.

7.1.2 Assimilating a data fragment

If you do decide to code a data fragment, it will rarely exactly match the code description and coding instructions for a given code. However, deviations are often minor and well within the allowances you are willing to make given the ambiguity inherent in, for example, language. If you then decide to apply that code, you effectively assimilate the data fragment.

If you decide to assimilate, your code structure stays unchanged: you don’t update code labels, descriptions, or coding instructions, and you don’t restructure your code structure. The coded fragment only becomes a part of the results of your analysis because by being coded with a code it becomes an example of the manifestation of the concepts captured by that code.

Typically, the decision to assimilate a data fragment enacts the coding instructions for the corresponding code. As such, assimilation is often a relatively uneventful procedure, and one of the few cases where documenting the justification for that decision is redundant.

7.1.3 Accomodating a data fragment

Alternatively, you may decide that what is expressed in a data fragment is cause to revise one or more of your codes. The data may help you refine the label, description, or coding instruction for one or more codes. For example, it may express an edge case that helps you to sharpen a code’s boundaries, or even the boundaries between two or more codes.

If you assimilate a data fragment, you decide to disregard some of its idiosyncratic aspects: you effectively make the data fragment fit your code structure. Conversely, if you accommodate a data fragment, you make your data structure fit the data fragment.

Especially if you aim to code mostly inductively, accommodation is an important part of your analysis. These decisions contribute to shaping the essence of your codes and so, the concepts they capture, and so directly form your results. Because of this, the justifications of your accommodation decisions are important to document. Why exactly you decide that an edge case falls within one code or another conveys important information in addition to what can be documented in the code structure, descriptions, and coding instructions alone.

7.1.4 Adding a code

If you do decide to code a data fragment, sometimes you decide it represents something that is important for understanding the phenomenon you are studying, but that is not captured by the codes you have (given their descriptions and coding instructions). Accomodation can only go so far: accommodating a fragment may require stretching a code too far beyond the concept you intend it to capture.

In such cases you create an entirely new code. You document its label, identifier, descriptioin, and draft a first version of the coding instructions. This will often also present an opportunity to finetune the coding instructions for existing codes, so that you can locate the new code in the conceptual space of your code structure as explicitly as possible.

Code addition alters your code structure in a very tangible, salient way. Codes you add will shape the overview of your results and how you will present these to others. They clearly represent your judgment that the new code captures something that is present in the data fragment that first gave rise to the code that does not match any the existing codes in an important way.

Therefore, like when you accommodate a data fragment in your code structure, it is important to record your considerations to make them available as context to your results, both to others as well as to future you.

7.1.5 Adapting the code structure

It is also possible that you encounter a data fragment that prompts you to reconsider how to organize the code structure to reflect the phenomena you are studying. This will often involve relocating some codes, sometimes necessitates redefining them or changing the way they are described to better fit their new position.

Reorganization of a code structure is a relatively profound procedure, having a great effect on your conceptualization of the patterns expressed in the data, and as a consequence, on your results. Therefore, it is very important to comprehensively document the justification for your decision. Explicating your reasons, how you weigh arguments for and against the decision, and on the basis of which considerations you ultimately decide to restructure registers important contextual information.

7.1.6 Combining decisions

In practice, it is common that you combine multiple decisions. Often, if a data fragment prompts you to restructure your code structure, you will also change the code descriptions and coding instructions. If you add a code, you will often also change some other codes to describe edge cases with the new code, similar to how you change code descriptions and coding instructions if you accommodate a data fragment in your code structure.

In the end, then, the five decisions described above are an oversimplification of the decisions and processes during coding. However, they provide a useful vocabulary to describe coding decisions and considerations, as well as a concrete framework for engaging in coding.

7.1.7 Pre-coding: indexing

Sometimes, before you code data, you want to organize the data. This is often called indexing. Here, you don’t change your actual code structure; instead, you attach codes to data to indicate which data fragments contain information about which parts of your study. Indexing, therefore, is very superficial. You don’t make any decisions about the implications of what is expressed in the data fragments regarding the phenomena you’re interested in. Instead, you organize the data to facilitate your actual coding process.

For example, if your data consists of interview transcripts, you may want to first read and code the data question by question, reading the transcripts for each interview for a given question before moving on to the next question. Another example is if the data may frequently contain irrelevant fragments, for example in observation data where video recordings are made, and you aim to look at interactions between people, but half of the time nobody is present.

In the ROCK, for indexing you normally don’t use regular code identifiers, but class instance identifiers. You then consider indexing a class, and each class instance is a category you want to index. In the first example (indexing based on interview question), each question would get its own class instance identifier; and in the second example, you would have two class instance identifiers, for example “codable data” versus “discarded data.” How to attach these identifiers to your sources is discussed in the section about the ROCK itself.

7.1.8 A step-by-step instruction

Coding is an iterative process, so the idea of a step-by-step instructions is a bit artificial. However, to get started a concrete starting point can be very useful. Just remember that this is not “the way to code”: these steps are useful to follow, but in many situations, deviating from these steps, or following a different approach altogether, can fit better. We tried to make some straightforward occasions for deviation explicit.

  1. Read through the data (or, as we explained earlier, watch it, listen to it, feel it, smell it, or taste it). In some cases (e.g., when coding more deductively) you may want to skip this step; in others (e.g. when coding more inductively) you may want to repeat it multiple times. In this first step, you don’t make any decisions.

  2. Optionally, index the data. This is mostly relevant if there exists some superficial categorization that you can apply that doesn’t have direct bearing on the implications of what is expressed in the data for the phenomena you are interested in.

  3. Read through the data, and for each successive fragment you read (or for each segment you read, if you use segmented data), take the time to evaluate it in relation to your code structure. Decide on what you want to do. You can use the five decisions outlines above to guide this process.

  4. Once you worked through the entire data set, repeat this process as often as you feel is useful. Making sure there is some time in between each round can help to see things in a fresh light.

  5. Then, selectively collect the fragments coded with your codes and read them code by code. This provides a new, focused perspective that allows you to critically examine whether your code structure and the descriptions accurately capture the phenomena that mean them to capture.

  6. When you are

7.2 Segmenting

Segmenting is the process of dividing data into segments. Segments can have different definitions, but once defined and applied, each segment binds together data fragments that exhibit some coherence in terms of the corresponding segment definition.

A relatively simple segment definition is a sentence. For sources that are interview transcripts, another example is an interviewee’s response to each question. In discussions or dialogues, segments can be divided by turns of talk (i.e. when another participant starts speaking).

However, segment definitions can be less straightforward, too. For example, they can be based on the transition from one topic in a discourse to another topic; or they can be defined by which stimuli are attended to by a subject. In such cases, where exactly one segment ends and the other begins is much more ambiguous. Such ambiguity is similar to the ambiguity present when determining whether to apply a given code: those also often lack bright-line boundaries. However, when it comes to segmentation, the methodological implications are much more profound.

If segmentation is applied in a project, that is usually because the resulting segments fulfill a role in the analyses. For example, the segments may be required for counting code occurrences or co-occurrences in each segment because the aim is to conduct Epistemic Network Analysis. This requires the segmentation definitions that are used to closely align with the theory.

For example, if code co-occurrences are taken to imply some association between codes, which proximity of code co-occurrence does and does not count toward this inference is determined by the segmentation definition. If co-occurrence of two codes is taken to imply, for example, psychological proximity of the concepts the codes are meant to capture, this places constraints on the required proximity between the codes. It is much more credible that if a person expresses two things in the same sentence that that reveals something about how closely those two things are associated in that person’s internal representation of the world, than if the two expressions are sentences or even paragraphs apart.

Whichever segment definitions are used, if those play a role in a project’s analysis and eventually play a part in the inferences that are drawn, it becomes important to pay close attention to your segment defininitions. In some projects, using sentence-long segments may be justifiable but paragraphs are not; in other projects, paragraphs may be fine but if more than 15 minutes elapsed between two codes, considering them a co-occurrence may be untenable. What is reasonable will heavily depend on the substantive domain in which you aim to draw inferences and what is already known about the relevant mechanics. For example, when studying humans, research on short-term memory and attention processes can guide such decisions.

Because segment definitions play such a pivotal role in the inferential chain, it is wortwhile to consider how you will separate analysis results that reflect the phenomena you are interested in from analysis results that are an artefact of the segment definition you chose. For example, when analysing co-occurrences, if your segment definitions result in segments that are too large, many “co-occurrences” will not be indicative of the phenomena you mean to study.

This can also happen the other way around: if your segment definition is such that the segments are too small, expressions of your phenomena of interest will not manifest in code co-occurrences. You may still see only a part of the picture, but you may also fail to see anything of interest.

In both those situations, the inferential chain on which you rely when interpreting code frequencies and co-occurrences is broken. I it is not clear whether your analysis results pertain to the phenomena of interest, or mostly reflect the artefacts of the model you applied.

You can explore how much effect your segment definitions have on your analysis results by conducting sensitivity analyses: comparing different segmentations and inspecting how this impacts your results. This will be discussed more in depth in the Sensitivity Analyses chapter. In any case, if you plan to use segmentation in your analyses, given the impact of your segment definition on the validity of your inferences, it is even more important to document your justifications well than it is for coding decisions.

7.3 Justifying

The hardest coding decisions are the most interesting coding decisions. These decisions, where you take a decision about edge cases, are where you are challenged to thoroughly interrogate your understanding of your subject matter. To retain these insights for the benefit of other researchers as well as future you, it is important to document the justifications of your decisions.

Documenting the justifications for your decisions is especially important if you aim to work more inductively. The less your decisions are guided by a set of codes, descriptions, and coding instructions that was elaborated in advance, the more those are created during the coding phase. This means that in you analysis, the coding decisions are an important part, that therefore need to be accessible to others if you aim to be transparent as a researcher (an imperative explicitly part of the UNESCO Open Science recommendation as well as codes of conduct for scientific research, such as the Dutch Code of Conduct for Scientific Research, that mandates Honesty, Transparency, Diligence, Independence, and Responsibility).