Chapter 10 The ROCK vocabulary
In the plethora of qualitative approaches, many different terms exist and often partially overlap. The ROCK standard uses the terms listed below. Below this section, we included a list of definitions of ENA terms.
10.1 ROCK terms
- A property or characteristic of a class instance (e.g. a participant), for example demographic variables such as interviewee age, gender, education level. Attributes can also be characteristics of instances of classes that are not persons, such as interview venue (an attribute can be e.g. whether it was crowded or not) or interviewer (an attribute can be e.g. the interviewer’s age).
- A data provider, such as a participant in a study. Cases are a class, with every individual case being a class instance.
- class instance
- A data provider, context, or other description of data collection. In interview studies, a class instance is usually a specific person. Assigning utterances to class instances is a means to efficiently associate attributes to many utterances in one go. Class instances can also be used to associate other information to many utterances, such as the interviewer, the place where an interview took place, or the time of day. Examples of class instances are “participant 4” to identify a person, “14:00” to identify the time of an interview, “meeting room B” to identify the location of an interview.
- class instance identifier
- The unique identifier for a class instance. Examples of class instance identifiers in the ROCK are
[[timeId:morning]], where the first part denotes the class (e.g.,
cidstands for ‘case identifier’) and the second part identifies the instance of that class (e.g.
participant_Acan refer to participants in a study). Class instance identifiers can be ephemeral or persistent: if persistent, they will apply not only to the utterance where they occur, but also to all following utterances in the same source until a new class instance identifier for the same class is encountered.
- A symbol that represents data fragments that are somehow similar, that similarity being the essence of the relevant code. Such a code usually represents a concept. Codes can vary from simple descriptions, for example to denote that the coded fragment concerns a topic such as “leisure activities,” to complex constructs, for example to denote that the coded fragment likely espresses psychological aspects that fall within the definition of a construct called “perceived autonomy.” Codes are represented in a source using a code identifier. In addition to that (machine-readable) identifier, codes commonly have a (human-readable) short label, a longer description, and coding instructions, ideally including examples.
- In the ROCK standard, all utterances starting with a hash,
#, are considered comments and are ignored. Note that the hash may not be preceded with whitespace: it has to be the first character in the utterance (usually, given the default utterance marker, this means the first character on each line).
- code delimiter
- A combination of two text strings that enclose code identifiers in coded sources to indicate which utterances are coded with which codes. In the ROCK standard (and in the default
rockR package settings), the code delimiters are pairs of square brackets:
- code identifier
- A brief identifier to uniquely identify a code. Code identifiers are used to represent the corresponding code when coding sources: they are then enclosed with the code delimiters (
]]by default, e.g.
[[code]]). Note the difference with coder identifiers, which represent coders instead of codes; and remember that like all identifiers, coder identifiers may only contain Latin letters, Arabic numerals, and underscores (and have to start with a letter).
- code structure
- A set of codes in a given organizational mode. Three common organizational modes are a flat code structure (i.e. no structure), a hierarchy, and a network.
- code tree
- If a hierarchical organizational mode is used, the code tree represents the hierarchy of codes used to code one or more sources.
- code value
- A code value is a way to efficiently attach values to codes. They consist of a (potentially hierarchically marked) code identifier immediately followed by two pipe characters (
||) and the relevant value, delimited with the code delimiters. For example, code values can be used to code the intensity of a statement using
[[intensity||low]], or the valence of a statement using
[[valence||neutral]]. When parsing the coded sources into the qualitative data table, where regular codes yield
1s denoting whether a given utterance was coded with that code code, code values yield the specified value in the corresponding column. Note that this functionality is somewhat similar to using attributes to efficiently attach information to utterances. The difference is that attributes are applied to all utterances coded with the corresponding class instance identifier, and the most commonly used class instance identifiers (case identifiers, coder identifiers, and item identifiers) are set as persistent identifiers, which means they normally are automatically applied to many utterances. Code values, on the other hand, can be used to attach values to single utterances.
- coder identifier
- A unique identifier for each coder. These are typically used when using multiple independent coders. By default, when used in a source, they are delimited using the code delimiters (
]]in the ROCK standard) and are preceded by
coderId=. For example, the coder identifier “coder_1” would be used in a source as
[[coderId=coder_1]]. Note the difference with code identifiers, which represent codes instead of coders; and remember that like all identifiers, code identifiers may only contain Latin letters, Abaric numerals, and underscores (and have to start with a letter).
- A part of a source (one or more consecutive characters, such as one or more words, sentences, or paragraphs). A data fragment can be considered somewhat akin to a ‘data point’ in a tabular data set, except that data fragments have not set size, and in that sense are recursive (i.e. data fragments contain other data fragments, e.g., paragraphs contain sentences). Data fragments are useful to refer to parts of a qualitative dataset completely independent of any segmentation that may have been applied.
- hierarchy marker
- A symbol that represents hierarchical levels and is used to separate code identifiers when they are added to a source. In the ROCK standard (and the default R
rockpackage settings), the hierarchy marker is the greater-than symbol,
- A unique character sequence that uniquely identifies something. Identifiers always have to start with a lowercase or uppercase Latin letter (
A-Z) and may only contain Latin letters, Arabic numerals (
0-9), and underscores (
rockR package will often use identifiers as variable names: for example, code identifiers become variable names in the qualitative data tabel. Examples of well-known identifiers are Uniform Resource Locators (URLs, commonly used for websites); Digital Object Identifiers (DOIs, commonly used for scientific articles); and the International Standard Book Number (ISBN, commonly used for books). The ROCK implements a way to generate and specify identifiers for utterances and a way to add other identifiers to a source, such as for codes and for class instances. In the ROCK, when the term “identifier” is used without specifying what type of identifier is meant, usually class instance identifiers are meant, such as identifiers for cases, coders, or items.
- nesting marker
- A symbol that represents nesting (or threading) in a source. In the ROCK standard (and the default R
rockpackage settings), the nesting marker is the tilde,
~, and has to be placed as the first character in an utterance (disregarding whitespace and the utterance identifier). For example, to denote that an utterance is a response to the preceding utterance, start the second utterance with a tilde (e.g.
[[uid:xxx]] ~ This is a response). If the following utterance also starts with a tilde, it is considered to also be a response to the first utterance; if it starts with two tildes, the nesting deepends and that third utterance is considered to be a response to the second utterance (e.g.
[[uid:xxx]] ~~ This is a response to the response).
- persistent identifiers
- Whereas regular identifiers are only applied to the utterance where they occur, persistent identifiers are automatically applied to all following utterances until the end of the source or until a persistent identifier for another instance of the same class is encountered. For example, by default, case identifiers, coders identifiers, and item identifiers are configured to be persistent. This means that if the case identifier for a given participant is applied to an utterance, all subsequent utterances will be considered to belong to that same participant until another case identifier is encountered.
- A delimited fragment of a source, also called segment or stanza.
- section break
- A break between two sections. In other words, section breaks split up sources into sections. The ROCK standard allows parallel use of multiple types of section breaks: for example, one type of section break can indicate paragraph breaks, whereas another type of section break can indicate where an interviewer asks a new question, and yet another type can indicate where there is a turn of talk between participants in a discussion. In the R
rockpackage, by default anything matching the regular expression
---<<[a-zA-Z0-9_]+>>---is considered a section break, where the sequence in between the smaller than signs and the greater than signs means “one or more of a Latin letter, an Arabic numeral, and an underscore” (i.e. the same pattern that holds for identifiers).
- section break identifier
- A sequence of characters that represents a section break.
- See section.
- A plain text file that describes or captures a bit of reality. The most common sources in research with humans (e.g. anthropology and psychology) are interview transcripts, but sources can also be internet content, archive materials, meeting minutes, descriptions of photographs, or timestamped descriptions of video material. Note that a source does not necessarily correspond one-on-one to any sensible delineation of reality. Often, each separate source will hold the data from one interview, but this is not necessary. Many interviews can also be combined in one source, and one interview can also be distributed over many sources. To distinguish, for example, interviews, use class instances. A source is simply the text file holding one or more utterances.
- The shortest codable fragment of a source. They are separated by utterance markers. In the ROCK standard (and the default R
rockpackage settings), these are line breaks (“
\n”), which means that each utterances is on a line of its own in a source. Utterances will often (but not necessarily) correspond to sentences. Other examples of utterances are words, clauses, phrases, or other constituents, paragraphs, or social media posts.
- utterance marker
- The sequence of characters that delimitrs utterances. In the ROCK standard (and the default R
rockpackage settings), these are line breaks (“
- utterance identifier
- A unique identifier for an utterance. These are used to match utterances when multiple independent coders are used. By default, when used in a source, they are delimited using the code delimiters (
]]in the ROCK standard) and are preceded by
uid=. For example, the utterance identifier (“UID”) “7fgglz2n” would be used in a source as
[[uid=7fgglz2n]]. The R
rockpackage has functions to automatically attach UIDs to sources. UIDs are normally placed at the beginning of each utterance.
- Invisible characters such as spaces and tabs.
10.2 ENA terms
- The occurrence of two or more codes attached to the same utterance.
- In an ENA network, an edge represents normalized co-occurrence of codes.
- In an ENA network, a node represents a code.
- Normalization refers to the division of a vector of co-occurrences by the number of utterances … ? Or does it?
- A vector is a sequence of one or more data points (e.g. numbers), for example a series of
1s denoting whether a series of utterances were coded with a given code (in which case the vector length is equal to the number of utterances), or a series of frequencies denoting the number of co-occurrences of two codes in a set of sources (in which case the vector length is equal to the number of sources).