Chapter 16 A Beginner’s Guide to the Reproducible Open Coding Kit
16.1 Welcome! Step-by-step, no prior knowledge of R needed!
Hello! If you are here, it probably means that you are interested in conducting transparent and rigorous qualitative research and/or you would like to prepare your data for Epistemic Network Analysis. The Reproducible Open Coding Kit (ROCK) can help you with that! If you have looked into what the ROCK is, you might be intimidated by the terms/expressions it uses or the software and processes it requires. Have no fear, the ROCK can be easily used by people with no prior knowledge of R, and we will prove it to you!
The following is a step-by-step account of how to work with the ROCK for a qualitative or a quantitative ethnographic research project using the rock R package. We disclose these steps assuming that you would like to make your process as transparent as possible, facilitating making your project public once (and if) you are ready to do so. Facilitating Open Science (OS) is something we care about deeply: making our processes public (including, but not limited to operationalization, data, full results, etc.) benefits science in general (as we can learn much from each other’s best practices and challenges) and increases reliability, reproducibility, and interoperability (read more about OS here). OS is not a dichotomy (open or closed), but a spectrum; so, if your project requires the data to remain private or your analysis cannot be fully disclosed, the ROCK package can still help you perform various tasks within your project and disclose as much as ethically possible. (Read more about ROCK functionality here).
Using the ROCK necessitates some preliminary steps, which will be covered in the first section, Setting up the Basics, requiring you to download some software and create a repository. The section on Data Preparation will help you specify your directory, add your sources and attributes, and designate your persistent identifiers. The Coding and Segmentation section will guide you through the process of coding and segmentation with iROCK and with automated coding. The last section is Aggregation and Analyses, which helps you aggregate coded data and perform various analyses.
For the purposes of this tutorial, we assume that you have already decided on a research question, have agreed on a research design, and have specified the operationalization of your project. We urge you to fill out a preregistration form for your project before you begin this tutorial (QE-specific preregistration form draft coming soon!). Even if you do not submit it, this form will help you consider vital aspects of your initiative and help you discuss those with team members. In the tutorial, we presume you have the following aspects of your project operationalized/collected/developed and ready to employ: * Sources (raw data in one or more files, anonymized and ready to be coded) * Metadata/Attributes (data about your data; that is, collected variables on your data providers or sources) * Codes (concepts or expressions with which you will be coding your data; these should already be developed provided you are using deductive coding to code your data) * Segmentation (if you are segmenting your sources, we assume you have ideas on how you will be designating meaningful segments in your data)
If you are not familiar with basic ROCK terminology, please refer to this chapter of the ROCK Book before you begin this step-by-step process. Please note that visual aids in this tutorial were generated from different projects; these will be updated with a more coherent set of visuals later. We would like to thank Judit Nyirő in helping us create this tutorial and checking our instructions for clarity at every step.
Okay, if you are ready, let’s begin setting things up!
16.2 Setting up the Basics
16.2.1 Install R stuff
Let’s begin setting up the basics by downloading and installing R and RStudio! You will need both of these to use the ROCK and perform various operations.
16.2.1.2 Download RStudio
What is R? R is a programming language and software environment for statistical computing and graphics. Many programmers around the world use R to create software, which are then made freely available for common use mainly via a platform called CRAN, the Comprehensive R Archive Network. CRAN is a network of servers around the world that store identical, up-to-date versions of R code and documentation. If you want to think of all this metaphorically, you can imagine R as a natural, spoken language like English. People around the world use words in English (elements in the R code) to create shorter or longer strings of sentences (pieces of software). These sentences can be used by researchers like you to e.g. perform analyses on their data, and they can be used by other programmers to modify, expand, build on these sentences to create new pieces of “writing” in R (more complex programs).
What is RStudio? RStudio wraps around R and makes life easier. It provides one interface where you can interact with R, work with analysis scripts, view plots and other results, interact with a version control system called Git that we”ll discuss later, and RStudio introduces the concept of projects, which are a useful way to bind together a set of files.
When you start using RStudio on MacOS, you may see these windows pop up. Feel free to click on “install” in the bottom window to get all the tools you need. The top window is a personal preference you should consider and then hit “No” or “Yes” accordingly.
16.2.2 Create a Gitlab account and project
We assume you will be using some kind of repository to store and share your materials. We will demonstrate the process through Gitlab here.
Git is an extremely powerful and relatively user-friendly version control system. You can think of it as the “tracked changes” functionality in word processor software such as Microsoft Word or LibreOffice Writer, but on steroids, and for all files in your project. It was originally developed to facilitate collaborating with many people on a set of files, simultaneously, without running the risk of accidently losing anything. Like R, Git itself is quite bare, and so many people interact with it through tools that augment it. One such tool is RStudio. Another class of tools are online Git repository managers.
Two popular examples of such online repository managers are GitHub and GitLab. The latter is Open Source, while the former is not; GitLab, therefore, is consistent with the Open Science foundations upon which the ROCK was built, and is what we will use in this tutorial.
The Git integration in RStudio means that you will be able to make modifications to the files in your directory through RStudio (e.g. perform analyses), which are then easily “pushed” to the central GitLab repository from RStudio. You are in essence performing actions on your PC, which can then be “synced” with your (public) repository on Gitlab.
Firstly, please create a GitLab account, if you don’t already have one. You can sign up here: https://gitlab.com/users/sign_up Then, go to “Projects” and “Your projects” and then click on “New project”
Fill in the details as instructed by GitLab and according to your preferences, then click “Create project”. Select the option “Initialize repository with a README”. Where you see “visibility level”, please note that the default is “Private” and switch this to “Public” (unless you don’t want to publish the repository, or want to publish it later on).
16.2.3 Create an R project
Open RStudio, click on “File” and then “New project”. Click on “Version control” and then “Git”.
Now you will see a field at the top in which you can enter the Git HTTPS cloning URL of your Git project. To find it, you will need to go to your Gitlab project’s page.
Go to Gitlab, find your project, and click on “Clone”. Click on the “copy” icon to the right of “Clone with HTTPS” or just highlight the URL and click Ctrl+C to copy it.
Take this URL back to RStudio and enter it into the field “Repository URL” (you can just hit Crtl+V). The field below it, called “Project directory name”, will be autofilled, either upon pasting the URL, or as you hit Tab or click the next field. The last thing you need to do is designate into which MAIN directory this particular project should be a SUBDIRECTORY of. Note that Git will create a directory with the “Project directory name”. We suggest you have a main directory for all your projects (e.g. “research” or “studies”) and in that case, we would now select that directory and RStudio will create a subdirectory within that. When finished, click “Create project”.
Cloning is always possible for public projects, but for private projects, you will need to authenticate to the GitLab server. RStudio will usually present a dialog requesting the necessary information. It will look something like this:
It will ask you for your Gitlab username and password.
If you are not prompted, you may receive an error message of some kind, asking you to specify these. In this case, enter your information below and copy-paste the text into your console and hit enter:
git config --global user.name "YourUserNameHere"
git config --global user.email "YourEmail@Here"
This should prompt RStudio to ask for your password if you retry. If it does not, you may have specified an incorrect password and will need to reset this. The way this works depends on which credential manager you use – see https://psy-ops.com/git-basics#git-unsetting-password for some pointers.
Congratulations, you now have an R Project, which is mostly empty right now, but we will start populating it soon!
16.2.4 Copy the Empty ROCK project and personalize it
The easiest way to prepare for using the ROCK is by copying the “Empty ROCK project” repository.
Go here: https://gitlab.com/psy-ops/empty-rock-project
Download as a zip file to anywhere on your PC, and then unzip it.
Place the contents of the unzipped folder into the directory you chose for the R project that you created when you cloned your GitLab repository. Please note that if you ever move this directory, you will need to open it in RStudio from the new location before you can open it from within RStudio. When copying these files, your computer will probably warn you that one or more files already exist. You can safely skip these. For example, you already created an empty “README.md” file for your project, so you won’t want to overwrite that with the Empty ROCK project README file.
Below is the main directory of the Empty ROCK project; feel free to delete the “empty-rock-project.RProj” file (the RStudio project; as you have your own RStudio project, completed in the previous step).
The “data” directory will be where your raw and coded data go; the “results” directory is for your intermediate and final results. The directory called “scripts” contains the R script(s) you will be using and modifying throughout your project. The “README.md” file contains the content that appears on the main page of the project in GitLab (if you scroll down below the overview with all the files in your repo). The three files in red all beginning with “git” will be discussed a bit later (xx).
In the “scripts” directory, rename the file according to your preferences. We suggest naming it the same as or resembling the entire RStudio project.
Now, in the main directory, double click on the RStudio project: this will open it in RStudio. Take a look at the RStudio interface.
The pane on the left is called the “console”, this enables you to communicate with the R software environment. All panes are multifunctional: for example, in the bottom-right pane you can see there are tabs, usually with the “Files” tab opened. This shows you where you currently are (“D:/Git projects/onco comm ena/scripts
”; this is where we have placed our Git repo, and in it, the Empty ROCK project contents). In the bottom right pane, if we click on the “scripts” directory and then on the so-called R Markdown file (with extension .Rmd; note that depending on your computer’s settings, file extensions may be hidden from you), which contains our R script, it will open in a pane above the console.
The script (Rmd file) contains some explanations and commands (instructions for R to do things with your data) that you will most likely need for your project. If you scroll down or drag the divider between the script and the console down, you will see more of the script file.
An R Markdown file combines chunks or R code (normally stored in R script files with the extension “.R
”) with text in the markdown format (normally stored in plain text files with the extension “.md
”). In this file, the text that is already present is formatted using that markdown format and supplies you with background and instructions on the ROCK, and is interspersed with R chunks that useful commands. The markdown format uses hashtags (“#
”) for headings, one hashtag indicates a level 1 heading, two hashtags a level 2 headng and so on. R commands look different; they appear in a grey box, as shown below (between lines 35-41). Also notice that there is a small green “play” sign in the top right corner of the grey box; clicking on this executes all the commands in that particular grey chunk. We will come back to more features of the script and RStudio later.
16.2.5 Change the basic info in the script
As a final step for now, rewrite the basic information at the top to reflect the details of your project. The title and author can be changed at any time, do not worry about making these final. This front-matter is formatted in a language called YAML, which has a special format, aspects of which we will be addressing later on. For now, it is important to remember that spaces, hashtags, line breaks, etc. are there for a reason. Try to only re-write or delete characters that you are sure you will not need in the future, and avoid changing the indentation. RStudio uses colors to designate field names and contents, and so should hint at accidently introduced errors.
This is basically what we will be doing for the rest of the project as well. This basic script contains all the instructions and commands you will need to use the ROCK. You will have to change some things, though. These changes will have to be made directly in this script file by writing over some things within commands. don’t worry, we will go into detail on this for each command. For now, just familiarize yourself with how the interface looks and what the different panes contain.
Click “Save”; it is at the top left corner of the pane.
Just to summarize, this R Markdown file contains text in three different languages:
- The front-matter uses “YAML”;
- The text uses Markdown;
- The code chunks use R
It is important to realize that both these languages themselves and the “costs” of deviating from those languages’ conventions differ for each type: for example, errors in the YAML will not interfere with running the R code chunks, but will prevent you from “rendering” the complete file (we will get back to this later); errors in the markdown will in most cases simply ruin the layout of your text, but have no serious consequences; and errors in R code will prevent RStudio from executing your analyses.
In the remainder of this tutorial, we will explain the bare minimum of what you will need to know of these conventions, but it is easy to find a host of tutorials on each of these three (e.g. see xx) should you wish to learn more.
16.2.6 Download the ROCK R Package
Open the R project and click on the script file in the lower right pane. You might notice that RStudio has a keen eye for R packages that are in a script and yet not installed on your computer. RStudio will most likely tell you this with a thin yellow ribbon that pops up above the opened script. We suggest you click on “Install” and let RStudio install the necessary packages for you.
Generally speaking, certain actions require certain packages, and packages usually build on other packages. RStudio is very smart and will let you know if you need to install a package that you haven’t yet, so whenever you see the yellow ribbon, trust it, click install.
Next, we will install the R package rock
. Depending on which version you want to install, copy one of the following three commands:
install.packages("rock");
::install_gitlab("r-packages/rock");
remotes::install_gitlab("r-packages/rock@dev"); remotes
(Commands and their context can be found here; note that the latter two require you to have the remotes
package installed, which you can install using the install_packages()
function used on the first of these tree lines)
Then paste the command into the console in RStudio (bottom left pane) and hit enter.
You might see a list of updates. Choose your preferences to update any or all packages shown below. Then hit enter again.
16.2.7 Update the ROCK R Package
Make sure you have the latest versions of the necessary packages. To do that, find “## Basic setup in R
” (here, i.e. in the figure below, on lines 58-82) in the script and run the chunk by clicking on the little green play button on the right. Alternatively, you can highlight the commands and hit Ctrl+Enter. This will run the updates.
After running the chunk, you should see this in the console:
Great, you are now up-to-date! Please note that since the ROCK is being developed continuously, you may need to update it every once in a while, especially when using a new command.
16.2.8 First “push” to GitLab
Open your RStudio project, if you do not presently have it opened. We have already made some changes at the top of our script (title, author). Any changes you make on your PC, whether they be made through RStudio or any other program, RStudio will see which files have been modified and what those modifications were exactly; these will appear in the top right pane in the “Git” tab. For example:
Each time you make changes, the names of the files that were modified will appear in the top right pane (here: “OncoComm_ena
” in folder “scripts
”). Remember, RStudio communicates with the GitLab repository. In order to allow these changes to appear in the repository, you first need to “Commit” these changes, then “Push” them to GitLab. To put it another way: there is now a channel between RStudio (your PC) and GitLab (your repository), but the floodgate is closed for your benefit. Once you are sure the modifications you made to your files are ready, you can open the floodgate by “pushing” to Git. This ensures that you have control over each version of your files.
Make sure you are on the “Git” tab of the upper right pane. Click the small box directly to the left of the name of the modified file, then click “commit”. A window will pop-up, letting you review the changes that you”ve made to each file. When ready, enter a short message in the “Commit message” box on the right, then click “Commit”.
Following this, you will see a record of the changes that you have accepted appearing in a new, overlayed pop-up window. Click “Close” and then on the “Review Changes” pop-up window (this should be directly underneath the “Git Commit” pop-up that you just closed), click on “Push”. This will actually push the changes you have accepted to your repository; don’t worry if there is nothing in that window…you still need to click on “Push”. You can now click “Close” for both pop-up windows and return to the main RStudio interface.
Congratulations! You have now pushed your changes to your repository! You can go to Gitlab, refresh the page, and make sure your changes are actually there.
16.2.9 Download Notepad++
You will need a text editor, so we suggest you download Notepad++, which works very well with multiple file types and encodings. And its logo is super cute. Download and install the most recent version here: https://notepad-plus-plus.org/downloads/
This is what the basic interface looks like. For now, you do not need to know any of its complex functions, just be able to use it for viewing and basic editing.
MAC USERS: for some reason Notepad++ does not run on MacOS. Here are some alternatives for you:
- Visual Studio: https://code.visualstudio.com/docs/?dv=osx
- Atom: https://atom.io/
- Coda: https://panic.com/coda/
This is what the Visual Studio interface looks like:
Great, you are all set in terms of basics!
16.3 Data Preparation
16.3.1 Take a closer look at your directory
Open your semi-personalized Empty ROCK directory, which should now have an R project file customized according to your preferences. Please feel free to add any directories you already know you will need during the project, e.g. “conceptualization
”, “operationalization
”, “code development
”, “manuscripts
”.
Now click on the Data folder. You will see that it already contains sub-folders and those contain files.
The term “sources” signifies the files you plan to code (or have already coded), and by “code” here we mean qualitative (inductive or deductive) coding of text files for research purposes. The “Raw sources
” folder in the directory “Data
” is to house your raw data, e.g. interview transcripts, focus group transcripts, transcribed audio diaries.
You will notice that we have put some sample text files here for you. Feel free to delete those as you replace the folder content with your own raw data (see step: “Place your raw data into the appropriate directory”). You will also see a file called “.gitkeep
”; each folder in your main directory (including the main directory) can be told to sync to your Git repository or not. When you see a file called “.gitkeep
” in a folder, it means that the contents of that folder will always sync to Gitlab when you “push” in RStudio (see: xx). You may not want everything on your PC to sync to Gitlab, though. If you do not want a folder to sync to your repository, then open the file called “.gitignore
” in the main directory and add the name of the folder you wish to disable from sync-ing.
Add the folder name and then a backslash to indicate that Git should ignore the entire folder. Here we are telling Git to not sync folders called “manuscripts
”, “private
”, and “waste
”. This way, even when we have set our Gitlab repository visibility level to “Public”, these three folders will not appear in the repository at all (but you will be able to see and access them on your PC).
MAC USERS: Some Mac users may not see the files with a .git extension; don’t worry, it’s still there. You can either download a piece of software that helps your OS see these files, or use RStudio to handle them. If you look at your directory in RStudio, you should be able to see .git
files as well.
Example of directory contents on MacOS:
16.3.2 Place your raw data into the appropriate directory
Replace the files in “Raw sources” with the files that constitute the sources in your project. Make sure to save these as files with the extension .txt
If you have your transcripts in Word, then you can just “Save as” a plain text file. As you convert, depending on the language of the transcript, you might have to change the encoding. Choose “Unicode (UTF-8)” in the “Other encoding” section. After double-clicking on that, you should be good to go.
You will have the option to code your sources manually (i.e. code texts by hand or with iROCK) or autocode them (automated coding with ROCK), note that the corresponding folders are already set up for you. Automated coding may have several rounds, the directory structure is set up so that coded sources from multiple rounds can be placed into separate directories. See section “xx” for more on autocoding.
Please note that a file called “attributes
” is also present in your “Data
” folder. This file will contain your metadata and attributes, characteristics about your sources (files to code, or coded files), your cases (data providers), or any other aspect of your data that is relevant to your research project. See step “Designate your attributes” for more on attributes.
16.3.3 Designate your attributes
Open the file called “attributes.rock
” in the “Data
” directory with Notepad++ (right click on file and “Edit with Notepad++”).
The top line explains that you can have your attributes listed in the source they pertain to or have them all in one file. We will go through the latter version here. (Please note that because the top line begins with a hashtag, it is regarded as a comment in R and thus can stay in the document without interfering with any future operations.)
After you have compiled the attributes (or metadata) you wish to include in your analyses, you will need to convert that information so that the rock
package can understand it. For this conversion, it is crucial that you keep the formatting you see in the “attributes.rock
” document you have opened.
You can see that the format begins with three dashes, then a line break, the expression “ROCK_attributes:
”, then a line break. Next, we have one dash after two spaces; in the next line we begin the actual list of attributes, which, in the example are: caseId
, artistName
, songName
, and year
. Each attribute is preceded by four spaces. Two spaces followed by a dash separate cases, and after the last case, the list of attributes is finalized with three dashes.
Note that the values for artistName and songName are in quotation marks, while numbers (e.g. year) are not. Please keep this format, while entering your own attributes and their values. Also, please remember that attributes must correspond across cases if you want to aggregate and compare them.
Here is another example with the same attributes but different values:
The dash separating cases can receive a line to itself or be on the same line as the subsequent case. The picture on the right shows you characters that are usually not displayed but are there nonetheless (spaces and line breaks) to help you see the formatting in entirety.
16.3.4 Designate your persistent IDs
In this step, we will identify persistent IDs in your project and let ROCK know about them. Persistent identifiers are codes that repeat every utterance until told otherwise: once they are applied to an utterance in a source, the rock package will automatically apply them to all subsequent utterances in that source. A good example of a persistent ID is “case”, which for an interview conducted with a single participant would persist over the entire transcript, or for focus groups, case IDs would indicate when a speaker changes.
Let’s take the example of working with individual interviews; we want to indicate for every utterance in the transcript that it was spoken by the participant. Instead of having to code every single utterance in an interview with e.g. “caseID: 2”, you can tell ROCK to consider this code as repeating up until when it encounters another such code. You may have several cases contributing to one source, such as in a focus group discussion. Persistent IDs in this instance would be useful as well, because you only need to indicate the participant (caseID) when there is a change in speaker.
Most commonly, persistent IDs will be caseIDs and various forms of mid-level segmentation.
Open the R project and specifically, your script. Locate the section called “Configuring the ROCK package” (here, i.e. in the figure below, on line 114-125).
Locate the code in the grey box (here: beginning on line 92), and while keeping the format as is, switch the items you wish to include and delete the items you wish to exclude.
In this example, we have designated “caseId” and “coderId” (lines 96-97) as persistent IDs because we have one case per source (e.g. one participant per interview) and we have multiple raters and wanted to keep track of who is coding which source.
We have also designated two forms of segmentation (lines 98-99), one is a way to indicate when a paragraph ends and another begins (paragraph-break), and the other form of segmentation is for indicating a change in topic (topiclist-switch). According to the ROCK standard, your segmentation labels should follow this format:
---<<your_chosen_label>>---
Don’t forget to keep the symbols on either side of the actual label because those symbols are needed to let ROCK know this is a form of segmentation. To provide a full example of how to switch your segmentation labels in the code:
Note that here we have reduced persistent IDs and section regexes to one item each. To do this, all we did was take out from the parentheses what we did not need, making sure to keep the parentheses themselves and the commas at the end of the lines originally containing commas. After you have made modifications and have run this command, you should see this in the console:
16.3.5 Clean your sources
Sources are usually messy, i.e. they are not neatly transcribed and necessitate formatting. The main purpose of “cleaning” in the rock
package is making sure that utterances are split along “utterance markers”. Utterances are the smallest meaningful unit of data in your study, coding occurs on this level. Utterances are separated by utterance markers; the default in the ROCK is a newline character (a character that indicates a new line) and the rock
package can insert those for you.
Go to your console in RStudio and copy this command into it: ?rock::clean_sources
and hit enter. On the right you will see a list of default actions the rock
package will perform if you use the Clean Sources command. As you can see, the description says: “Cleaning consists of two operations: splitting the source at utterance markers, and conducting search and replaces using regular expressions.” For now, we will not look at the extra search and replacements you can ask the rock
package to do. We will discuss that here (xx).
The rock
package is set to consider sentences as utterances and insert newline characters between all sentences. These are the actions the rock
package will perform to clean up your data:
- Double periods (..) will be replaced with single periods (.)
- Four or more periods (… or …..) will be replaced with three periods
- Three or more newline characters will be replaced by one newline character (which will become more, if the sentence before that character marks the end of an utterance)
- All sentences will become separate utterances (in a semi-smart manner; specifically, breaks in speaking, if represented by three periods, are not considered sentence ends, whereas ellipses (“…” or unicode 2026, see the example) are.
- If there are comma’s without a space following them, a space will be inserted.
If your utterances are not defined as sentences, please see xx.
Now, locate the section titled “## Preparing and cleaning sources
” in your script within your R project. Locate the command in the grey box. Make sure your sources are placed into your “01--raw-sources
” directory and are in text files (files with a .txt
extension). Now click on the green “play” button within the command’s grey box.
This action will place your cleaned sources into the folder called “02--cleaned-sources
”. Open that directory and check your sources to make sure that each sentence is on a separate line. You may also notice that the rock
package has changed the extension of your files to .rock
to make them recognizable. They remain plain-text files, and so will still be editable with your text editor.
16.3.6 Add UIDs to your sources
Now that your utterances have received a separate line in your sources, the next step is to give each utterance a unique identifier. This identifier will serve to “collapse” all your codes and attributes onto when you aggregate your data (see: xx). Utterance identifiers clarify which utterances (e.g. sentences) belong to which case (e.g. participant), what attributes those cases have, and what discourse codes utterances were given. This enables you to group your coded utterances according to participant (or any of their attributes) and investigate code co-occurrences on the level of utterance or higher levels of segmentation, such as stanza (i.e. sets of utterances).
Open RStudio and locate the section “## Prepending utterance identifiers (UIDs)
” in your script within your R project. The command for prepending unique utterance identifiers (uids) is in the grey box under this section heading. Running this command will ask the ROCK to take the files in folder “02--cleaned-sources
”, add uids to each line (e.g. sentence), and put the files with uids into the directory “10--sources-with-UIDs
”. Note, these files will have the same exact text, with the only difference that each line will receive a combination of letters and numbers added to the front of the line. This unique combination will be within two square brackets:
Congratulations, your sources are now ready to be coded!
16.4 Coding and Segmentation
16.4.1 Designate your codes and section breaks
This section is for researchers coding their data deductively (using predetermined codes for coding); if you are working inductively, please see section xx. We suggest that you use iROCK to perform coding; this is an interface that allows you to upload the sources you want to code (and segment), as well as your codes (and section breaks), and just drag and drop them into the document. Please see the next section on how to use iROCK.
If you are ready with your codes, you will need to make a separate text file that contains the codes according to the ROCK standard. This format is the following:
[[your_code_identifier_here]]
If you are working with hierarchical codes, you have to use the hierarchy marker to indicate a code’s parent codes, e.g.:
[[Info>source>media]]
Here, the parent code is “Information”, the child code is “source” and the grandchild code (what we will actually use in coding) is called “media”. Below is an illustration for a list of codes in a text file:
Please note that only these characters will be considered a code, provided they fall between two square brackets: [a-zA-Z0-9_>]+ Any other sequence of characters, regardless of whether it’s between [[ ]], will not be seen as a code identifier. Also, make sure that none of your low-level code labels are identical, as even if they belong to different parents, the ROCK will not be able to parse them as different codes.
Several raters might be coding the same corpus with different codes in your project. In this case, we suggest creating separate text files for each code cluster (or parent code) so that the code list only contains codes that a certain rater would use. Of course, if multiple raters are using the same codes, then the text document should be shared among them.
If you also intend to segment your transcripts (e.g. according to question-response or topic), you should create a text document with your section breaks as well. Here is an illustration of three section breaks in one file:
Do not use hyphens in section break labels! As with code labels, they should consist of these characters: [a-zA-Z0-9_>]+
Once you have created these documents (which can be placed e.g. in their own separate folder), you can begin coding your sources! We suggest you use iROCK for this purpose, see the next step for details!
16.4.2 Using iROCK to code your sources
This section is dedicated to familiarizing you with the interface for ROCK, or iROCK, which was created to ease manual coding of data (if you want to automate your coding process, please see section xx). First, let’s go through the preparations involved in coding and segmentation. Please click on this link: https://i.rock.science to reach the iROCK interface. This is what you will see:
Going from left to right, by clicking on the buttons at the top you will be able to upload your Sources, your Codes, and your Section breaks (provided you are segmenting your data). The buttons on the right pane allow you to add your codes and section breaks inductively by typing in their label and hitting enter.
If you are using deductive coding, we suggest you upload your text file containing your codes, because it may be tedious to type in the list every time you begin coding a new source. Your codes will appear in the pane on the right and you will be able to drag and drop them on the utterance you want to code (i.e. tag an utterance with a code). Sections breaks work in a similar manner, except those are to be positioned in between two utterances (i.e. slice your text into fragments). When you have uploaded these components and started coding, the interface will look something like this:
Please note that segmentation cannot be nested at this point in the ROCK, that is, a section break operates as a persistent identifier: it repeats at each utterance until told otherwise. In the figure above, there are three text fragments delimited by two “stanza delimiters”; thus, every utterance before the first delimiter will be labeled with “0”, every utterance between the first and second delimiter will be labeled with “1”, and all utterances after the second delimiter will be labeled with “2” until a new section break is added to the text. We are working on a way to create threaded data, which can be employed as nested segmentation also.
Multiple section breaks ARE allowed on one line.
When you have finished coding, just click on the “Download” button in the lower right corner and place the coded source into your “data
” directory, within the “20--manually-coded-sources
” folder.
If you want to work on a source iteratively, or you want to pass it on to another rater and want them to be able to see your coded version, you can download the (partially) coded source and then just upload it next time you want to work on it. When you download a file from iROCK, it saves all information in a text file with the .rock extension. This file can be re-loaded into iROCK at any time and you can manually modify the file in any text editor.
16.4.4 Add identifiers to your sources
Often, qualitative sources have a set of characteristics that are independent of coding. In the ROCK, these are called “attributes”, and they are designated to sources using “identifiers”. An identifier is a special code that can be used to identify sources. For example, identifiers can be used to code which participant was interviewed; who the interviewer was; what the location of the interviews was, or whether the interview was conducted during the morning, afternoon, or evening.
These identifiers can then be used to associate attributes to utterances. By default, the ROCK uses the identifier “caseId
” to designate data providers. It is configured as a so-called “persistent” identifier, which means that it does not only apply to the coded utterance, but is automatically applied to all following utterances until another “caseId” identifier is encountered.
These identifiers are ideally added during or just after transcription, but they can also be added around the time of (manual) coding. The default ROCK case identifier format is:
[[cid=EnterLabelHere]]
You can use e.g., numbers or pseudonyms (of course, never use participants’ real names!). Here are a couple of examples:
[[cid=25]]
[[cid=Patient_7]]
[[cid=Larry1997]]
When you have created a caseID for all your participants, you still need to let the ROCK know which case contributed to which source. The caseID of your data provider should be placed into every source they contributed to. If only one case contributed to a source, you can just copy-paste the cid into the (coded) text file to the top of the document. If your sources contain utterances from more than one participant….
more coming soon!