Chapter 2 Software Basics

This chapter introduces the software packages included in Rosetta Stats. In addition to providing a brief introduction, some required basics will be discussed. Some terms that you will encounter in this chapter are the following:

Console
A console is an interface for sending commands and receiving responses. Until graphical user interfaces such as introduced by Windows and macOS became commonplace, they were pretty much how people interacted with personal computers. However, they remain ubiquitous. Consoles use plain text for communication. Users type commands, which are sent to an interpreter and processed further, and the results are again presented to the user using text printed to the console.
FLOSS
FLOSS stands for Free/Libre Open Source Software. This is software that is available to everybody at no cost, that can freely be copied and distributed, and with publicly available source code, that is also available for others to view and edit. The benefits of FLOSS over proprietary software (i.e. software owned by a corporation or individual) include that errors can easily be spotted and corrected, that the software can be extended by anybody who wants to, and that the software can be downloaded and used by anybody, which removes funding as a barrier.
Graphical User Interface
A graphical user interface (GUI) is an interface for sending commands and receiving responses. Unlike consoles, GUIs do not rely on plain text but instead use graphics. Most users will be familiar with GUIs: the operating systems of smartphones and personal computers all use GUIs.
GUI
See Graphical User Interface.
Open Science
Open Science is a collection of principles and practices that aim to enhance transparency, accessibility, and diversity both within science and of scientific products. These principles include full disclosure (i.e. making all materials, stimuli, datasets, analysis scripts, and output public), preregistration, preprinting, choosing open standards and FLOSS over proprietary standards and software, and making articles, books and chapters open access. In short, Open Science is doing one’s best to avoid making choices that introduce any kind of barrier that is not absolutely necessary, so that scientific products and participation in science are not gatekept.

2.1 File and variable name conventions

When working with data and analysis scripts, it is important to strive for platform- and software-independence. This results in a number of naming conventions that you should follow:

  • Only use characters in filename and variable names that are platform- and software independent
  • As much as possible, use the English language when naming files and variables (English is both the most widely understood language in the world and the de facto language of science)
  • Make sure filenames and variable names are as much as possible self-explanatory: always prioritize clarity of brevity.

The characters that are safe to use in filenames are lower and upper case latin characters (a-z and A-Z), arabic digits (0-9), underscores (_), dashes (-), and periods (.; although periods are also used to distinguish file extensions from the filename itself and are therefore discouraged).

The characters that are safe to use in variable names are lower and upper case latin characters (a-z and A-Z), arabic digits (0-9), underscores (_), and periods (.).

This means that the set of characters that are always safe to use are (a-z and A-Z), arabic digits (0-9), and underscores (_).

If you want to indicate word boundaries, you can use underscores (_) or camelCase, where the letter following a (omitted) space is capitalized.

A useful convention is to use camelCase to clearly distinguish words that are related, and reserve underscores for parts of variable names. For example: preTest_selfEfficacy_item1.

2.2 jamovi

The jamovi3 project was founded to develop a free and open statistical platform which is intuitive to use, and can provide the latest developments in statistical methodology. At the core of the jamovi philosophy, is that scientific software should be “community driven,” where anyone can develop and publish analyses, and make them available to a wide audience. It is available for Windows, macOS, Linux and ChromeOS from https://jamovi.org.

A fresh install of jamovi.

Figure 2.1: A fresh install of jamovi.

A nice feature of jamovi is that when you save a project, it stores the data, analyses, and output in the same file. This makes both collaboration and supervision much more straightforward. Another nice feature is that consistent with the jamovi philosophy, everybody can contribute modules. This means that users can install additional functionality from the jamovi library. It also means that the jamovi ecosystem will slowly keep growing over time.

If you just installed and opened jamovi, you will see something similar to what’s shown in Figure 2.1. This book is accompanied by a jamovi module that contains some of the analyses we refer to as well as the datasets we listed in Chapter 1.

2.2.1 Installing jamovi modules

To install a new module in jamovi, click the button with blue “plus” labelled “Modules” in the top-right corner. This will open the menu shown in Figure 2.2.

The modules menu in jamovi.

Figure 2.2: The modules menu in jamovi.

The jamovi library.

Figure 2.3: The jamovi library.

From this menu, select “jamovi library,” which will open the jamovi library with an overview of all available modules as shown in Figure 2.3. Scroll down to the “rosetta” module, which is called “Parallel Use of Statistical Packages in Teaching” and click the corresponding “Install” button to start the installation. Once the module is installed, it appears in the menu bar as shown in Figure 2.4.

The jamovi library.

Figure 2.4: The jamovi library.

Installing the “rosetta” module will make the example datasets used in this book available. Chapter 3 explains how to load these datasets (or other datasets) into jamovi.

2.3 R

R is a very powerful and extensible FLOSS for statistical analyses. Because it is readily extensible through contributed packages, it has grown into a tool that can help you with pretty much anything: there are packages for multilevel analysis, structural equation modeling, but also for visualising geographical maps, rendering three-dimensional objects, doing text mining, working with big data, directly interacting with databases, advanced data visualisation, qualitative analysis, and the list goes on.

R.

Figure 2.5: R.

R can be downloaded from https://cloud.r-project.org/. When you download R, you mostly get the statistical package itself. Most of this is the R language, which you can’t see. Instead, as a user, you see a rudimentary console interface (see Figure 2.5). There are a lot of ways to make working with R more fun and efficient, such as the popular FLOSS package RStudio.

2.3.1 R Studio

A very popular environment is RStudio. RStudio is FLOSS and can be downloaded from https://rstudio.com/products/rstudio/ (choose RStudio Desktop). See Figure 2.6 for an illustration of how RStudio looks.

RStudio. Note that you can choose how RStudio looks by installing different themes: this theme can be downloaded from https://gitlab.com/snippets/1846452/raw?inline=false.

Figure 2.6: RStudio. Note that you can choose how RStudio looks by installing different themes: this theme can be downloaded from https://gitlab.com/snippets/1846452/raw?inline=false.

RStudio is an interface that uses four tabbed panes to facilitate interactions with R. The top-left pane contains the analysis script, or multiple scripts, you are working on. The bottom-left pane contains the R console. The top-right pane contains an overview of all datasets and other objects you have loaded, as well as a history of the commands you provided to R. Finally, the bottom-right pane provides easy access to an overview of all files in your project, the plots you produced, the R packages you have installed and loaded, and the online manual of R and your packages.

2.3.2 Packages

R comes installed with many basic functions, commonly referred to as Base R. Researchers create organized collections of R functions, known as packages, to faciliate the use of various statistical methodologies and analyses. These packages are free and produce useful output for users. Anyone can create an R package. It is common to cite the packages you use in the references section of your research.

Packages typically live in two places: the Comprehensive R Archive Network, CRAN for short, which performs some quality control on packages. By default, R only looks in the CRAN repositories when you tell it to install a package. Many package developers also offer packages in Git repositories. These are often less polished than CRAN packages, but also more cutting edge. We will explain how to install “non-CRAN packages” below.

One package that you may want to install if you use this book is the rosetta package. We created this to facilitate parallel use of, and transition between, different statistical packages. It contains a number of functions designed to behave similarly to their counterparts in other software packages. This means we strive to use similar names for the functions and arguments, similar default settings, and similar output. We will use this package as an example in the following sections.

2.3.2.1 Installing CRAN packages

To install a package, you can use the following command:

install.packages('rosetta');

Replace “rosetta” with the name of the package you want to install. The name of the package is case sensitive and must go in single (') or double (") quotes. Once the package is installed, you will never need to install it again (although you may be asked to update it).4

If you use RStudio, you can also install packages using the Packages tab in the bottom-right pane of RStudio by clicking on “Packages” and “Install” and typing the name of the package you want to install (see Figure 2.7).

How to install package using the RStudio GUI.

Figure 2.7: How to install package using the RStudio GUI.

2.3.2.2 Installing non-CRAN packages

If you want to install a package that is not on CRAN, or a newer version of a package than the version on CRAN, you can install packages directly from Git repositories. This requires the package remotes. Once installed, you can use the functions remotes::install_gitlab() and remotes::install_github() to install packages from repositories hosted by GitLab and GitHub, respectively.

To do this, first find the URL to the repository (this will usually have the form of “https://gitlab.com/xxx/yyy” or “https://github.com/xxx/yyy”), and remember the bit after the domain name (so “xxx/yyy”). The pass this as an argument to the remotes::install_gitlab() or remotes::install_github() function.

For example, to install the current development version of rosetta (which has more features but is usually less stable), use the following two commands (you can skip the first one if you already installed the remotes package):

install.packages('remotes');
remotes::install_gitlab('r-packages/rosetta');

2.3.2.3 Calling functions from packages

Once a package is installed, you can run all functions in the package by prepending the function name with “package::” For example, if you installed the rosetta package, you can run all functions in the Rosetta Stats book that start with rosetta::. Note that if a function call starts with somethingelse::, you need to install the package somethingelse. For example, advanced data visualusations require the ggplot2 package, and functions from ggplot all start with ggplot2::. The rosetta package also uses the ggplot2 package in the background, so you will not have to install that separately.

If you try to use a function but you forget to tell R from which package it should get the function, R will give you an error. For example, if we try to use the varView() function from the rosetta package to show the variable view for the built-in Orange dataframe like this:

varView(Orange);

R will throw the following error:

Error: object 'varView' not found

This is because we forgot to specify that we want to use the varView function from the rosetta package (and not from another package) by using the rosetta:: prefix, so R can’t find the object (the varView() function): it doesn’t know where to look.5 If do we include the package name, R happily produces the variable view:

rosetta::varView(Orange);
## <p>
## Variable view for 'Orange':</p>
## <style>p,th,td{font-family:sans-serif}td{padding:3px;vertical-align:top;}tr:nth-child(even){background-color:#f2f2f2}</style>
## 
## <table>
##  <thead>
##   <tr>
##    <th style="text-align:left;">   </th>
##    <th style="text-align:right;"> index </th>
##    <th style="text-align:left;"> values </th>
##    <th style="text-align:left;"> level </th>
##    <th style="text-align:left;"> valids </th>
##    <th style="text-align:left;"> NAs </th>
##    <th style="text-align:left;"> class </th>
##   </tr>
##  </thead>
## <tbody>
##   <tr>
##    <td style="text-align:left;"> Tree </td>
##    <td style="text-align:right;"> 1 </td>
##    <td style="text-align:left;"> 3 (1), 1 (2), 5 (3), 2 (4) &amp; 4 (5) </td>
##    <td style="text-align:left;"> ordinal </td>
##    <td style="text-align:left;"> 35 </td>
##    <td style="text-align:left;"> 0 </td>
##    <td style="text-align:left;"> ordered &amp; factor </td>
##   </tr>
##   <tr>
##    <td style="text-align:left;"> age </td>
##    <td style="text-align:right;"> 2 </td>
##    <td style="text-align:left;"> 7 unique values ranging from 118 to 1582. </td>
##    <td style="text-align:left;"> continuous </td>
##    <td style="text-align:left;"> 35 </td>
##    <td style="text-align:left;"> 0 </td>
##    <td style="text-align:left;"> numeric </td>
##   </tr>
##   <tr>
##    <td style="text-align:left;"> circumference </td>
##    <td style="text-align:right;"> 3 </td>
##    <td style="text-align:left;"> 30 unique values ranging from 30 to 214. </td>
##    <td style="text-align:left;"> continuous </td>
##    <td style="text-align:left;"> 35 </td>
##    <td style="text-align:left;"> 0 </td>
##    <td style="text-align:left;"> numeric </td>
##   </tr>
## </tbody>
## </table>

2.3.2.4 Loading packages in memory

Sometimes you may want to tell R to always search for objects (such as functions) in a certain package. In that case, you can attach the package to R’s “search path” for the duration of your session. You can conceptualize as “loading” the package. Although you only need to install a package once, if you want to load it, you will have to repeat that command every time you open a new R session.

You can load a package (i.e. attach it to R’s search path) with this command:

library("rosetta")

If you use RStudio to work with R, you can load packages by opening the Packages tab in the bottom-right pane and clicking on the package you want to load (see Figure 2.8). If the package does not appear in your list of packages, it is because it has not been installed (see Section ??).

How to load a package using RStudio.

Figure 2.8: How to load a package using RStudio.

Note that it pays off to make a habit out of always explicitly specifying the package name using the :: operator. Because R does not enforce unique function names across packages, many packages have functions with the same names (such as recode() in the Hmisc, car, and rosetta packages and filter in both base R and dplyr). If you load multiple packages, and use such a function, it is not always intuitive which version of a function you will “get.”

Because of this, debugging is much more efficient if you explicitly specify the packages along with the functions in every function call. Since when one works with R (or other statistical packages), most time isn’t spent writing code, but debugging code, this quickly pays off despite the slight inefficiency of having to always also type the package names.

A second reason to explicitly specify the packages when calling functions is that it makes R code more portable. When you rely on loaded packages, if you copy fragments of R code to another script, the code will throw errors until you remember to also load those functions at the top of your code. Identifying which packages you need to load to be able to run a fragment of code is impossible unless you happen to know by heart which functions come from which packages.

A third closely related reason is that explicitly specifying packages in function calls makes your code easier to read for others. They don’t have to scroll up and down to try and figure out which functions exactly you’re using.

Finally, a fourth reason applies to more advances users: if you develop R packages, it is also best practise to always call functions using the :: operator. Therefore, it’s best to automate this from the get-go.

If you want to check for the existence of a package at the top of your code, there is a function called checkPkgs() in the ufs package.

2.4 SPSS

SPSS is a popular proprietary data analytic software owned by IBM. SPSS offers both a GUI and Syntax option for analysis, although people predominantly use SPSS for the simplicity of its GUI. SPSS comes with a set number of analytic options in its GUI and they cannot be modified by users.

As of January 2020, SPSS starts at $99/month per uesr per month. Universities and businesses may have existing SPSS contracts in place that provide access to the software for their employees, faculty, staff, or students. Student discounts are also available for students who do not otherwise have access.

A fresh install of SPSS version 26

Figure 2.9: A fresh install of SPSS version 26

As a proprietary software, SPSS does not have any packages or modules that can be created or added by users. This makes it straight forward and simple, but much more limited than a FLOSS. This tutorial uses SPSS version 26.

2.4.1 The active dataset

In SPSS, each dataset will open in a separate window. Only one dataset can be interacted with at any one time. The user therefore manually has to switch between datasets.

When using the GUI, a dataset can be activated by clicking its window. As such, it is important to make sure you are operating in the dataset you intend to use.

To specify which dataset should be activated using SPSS Syntax, use the command DATASET ACTIVATE followed by the dataset name. For example, to activate a dataset named dat, one can use:

DATASET ACTIVATE dat.

Note that if you only have one dataset opened, that dataset is active, and so you will not have to activate it.


  1. Yes, the preferred spelling is without a capital.↩︎

  2. Unless you update to a new version of R, e.g. from 4.0 to 4.1, or of course if you get a new computer.↩︎

  3. Formally, the rosetta package isn’t on R’s so-called “search path.” Note that it’s also possible to add a package to R’s search path: see the next section.↩︎