Chapter 2 Software Basics
This chapter introduces the software packages included in Rosetta Stats. In addition to providing a brief introduction, some required basics will be discussed. Some terms that you will encounter in this chapter are the following:
- A console is an interface for sending commands and receiving responses. Until graphical user interfaces such as introduced by Windows and macOS became commonplace, they were pretty much how people interacted with personal computers. However, they remain ubiquitous. Consoles use plain text for communication. Users type commands, which are sent to an interpreter and processed further, and the results are again presented to the user using text printed to the console.
- FLOSS stands for Free/Libre Open Source Software. This is software that is available to everybody at no cost, that can freely be copied and distributed, and with publicly available source code, that is also available for others to view and edit. The benefits of FLOSS over proprietary software (i.e. software owned by a corporation or individual) include that errors can easily be spotted and corrected, that the software can be extended by anybody who wants to, and that the software can be downloaded and used by anybody, which removes funding as a barrier.
- Graphical User Interface
- A graphical user interface (GUI) is an interface for sending commands and receiving responses. Unlike consoles, GUIs do not rely on plain text but instead use graphics. Most users will be familiar with GUIs: the operating systems of smartphones and personal computers all use GUIs.
- See Graphical User Interface.
- Open Science
- Open Science is a collection of principles and practices that aim to enhance transparency, accessibility, and diversity both within science and of scientific products. These principles include full disclosure (i.e. making all materials, stimuli, datasets, analysis scripts, and output public), preregistration, preprinting, choosing open standards and FLOSS over proprietary standards and software, and making articles, books and chapters open access. In short, Open Science is doing one’s best to avoid making choices that introduce any kind of barrier that is not absolutely necessary, so that scientific products and participation in science are not gatekept.
The jamovi3 project was founded to develop a free and open statistical platform which is intuitive to use, and can provide the latest developments in statistical methodology. At the core of the jamovi philosophy, is that scientific software should be “community driven”, where anyone can develop and publish analyses, and make them available to a wide audience. It is available for Windows, macOS, Linux and ChromeOS from https://jamovi.org.
A nice feature of jamovi is that when you save a project, it stores the data, analyses, and output in the same file. This makes both collaboration and supervision much more straightforward. Another nice feature is that consistent with the jamovi philosophy, everybody can contribute modules. This means that users can install additional functionality from the jamovi library. It also means that the jamovi ecosystem will slowly keep growing over time.
If you just installed and opened jamovi, you will see something similar to what’s shown in Figure 2.1. This book is accompanied by a jamovi module that contains some of the analyses we refer to as well as the datasets we listed in Chapter 1.
2.1.1 Installing jamovi modules
To install a new module in jamovi, click the button with blue “plus” labelled “Modules” in the top-right corner. This will open the menu shown in Figure 2.2.
From this menu, select “jamovi library”, which will open the jamovi library with an overview of all available modules as shown in Figure 2.3. Scroll down to the “rosetta” module, which is called “Parallel Use of Statistical Packages in Teaching” and click the corresponding “Install” button to start the installation. Once the module is installed, it appears in the menu bar as shown in Figure 2.4.
Installing the “rosetta” module will make the example datasets used in this book available. Chapter 3 explains how to load these datasets (or other datasets) into jamovi.
R is a very powerful and extensible FLOSS for statistical analyses. Because it is readily extensible through contributed packages, it has grown into a tool that can help you with pretty much anything: there are packages for multilevel analysis, structural equation modeling, but also for visualising geographical maps, rendering three-dimensional objects, doing text mining, working with big data, directly interacting with databases, advanced data visualisation, qualitative analysis, and the list goes on.
R can be downloaded from https://cloud.r-project.org/. When you download R, you mostly get the statistical package itself. Most of this is the R language, which you can’t see. Instead, as a user, you see a rudimentary console interface (see Figure 2.5). There are a lot of ways to make working with R more fun and efficient, such as the popular FLOSS package RStudio.
2.2.1 R Studio
A very popular environment is RStudio. RStudio is FLOSS and can be downloaded from https://rstudio.com/products/rstudio/ (choose RStudio Desktop). See Figure 2.6 for an illustration of how RStudio looks.
RStudio is an interface that uses four tabbed panes to facilitate interactions with R. The top-left pane contains the analysis script, or multiple scripts, you are working on. The bottom-left pane contains the R console. The top-right pane contains an overview of all datasets and other objects you have loaded, as well as a history of the commands you provided to R. Finally, the bottom-right pane provides easy access to an overview of all files in your project, the plots you produced, the R packages you have installed and loaded, and the online manual of R and your packages.
To facilitate parallel use of, and transition between, different statistical packages, we created the
rosetta R package. This package contains a number of functions designed to behave similarly to their counterparts in other software packages. This means we strive to use similar names for the functions and arguments, similar default settings, and similar output.
This package can be installed in R using the following command:
Alternatively, if you want to install the current development version (which has more features but is usually less stable), you can do so using the
remotes package, which you will then first have to install:
rosetta package is installed, you can run all functions in the Rosetta Stats book that start with
rosetta::. Note that if a function call starts with
somethingelse::, you need to install the package
somethingelse. For example, advanced data visualusations requires the
ggplot2 package, and functions from
ggplot all start with
rosetta package also uses the
ggplot2 package in the background, so you will not have to install that separately.
2.2.3 Installing packages
R comes installed with many basic functions, commonly referred to as Base R. Researchers create organized collections of R functions, known as packages, to faciliate the use of various statistical methodologies and analyses. These packages are free and produce useful output for users. Anyone can create an R package. It is common to cite the packages you use in the references section of your research.
To install any package in R, use the following command:
The name of the package is case sensitive and must go in quotes. Once the package is installed, you will never need to install it again (although you may be asked to update it). You can also do this using the R GUI on the bottom right of RStudio by clicking on “Packages” and “Install” and typing the name of the package you want to install (see Figure 2.7).
2.2.4 Loading packages
Although you only need to install a package once, you will need to load it every time you open a new R session. To do this, use the command:
Note that there are no quotes around the package name in the library command. Alternatively, you can do this using the R GUI by clicking on the package you want to load (see Figure 2.8). If the package does not appear in your list of packages, it is because it has not been installed (see Section 2.2.3).
In this book, we will primarily use the rosetta package.
SPSS is a popular proprietary data analytic software owned by IBM. SPSS offers both a GUI and Syntax option for analysis, although people predominantly use SPSS for the simplicity of its GUI. SPSS comes with a set number of analytic options in its GUI and they cannot be modified by users.
As of January 2020, SPSS starts at $99/month per uesr per month. Universities and businesses may have existing SPSS contracts in place that provide access to the software for their employees, faculty, staff, or students. Student discounts are also available for students who do not otherwise have access.
As a proprietary software, SPSS does not have any packages or modules that can be created or added by users. This makes it straight forward and simple, but much more limited than a FLOSS. This tutorial uses SPSS version 26.
2.3.1 The active dataset
In SPSS, each dataset will open in a separate window. Only one dataset can be interacted with at any one time. The user therefore manually has to switch between datasets.
When using the GUI, a dataset can be activated by clicking its window. As such, it is important to make sure you are operating in the dataset you intend to use.
To specify which dataset should be activated using SPSS Syntax, use the command
DATASET ACTIVATE followed by the dataset name. For example, to activate a dataset named
dat, one can use:
DATASET ACTIVATE dat.
Note that if you only have one dataset opened, that dataset is active, and so you will not have to activate it.
Yes, the preferred spelling is without a capital.↩︎