Chapter 5 Aggregating data: sum

5.1 Intro

A common task is aggregating multiple variables (columns in a dataset) into one new variable (column). For example, you may want to compute the sum of the items of a questionnaire.

Note that when creating new variable names, it is important to follow the convention for variable names (see section (software-basics-file-and-variable-name-conventions)).

5.1.1 Example dataset

This example uses the Rosetta Stats example dataset “pp15” (see Chapter 1 for information about the datasets and Chapter 3 for an explanation of how to load datasets).

5.1.2 Variable(s)

From this dataset, this example uses variables highDose_AttGeneral_good, highDose_AttGeneral_prettig, highDose_AttGeneral_slim, highDose_AttGeneral_gezond & highDose_AttGeneral_spannend.

We will aggregate these into the variable highDose_attitude_sum.

5.2 Input: jamovi

In the “Data” tab, click the “Compute” button as shown in Figure 5.1.

Figure 5.1: Aggregating in jamovi: opening Compute menu

Type in the new variable name in the text field at the top, labelled “COMPUTED VARIABLE”. Then click the function button, marked \(f_x\), select the SUM function from the box labelled “Functions”, and double click all variables for which you want the sum in the box labelled “Variables”, while typing a comma in between each variable name as shown in Figure 5.2.

Figure 5.2: Aggregating in jamovi: using the function menu to specify a computation

Alternatively, you can type the function name and list of variables directly without using the function (\(f_x\)) dialog as shown in Figure 5.3.

Figure 5.3: Aggregating in jamovi: directly typing in a computation

5.3 Input: R

In R, there are roughly three approaches. Many analyses can be done with base R without installing additional packages. The rosetta package accompanies this book and aims to provide output similar to jamovi and SPSS with simple commands. Finally, the tidyverse is a popular collection of packages that try to work together consistently but implement a different underlying logic that base R (and so, the rosetta package).

5.3.1 R: base R

dat$highdose_attitude <-
  colSums(
    dat[
      ,
      c(
        'highDose_AttGeneral_good',
        'highDose_AttGeneral_prettig',
        'highDose_AttGeneral_slim',
        'highDose_AttGeneral_gezond',
        'highDose_AttGeneral_spannend'
      )
    ]
  );

5.3.2 R: rosetta

dat$highdose_attitude <-
  rosetta::sums(
    data = dat,
    'highDose_AttGeneral_good',
    'highDose_AttGeneral_prettig',
    'highDose_AttGeneral_slim',
    'highDose_AttGeneral_gezond',
    'highDose_AttGeneral_spannend'
  );

5.4 Input: SPSS

For SPSS, there are two approaches: using the Graphical User Interface (GUI) or specify an analysis script, which in SPSS are called “syntax”.

5.4.1 SPSS: GUI

First activate the dat dataset (see 2.4.1).

Figure 5.4: A screenshot placeholder

5.4.2 SPSS: Syntax

COMPUTE highdose_attitude =
  SUM(
    highDose_AttGeneral_good,
    highDose_AttGeneral_prettig,
    highDose_AttGeneral_slim,
    highDose_AttGeneral_gezond,
    highDose_AttGeneral_spannend
  ).

5.5 Output

Aggregating variables is not an analysis, and as such, does not produce output. You can inspect the newly created variable to ensure it has been created properly.