Chapter 4 Aggregating data: mean
A common task is aggregating multiple variables (columns in a dataset) into one new variable (column). For example, you may want to compute the average score on the items of a questionnaire.
Note that when creating new variable names, it is important to follow the convention for variable names (see section (software-basics-file-and-variable-name-conventions)).
4.1.1 Example dataset
From this dataset, this example uses variables
We will aggregate these into the variable
highDose_attitude (note that this variable already exists in the dataset, and that existing variable is also the mean of those five variables).
4.2 Input: jamovi
In the “Data” tab, click the “Compute” button as shown in Figure 4.1.
Type in the new variable name in the text field at the top, labelled “COMPUTED VARIABLE”. Then click the function button, marked \(f_x\), select the MEAN function from the box labelled “Functions”, and double click all variables for which you want the mean in the box labelled “Variables”, while typing a comma in between each variable name as shown in Figure 4.1.
Alternatively, you can type the function name and list of variables directly without using the function (\(f_x\)) dialog as shown in Figure 4.2.
If you want to allow missing values, you can specify the
ignore_missing=1 argument. In that case, you would type:
It is as yet not possible to indicate the number of valid values that is required; either no missings are allowed at all, or any number of missing values is accepted.
4.3 Input: R
In R, there are roughly three approaches. Many analyses can be done with base R without installing additional packages. The
rosetta package accompanies this book and aims to provide output similar to jamovi and SPSS with simple commands. Finally, the tidyverse is a popular collection of packages that try to work together consistently but implement a different underlying logic that base R (and so, the
4.3.1 R: base R
4.3.2 R: Rosetta
To indicate that a certain number of values must be valid (i.e. “non-missing”), the argument
requiredValidValues can be passed. For example, to require four valid values (instead of requiring only one valid value, the default), use:
4.4 Input: SPSS
For SPSS, there are two approaches: using the Graphical User Interface (GUI) or specify an analysis script, which in SPSS are called “syntax”.
4.4.1 SPSS: GUI
First activate the
dat dataset (see 2.4.1).
4.4.2 SPSS: Syntax
COMPUTE highdose_attitude = MEAN( highDose_AttGeneral_good, highDose_AttGeneral_prettig, highDose_AttGeneral_slim, highDose_AttGeneral_gezond, highDose_AttGeneral_spannend ).
To indicate that a certain number of values must be valid (i.e. “non-missing”), the command
MEAN can be appended with a period and the number of required valid values. For example, to required four valid values, use:
COMPUTE highdose_attitude = MEAN.4( highDose_AttGeneral_good, highDose_AttGeneral_prettig, highDose_AttGeneral_slim, highDose_AttGeneral_gezond, highDose_AttGeneral_spannend ).
Aggregating variables is not an analysis, and as such, does not produce output. You can inspect the newly created variable to ensure it has been created properly.