Chapter 21 Correlation
21.1 Intro
A correlation coefficient is an estimation of the degree to which two variables have a linear association, and is the square root of their mutual proportions of explained variance.
21.1.1 Example dataset
This example uses the Rosetta Stats example dataset “pp15” (see Chapter 1 for information about the datasets and Chapter 3 for an explanation of how to load datasets).
21.1.2 Variable(s)
From this dataset, this example uses variables highDose_AttDesirable_long
, highDose_AttDesirable_intens
, highDose_AttDesirable_intoxicated
, highDose_AttDesirable_energy
, and highDose_AttDesirable_euphoria
; these are a number of expressions of which effects people prefer when using MDMA (see Chapter 1).
21.2 Input: jamovi
In jamovi, use the ‘Regression’ menu, choose ‘Correlation matrix’, and select the variables you want to include.
You can check the checkbox for confidence intervals to order confidence intervals.
21.3 Input: R
Many analyses can be done with base R without installing additional packages. The rosetta
package accompanies this book and aims to provide output similar to jamovi and SPSS with simple commands.
21.3.1 R: base R
A basic correlation matrix can be produced with cor()
, passing argument use="complete.obs"
if there are missing values in the dataset (otherwise, missing values result in a correlation estimate that it also missing).
cor(
dat[
,c(
'highDose_AttDesirable_long',
'highDose_AttDesirable_intens',
'highDose_AttDesirable_intoxicated',
'highDose_AttDesirable_energy',
'highDose_AttDesirable_euphoria'
)
],use = "complete.obs"
);
To obtain confidence intervals for a correlation, cor.test()
can be used. However, this function only works for one correlation.
cor.test(
$highDose_AttDesirable_long,
dat$highDose_AttDesirable_intens
dat );
21.3.2 R: rosetta (ufs
)
A correlation matrix function has not yet been made available in the rosetta
package, but it is available in the ufs
package that comes installed with rosetta
. Therefore, if you have rosetta
installed, you can use the following command.
::associationMatrix(
ufs
dat,x = c(
'highDose_AttDesirable_long',
'highDose_AttDesirable_intens',
'highDose_AttDesirable_intoxicated',
'highDose_AttDesirable_energy',
'highDose_AttDesirable_euphoria'
) );
This function provides the confidence intervals (the confidence level, by default \(95\%\), can be set with argument conf.level
) as well as the point estimates and associated \(p\)-values. The \(p\)-values are corrected for multiple testing (using the false detection rate approach by default; this can be set using the correction
argument; for example, pass correction="none"
to not correct the \(p\)-values), and sample sizes are printed as well if they differ for each comparison (and omitted if they are the same for all correlation coefficients).
21.4 Input: SPSS
For SPSS, there are two approaches: using the Graphical User Interface (GUI) or specify an analysis script, which in SPSS are called “syntax”.
21.6 Output: R
21.6.1 R: base
A correlation matrix (note: the variable names have been manually shortened, and the resulting correlations have been rounded to four decimal places, to make this example fit in the book):
long intens intoxi energy euphor
long 1.0000 0.5724 0.3737 0.3885 0.4663
intens 0.5724 1.0000 0.5843 0.3476 0.3441
intoxi 0.3737 0.5843 1.0000 0.3519 0.1474
energy 0.3885 0.3476 0.3519 1.0000 0.4772
euphor 0.4663 0.3441 0.1474 0.4772 1.0000
The results of cor.test()
including the confidence interval:
Pearson's product-moment correlation
data: dat$highDose_AttDesirable_long and dat$highDose_AttDesirable_intens
t = 10.068, df = 208, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.4737307 0.6568901
sample estimates:
cor
0.5724077
21.6.2 R: rosetta (ufs)
Note: the variable names in the first column have been adjusted to make the table fit, and make the labels consistent with those in Chapter 22.
|
|
|
|
|
|
---|---|---|---|---|---|
Prefer long effects | |||||
Prefer intense effects | r=[0.47; 0.66], r=0.57, p<.001 | ||||
Prefer more intoxication | r=[0.25; 0.48], r=0.37, p<.001 | r=[0.49; 0.67], r=0.58, p<.001 | |||
Prefer more energy | r=[0.27; 0.5], r=0.39, p<.001 | r=[0.22; 0.46], r=0.35, p<.001 | r=[0.23; 0.47], r=0.35, p<.001 | ||
Prefer more euphoria | r=[0.35; 0.57], r=0.47, p<.001 | r=[0.22; 0.46], r=0.34, p<.001 | r=[0.01; 0.28], r=0.15, p=.033 | r=[0.37; 0.58], r=0.48, p<.001 |
21.8 Read more
If you would like more background on this topic, you can read more in these sources:
- More options for creating scattermatrices in R are available here: http://www.sthda.com/english/wiki/scatter-plot-matrices-r-base-graphs