The effect of random noise on correlation

Changes in correlation when random noise is added

Adding random noise to variables always results in a shrinkage of the absolute correlation between these variables towards zero. More precisely, let X und Y be correlated with ρ. Suppose now that we can only observe X'=X+εX und Y'=Y+εY, with \(\epsilon_X\) and \(\epsilon_Y\) unsystematic errors. Then it can be shown that \(|\rho(X,Y)| \geq |\rho(X',Y')|\). This effect is sometimes called `attenuation bias'; see for instance \cite{Johnston1997}.

As an example, we look at the returns of the S&P 500 from 1 January 2009 until today. We simulate a portfolio manager who chooses a random portfolio weight every day. This weight must lie in specific ranges. For each range, we simulate 1000 paths and compute the correlation of the portfolio with the S&P 500. The graphic shows the results. It is apparent that changing the weight, but staying safely in a long-only range does only marginally impact correlation. Correlation is only affected once we are allowed to go to a zero or even negative weight.

runs <- 1000
res <- array(0, dim = c(runs, 7))
min <- 0.6
max <- 1.1

index <- c(PMwR::returns(coredata(sp)))

labels <- character(dim(res)[2L])
for (j in seq_len(dim(res)[2L])){
    min <- min - 0.1
    max <- max - 0.1
    labels[j] <- paste0(round(min,1), "\nto ", round(max,1))
    for (i in seq_len(runs)) {
        fund_ <- index*runif(length(index), min = min, max = max)
        res[i,j] <- cor(index, fund_)
    }
}

par(mar = c(3,3,1,1), bty = "n", las = 1, mgp = c(3,1,0), tck = 0.01)
boxplot(res, xaxt = "n", pars = list(boxwex = 0.5))
axis(1, at = 1:dim(res)[2L], labels = labels, tck = 0.01, lwd = 0)
correlation-with-noise.png