# Sampling with a rank-deficient variance–covariance matrix

We wish to create *m* random-samples of *n* Gaussian variates that
follow a specific variance–covariance matrix Σ. The canonical
way to do that (and, unfortunately, often the only one that is
described in textbooks) is to

- create an m × n matrix
*X*of standard Gaussian variates without any specific correlation, - compute the Cholesky factor of the desired variance–covariance matrix, and
- post-multiply
*X*by this factor.

If Σ is rank-deficient, we can use a factorisation of Σ that does not require full-rank, such as the eigenvalue decomposition (see GMS).

The downside of an alternative decomposition is that it might be expensive. Also, since we want to sample, we would create more random numbers than we actually need.

Let us start with an example. We call our original data-matrix *M*.

nc <- 3 nr <- 10 M <- array(rnorm(nr*nc), dim = c(nr, nc)) C <- array(0.5, dim = c(nc,nc)) diag(C) <- 1 M <- M %*% chol(C) M <- M[ ,c(1,1,1,2,3)] head(M, 3)

[,1] [,2] [,3] [,4] [,5] [1,] -0.694 -0.694 -0.694 -0.261 0.446 [2,] -1.808 -1.808 -1.808 1.381 1.581 [3,] 0.687 0.687 0.687 1.316 2.186

*M* is rank-deficient by construction, since columns 2 and 3 are
actually copies of column 1. If we were to compute a correlation
matrix from these columns, we would only need a three by three matrix.

The function `colSubset`

will do all the work (this function is
incluced in the NMOF package).

colSubset <- function(x) { nr <- dim(x)[1L] nc <- dim(x)[2L] QR <- qr(x) if ((qrank <- QR$rank) == nc) { ans <- list(columns = seq_len(nc), multiplier = diag(nc)) } else { cols <- QR$pivot[seq_len(qrank)] cols_ <- QR$pivot[(qrank + 1L):nc] D <- array(0, dim = c(qrank, nc)) D[cbind(seq_along(cols), cols)] <- 1 D[ ,cols_] <- qr.solve(x[ ,cols], x[ ,cols_]) ans <- list(columns = cols, multiplier = D) } ans }

```
(tmp <- colSubset(M))
```

$columns [1] 1 4 5 $multiplier [,1] [,2] [,3] [,4] [,5] [1,] 1 1.00e+00 1.00e+00 0 0 [2,] 0 -1.22e-17 -1.22e-17 1 0 [3,] 0 -5.83e-17 -5.83e-17 0 1

It returns a list of two components: a full-rank subset of columns
that spans the column space of the original matrix and a matrix
`multiplier`

.

To create random numbers that match the correlation structure of *M*,
we create random numbers as in the textbook version, but we use a
correlation matrix that uses only the column subset. Then we
postmultiply by `multiplier`

.

C <- cor(M[ ,tmp$columns]) nc <- ncol(C) nr <- 100 X <- array(rnorm(nr*nc), dim = c(nr, nc)) X <- X %*% chol(C) X <- X %*% tmp$multiplier round(cor(X), 1)

[,1] [,2] [,3] [,4] [,5] [1,] 1.0 1.0 1.0 -0.1 0.3 [2,] 1.0 1.0 1.0 -0.1 0.3 [3,] 1.0 1.0 1.0 -0.1 0.3 [4,] -0.1 -0.1 -0.1 1.0 0.3 [5,] 0.3 0.3 0.3 0.3 1.0