Backtests with time-varying asset universes
In this note we'll see how can deal with a particular case of missing values: when certain assets are available only at certain times.
We first get some data: time-series of industry portfolios from Kenneth French's website at https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ The dataset comprises 30 series of daily data, and we use a subset that starts in January 1990.
library("NMOF") library("zoo") P <- French(dest.dir = tempdir(), "30_Industry_Portfolios_daily_CSV.zip", price.series = TRUE, na.rm = TRUE) P <- zoo(P, as.Date(row.names(P))) P <- window(P, start = as.Date("1990-1-1")) str(P)
‘zoo’ series from 1990-01-02 to 2020-08-31 Data: num [1:7727, 1:30] 808 803 797 791 791 ... - attr(*, "dimnames")=List of 2 ..$ : chr [1:7727] "1990-01-02" "1990-01-03" "1990-01-04" ... ..$ : chr [1:30] "Food" "Beer" "Smoke" "Games" ... Index: Date[1:7727], format: "1990-01-02" "1990-01-03" ...
Actually, the data are complete: there are no missing values.
any(is.na(P))
[1] FALSE
So let us make them incomplete: in series 16 to 30, we remove all data before January 2000.
window(P[, 16:30], end = as.Date("1999-12-31")) <- NA
The key feature of btest
to handle such data is this:
if an asset is not selected (i.e. has a zero position),
it is not required for valuing the position, and so it
can be missing. Suppose we wanted to to simulate a
50/50 investment in only the first two series (which,
we know, are complete). With btest
, we could do it
as follows.
library("PMwR") bt <- btest(prices = list(coredata(P)), timestamp = index(P), signal = function() { w <- numeric(ncol(Close())) w[1:2] <- c(0.5, 0.5) w }, do.signal = "lastofquarter", convert.weights = TRUE, initial.cash = 100) head(journal(bt), n = 10, by = FALSE)
As you can see, the function does not complain. If you check the journal, you'll find that all transactions have been in Food and Beer, the first two industries.
instrument timestamp amount price 1 Food 1990-03-30 0.0659531693 759.8573 2 Beer 1990-03-30 0.0335054119 1481.5517 3 Food 1990-06-29 0.0026870029 843.8351 4 Beer 1990-06-29 -0.0011305346 1775.1047 5 Food 1990-09-28 -0.0014578071 775.9197 6 Beer 1990-09-28 0.0007077629 1575.3859 7 Food 1990-12-31 0.0008239410 882.5049 8 Beer 1990-12-31 -0.0003957095 1824.9844 9 Food 1991-03-28 -0.0004120411 1081.1665 10 Beer 1991-03-28 0.0001984854 2237.6230 10 transactions
Now we can start the actual example. The aim in this
exercise is to compute a minimum-variance portfolio
over all available assets. We begin by defining
when certain assets were available, and placing this
information in a data-frame active
.
active <- data.frame(instrument = colnames(P), start = c(rep(as.Date("1990-1-1"), 15), rep(as.Date("2001-1-1"), 15)), end = tail(index(P), 1)) active
instrument start end 1 Food 1990-01-01 2020-08-31 2 Beer 1990-01-01 2020-08-31 3 Smoke 1990-01-01 2020-08-31 4 Games 1990-01-01 2020-08-31 5 Books 1990-01-01 2020-08-31 6 Hshld 1990-01-01 2020-08-31 7 Clths 1990-01-01 2020-08-31 8 Hlth 1990-01-01 2020-08-31 9 Chems 1990-01-01 2020-08-31 10 Txtls 1990-01-01 2020-08-31 11 Cnstr 1990-01-01 2020-08-31 12 Steel 1990-01-01 2020-08-31 13 FabPr 1990-01-01 2020-08-31 14 ElcEq 1990-01-01 2020-08-31 15 Autos 1990-01-01 2020-08-31 16 Carry 2001-01-01 2020-08-31 17 Mines 2001-01-01 2020-08-31 18 Coal 2001-01-01 2020-08-31 19 Oil 2001-01-01 2020-08-31 20 Util 2001-01-01 2020-08-31 21 Telcm 2001-01-01 2020-08-31 22 Servs 2001-01-01 2020-08-31 23 BusEq 2001-01-01 2020-08-31 24 Paper 2001-01-01 2020-08-31 25 Trans 2001-01-01 2020-08-31 26 Whlsl 2001-01-01 2020-08-31 27 Rtail 2001-01-01 2020-08-31 28 Meals 2001-01-01 2020-08-31 29 Fin 2001-01-01 2020-08-31 30 Other 2001-01-01 2020-08-31
Note that we did set start
to 2001, not 2000. You'll
see shortly, why.
Now for the signal
function. It receives active
as
an argument.
mv <- function(active) { ## find those assets that are active ## ==> 'j' is a logical vector that ## indicates the active assets j <- Timestamp() >= active[["start"]] & Timestamp() <= active[["end"]] ## get last 260 prices of active assets and compute ## variance--covariance matrix P.j <- Close(n = 260)[, j] R.j <- returns(P.j) S <- cov(R.j) ## compute minimum-variance weights w.j <- NMOF::minvar(S, wmin = 0, wmax = 0.10) ## create a zero-vector with length equal to number ## of total assets and assign the weights at ## appropriate positions w <- numeric(length(j)) w[j] <- w.j w }
Now you see why we used 2001 as the start date for
series 16 to 30: we'll use one year of historical data
to compute the variance-covariance matrix. (Note that
there are better ways to come up with forecasts of the
variance-covariance matrix, e.g. methods that apply
shrinkage. But the purpose of this note is to show how
to handle missing values in btest
, not to discuss
empirical methods.)
We call btest
.
bt.mv <- btest(prices = list(coredata(P)), timestamp = index(P), signal = mv, do.signal = "lastofquarter", convert.weights = TRUE, initial.cash = 100, active = active, b = 260) bt.mv
initial wealth 100 => final wealth 1652.74 Total return 1552.7%
The backtest runs without problems. As an example, let
us check trades in industry Oil
.
head(journal(bt.mv)["Oil"], 5)
instrument timestamp amount price 1 Oil 2001-03-30 0.0104934366 2656.871 2 Oil 2001-06-29 -0.0003607878 2709.119 3 Oil 2001-09-28 0.0011873853 2383.685 4 Oil 2001-12-31 -0.0043576713 2549.018 5 Oil 2002-03-28 -0.0037902744 2807.207 5 transactions
As expected, the first trades occur only in 2001.
A final remark: we would not have needed to prepare
active
upfront. Instead, we could have checked for
missing values in the signal
function.
mv_with_NA_check <- function() { ## fetch data and check for missing values P <- Close(n = 260) j <- !apply(P, 2, anyNA) ## get last 250 prices of active assets and compute ## variance--covariance matrix P.j <- P[, j] R.j <- returns(P.j) S <- cov(R.j) ## compute minimum-variance weights w.j <- NMOF::minvar(S, wmin = 0, wmax = 0.10) ## create a zero-vector with length equal to number ## of total assets and assign the weights at ## appropriate positions w <- numeric(length(j)) w[j] <- w.j w }
bt.mv2 <- btest(prices = list(coredata(P)), timestamp = index(P), signal = mv_with_NA_check, do.signal = "lastofquarter", convert.weights = TRUE, initial.cash = 100, b = 260) bt.mv2 head(journal(bt.mv)["Oil"], 5)
initial wealth 100 => final wealth 1652.74 Total return 1552.7% instrument timestamp amount price 1 Oil 2001-03-30 0.0104934366 2656.871 2 Oil 2001-06-29 -0.0003607878 2709.119 3 Oil 2001-09-28 0.0011873853 2383.685 4 Oil 2001-12-31 -0.0043576713 2549.018 5 Oil 2002-03-28 -0.0037902744 2807.207 5 transactions
We get the same results. But defining an explicit list is more, well, explicit. Which is often a good thing when analysing data; notably, because it sets an expectation that those active time-series don't have missing values.