Backtest a strategy on different datasets

Suppose you wanted to test a strategy on more than one dataset. The function btest in package PMwR provides a convenient way to do this.

Start with the data. We create two datasets (but it could be more than two): industry indices from Kenneth French's Data Library. Both are multivariate zoo objects, and we call them prices.17 and prices.48.

library("NMOF")
library("zoo")

prices.17 <- French("~/Downloads/French",
                    "17_Industry_Portfolios_daily_CSV.zip",
                    price.series = TRUE)
prices.17 <- window(zoo(prices.17, as.Date(row.names(prices.17))),
                    start = as.Date("2000-1-1"))


prices.48 <- French("~/Downloads/French",
                    "48_Industry_Portfolios_daily_CSV.zip",
                    price.series = TRUE, na.rm = TRUE)
prices.48 <- window(zoo(prices.48, as.Date(row.names(prices.48))),
                    start = as.Date("2000-1-1"))

Suppose that we wanted to see the performance of equally-weighted portfolios, rebalanced quarterly. The signal function, which btest requires, could be written as follows.

ew <- function() {
    k <- ncol(Close())
    rep(1/k, k)
}

In fact, the function could be simpler: we know the number of assets in the portfolios – 17 and 48 –, so there would be not need to compute them from the data. Instead, we could pass them as arguments. But we want the code to be as simple as possible, and the speedup would be minuscule.

It is easy enough to call btest two times now.

library("PMwR")
bt.17 <- btest(list(coredata(prices.17)),
               signal = ew,
               initial.cash = 100,
               convert.weights = TRUE,
               timestamp = index(prices.17),
               do.signal = "lastofquarter")

bt.48 <- btest(list(coredata(prices.48)),
               signal = ew,
               initial.cash = 100,
               convert.weights = TRUE,
               timestamp = index(prices.48),
               do.signal = "lastofquarter")

bt.17
bt.48
plot(bt.48, col = "darkgreen")
lines(bt.17, col = "blue")
initial wealth 100  =>  final wealth  464.62 
Total return   364.6%

initial wealth 100  =>  final wealth  536.67 
Total return   436.7%

Even easier would be to call btest only once. (In particular if we had more than two datasets.)

prices <- list(list(coredata(prices.17)),
               list(coredata(prices.48)))

bt <- btest(signal = ew,
            initial.cash = 100,
            convert.weights = TRUE,
            do.signal = "lastofquarter",
            timestamp = index(prices.17),
            variations = list(prices = prices),
            variations.settings = list(labels = c("sec17", "sec48")))
bt

$sec17
initial wealth 100  =>  final wealth  464.62 
Total return   364.6%

$sec48
initial wealth 100  =>  final wealth  536.67 
Total return   436.7%

All we had to do was package both datasets together into one list, and then place that list into a named list: this latter list, in turn, we passed to argument variations.

str(list(prices = prices))
List of 1
 $ prices:List of 2
  ..$ :List of 1
  .. ..$ : num [1:4967, 1:17] 2920 2871 2894 2932 3031 ...
  .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. ..$ : chr [1:4967] "2000-01-03" "2000-01-04" "2000-01-05" ...
  .. .. .. ..$ : chr [1:17] "Food" "Mines" "Oil" "Clths" ...
  ..$ :List of 1
  .. ..$ : num [1:4967, 1:48] 388 378 389 390 401 ...
  .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. ..$ : chr [1:4967] "2000-01-03" "2000-01-04" "2000-01-05" ...
  .. .. .. ..$ : chr [1:48] "Agric" "Food" "Soda" "Beer" ...

This list-of-list structure may seem complicated at first. But it's simple, actually: variations expects as input a named list. The names of this list are matched against the arguments names of btest. In our case, these variations have an element named prices. Now, btest loops over all elements in prices, in effect calling

btest(prices = variations$prices[[1]], ...)
btest(prices = variations$prices[[2]], ...)

and so on.

Looping is the default way in which btest evaluates variations. Alternatively, we could instruct the function to run the backtests in parallel.

bt <- btest(signal = ew,
            initial.cash = 100,
            convert.weights = TRUE,
            timestamp = index(prices.17),
            do.signal = "lastofquarter",
            variations = list(prices = prices),
            variations.settings = list(labels = c("sec17", "sec48"),
                                       method = "multicore"))
bt
$sec17
initial wealth 100  =>  final wealth  464.62 
Total return   364.6%

$sec48
initial wealth 100  =>  final wealth  536.67 
Total return   436.7%

If you run the examples, you will find they are quickly computed, and so distribution does not offer much of an advantage. But for more-expensive models, running them in parallel can save quite some time.

When you look at the example closely, you will find that we passed the same timestamp information for both datasets. For these particular datasets, this is OK because the timestamps are indeed identical.

all.equal(index(prices.17), index(prices.48))
[1] TRUE

A longer, but safer version would have been this:

btest(signal = ew,
      initial.cash = 100,
      convert.weights = TRUE,
      do.signal = "lastofquarter",
      variations = list(prices = prices,
                        timestamp = list(index(prices.17),
                                         index(prices.48))),
      variations.settings = list(labels = c("sec17", "sec48"),
                                 expand.grid = FALSE))
$sec17
initial wealth 100  =>  final wealth  464.62 
Total return   364.6%

$sec48
initial wealth 100  =>  final wealth  536.67 
Total return   436.7%

This version would be equivalent to calling

btest(prices    = variations$prices   [[1]],
      timestamp = variations$timestamp[[1]], ...)
btest(prices    = variations$prices   [[2]],
      timestamp = variations$timestamp[[2]], ...)

But note that we have set variations.settings$expand.grid to FALSE. If we hadn't, btest would have computed all combinations of prices and timestamps.