Backtest a strategy on different datasets
Suppose you wanted to test a strategy on more than one
dataset. The function btest
in package PMwR
provides a convenient way to do this.
Start with the data. We create two datasets (but it
could be more than two): industry indices from Kenneth
French's Data Library. Both are multivariate zoo
objects, and we call them prices.17
and prices.48
.
library("NMOF") library("zoo") prices.17 <- French("~/Downloads/French", "17_Industry_Portfolios_daily_CSV.zip", price.series = TRUE) prices.17 <- window(zoo(prices.17, as.Date(row.names(prices.17))), start = as.Date("2000-1-1")) prices.48 <- French("~/Downloads/French", "48_Industry_Portfolios_daily_CSV.zip", price.series = TRUE, na.rm = TRUE) prices.48 <- window(zoo(prices.48, as.Date(row.names(prices.48))), start = as.Date("2000-1-1"))
Suppose that we wanted to see the performance of
equally-weighted portfolios, rebalanced quarterly.
The signal
function, which btest
requires, could be
written as follows.
ew <- function() { k <- ncol(Close()) rep(1/k, k) }
In fact, the function could be simpler: we know the number of assets in the portfolios – 17 and 48 –, so there would be not need to compute them from the data. Instead, we could pass them as arguments. But we want the code to be as simple as possible, and the speedup would be minuscule.
It is easy enough to call btest
two times now.
library("PMwR") bt.17 <- btest(list(coredata(prices.17)), signal = ew, initial.cash = 100, convert.weights = TRUE, timestamp = index(prices.17), do.signal = "lastofquarter") bt.48 <- btest(list(coredata(prices.48)), signal = ew, initial.cash = 100, convert.weights = TRUE, timestamp = index(prices.48), do.signal = "lastofquarter") bt.17 bt.48 plot(bt.48, col = "darkgreen") lines(bt.17, col = "blue")
initial wealth 100 => final wealth 464.62 Total return 364.6% initial wealth 100 => final wealth 536.67 Total return 436.7%
Even easier would be to call btest
only once. (In
particular if we had more than two datasets.)
prices <- list(list(coredata(prices.17)), list(coredata(prices.48))) bt <- btest(signal = ew, initial.cash = 100, convert.weights = TRUE, do.signal = "lastofquarter", timestamp = index(prices.17), variations = list(prices = prices), variations.settings = list(labels = c("sec17", "sec48"))) bt
$sec17 initial wealth 100 => final wealth 464.62 Total return 364.6% $sec48 initial wealth 100 => final wealth 536.67 Total return 436.7%
All we had to do was package both datasets together
into one list, and then place that list into a named
list: this latter list, in turn, we passed to argument
variations
.
str(list(prices = prices))
List of 1 $ prices:List of 2 ..$ :List of 1 .. ..$ : num [1:4967, 1:17] 2920 2871 2894 2932 3031 ... .. .. ..- attr(*, "dimnames")=List of 2 .. .. .. ..$ : chr [1:4967] "2000-01-03" "2000-01-04" "2000-01-05" ... .. .. .. ..$ : chr [1:17] "Food" "Mines" "Oil" "Clths" ... ..$ :List of 1 .. ..$ : num [1:4967, 1:48] 388 378 389 390 401 ... .. .. ..- attr(*, "dimnames")=List of 2 .. .. .. ..$ : chr [1:4967] "2000-01-03" "2000-01-04" "2000-01-05" ... .. .. .. ..$ : chr [1:48] "Agric" "Food" "Soda" "Beer" ...
This list-of-list structure may seem complicated at
first. But it's simple, actually: variations
expects
as input a named list. The names of this list are
matched against the arguments names of btest
. In our
case, these variations have an element named
prices
. Now, btest
loops over all elements in
prices
, in effect calling
btest(prices = variations$prices[[1]], ...) btest(prices = variations$prices[[2]], ...)
and so on.
Looping is the default way in which btest
evaluates
variations
. Alternatively, we could instruct the
function to run the backtests in parallel.
bt <- btest(signal = ew, initial.cash = 100, convert.weights = TRUE, timestamp = index(prices.17), do.signal = "lastofquarter", variations = list(prices = prices), variations.settings = list(labels = c("sec17", "sec48"), method = "multicore")) bt
$sec17 initial wealth 100 => final wealth 464.62 Total return 364.6% $sec48 initial wealth 100 => final wealth 536.67 Total return 436.7%
If you run the examples, you will find they are quickly computed, and so distribution does not offer much of an advantage. But for more-expensive models, running them in parallel can save quite some time.
When you look at the example closely, you will find
that we passed the same timestamp
information for
both datasets. For these particular datasets, this is
OK because the timestamps are indeed identical.
all.equal(index(prices.17), index(prices.48))
[1] TRUE
A longer, but safer version would have been this:
btest(signal = ew, initial.cash = 100, convert.weights = TRUE, do.signal = "lastofquarter", variations = list(prices = prices, timestamp = list(index(prices.17), index(prices.48))), variations.settings = list(labels = c("sec17", "sec48"), expand.grid = FALSE))
$sec17 initial wealth 100 => final wealth 464.62 Total return 364.6% $sec48 initial wealth 100 => final wealth 536.67 Total return 436.7%
This version would be equivalent to calling
btest(prices = variations$prices [[1]], timestamp = variations$timestamp[[1]], ...) btest(prices = variations$prices [[2]], timestamp = variations$timestamp[[2]], ...)
But note that we have set
variations.settings$expand.grid
to FALSE
. If we
hadn't, btest
would have computed all combinations of
prices
and timestamps
.