# Remarks on 'A comparison of some heuristic optimization methods'

This note is a reply to http://www.portfolioprobe.com/2012/07/23/a-comparison-of-some-heuristic-optimization-methods/

## Principles

Pat's example showed that there is apparently a sizeable
difference between the different heuristics. Typically, the
specific settings and choices (such as how the solution is
represented, or for how many iterations an algorithm is run)
matter a lot for such methods. Thus, we could be tempted to
argue that the authors of the used packages should have 'tried
harder' to incorporate better default settings. (Note that I am
one of these authors.) But I think this is illusory. Much
better would be to **educate** people how to use such methods
properly.

Two principles should guide the use of heuristics: (i) the application matters, and (ii) go experiment. (Actually these principles are valid for any optimisation technique.)

(i) The application matters in the sense that how a good a
solution is needs to be judged with respect to what we actually
want to achieve. Quantities need to be measured in meaningful
units. The example is small and artifial, so it is difficult to
evaluate the importance of these errors. Nevertheless, we need
constant reminding that we cannot say whether an error of 0.2
matters as long we do not know what we actually measure. (One
may argue that the post was only about numerical aspects, and
so this principle is off-topic. But I think it is just as
off-topic as Morgenstern's 'On the Accuracy of Economic
Observations' is off-topic to what econometricians do.)^{1}

(ii) There is no way to configure a **general-purpose heuristic**
such that it works out-of-the-box for all cases. (Which is not
true for a specialised implementation.) The second principle
states that this is not a problem: users simply need to give up
the idea that they can run a method once with default settings
and are done. Rather, users need to run small-scale experiments
to see what settings are appropriate.

## Default settings

When I had started working on the NMOF package, I talked to Manfred Gilli about whether the methods should have default settings. My preference was No; but Manfred suggested to add 'some numbers' to make it easier for people to get started with these methods. But clearly, to get started, not to use them 'as is'. It is easier to modify – to play around with – a programme that actually runs.

The trouble with heuristics is that it is difficult to come up with generic settings that 'simply work'. This difficulty is worst in the old methods, like Simulated Annealing or Tabu Search, for which the most critical part is the neighbourhood. It is a lesser problem for more–precisely-defined techniques like Differential Evolution (DE). For instance with DE, there is mainly one problem: how long to run the algorithm?

We cannot use the rules of classical optimisation. Even if there was no improvement in the best objective function value for quite some time, we could not be sure that the algorithm should be stopped: heuristics were specially designed to be able to walk away from local minima, and this may take time.

(Well, there are at least two exceptional cases that I can
think of. First, if the objective function is some kind of
distance for which the best possible value is zero, we can stop
the algorithm when we are at zero. Or rather 'sufficiently
close to zero'. What is sufficiently close? That totally
depends on the problem. Second, sometimes the mechanics of a
technique tell us that we can stop the algorithm. For instance,
in the standard DE, new solutions are linear combinations of
old solutions. Thus, if the population has converged such that
all members are the same, we can stop. Again: 'the same' means
'roughly the same'; and what that means depends on your
problem.^{2})

## So how long to run the algorithm?

Actually, it's quite simple. We set up the problem, fix the
number of iterations to I_{1}, and run the method, say, 20
times. For each run, we store the solutions (most easily, we
look at the objective function values). Now we increase the
iterations to I_{2}, and again run DE 20 times. For each setting,
we check the obtained solutions (eg, look at the distribution
function of the objective function values). This is not a
formal method; but it gives us an idea what the algorithm gives
us for specific settings, and thus we can easily find settings
that are good enough.

John Tukey is often quoted for having said something like 'the best thing is when the data give you so clear a message that you cannot ignore it'. In my experience, this is exactly what happens when you run such experiments; furthermore, the obtained settings work typically well for the given model even when used with new data.

## Solving Pat's example

### The exact solution

Pat's example is to maximise a linear combination of portfolio return and portfolio variance, subject to a cardinality constraint. For such a small model, we can write down all possibilities and solve exactly.

We will try all possible combinations of assets, and for each use the
`quadprog`

package.

require("quadprog") require("NMOF") ## 'qp' solves the problem for a given vector ## of return forecasts and a given matrix of ## variance forecasts qp <- function(mv, cm) { d <- mv D <- 2*cm A <- rbind(1, diag(length(d))) bvec <- c(1, rep(0, length(d))) solve.QP(D, d, t(A), bvec, meq = 1L)$solution } ## 'OF' gives the objective function value for a ## portfolio 'w' OF <- function(w, mv, cm) sum(w * mv) - w %*% cm %*% w

We look at the `A10s`

example which seems to be trickier.

ofVals <- numeric(0L) ## storing the results assets <- NULL; weights <- NULL mv <- A10s ## mean return forecasts cm <- V10 ## covariance matrix nA <- 10L ## number of assets card <- 1L; sel <- combn(nA, card) tmpA <- split(sel, rep(seq_len(ncol(sel)), each = nrow(sel))) tmpW <- vector("list", length = ncol(sel)) names(tmpW) <- names(tmpA) <- paste("c", card, "_", seq_len(ncol(sel)), sep = "") tmpV <- numeric(ncol(sel)) for (i in seq_along(tmpV)) { ai <- sel[ ,i] tmpW[[i]] <- 1 tmpV[i] <- OF(tmpW[[i]], mv = as.vector(mv[ai]), cm = cm[ai,ai]) } cards <- 2:5 for (card in cards) { sel <- combn(nA, card) tmpA <- split(sel, rep(seq_len(ncol(sel)), each = nrow(sel))) tmpW <- vector("list", length = ncol(sel)) names(tmpW) <- names(tmpA) <- paste("c", card, "_", seq_len(ncol(sel)), sep = "") tmpV <- numeric(ncol(sel)) for (i in seq_along(tmpV)) { ai <- sel[ ,i] tmpW[[i]] <- qp(mv = as.vector(mv[ai]), cm = cm[ai,ai]) tmpV[i] <- OF(tmpW[[i]], mv = as.vector(mv[ai]), cm = cm[ai,ai]) } assets <- c(assets, tmpA) ofVals <- c(ofVals, tmpV) weights <- c(weights, tmpW) }

The best objective function value.

round(max(ofVals), 4)

[1] -1.3897

The selected assets:

```
assets[[which.max(ofVals)]]
```

[1] 1 2 3 8 10

The weights:

round(weights[[which.max(ofVals)]], 2)

[1] 0.17 0.29 0.27 0.20 0.08

Thus, we have a benchmark solution. Now we can see how other methods fare.

### Using Threshold Accepting

We use Threshold Accepting. First, we collect all pieces of information in a list to have them in one place (which makes it easier to pass them explicitly to the optimisation function; it is bad practice to rely on a function finding objects in the global environment).

Data <- list(wmax = 1, ## the maximal weight Kmax = 5, ## max cardinality eps = 0.5/100, ## the step size resample = function(x, ...) x[sample.int(length(x), ...)], na = 10L, cm = cm, mv = mv)

Since there are 10 assets, we represent the solution as a
numeric vector of length 10. We define a *neighbourhood*
function: it changes a given portfolio slightly; but the new
portfolio will remain feasible.

neighbour <- function(w, Data) { tol <- 1e-12 J <- sum(w > tol) if (J == Data$Kmax) toBuy <- w > tol & w < Data$wmax else toBuy <- w < Data$wmax toSell <- w > tol i <- Data$resample(which(toSell), size = 1L) j <- Data$resample(which(toBuy), size = 1L) eps <- runif(1) * Data$eps eps <- min(w[i], Data$wmax - w[j], eps) w[i] <- w[i] - eps w[j] <- w[j] + eps w }

We create a random feasible solution.

makex0 <- function() { k <- sample.int(5,1) x0 <- numeric(10) x0[sample(1:10, k)] <- 1/k x0 } x0 <- makex0()

As an example, we move from a random `x0`

to a random `x1`

.

x1 <- neighbour(x0, Data) x2 <- neighbour(x1, Data) data.frame(x0 = round(x0, 4), x1 = round(x1, 4), x2 = round(x2, 4))

x0 x1 x2 1 0 0.0000 0.0027 2 0 0.0000 0.0000 3 0 0.0000 0.0000 4 0 0.0000 0.0000 5 0 0.0000 0.0000 6 0 0.0000 0.0000 7 1 0.9952 0.9925 8 0 0.0048 0.0048 9 0 0.0000 0.0000 10 0 0.0000 0.0000

We run `TAopt`

. Note that we redefine the objective function so
that it takes `Data`

as an argument and that it switches the
sign (ie, we minimise). It takes less than a second.

TAsettings <- list(neighbour = neighbour, x0 = makex0, q = 0.9, printBar = FALSE, printDetail = FALSE) OFTA <- function (w, Data) -sum(w * Data$mv) + w %*% Data$cm %*% w system.time(sol <- TAopt(OFTA, algo = TAsettings, Data = Data))

user system elapsed 0.64 0.00 0.64

The final value.

round(sol$OFvalue,4)

1.3897

We run the algorithm 100 times. (To speed up the process, we distribute these restarts.)

restarts <- restartOpt(TAopt, n = 100, OFTA, algo = TAsettings, Data = Data, method = "snow", cl = 4) summary(sapply(restarts, `[[`, "OFvalue"))

Min. 1st Qu. Median Mean 3rd Qu. Max. 1.390 1.390 1.390 1.422 1.468 1.608

Indeed, the solutions differ (even though more than half the runs return the optimum).

As said initially, heuristics can walk away from local minima; so we increase the number of iterations.

TAsettings$nS <- 2000 ## default is 1000

restarts <- restartOpt(TAopt, n = 100, OFTA, algo = TAsettings, Data = Data, method = "snow", cl = 4) summary(sapply(restarts, `[[`, "OFvalue"))

Min. 1st Qu. Median Mean 3rd Qu. Max. 1.390 1.390 1.390 1.403 1.390 1.502

That looks better already. In particular, look at the mean value.

TAsettings$nS <- 4000 ## default is 1000 restarts <- restartOpt(TAopt, n = 100, OFTA, algo = TAsettings, Data = Data, method = "snow", cl = 4) summary( sapply(restarts, `[[`, "OFvalue")) quantile(sapply(restarts, `[[`, "OFvalue"), c(0.5, 0.90))

Min. 1st Qu. Median Mean 3rd Qu. Max. 1.390 1.390 1.390 1.390 1.390 1.401 50% 90% 1.389657 1.389657

Now more than 90% of all runs return the optimum. We could increase the iterations further, but better would be to restart the algorithm a few times and keep the best solution; see Gilli/Maringer/Schumann, 2011, for a discussion and many more details.

Such a best-of strategy can be easily used by setting the
`best.only`

argument of `restartOpt`

to `TRUE`

.

TAsettings$nS <- 1000 ## default is 1000 bestof3 <- restartOpt(TAopt, n = 3, OFTA, algo = TAsettings, Data = Data, best.only = TRUE) bestof3$OFvalue

[,1] [1,] 1.389657

## Other stuff

### DEopt

`DEopt`

implements box constraints (you need to set
`minmaxConstr`

to `TRUE`

). But there is a difference how the
population is initialised, and how we constrain a solution.

### Measuring time

I don't like Windows either `:-)`

(but the performance of
sorting algorithms typically varies strongly with the
'sortedness' of the input).

## References

- Gilli, M., Maringer, D., Schumann, E. (2011) . Numerical Methods and Optimization in Finance. Elsevier/Academic Press. http://nmof.net
- the tangled R-code

## Footnotes:

^{1} What would be wrong with unnecessary numeric precision?
Two things: it will typically cost us in terms of computing
time or time-to-develop; worse still, too much precision may
give us a sense of exactitude that is simply not warranted by
financial models.

^{2} I do not talk about numeric precision here, but the
precision that is adequate for a given problem. I am not aware
of a financial model that needs to be solved to six decimals.