Evolutionary Strategies

2025-07-22

In lieu of a hello-world blog post, I'm sharing some interactive app I built a few years ago to illustrate Evolutionary Strategies (ES) optimization as part of a seminar I prepared for my PhD.

This interactive demo implements a simple Evolution Strategies (ES) optimizer on a 2D objective function. ES is a black‑box optimization method: instead of backpropagating analytic gradients, it perturbs a search distribution, observes how rewards change, and moves the distribution’s mean in a direction that on average improves reward. It’s a great mental model for optimization as exploration + averaging (Hansen, 2006; Wierstra et al., 2008).

Controls and interaction:

  • Click anywhere on the canvas to set the initial mean μ\mu; particles explore from there.
  • speed: sets η\eta (the learning rate).
  • particles: sets nn (population size via n\sqrt{n} slider).
  • zoom: magnifies the view; particles and objective remain aligned in world coordinates.

Click on the canvas below to try it:

0.010100² = 100001.00×

How it works

We maintain a Gaussian search distribution over inputs with mean μR2\mu \in \mathbb{R}^2 and fixed standard deviation σ\sigma:

xi=μ+σεi,εiN(0,I),  i=1,,n.\begin{aligned} x_i = \mu + \sigma\,\varepsilon_i,\qquad \varepsilon_i \sim \mathcal{N}(0, I), \; i=1,\dots,n. \end{aligned}

Each sample is evaluated on the objective f(x)f(x) to obtain rewards ri=f(xi)r_i = f(x_i). To stabilize updates we z‑score the rewards

r~i=rirˉsr,\begin{aligned} \tilde r_i = \frac{r_i - \bar r}{s_r}, \end{aligned}

and form a Natural Evolution Strategies–style gradient estimate for the mean:

g^μ    1nσi=1nr~iεi.\begin{aligned} \hat g_{\mu} \;\propto\; \frac{1}{n\,\sigma}\sum_{i=1}^n \tilde r_i\,\varepsilon_i. \end{aligned}

The mean is updated with learning rate (speed) η\eta:

μμ+ηg^μ.\begin{aligned}\mu \leftarrow \mu + \eta\, \hat g_{\mu}. \end{aligned}

Objective surface used in this demo

We optimize a synthetic multimodal function (sum/difference of Gaussians):

f(x,y)=exp ⁣((x0.3)2+(y+0.3)2242)  +  exp ⁣((x+0.3)2+(y0.3)2222)    exp ⁣((x0.6)2+(y0.6)2222)    exp ⁣((x+0.4)2+(y+0.2)2232).\begin{aligned} f(x, y) &= \exp\!\Big( - \tfrac{(x-0.3)^2 + (y+0.3)^2}{2\cdot 4^2} \Big) \;+\; \exp\!\Big( - \tfrac{(x+0.3)^2 + (y-0.3)^2}{2\cdot 2^2} \Big) \\ &\;-\; \exp\!\Big( - \tfrac{(x-0.6)^2 + (y-0.6)^2}{2\cdot 2^2} \Big) \;-\; \exp\!\Big( - \tfrac{(x+0.4)^2 + (y+0.2)^2}{2\cdot 3^2} \Big). \end{aligned}

Algorithm (with inputs):

Inputs:   μR2  (search center),  σ>0  (exploration std),n  (population size),  η  (step size / learning rate)Loop (each frame): 1.   εiN(0,I),  i=1,,n2.   xiμ+σεi3.   rif(xi)4.   r~i(rirˉ)/sr(z‑score)5.   g^μ1nσir~iεi6.   μμ+ηg^μ\begin{aligned} \textbf{Inputs: }&\; \mu \in \mathbb{R}^2\;\, (\text{search center}),\; \sigma>0\;\, (\text{exploration std}),\\ & n\; (\text{population size}),\; \eta\; (\text{step size / learning rate}) \\ \textbf{Loop (each frame): }& \\ \text{1. }&\; \varepsilon_i \sim \mathcal{N}(0, I),\; i=1,\dots,n \\ \text{2. }&\; x_i \leftarrow \mu + \sigma\,\varepsilon_i \\ \text{3. }&\; r_i \leftarrow f(x_i) \\ \text{4. }&\; \tilde r_i \leftarrow (r_i - \bar r)/s_r \quad \text{(z‑score)} \\ \text{5. }&\; \hat g_{\mu} \leftarrow \tfrac{1}{n\sigma} \sum_i \tilde r_i\,\varepsilon_i \\ \text{6. }&\; \mu \leftarrow \mu + \eta\, \hat g_{\mu} \end{aligned}

Why this works

Sampling symmetrically around μ\mu means that directions that tend to increase ff will, on average, receive larger positive scores. After normalization, the weighted average of perturbations points toward higher reward. The step size η\eta trades off stability and speed.

References

Hansen, N. (2006). The CMA Evolution Strategy: A Comparing Review. Towards a New Evolutionary Computation, 75–102.
Wierstra, D., Schaul, T., Glasmachers, T., Sun, Y., Peters, J., & Schmidhuber, J. (2008). Natural Evolution Strategies. Proceedings of the 2008 IEEE Congress on Evolutionary Computation.