In lieu of a hello-world blog post, I'm sharing some interactive app I built a few years ago to illustrate Evolutionary Strategies (ES) optimization as part of a seminar I prepared for my PhD.

This interactive demo implements a simple Evolution Strategies (ES) optimizer on a 2D objective function. ES is a black‑box optimization method: instead of backpropagating analytic gradients, it perturbs a search distribution, observes how rewards change, and moves the distribution’s mean in a direction that on average improves reward. It’s a great mental model for optimization as exploration + averaging (Hansen, 2006; Wierstra et al., 2008).

Controls and interaction:

Click anywhere on the canvas to set the initial mean $\mu$ ; particles explore from there.
speed: sets $\eta$ (the learning rate).
particles: sets $n$ (population size via $\sqrt{n}$ slider).
zoom: magnifies the view; particles and objective remain aligned in world coordinates.

Click on the canvas below to try it:

speed0.010particles100² = 10000zoom1.00×

How it works

We maintain a Gaussian search distribution over inputs with mean $\mu \in \mathbb{R}^2$ and fixed standard deviation $\sigma$ :

\begin{aligned} x_i = \mu + \sigma\,\varepsilon_i,\qquad \varepsilon_i \sim \mathcal{N}(0, I), \; i=1,\dots,n. \end{aligned}

Each sample is evaluated on the objective $f(x)$ to obtain rewards $r_i = f(x_i)$ . To stabilize updates we z‑score the rewards

\begin{aligned} \tilde r_i = \frac{r_i - \bar r}{s_r}, \end{aligned}

and form a Natural Evolution Strategies–style gradient estimate for the mean:

\begin{aligned} \hat g_{\mu} \;\propto\; \frac{1}{n\,\sigma}\sum_{i=1}^n \tilde r_i\,\varepsilon_i. \end{aligned}

The mean is updated with learning rate (speed) $\eta$ :

\begin{aligned}\mu \leftarrow \mu + \eta\, \hat g_{\mu}. \end{aligned}

Objective surface used in this demo

We optimize a synthetic multimodal function (sum/difference of Gaussians):

\begin{aligned} f(x, y) &= \exp\!\Big( - \tfrac{(x-0.3)^2 + (y+0.3)^2}{2\cdot 4^2} \Big) \;+\; \exp\!\Big( - \tfrac{(x+0.3)^2 + (y-0.3)^2}{2\cdot 2^2} \Big) \\ &\;-\; \exp\!\Big( - \tfrac{(x-0.6)^2 + (y-0.6)^2}{2\cdot 2^2} \Big) \;-\; \exp\!\Big( - \tfrac{(x+0.4)^2 + (y+0.2)^2}{2\cdot 3^2} \Big). \end{aligned}

Algorithm (with inputs):

\begin{aligned} \textbf{Inputs: }&\; \mu \in \mathbb{R}^2\;\, (\text{search center}),\; \sigma>0\;\, (\text{exploration std}),\\ & n\; (\text{population size}),\; \eta\; (\text{step size / learning rate}) \\ \textbf{Loop (each frame): }& \\ \text{1. }&\; \varepsilon_i \sim \mathcal{N}(0, I),\; i=1,\dots,n \\ \text{2. }&\; x_i \leftarrow \mu + \sigma\,\varepsilon_i \\ \text{3. }&\; r_i \leftarrow f(x_i) \\ \text{4. }&\; \tilde r_i \leftarrow (r_i - \bar r)/s_r \quad \text{(z‑score)} \\ \text{5. }&\; \hat g_{\mu} \leftarrow \tfrac{1}{n\sigma} \sum_i \tilde r_i\,\varepsilon_i \\ \text{6. }&\; \mu \leftarrow \mu + \eta\, \hat g_{\mu} \end{aligned}

Why this works

Sampling symmetrically around $\mu$ means that directions that tend to increase $f$ will, on average, receive larger positive scores. After normalization, the weighted average of perturbations points toward higher reward. The step size $\eta$ trades off stability and speed.

References

Hansen, N. (2006). The CMA Evolution Strategy: A Comparing Review. Towards a New Evolutionary Computation, 75–102.

Wierstra, D., Schaul, T., Glasmachers, T., Sun, Y., Peters, J., & Schmidhuber, J. (2008). Natural Evolution Strategies. Proceedings of the 2008 IEEE Congress on Evolutionary Computation.