Lecture 6: Selection and Reproduction

Selection

Selection is not a search operation but influences the search performance of our algorithms significantly.

Selection is usually performed before variation operations, allowing for the breeding of the fittest individuals

The process of selection emphasises exploiting better solutions in a population by selecting one or more copies of good solutions and inferior solutions with a much lower probability (but not 0)

We continue to allow for selection of inferior solutions to prevent our solution from getting 'stuck' in a local minima/ maxima

There are multiple selection schemes:

We can group these schemes based on

Fitness Proportional Selection

Fitness Proportional Selection can be represented using a roulette wheel

Selecting an individual ii has a probability:
pi=fij=1Mfjp_i = \frac{f_i}{\sum_{j=1}^M f_j}

where fif_i is the fitness value of individual, ii, MM is the number of individuals

This does not allow for negative fitness values. Individuals with higher fitness values will be more likely to be selected, but there is still a chance that they might be eliminated.

Individuals with a low fitness may still survive selection, allowing some weak individuals who may help escape local optima

Scaling

We notice that in early generations there might emerge a sub-population of "super individual" with very high fitness values
Conversely, in later generations, we may find no such sub-population.

These "super individuals" can result in premature convergence to a local optimum
A lack of separation in our population, seen in later generations can lead to slow convergence

How can we maintain the same selection pressure throughout training ?

We can achieve this by replacing raw fitness values with scaled fitness values fif_i'

Linear scaling takes the form:

fi=a+bfif_i' = a + b \cdot f_i

where aa and bb are constants, usually a=max(f)a = \max(\textbf{f}) and b=min(f)/M<1b = \min(\textbf{f})/M < 1 where f=f1,f2,,fM\textbf{f} = f_1,f_2,\ldots,f_M

Fitness Proportional Selection is not widely used now, instead people opt for other schemes such as :

Ranking Selection

In ranking selection we sort our population of size MM from best to worst by their fitness values

x(M1),x(M2),,x(0)x_{(M-1)}',x_{(M-2)}',\ldots,x_{(0)}'

From this select the top γ\gamma-ranked individual xγx_\gamma' with a probability p(γ)p(\gamma) where γ\gamma is the rank and p(γ)p(\gamma) is the ranking function such as:

Linear Ranking Function:

p(γ)=α+(βα)γM1Mp(\gamma) = \frac{ \alpha + (\beta - \alpha) \cdot \frac{\gamma}{M-1} }{M}

Where γ=0M1p(γ)=1\sum_{\gamma = 0}^{M-1} p(\gamma) =1 implies α+β=2\alpha + \beta =2 and 1β21 \leq \beta \leq 2

This gives us the expectation that:

Other Ranking Functions

Power Ranking Function

p(γ)=α+(βα)(γM1)kCp(\gamma) = \frac{\alpha + (\beta - \alpha)\cdot (\frac{\gamma}{M-1})^k}{C}

Geometric ranking function

p(γ)=α(1α)M1γCp(\gamma) = \frac{\alpha \cdot (1-\alpha)^{M-1-\gamma}}{C}

Exponential ranking function

p(γ)=1eγCp(\gamma) = \frac{1-e^{-\gamma}}{C}

Where C is a normalising factor and 0<α<β0 < \alpha < \beta

Truncated Selection

In this scheme first rank individuals by fitness value then select some proportion, k of the top ranked individuals. Usually k=0.5k=0.5 or k=0.3k=0.3

Tournament Selection

In tournament selection you set a tournament size, kk before:

  1. Randomly sampling a subset PP' of kk individuals from a population PP
  2. Select the individual in PP' with the highest fitness
  3. Repeat 1 and 2 until enough offspring are selected

This is one of the most popular selection methods in genetic algorithms, with binary tournament selection (k=2k=2) being the most popular flavour

(μ+λ)(\mu + \lambda) and (μ,λ)(\mu,\lambda) Selection

(μ+λ)(\mu + \lambda) selection

(μ,λ)(\mu,\lambda) selection , where λ>μ\lambda > \mu

Selection Pressure

Selection pressure is the degree to which selection promotes fitter individuals

Selection Pressure on exploration vs exploitation

A higher selection pressure leads to more exploitation and faster, premature convergence to a local optimum

A low selection pressure promotes exploration and a slow convergence

Measuring Selection Pressure

We measure the selection pressure of a scheme with a value called "take-over time", τ\tau*

Assuming we have a population of size MM and initial population with one unique fittest individual xx* we apply selection repeatedly with no other operations. τ\tau* is the number of generations until the population consists entirely of xx*

Higher values of take-over time mean lower selection pressure

Selection Function τ\tau* \simeq Note
Fitness Prop MlnMc\frac{M\ln{ M}}{c} Assuming a power law objective function $f(x) = x^c
Linear Ranking 2ln(M1)β1\frac{2\ln (M-1)}{\beta -1} 1β21\leq \beta \leq 2
Truncation lnM\ln M
Tournament lnMlnk\frac{\ln M}{\ln k} tournament size kk
(μ+λ)(\mu + \lambda) lnλlnλμ\frac{\ln \lambda}{\ln{\frac{\lambda}{\mu}}}

Reproduction

Reproduction is the control of how a genetic algorithm creates the next generation.

Generational vs steady state

nn-elitisms, this is the process by which we always copy the nn best individuals to the next generation

Generational gap: the overlap between old and new generations. I.e. the individuals who do not go through any variation operators between generations