Selection is not a search operation but influences the search performance of our algorithms significantly.
Selection is usually performed before variation operations, allowing for the breeding of the fittest individuals
The process of selection emphasises exploiting better solutions in a population by selecting one or more copies of good solutions and inferior solutions with a much lower probability (but not 0)
We continue to allow for selection of inferior solutions to prevent our solution from getting 'stuck' in a local minima/ maxima
There are multiple selection schemes:
We can group these schemes based on
Fitness Proportional Selection can be represented using a roulette wheel
Selecting an individual has a probability:
where is the fitness value of individual, , is the number of individuals
This does not allow for negative fitness values. Individuals with higher fitness values will be more likely to be selected, but there is still a chance that they might be eliminated.
Individuals with a low fitness may still survive selection, allowing some weak individuals who may help escape local optima
We notice that in early generations there might emerge a sub-population of "super individual" with very high fitness values
Conversely, in later generations, we may find no such sub-population.
These "super individuals" can result in premature convergence to a local optimum
A lack of separation in our population, seen in later generations can lead to slow convergence
How can we maintain the same selection pressure throughout training ?
We can achieve this by replacing raw fitness values with scaled fitness values
Linear scaling takes the form:
where and are constants, usually and where
Fitness Proportional Selection is not widely used now, instead people opt for other schemes such as :
In ranking selection we sort our population of size from best to worst by their fitness values
From this select the top -ranked individual with a probability where is the rank and is the ranking function such as:
Where implies and
This gives us the expectation that:
Where C is a normalising factor and
In this scheme first rank individuals by fitness value then select some proportion, k of the top ranked individuals. Usually or
In tournament selection you set a tournament size, before:
This is one of the most popular selection methods in genetic algorithms, with binary tournament selection () being the most popular flavour
Selection pressure is the degree to which selection promotes fitter individuals
A higher selection pressure leads to more exploitation and faster, premature convergence to a local optimum
A low selection pressure promotes exploration and a slow convergence
We measure the selection pressure of a scheme with a value called "take-over time",
Assuming we have a population of size and initial population with one unique fittest individual we apply selection repeatedly with no other operations. is the number of generations until the population consists entirely of
Higher values of take-over time mean lower selection pressure
Selection Function | Note | |
---|---|---|
Fitness Prop | Assuming a power law objective function $f(x) = x^c | |
Linear Ranking | ||
Truncation | ||
Tournament | tournament size | |
Reproduction is the control of how a genetic algorithm creates the next generation.
Generational vs steady state
-elitisms, this is the process by which we always copy the best individuals to the next generation
Generational gap: the overlap between old and new generations. I.e. the individuals who do not go through any variation operators between generations