A niche refers to the formation of groups of individuals in a population. Individuals in these groups are similar.
This is useful in maintaining population diversity, allowing better exploration of the search space.
It is also useful for:
There are different techniques for niching including (fitness) sharing and crowding.
Initially introduced by Goldberg and Richardson in 1987. It alters only the fitness assignment stage of a GA, it must be the last applied before selection.
Fitness sharing transforms the fitness of an individual into a shared group fitness. It relies on the idea that fitness is a finite resource within each niche.
Essentially, the number of individuals residing near a given peak will be proportional to the height of that peak.
Sharing radius defines the niche size. Individuals within this radius will be regarded as being similar to eachother, needing to share fitness.
The similarity between two individuals is dependant on the distance between them. The similarity between two binary strings can be defined by their Hamming distance.
The sharing function can be defined as:
where is the distance between individuals and
determines how sharp or smooth the edge of the sharing radius is. As the edge gets more blurred as
is set to a value small enough to allow discrimination between desired peaks.
The shared fitness of individual can be defined as:
where is the population size
Population size can be set roughly as a multiple of the number of peaks the user wishes to locate. Sharing is best run for a number of generations, often recommended around . This heuristic comes from shortening the expected convergence time of a GA that uses fitness-proportionate selection.
Sharing can be done at the genotypic or phenotypic levels.
At the genotypic level use the hamming distance to distinguish individuals. This approach is typically applied as a last resort when there is no phenotypic option available.
At the phenotypic level the distance function is defined using problem-specific knowledge of the phenotype. For a set of variables, the most common function is the euclidean distance. However, for a classification problem the distance between two rules can be defined based upon the examples to which they both apply.
Note: A GA under fitness sharing will not converge individuals to the tops of the peaks it locates Instead run another algorithm such as hill climbing afterwards.
Fitness sharing can be implemented with any selection method, however the choice may affect the stability of the algorithm.
The major drawback of fitness sharing is the additional time required to iterate over the population to compute the shared fitness.
In order for fitness sharing to work, raw fitness scaling is often required, defined as followed:
where is a scaling factor
However, this approach has issues:
Solutions include:
The idea for implicit fitness sharing stems from the immune system, the antibody that best matches the invading antigen receives the payoff
For each test case to be solve do the following times:
It has been shown that, theoretically , implicit and explicit fitness sharing share the same basis. A larger value for leads to better performance but longer run-time.
Implicit fitness sharing covers optima more comprehensively even in cases where the population size allows for entire species to form at each optimum.
Explicit fitness sharing is able to find optima where the population size is not great enough to cover all optima.
Note: There is technically a distinction between niching and speciation. Niching is concerned with locating peaks in a search space whereas speciation is more focused on converging to the peaks.