The genetic algorithm is a computer approximation of how evolution performs research, which involves making changes to the parent genomes in their offspring and thus producing new individuals with different abilities. In this article, I will walk you through how to build a genetic algorithm with Python by solving a real-time case study.

## How does a Genetic Algorithm work?

Like other mathematical models such as neurons, a genetic algorithm tries to ignore everything except the important parts that we need to understand what evolution does. From this principle, the elements we need to model a simple genetic algorithm inside a computer and solve problems with it are:

**Also, Read – What is BigQuery in Data Science?**

- a method of representing problems in the form of chromosomes.
- a way to calculate the adequacy of a solution.
- a selection method for choosing parents.
- a way to generate offspring by raising parents.

I hope you now have understood what a genetic algorithm is. Now let’s head over to a case study to get into a situation where we can build our genetic algorithm.

## Case Study Based on a Genetic Algorithm:

You have [X] MP3 files on your computer hard drive. Unfortunately, the hard drive started making noise and you decide that you had better save the MP3s. Likewise, unfortunately, you can only burn CDs, not DVDs, on your computer. You need to minimize the number of CDs you use, so you decide to design a genetic algorithm to choose which MP3s to put on each CD to fill each CD as completely as possible. Design a genetic algorithm to solve the problem. You will need to think about how you would encode the entries, the appropriate genetic operators, and how you would make the genetic algorithm handle the fact that you have multiple CDs, not just one CD.

The first thing we need to look at is how to represent the problem in the form of chromosomes. As we are having [X] number of files, we should build a binary chromosome where 1 means a particular MP3 is taken from the chromosome while 0 means it is not. The position of this bit will indicate which MP3 we will start with.

## Building a Genetic Algorithm with Python

generateParents is our parent generation function. Since this is the easiest to do, we’ll do it first. The population will contain the total number of descendants that each generation will have and for the first generation, this will be the total number of randomly generated parents. size contains the current total number of MP3s being processed.

totalSize is a simple function that measures the total space used by all selected MP3s in that particular chromosome (data). size contains the current total number of MP3s being processed. For each bit of the chromosome, we check if there is one and if so, we increment s by the size of the MP3 file corresponding to that bit.

reduceSize is the function we use to randomly mutate the chromosome in a way that reduces the total size of the MP3 files described by that chromosome to fit on a CD. So as long as the total chromosome size exceeds 700, we choose a random bit and if it is 1, we change it to 0.

mutate is our mutating function. Mutations happen in real life when new descendants are created, usually, mutations would have a mutation rate, but we are omitting that in this tutorial for simplicity. Our mutation function selects a bit at random and changes its value.

fixChromosomes will take the data and apply it to the reduceSize function to the chromosomes as needed. It also applies the fitness function and adds it to the generation data so that each chromosome has a corresponding fitness. This is also sorted so that those above have the best physical shape.

The crossover function takes 2 parents and performs a random cross between their chromosomes. He chooses a random index i and divides mum and dad on the i-th index then crosses combines them both to generate 2 children. These children are then mutated by the mutation function.

The newGeneration function takes the current generation and produces the next generation. This is done by taking the 4 best parents in terms of fitness and crossing each pair of them to generate new offspring.

However, due to the particular configuration of the problem, we are very likely to have a near-optimal solution straight out of the first generation due to the randomness and our reduction feature. For this reason, we can add the top 2 parents to the new generation to ensure that we don’t lose the optimal results created in previous generations.

## Train Our Algorithm

The train function is where everything comes together. mp3Cnt contains the total number of MP3 files that have not yet been classified on a CD. Every time we process the generation cycle we find a chromosome that best matches the current CD that we are working on.

So once that is done and we are happy with the result, we can produce the required CD and remove all those MP3s from the mp3 list and update mp3Cnt accordingly.

CD1: MP3 Count:17 Size: [699.73941204] CD2: MP3 Count:16 Size: [699.40378924] CD3: MP3 Count:15 Size: [699.6834483] CD4: MP3 Count:10 Size: [699.88691372] CD5: MP3 Count:15 Size: [699.36230913] CD6: MP3 Count:13 Size: [699.58756269] CD7: MP3 Count:12 Size: [699.4258647] CD8: MP3 Count:2 Size: [40.9390979]

**Also, Read – Scaling and Normalization in Machine Learning.**

So this is how we can build a Genetic algorithm with Python. I hope you liked this article on building a genetic algorithm with python. Feel free to ask your valuable questions in the comments section below. You can also follow me on **Medium** to learn every topic of Python and Machine Learning.