Create Dummy Data using Python

If you are learning Data Science and find it hard to create a dataset for practice from scratch, you can either download a dataset from Kaggle or create fake data. If you want to learn how to create a dummy dataset in just a few lines of code, this article is for you. In this article, I will take you through how to create dummy data using Python.

Create Dummy Data using Python

To create dummy data using Python, we can use the faker library. The faker library generates fake data randomly. If you have never used this library before, you can easily install it by using the pip command mentioned below in your command prompt or terminal:

  • pip install faker

Now let’s look at some examples of this library before creating a dummy dataset. The code below will return a fake name, address, and text randomly:

from faker import Faker
fake = Faker()
print(fake.name())
print(fake.address())
print(fake.text())
Sean Obrien
2606 Mackenzie Tunnel Apt. 215
East Ericfurt, CO 88091
Building job station sometimes what language money. Able air really it study suffer health. Body why approach difference case notice choose.

Every time you will run this code, you will get different results. Now let’s see how to create fake data for creating a dummy dataset using Python.

The Faker().profile() method returns fake data about job profiles containing 13 columns. So below is how you can create a dummy dataset using Python:

from faker import Faker
import pandas as pd
fake = Faker()
data = [fake.profile() for i in range(50)]
data = pd.DataFrame(data)
print(data.head())
                                     job  ...   birthdate
0  Engineer, control and instrumentation  ...  1949-06-13
1                     Editor, film/video  ...  1959-07-23
2                           Chiropractor  ...  1927-12-12
3                           Nurse, adult  ...  1996-11-02
4                      Personnel officer  ...  1953-08-19

[5 rows x 13 columns]

You can learn more about creating fake data using the faker library from here.

Summary

So this is how you can create a fake or dummy dataset using the Python programming language. If you want to work with better datasets, I will recommend visiting Kaggle. I hope you liked this article on creating dummy data using Python. Feel free to ask valuable questions in the comments section below.

Aman Kharwal
Aman Kharwal

Data Strategist at Statso. My aim is to decode data science for the real world in the most simple words.

Articles: 1607

Leave a Reply

Discover more from thecleverprogrammer

Subscribe now to keep reading and get access to the full archive.

Continue reading