Data science is a combination of data mining and computer science. Data plays an important role in every data science task. As a data science professional, you need to know what kind of data you are working with to solve a problem. So, if you want to know the types of data we deal with in Data Science, this article is for you. In this article, I will introduce you to the types of data in Data Science that you need to know.
Types of Data in Data Science
There are four major types of data we deal with as Data Science professionals:
- Numerical data
- Categorical data
- Time series data
- Textual data
Let’s go through all the data types one by one.
Numerical Data
Numerical data is a type of data in the form of numbers. A dataset containing discrete or continuous numerical features is a numerical dataset. To train a machine learning model, we need numerical data. If the data is not numerical, we need to convert it into numerical values for training machine learning models.
For example, height, weight, number of matches, runs scored, etc.
Categorical Data
Categorical data is a data type that contains two or more categories. Whenever you have features in your dataset with categories, such a dataset is useful for training classification models. These datasets are also useful for grouping data.
The income group of a person, gender of the person, and nationality of the person are some examples of categorical data.
Time Series Data
A time-series data is a sequence of data collected over time intervals. Such datasets can be collected based on months, years, days, hours, minutes, or even seconds. Time series datasets help analyze the change in data with the change in time.
Stock price data, monthly sales data, and daily website traffic data are some examples of time-series data.
Textual Data
Textual data is a collection of textual information like words and phrases. The source of textual information can be any piece of text. Such datasets help solve the problems of Natural Language Processing, where we train systems to understand human languages.
Tweets, comments, reviews, and the text of a book are some examples of textual data.
Summary
As a data science professional, you need to know what kind of data you are working with to solve a problem. Numerical, categorical, textual and time-series data are the main types of data you should know as a Data Science professional. I hope you liked this article on the types of data in Data Science you should know. Feel free to ask valuable questions in the comments section below.
Since we cannot train ML model with data that isn’t numeric, how can we change the different data types to numerical so that it becomes suitable for training ML models?
Converting categorical to numerical: https://thecleverprogrammer.com/2020/08/27/one-hot-encoding-in-machine-learning/
Converting text to numerical: https://thecleverprogrammer.com/2021/04/17/convert-text-into-numerical-data-using-python/
Converting image to array: https://thecleverprogrammer.com/2021/06/08/convert-image-to-array-using-python/