Difference Between Variance Covariance and Correlation

Variance, covariance, and correlation are statistical measures for finding the relationship between data points in a dataset. If you are learning data science, you need to understand these terms. So if you want to learn more about variance, covariance, and correlation, this article is for you. In this article, I will explain the difference between Variance, Covariance and Correlation.

Difference Between Variance Covariance and Correlation

Variance

Variance measures the spread between all data points in a dataset. It shows how far each data point is from the mean value. The formula for calculating the variance is:

  • S2 = Σ(x – x̄)2 / n – 1

In the above formula for calculating variance:

  1. S2 = variance
  2. Σ = sum of
  3. x = value of a data point
  4. x̄ = mean of all the data points
  5. n = total number of observations

Covariance

Covariance finds the direction of the relationship between two data points. It finds whether two data points are directly or inversely proportional. The formula for calculating the covariance is:

  • Cov(x,y) = Σ(xi – x̄)(yi – ȳ) / n – 1

In the above formula for calculating covariance:

  1. Cov(x, y) = covariance between x and y
  2. xi = value of x
  3. yi = value of y
  4. x = mean of x
  5. y = mean of y
  6. n = total number of values

Correlation

Correlation finds the relationship between the data points. It can find the direction of the relationship along with the degree of the relationship. It is often confused with covariance. Covariance only measures whether the two data points are directly or inversely related to each other, but correlation also measures the strength of the relationship between the data points. The formula for calculating the correlation is:

  • r = Cov(x,y) / σx – σy

In the above formula for calculating correlation:

  1. Cov(x,y) = covariance between x and y
  2. σx = standard deviation of x
  3. σy = standard deviation of y

Summary

So below are some of the points about variance, covariance, and correlation that you should take away:

  1. Variance measures the spread between all data points in a dataset.
  2. Covariance finds whether two data points are directly or inversely proportional.
  3. Correlation finds the direction of the relationship along with the degree of the relationship.

I hope you liked this article on the difference between variance, covariance, and correlation. Feel free to ask valuable questions in the comments section below.

Aman Kharwal
Aman Kharwal

Data Strategist at Statso. My aim is to decode data science for the real world in the most simple words.

Articles: 1607

Leave a Reply

Discover more from thecleverprogrammer

Subscribe now to keep reading and get access to the full archive.

Continue reading