Imagine, you are given a dataset consisting of variables having more than 30% missing values. Let’s say, out of 50 variables, 8 variables have missing values, which is higher than 30%. How will you deal with them?

To deal with the missing values, we will do the following:

  • We will specify a different class for the missing values.
  • Now, we will check the distribution of values, and we would hold those missing values that are defining a pattern.
  • Then, we will charge these into a yet another class, while eliminating others.