Machine Learning Interview Questions

With the growing demand for experts in the field of Machine Learning, more and more experts are starting to research common questions for their interviews. In this article, I’m going to introduce you to some very common machine learning interview questions that are collected by me and my other known machine learning experts who got these machine learning interview questions when they applied to jobs.

Recently I wrote an article on how you can prepare a Data Science Resume if you are planning to give an interview then I will recommend you to follow those steps that I have shown in that article to make a good resume before diving into machine learning interview questions. It will help you in making a good impact to get the job and for a better career. You will find that article here.

Also, Read – Galaxy Classification Model with Machine Learning.

Common Machine Learning Interview Questions

If your model performs well on training data but generalizes poorly to new instances, what happens? Can you name three possible solutions?

If a model performs well on training data but generalizes poorly to new instances, the model is probably overfitting the training data (or we were very lucky on the training data). Possible solutions to overfitting are obtaining more data, simplifying the model (selecting a simpler algorithm, reducing the number of parameters or features used, or regularizing the model) or reducing noise in the data. training.

Suppose you are using polynomial regression. You draw the learning curves and you notice that there is a big gap between the learning error and the validation error. What is happening? What are three ways to solve this problem?

If the validation error is much higher than the training error, this is most likely due to your model over-fitting the training set. One way to try to solve this problem is to reduce the polynomial degree: a model with fewer degrees of freedom is less likely to overfit. Another thing you can try is to regularize the model – for example, adding a ℓ2 (Ridge) penalty or a ℓ1 (Lasso) penalty to the cost function. This will also reduce the degrees of freedom of the model. Finally, you can try increasing the size of the training set.

Can an SVM classifier generate a confidence score when it classifies an instance? And a probability?

An SVM classifier can display the distance between the test instance and the decision limit, and you can use it as a confidence score. However, this score cannot be directly converted into an estimate of the class probability. If you set probability = True when creating an SVM in Scikit-Learn, after training it will calibrate the probabilities using logistic regression on the scores of the SVM (driven by an additional five cross-validation times on the training data). This will add the Predict_proba () and Predict_log_proba () methods to the SVM.

If a decision tree doesn’t fit the training set, is it a good idea to try and scale the input features?

Decision trees don’t care whether training data is scaled or centred; that’s one of the good things about them. So if a decision tree is smaller than the training set, scaling the input features will be just a waste of time.

What is the difference between hard and soft vote classifiers?

A hard-voting classifier simply counts the votes of each classifier in the set and chooses the class that gets the most votes. A soft voting classifier calculates the estimated middle-class probability for each class and selects the class with the highest probability. This gives high confidence votes more weight and often performs better, but it only works if each classifier can estimate the class probabilities.

How to evaluate the performance of a dimensionality reduction algorithm on your dataset?

Intuitively, a dimensionality reduction algorithm works well if it eliminates a large number of dimensions from the dataset without losing too much information. One way to measure this is to apply the inverse transform and measure the reconstruction error. 

However, not all dimensionality reduction algorithms provide an inverse transformation. Alternatively, if you use dimensionality reduction as a preprocessing step before another machine learning algorithm (for example, a Random Forest classifier), you can simply measure the performance of this second algorithm; if the dimensionality reduction has not lost too much information, then the algorithm should work as well as when using the original dataset.

What is the difference between anomaly detection and novelty detection?

Many people use the terms anomaly detection and novelty detection interchangeably, but they are not the same. In anomaly detection, the algorithm is trained on a data set that may contain outliers, and the goal is usually to identify those outliers (in the training set), as well as the values. aberrant among the new instances.

In novelty detection, the algorithm is trained on a set of data that is presumed to be “clean”, and the goal is to detect novelty strictly among new instances. Some algorithms work better for anomaly detection (eg, Isolation Forest), while others are better suited for novelty detection (eg, SVM to a class).

Why is it generally better to use a logistic regression classifier rather than a classic Perceptron? How can you modify a Perceptron to make it equivalent to a logistic regression classifier?

A classic Perceptron will converge only if the dataset is linearly separable, and it will not be able to estimate class probabilities. In contrast, a logistic regression classifier will converge to a good solution even if the dataset is not linearly separable, and it will produce class probabilities.

If you change the activation function of Perceptron to logistic activation (or to softmax activation if there are multiple neurons), and if you train it using Gradient Descent (or some other algorithm of optimization minimizing the cost function, usually cross-entropy), then it becomes equivalent to a logistic regression classifier.

Is TensorFlow an instant replacement for NumPy? What are the main differences between the two?

While TensorFlow offers most of the functionality that NumPy provides, it is not an instant replacement, for several reasons. First, the names of functions are not always the same (for example, tf.reduce_sum () versus np.sum ()). Second, some functions do not behave the same (for example, tf.trans pose () creates a transposed copy of a tensor, while the T attribute of NumPy creates a transposed view, without actually copying any data). Finally, NumPy arrays are mutable, unlike TensorFlow tensors.

If your GPU runs out of memory while training a CNN, what are five things you could try to solve the problem? 

If your GPU is running low on memory while training a CNN, here are five things you can try to fix the problem (other than buying a GPU with more RAM):

  • Reduce the size of the mini-lot.
  • Reduce dimensionality by using a larger stride in one or more layers.
  • Remove one or more layers.
  • Use 16-bit floats instead of 32-bit floats.
  • Distribute CNN across multiple devices.

Also, Read – My Journey From Commerce to Machine Learning.

So these were some very common Machine Learning Interview questions. I hope these will help you in cracking the most of your machine learning interview questions. I hope you liked this article on Machine Learning Interview questions. Feel free to ask your valuable questions in the comments section below. You can also follow me on Medium to learn every topic of Machine Learning.

Follow Us:

Leave a Reply