In this article, I will take you through how the tinder or other dating sites algorithms work. I will solve a case study based on tinder to predict tinder matches with machine learning.
Now before getting started with this task to predict tinder matches with machine learning, I want the readers to go through the case study below so that you can understand how I am going to set up the algorithm to predict the tinder matches.
Case Study: Predict Tinder Matches
My friend Hellen has used some online dating sites to find different people to date. She realized that despite the site’s recommendations, she didn’t like everyone she was matched with. After some soul-searching, she realized that there were three types of people she was dating:
- People she didn’t like
- The people she loved in small doses
- The people she loved in large doses
After finding out about this, Hellen couldn’t figure out what made a person fall into one of these categories. They were all recommended to her by the dating site. The people she liked in small doses were good to see Monday through Friday, but on weekends she preferred spending time with the people she liked in large doses. Hellen asked us to help him filter future matches to categorize them. Also, Hellen has collected data that is not recorded by the dating site, but she finds it useful in selecting who to date.
Solution: Predict Tinder Matches
The data Hellen collects is in a text file called datingTestSet.txt. Hellen has been collecting this data for some time and has 1,000 entries. A new sample is on each line and Hellen recorded the following characteristics:
- Number of loyalty miles earned per year
- Percentage of time spent playing video games
- Litres of ice consumed per week
Before we can use this data in our classifier, we need to change it to the format accepted by our classifier. To do this, we’ll add a new function to our Python file called file2matrix. This function takes a filename string and generates two things: an array of training examples and a vector of class labels.
def file2matrix(filename): fr = open(filename) numberOfLines = len(fr.readlines()) returnMat = zeros((numberOfLines,3)) classLabelVector =  fr = open(filename) index = 0 for line in fr.readlines(): line = line.strip() listFromLine = line.split('\t') returnMat[index,:] = listFromLine[0:3] classLabelVector.append(int(listFromLine[-1])) index += 1 return returnMat,classLabelVector
The code above simply processes the text with Python. To use it, type the following at the Python prompt:
reload(kNN) datingDataMat,datingLabels = kNN.file2matrix('datingTestSet.txt')
Make sure the datingTestSet.txt file is in the same directory as you are working. Note that before running the function, I reloaded the kNN.py module (name of my Python file). When you modify a module, you must reload that module or you will always use the old version. Now let’s explore the text file:
array([[ 7.29170000e+04, 7.10627300e+00, 2.23600000e-01], [ 1.42830000e+04, 2.44186700e+00, 1.90838000e-01], [ 7.34750000e+04, 8.31018900e+00, 8.52795000e-01], ..., [ 1.24290000e+04, 4.43233100e+00, 9.24649000e-01], [ 2.52880000e+04, 1.31899030e+01, 1.05013800e+00], [ 4.91800000e+03, 3.01112400e+00, 1.90663000e-01]])
['didntLike', 'smallDoses', 'didntLike', 'largeDoses', 'smallDoses', 'smallDoses', 'didntLike', 'smallDoses', 'didntLike', 'didntLike', 'largeDoses', 'largeDose s', 'largeDoses', 'didntLike', 'didntLike', 'smallDoses', 'smallDoses', 'didntLike', 'smallDoses', 'didntLike']
When dealing with values that are in different ranges, it is common to normalize them. Common ranges to normalize them are 0 to 1 or -1 to 1. To scale everything from 0 to 1, you need to use the formula below:
newValue = (oldValue-min)/(max-min)
In the normalization procedure, the min and max variables are the smallest and largest values in the dataset. This scaling adds some complexity to our classifier, but it’s worth getting good results. Let’s create a new function called autoNorm() to automatically normalize the data:
def autoNorm(dataSet): minVals = dataSet.min(0) maxVals = dataSet.max(0) ranges = maxVals - minVals normDataSet = zeros(shape(dataSet)) m = dataSet.shape normDataSet = dataSet - tile(minVals, (m,1)) normDataSet = normDataSet/tile(ranges, (m,1)) return normDataSet, ranges, minVals
Now let’s try out autoNorm() function:
reload(kNN) normMat, ranges, minVals = kNN.autoNorm(datingDataMat) normMat
array([[ 0.33060119, 0.58918886, 0.69043973], [ 0.49199139, 0.50262471, 0.13468257], [ 0.34858782, 0.68886842, 0.59540619], ..., [ 0.93077422, 0.52696233, 0.58885466], [ 0.76626481, 0.44109859, 0.88192528], [ 0.0975718 , 0.02096883, 0.02443895]])
You could have returned only normMat, but you need the minimum ranges and values to normalize the test data. You will see this in action next.
Testing the Classifier To Predict Tinder Matches
Now that you have the data in a format you can use, you are ready to test our classifier. After testing it, you can give it to our friend Hellen for him to use. One of the common tasks of machine learning is to assess the accuracy of an algorithm.
One way to use the existing data is to take some of it, say 90%, to train the classifier. Then you will take the remaining 10% to test the classifier and see how accurate it is. There are more advanced ways to do this, which we’ll cover later, but for now, let’s use this method.
The 10% to be retained should be chosen at random. Our data is not stored in a specific sequence, so you can take the top 10 or the bottom 10% without disturbing the stat professors.
To test the classifier to predict tinder matches, I will create a function called datingClassTest:
def datingClassTest(): hoRatio = 0.10 datingDataMat,datingLabels = file2matrix('datingTestSet.txt') normMat, ranges, minVals = autoNorm(datingDataMat) m = normMat.shape numTestVecs = int(m*hoRatio) errorCount = 0.0 for i in range(numTestVecs): classifierResult = classify0(normMat[i,:],normMat[numTestVecs:m,:],\ datingLabels[numTestVecs:m],3) print "the classifier came back with: %d, the real answer is: %d"\ % (classifierResult, datingLabels[i]) if (classifierResult != datingLabels[i]): errorCount += 1.0 print "the total error rate is: %f" % (errorCount/float(numTestVecs))
Now let’s test our function:
the classifier came back with: 1, the real answer is: 1 the classifier came back with: 2, the real answer is: 2 . . the classifier came back with: 1, the real answer is: 1 the classifier came back with: 2, the real answer is: 2 the classifier came back with: 3, the real answer is: 3 the classifier came back with: 3, the real answer is: 1 the classifier came back with: 2, the real answer is: 2 the total error rate is: 0.024000
The total error rate for this classifier on this dataset with these settings is 2.4%. Not bad. Now the next thing to do is to use the whole program as a machine learning system to predict tinder matches.
Putting Everything Together
Now as we have tested the model on our data let’s use the model on the data of Hellen to predict tinder matches for her:
def classifyPerson(): resultList = ['not at all','in small doses', 'in large doses'] percentTats = float(raw_input(\"percentage of time spent playing video games?")) ffMiles = float(raw_input("frequent flier miles earned per year?")) iceCream = float(raw_input("liters of ice cream consumed per year?")) datingDataMat,datingLabels = file2matrix('datingTestSet.txt') normMat, ranges, minVals = autoNorm(datingDataMat) inArr = array([ffMiles, percentTats, iceCream]) classifierResult = classify0((inArr-\minVals)/ranges,normMat,datingLabels,3) print "You will probably like this person: ",\resultList[classifierResult - 1] kNN.classifyPerson()]
percentage of time spent playing video games?10 frequent flier miles earned per year?10000 liters of ice cream consumed per year?0.5 You will probably like this person: in small doses
So this is how tinder and other dating sites also works. I hope you liked this article on predict tinder matches with Machine Learning. Feel free to ask your valuable questions in the comments section below.