You just bought the ultimate fragrance, your abs are not perfectly sculpted, the beard is perfect, but the only match you get on Tinder Is with the fake profile made by your mate.
You start thinking that you have a problem.
Your “pick up strategy” is not working, obviously.
You decide to rely on your friend, the pick up wiz, the king of pick up in MySpace and Netlog period, also called by your friends “The Driller”.
After a short chat and a beer, The Driller, decides to help you, but only if you accept to buy him drinks as a success fee.
Suddenly he realizes that your profile pics selection are not suitable, the one blasting bare chest with the discount underwear and the one out of focus with the stoned face called “Ibiza 2k12” must be censored.
Shopping spree at Primark, a new photo made with a reflex in an auto mode made by your photograph friend and random poetic quote.
After the new updates, you start with a new good match and you can’t believe.
But you don’t want to buy drinks for The Driller, you think is just a coincidence, that you are been lucky.
With the old photo pics->100 Tries->1match
With the new photo pics->100 Tries-> 10 Matches
You watch The Driller and say: “I think It’s just a coincidence, with the new profile pics I was just lucky”
The Driller starts watching you in the eyes, he can’t believe, he wants you to buy him his drinks. He replies with calm: “Ok let’s suppose that it was just a coincidence. If this is true there is no difference between “before and after the changes” and we have 200 tries in total, isn’t it?”
The Driller: ”So now let’s make some simulations
We take 200 leaflets we write on it the name of the girl and we write if it was a match or not.
1 if you had success 0 if you don’t.”
The request is strange, but you fill out this 200 leaflets.
The Driller: “Now we shuffle this 200 leaflets, we associate the first 100 to the old condition (Case A) and the last 100 to the new condition (Case B).
Once we have done it, we determine the difference between the new Case B’ and the new Case A’, we call this value “Pick Up Delta”
Do you remember that in the starting and original case this difference was 0.09?”(10(match)/100(tries)-1(match)/100(tries)
The Driller: “Once we determined the “Pick up Delta” for the second time, we shuffle again the leaflets and we reproduce this operation several times” (a number of times “n”, with n very large).
If what you said is true the number of times when “Pick up Delta” is bigger or equal to the “Pick up Delta” of the starting and the original case should be fairly common, because we supposed that this difference is just a coincidence.
You: ”Yes, it makes sense”
The Driller: “ We could evaluate this by dividing the number of times when “Pick up delta” is greater than or equal to the starting and the original case by the number of times we shuffled our leaflets” (This value will be our p-value)
The Driller: ”If this ratio is high your hypothesis will be probably true, but if this ratio is small your hypothesis will be probably small”
You: “How small should It be?”
The Driller: “If we are going to reject the hypothesis with 95% confidence, this value must be smaller than 0.05”
You and the Driller discovered that:
- Number of shuffles when the pickup delta was better than the original and start case was only 100 to 1, following the p-value was 0.01
- The hypothesis “It was just a coincidence” was false
- You have to buy drinks
A/B tests are extremely frequent especially in Digital Marketing, but their evaluation is not easy.
This article with the attached script is only a nice introduction, I simplified a lot of hypotheses.
For a rigorous discussion, I always recommend the Ross Book on Probability and Statistics.
Moreover, we must estimate experimentation cost, the improvement from test A or Test B must be not only statistically significative but also economically significative.
Economically significative means that experimentation costs are validated by the improvements created, a really though point and sometimes unattended.
Thanks for reading the article!
If you liked it, share this post with others
Ping me on Twitter for further discussion
import numpy as np import pandas as pd
#Representing with 2Arrays our two analysis cases (A-B) old_pic=np.array([True] * 1 + [False] * 99) new_pic=np.array([True] * 10 + [False] * 90)
#We Define our Analysis Statistic: #"The Difference between success cases with the new photos and the success cases with the old photos #divided by the total ammount of tries def frac_abs_success(a,b): afrac = np.sum(a) /len(a) bfrac= np.sum(b) /len(b) ababs=abs(afrac-bfrac) return ababs def permutation_sample(data1, data2,func): """Once we defined the two dataset we generate our permutation""" # We concatenate the two dataset: data data = np.concatenate((data1,data2)) #We define the permutation array "permuted_data" permuted_data = np.random.permutation(data) # We devide in two sub-arrays our permutation array: perm_sample_1, perm_sample_2 perm_sample_1 = permuted_data[:len(data1)] perm_sample_2 = permuted_data[len(data1):] delta_rim=func(perm_sample_1,perm_sample_2) return delta_rim
#We realize n permutation on our two dataset A* B* n=1000 #for each permutation we evaluate the analysis statistic value #The difference between the first and the second dataset def draw_rep_stat(data,data2, func, size): """Draw bootstrap replicates.""" # Initialize array of replicates: bs_replicates stat_replicates = np.empty(size) # Generate replicates for i in range(size): stat_replicates[i] = permutation_sample(data,data2,func) return stat_replicates
test_set=draw_rep_stat(old_pic, new_pic,frac_abs_success,n) print(len(test_set)) #We evaluate the data p-value #n is the number of permutations realized p = np.sum(test_set >= frac_abs_success(old_pic,new_pic)) / len(test_set) print('p-value =', p)
1000 p-value = 0.01