How to build a Marketing Mix Model in Python with Google Lightweight MMM – Tutorial with (conscious) Mistakes

Sharing is caring!

Building a Marketing Mix Model is always an strategic choice for a company, there are several benefits in doing that.

If you are not familiar with Marketing Mix Model I suggest to read first this very quick article, you will understand “What is a Marketing Mix Model” and which other libraries and tool are available.

If you want to jump straight to the code click here

What is Google Lightweight MMM?

Google Lightweight MMM is an open-source Python Library for building marketing mix models based on Bayesian statistics.

It is an unofficial Google product that became quite popular among econometricians because is:

  1. Quick to train
  2. You can run on Google Colab
  3. Supported by a vibrant community on Git Hub

If you want to build a marketing mix model with Google Lightweight MMM and covering its technical aspects you are in the right place.

Why Google Lightweight MMM is useful?

Google Lightweight MMM helps to optimise the marketing budget.

Quoting the Harvard Business Review “Marketing Mix Models able to produce dependable measurements — and insight — purely from natural variation in aggregate data, and don’t require user-level data”

Being able to run on Google Colab and load the media data from Google Sheets it is another good point that is important to stress.

What strong HP are behind Google Lightweight MMM?

Theoretically Google Lightweight is based on Bayesian Statistics so some strong consequences are:

  • There is no statistical significance of coefficients, but rather a credibility interval
  • Because it is based on Bayesian Inference you will not get the p-value and significance of the variables compared to the frequentist approach
  • You can’t know the coefficients importance (How was reliable YouTube Activity Contribution? How was reliable Facebook Activity Contribution?)
  • Instead of saying the media channel impact simply has one (unknown) true value, a Bayesian method says the media impact is fixed but has been chosen from some probability distribution — known as the prior probability distribution.

Moreover, in Google Lightweight is not easy to manually modify the parameters of the stochastic function behind the sampling from the prior probability distribution.

Input Variables

This marketing mix model will use the following input variables:

  • Weekly Price Variables
    • Price Per Unit
    • Black Friday (as binary variable)
  • Weekly Media Variables as impressions
    • Google Shopping ad Impressions – all campaigns 21 22
    • YT ad Impressions – all campaigns 21 22
    • Google Search ad Impressions – all campaigns 21 22
    • FB/IG ad Impressions – all campaigns 21 22
    • Amz Ads Impressions
  • Weekly Media Budget Variables
    • Google Shopping ad Spend (Shopping) 21 22
    • YT ad Spend (YouTube) 21 22
    • Google Search ad Spend (Search) 21 22
    • FB/IG Spend – USD 21 22
    • Amz Ads Spend

Building a Marketing Mix Model in Python

The code is not very pythonic, I will improve in the future.

# installing lightweight market mix model
!pip install --upgrade git+https://github.com/google/lightweight_mmm.git
# Import jax.numpy and any other library we might need.
import jax.numpy as jnp
import numpyro


# Import the relevant modules of the library
from lightweight_mmm import lightweight_mmm
from lightweight_mmm import optimize_media
from lightweight_mmm import plot
from lightweight_mmm import preprocessing
from lightweight_mmm import utils
#Reading the data
df = pd.read_excel(ANDREA_PATH,
                   sheet_name='Sheet1',
                   skiprows = 4,
                   engine='openpyxl')
#Defining the impressions and budget DataFrame
df_impression = df[['Dates',
         'SKU1 Total Value Sales',
         'SKU1 Price Per Unit',
         'Google Shopping ad Impressions - all campaigns 21 22',
         'YT ad Impressions - all campaigns 21 22',
         'Google Search ad Impressions - all campaigns 21 22',
         'FB/IG ad Impressions - all campaigns 21 22',
         'Amz Ads Impressions',
         'black_friday',
         'Google Shopping ad Spend (Shopping) 21 22',
         'YT ad Spend (YouTube) 21 22',
         'Google Search ad Spend (Search) 21 22',
         'FB/IG Spend (Awareness/Prospecting/Conversion) - USD 21 22',
         'Amz Ads Spend']]
# Removing null values from rows
df = df.dropna(axis = 0,
               how = 'all')
# Adding black friday days to data
df['black_friday']=0

df.loc[df['Dates'] =='2021-11-22', 'black_friday'] = 1
df.loc[df['Dates'] =='2022-11-21', 'black_friday'] = 1
# Media columns
media_cols= ['Google Shopping ad Impressions - all campaigns 21 22',
             'YT ad Impressions - all campaigns 21 22',
             'Google Search ad Impressions - all campaigns 21 22',
             'FB/IG ad Impressions - all campaigns 21 22',
             'Amz Ads Impressions']
# Cost columns
cost_cols = ['Google Shopping ad Spend (Shopping) 21 22',
             'YT ad Spend (YouTube) 21 22',
             'Google Search ad Spend (Search) 21 22',
             'FB/IG Spend (Awareness/Prospecting/Conversion) - USD 21 22',
             'Amz Ads Spend']

“Price per unit” and “Black Friday” are modelled as Extra features in our Marketing Mix Model. They highly influence our output “Sales Value” and have a strong sales contribution, but they can’t be part of the optimization process (not in a strictly sense).

Excluding these two variables is a big mistake because the model will attribute their sales contribution to the media activities.

# extra features
control_vars = ['SKU1 Price Per Unit','black_friday']
# start modelling from 2021
df_impression['Dates'] = pd.to_datetime(df_impression['Dates'])
df_impression = df_impression[df_impression['Dates'].dt.year > 2020]
df_impression.reset_index(inplace=True,
                          drop=True)

In the following section, again in a non pythonic way, I am evaluating the total budget for each media channel.

tot_spend_g_shoping_ad = df_impression['Google Shopping ad Spend (Shopping) 21 22'].sum().round()

tot_spend_yt_ad = df_impression['YT ad Spend (YouTube) 21 22'].sum().round()

tot_spend_g_search_ad = df_impression['Google Search ad Spend (Search) 21 22'].sum().round()

tot_spend_fb_ig_ad = df_impression['FB/IG Spend (Awareness/Prospecting/Conversion) - USD 21 22'].sum().round()

tot_spend_amz_ads = df_impression['Amz Ads Spend'].sum().round()

total_spend = [tot_spend_g_shoping_ad,
               tot_spend_yt_ad,
               tot_spend_g_search_ad,
               tot_spend_fb_ig_ad,
               tot_spend_amz_ads]

print("Total Spend in 2021-22 for each channel Google Shopping, YouTube, FB, Amazon Ads")
print(total_spend)
[Removed for privacy reasons]

In the following section, I am evaluating the total impressions for each media channel.

tot_impr_g_shop = df_impression['Google Shopping ad Impressions - all campaigns 21 22'].sum().round()

tot_impr_yt = df_impression['YT ad Impressions - all campaigns 21 22'].sum().round()

tot_impr_g_search = df_impression['Google Search ad Impressions - all campaigns 21 22'].sum().round()

tot_impr_fb_ig = df_impression['FB/IG ad Impressions - all campaigns 21 22'].sum().round()

tot_impr_amz = df_impression['Amz Ads Impressions'].sum().round()

tot_impr = [tot_impr_g_shop,
            tot_impr_yt,
            tot_impr_g_search,
            tot_impr_fb_ig,
            tot_impr_amz
]
print("Total Impressions in 2021-22 for each channel Google Shopping, YouTube, FB, Amazon Ads")
print(tot_impr)
Total Impressions in 2021-22 for each channel Google Shopping, YouTube, FB, Amazon Ads
[Removed for privacy reasons]

Here we are evaluating the price for each media channel as Total Media Cost divided by the Total Media Impression.

Then we convert the array to a jax array

price_google_shoping_ad = df_impression['Google Shopping ad Spend (Shopping) 21 22'].sum()/df_impression['Google Shopping ad Impressions - all campaigns 21 22'].sum()

price_yt_ad = df_impression['YT ad Spend (YouTube) 21 22'].sum()/df_impression['YT ad Impressions - all campaigns 21 22'].sum()

price_google_search_ad = df_impression['Google Search ad Spend (Search) 21 22'].sum()/df_impression['Google Search ad Impressions - all campaigns 21 22'].sum()

price_fb_ig_ad = df_impression['FB/IG Spend (Awareness/Prospecting/Conversion) - USD 21 22'].sum()/df_impression['FB/IG ad Impressions - all campaigns 21 22'].sum()

price_amz_ads = df_impression['Amz Ads Spend'].sum()/df_impression['Amz Ads Impressions'].sum()

prices_unscaled = jnp.asarray([price_google_shoping_ad,
                   price_yt_ad,price_google_search_ad,
                   price_fb_ig_ad,
                   price_amz_ads])

Based on the previous data I am calculating the average marketing budget for one year.

budget_unscaled = [tot_impr[i] * prices_unscaled[i] for i in range(len(tot_impr)) ]

tot_bud_unscaled = round(sum(budget_unscaled)/len( df_impression['FB/IG ad Impressions - all campaigns 21 22']))*52

print(tot_bud_unscaled)

Now we are going to convert to numpy all the input variables that we will need to evaluate the media activity contribution and the potential optimization.

In Marketing Mix Modelling I am not a huge fan of splitting the data between train and testing as I used to do in classic machine learning models.

The reason is because I am not trying to make forecast, but to know in a reasonable way what happened in the past and build possible scenarios for the future.

Anyway I broke my rule and I did in this specific example.

SEED = 105
data_size = len(df_impression)

n_media_channels = len(media_cols)

n_extra_features = len(control_vars)
media_data = df_impression[media_cols].to_numpy()
extra_features = df_impression[control_vars].to_numpy()
target = df_impression['SKU1 Total Value Sales'].to_numpy()
costs = df_impression[cost_cols].sum().to_numpy()

# Split and scale data.
test_data_period_size = 12
split_point = data_size - test_data_period_size
# Media data
media_data_train_uns = media_data[:split_point, ...]
media_data_test = media_data[split_point:, ...]
# Extra features
extra_features_train = extra_features[:split_point, ...]
extra_features_test  = extra_features[split_point:, ...]
# Target
target_train = target[:split_point]
# Preprocessing we need to scale our variables
media_scaler = preprocessing.CustomScaler(divide_operation=jnp.mean)
extra_features_scaler = preprocessing.CustomScaler(divide_operation=jnp.mean)
target_scaler = preprocessing.CustomScaler(divide_operation=jnp.mean)
cost_scaler = preprocessing.CustomScaler(divide_operation = jnp.mean, multiply_by = 0.15)

# print("Before Scaling")
# print(media_data_train)
media_data_train = media_scaler.fit_transform(media_data_train_uns)

# print("After Scaling")
# print(media_data_train)
extra_features_train = extra_features_scaler.fit_transform(extra_features_train)
target_train = target_scaler.fit_transform(target_train)
costs = cost_scaler.fit_transform(costs)

The model

lightweight_mmm.LightweightMMM(model_name="hill_adstock")

takes as an argument the model_name, you can provide three possible values:

  • hill_adstock
  • adstock
  • carryover

I am not a huge supporter of this choice and I explained here on git hub my concern.

In the marketing industry hill_adstock/Adstock/carryover effects refer to the same key concept.

I wrote here about adstock or carryover effects

# Model training
mmm = lightweight_mmm.LightweightMMM(model_name="hill_adstock")

mmm.fit( media=media_data_train,
        media_prior=costs,
        target=target_train,
        extra_features=extra_features_train,
        number_warmup=1000,
        number_samples=1000,
        media_names = media_cols, seed=SEED)
mmm.print_summary()
Attribution model output: As you can see price coefficient has the biggest mean absolute value and in some ways describe the product elasticity

Now we are going to plot the posterior distribution of our media variables

# posterior distributions of the media effects.
plot.plot_media_channel_posteriors(media_mix_model=mmm,
                                   channel_names=media_cols)
plot.plot_model_fit(mmm,
                    target_scaler=target_scaler)

What to do when the R2 is low in a marketing mix model?

As you can see the R2 is very low, this can happen quite often, but why?

For example the modeller is not aware of all the price promo happened during the analysis period, like in this case.

Another frequent situation are strong changes in product weighted distribution.

The best thing to do is to speak with the client and discuss together and ask:

"Is any activity missing in this specific period? Because this difference between the modelled values and the actuals is very strange to me"

Let’s continue our tutorial with the Google Lightweight MMM library.

media_contribution, roi_hat = mmm.get_posterior_metrics(target_scaler=target_scaler,
                                                        cost_scaler=cost_scaler)

plot.plot_media_baseline_contribution_area_plot(media_mix_model=mmm,
                                                target_scaler=target_scaler,
                                                fig_size=(30,10),
                                                channel_names = media_cols
                                                )

Plotting Media Contribution to Sales Uplift

This is one of the most important plot because it describes, in a visual way, the contribution of each activity to sales.

Now we will plot the ROI of each media activity and their credibility interval

plot.plot_bars_media_metrics(metric=roi_hat,
                             metric_name="ROI hat",
                             channel_names=media_cols)
Marketing ROI Plot
This Marketing ROI Plot can be formatted in a tidier way, but this is the quickest way to get it with the Lightweight library.

Marketing Media Budget Optimization

Media budget optimization is based on the theory of diminishing returns and the model attribution.

Based on the contribution of each media channel and it’s diminishing return curve, which is the best way to allocate the media budget?

tot_impr_rev = [total_spend[i]*prices_unscaled[i].round() for i in range(len(total_spend))]
print(tot_impr_rev)
print(tot_impr)
n_time_periods = 52

avg_media = media_data.mean(axis=0)

print(avg_media)
print(prices_unscaled)

weekly_budget = avg_media*prices_unscaled
yearly_budget = weekly_budget * n_time_periods*2
print(weekly_budget)
print(yearly_budget)
# budget = jnp.sum(jnp.dot(prices, media_data.mean(axis=0)))* n_time_periods
yearly_budget.sum()
prices_unscaled
bounds_ext = optimize_media._get_lower_and_upper_bounds(mmm.media,
                                                        n_time_periods =52,
                                                        lower_pct = jnp.repeat(a=0.2, repeats=len(prices_unscaled)),
                                                        upper_pct = jnp.repeat(a=0.2, repeats=len(prices_unscaled)),
                                                        media_scaler = media_scaler)
lower_budget = [bounds_ext.lb[i]*prices_unscaled[i].round() for i in range(len(bounds_ext.lb))]
upper_budget = [bounds_ext.ub[i]*prices_unscaled[i].round() for i in range(len(bounds_ext.ub))]
solution, kpi_without_optim, previous_media_allocation = optimize_media.find_optimal_budgets(n_time_periods=52,
                                                                                             media_mix_model=mmm,
                                                                                             budget=150000,
                                                                                             prices=prices_unscaled,
                                                                                            #  extra_features = extra_features_train,
                                                                                             target_scaler = target_scaler,
                                                                                             media_scaler = media_scaler)


impression_opt_budget = solution.x*prices_unscaled
print("Budget for each channel")
print((impression_opt_budget))

In the last step “solution.x*prices_unscaled” you get the ideal budget for each channel based on the previous results.

Conclusions

Google Lightweight MMM can be a quick way to build a MMM model.

If you are not experienced with Bayesian Statistic and Marketing Mix Modelling it’s very easy to make big mistakes providing the wrong insights to the marketing department.

If you want additional info about Bayesian Statistic please check some of the discussion on Cross Validated Forum:

How to interpret the importance for a regression coeffcient in Bayesian regression from its posterior density?

Can we talk about statistical significance using Bayesian Inference?

What’s the difference between a confidence interval and a credible interval?

If you want to get in touch with me please fill the form below or comment this article.

1 thought on “How to build a Marketing Mix Model in Python with Google Lightweight MMM – Tutorial with (conscious) Mistakes”

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Related articles

Beware of YES Master NO

Talking about negotiation, like other people, I am not so good at handling “NO” as a reply.  I mean, after

Questions?
Let's get in touch!

Tell me about your project

I will arrange a meeting to discuss the details

we will start working together

Scroll to Top