Challenges and Pitfall in Marketing Mix Models

Sharing is caring!

In the Search and Optimisation Theory there is a theorem that I love: the “ No Free Lunch Theorem”*

It basically states that:

“All algorithms that search for an extremum of a cost function perform exactly the same when averaged over all possible cost functions. So, for any search/optimization algorithm, any elevated performance over one class of problems is exactly paid for in performance over another class.”

The implication of this theorem is that there is no single best optimisation algorithm for all the situations, or more colloquially there is no “one-size fits all” algorithm. 

Marketing Mix Models Optimising Marketing Budget

This limit, or simply this fact, makes me think of all the limits that a Data Scientist can encounter when building a Marketing Mix Model.

The two things are not strictly related, but in my mind they converge to the point where we should see Machine Learning Models and Marketing Mix Models as tools that approximate business reality, useful to accomplish specific tasks

It seems a naive statement but it would be a huge mistake to consider them as a granular causal description of reality itself.

So after this introduction what are the challenges and flip side when I am building a Marketing Mix Model (but also why even with these limits it is still massively better to build one in your marketing department instead of not having one)?

Based also on David Chan and Micheal Perry paper I would summarise in two critical aspects:

  • Data Limitations
  • Selection Bias 

Data Limitation 

A marketing mix model is built on observational data, usually sales is our dependent variable and then price, distribution, media activities are the independent ones. 

Until now I always worked with weekly data, and this means that in one year I can gather only 52 data points, if I am very lucky in three years 156. 

When in 2016 I enrolled in the free Machine learning course from Stanford Professor Andrew NG in one of his videos he provided this rule of thumb in modelling that I still remember:  

“We need 10 data points for each parameter we are going to use as a descriptive variable”

To be more practical this is just a very short list of all the parameters that can be used to model sales data: 

  • Distribution
  • Seasonality
  • Weather
  • Covid & Lockdown
  • Price per units
  • TV Media Activity
  • Facebook IG Media Activity 
  • Facebook Audience network
  • YouTube Activity
  • In-Store Shopper Activity
  • PR and Newspaper
  • Tik Tok 
  • Radio/Spotify ADS

Just with this set of 13 variables we need 130 sales data, more than 2 years of weekly data collection.

Another problem related to the data limitations is the correlation among variables. 

For example if the YouTube Campaign is running in parallel with Facebook and Instagram campaigns it is not easy to isolate the contributions of each of these activities. 

As a Data Scientist, you would like to study the impact of these two media separately, but when advertisers allocate their spend across channels in order to maximise effectiveness, they will run in parallel.

The consequence of fitting a linear regression model with highly correlated variables is the risk of misattributing the sales contribution of the parameters.

A practical example

Let’s think about this situation, the latest marketing campaign led to an increase in sales of 1000 units. 

The marketing campaign was conducted on Facebook, Youtube and Tik Tok in the same period. 

In this context multiple sales attributions  hypotheses can be reasonable. 

Total sales uplift = 400 units from Facebook + 300 units from YouTube+ 300 units from TikTok

Or second hypothesis

Total sales uplift = 200 units from Facebook + 600 units from YouTube + 200 units from TikTok. 

A priori and without additional information we have two reasonable scenarios. 

What it can help here is to avoid big media bursts, trying to create isolated campaigns and  understanding the cost and audience dimension of the specific media channel.

As I work in the FMCG industry dealing with sales from physical stores I will not discuss the last click attribution model (Last-click attribution model – Search Ads 360 Help (google.com)). 

The other issue related to the data quality is the limited range of data. 

A complex system

When in civil engineering we study the response curve of a concrete sample to different strains, the tensions go to zero until the cracking point. 

Thanks to these incremental steps we are able to describe precisely the material reaction. 

Marketing is different because you don’t study how the customers react to different levels of budget; you choose a budget that is suitable with business needs and from that you can collect sales data points.

When you have a limited set of Ad Spend and based on Sales you want to calculate the ROAS for a particular channel, you can have multiple curves with the same R2 value that can fit the collected data. 

As shown in this video, for example, you can use a linear, a quadratic or cubic function to fit your data

I also created a notebook on python where I simulated how with a marketing budget of 20-40.000 € per week and a ROAS of 1.5-2.5 you can have different curves fitting the same data with the same R^2. 


For the curves you saw before the 3 R2 are: 

  • 0.5748375576936966
  • 0.5769298746650778
  • 0.5656465670979127

The previous values are related to the scarcity in the data variance. In fact under this condition it is possible to get with the same dataset the same value for a desired performance metric like the R2 value.

Selection Bias

Another issue that we face in the modelling process is the selection Bias. 

A selection bias is a distortion of the statistical analysis originated by the methodology for collecting samples. 

I showed an example while discussing the 4th chapter of How Brands Grow on how light and heavy buyers regress toward the mean and how the definition of new customer changes based on the selection of a specific time window. 

Selection bias is not only related to time-window selection. 

Specifically for Marketing Mix Models it occurs when an input media variable is correlated to an unobservable demand variable which may be the key driver for sales uplift. 

When we ignore that variable (consciously or unconsciously) we over attribute sales to the media activity correlated to the real sales driver. 

On that there are several specific biassed situations, Adtargeting, mismodelled funnel, but for me the most frequent and common is a Seasonality Misspecification.

Some products and services are influenced by demand seasonality, this how the term “Diet” is searched in uk: 

As you can see there is a clear spike in January with the advent of the new year. 

What does it mean?

Both from a marketing campaigns planning and marketing models perspective it means that we need to take into account these cyclical fluctuations otherwise our results and conclusions will be flawed.

Marketing Mix Modelling Benefits

As you can understand from the above, modelling marketing effects is definitely not an easy road, but, in my opinion, it is one that is worth travelling. 

Implementing a Marketing Mix Model not only helps the marketing department to optimise the media spend, but it nudges all the stakeholders to improve their data literacy. 

I saw this improvement with my clients, after designing and deploying a marketing mix model project, they were able to spot new opportunities and threats through the new rigorous data collection requirements used for the media mix modelling.

In the end, I have no doubt about the benefit in terms of better data quality and data evaluation that a marketing mix model project can deliver to the marketing department and that’s why I highly recommend starting using this methodology.

*(there is also a “No Free Lunch Theorem” in machine learning applied to supervised learning). 

**No Free Lunch Theorems (no-free-lunch.org) 

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Related articles

Beware of YES Master NO

Talking about negotiation, like other people, I am not so good at handling “NO” as a reply.  I mean, after

Questions?
Let's get in touch!

Tell me about your project

I will arrange a meeting to discuss the details

we will start working together

Scroll to Top