I'm analyzing a year's worth of monthly sales data for our e-commerce platform, trying to see if there's a predictable relationship between our marketing spend on social media ads and the number of new customer acquisitions. I've plotted the data and it looks vaguely linear, so I'm planning to run a simple linear regression. My main hang-up is whether to use total spend or cost-per-click as the independent variable, and how to properly account for seasonal spikes we see in November and December. I'm using Python with statsmodels, but I'm more concerned about the model specification than the code itself. Has anyone else tackled a similar analysis for digital marketing ROI?
Tot spend vs CPC: depends on the data. CPC can be pretty noisy month to month, so I’d start with total spend to gauge scale and then test CPC as a separate predictor with a lag or two.
I tackled this with monthly data last year. I log-transformed both spend and acquisitions, added month dummies for seasonality (especially Nov/Dec), and included a 1-month lag on spend. Result: spend helps, but there were diminishing returns; CPC was less predictive once seasonality and lag were in, to my surprise.
Another thing: check for autocorrelation. If residuals are correlated, consider an ARIMAX or a SARIMAX approach with spend/CPC as exogenous vars. In practice I found a simple OLS with year-month fixed effects + a few lags worked ok, but you’ll want to test assumptions.
Keep an eye on attribution. If acquisitions are attributed to different windows or platforms, the regression will mix signals. You might need a marketing mix modeling approach or at least clearly define the attribution window and include other channels as controls if they exist.
One alternative framework is to target CAC/ROAS rather than raw acquisitions. If your goal is efficiency, a two-stage model (first predict clicks, then conversions) or a logistic/Poisson model for conversions might fit better than straight linear regression.
Do you have monthly data for a couple of years? Any promotions or seasonality beyond Nov/Dec? If you want, I can sketch a minimal specification you can test in statsmodels.