Confusing documentation on scaling of control variables #1469

kb-open · 2025-02-05T05:06:38Z

Discussed in #1462

^{Originally posted by kb-open February 3, 2025}
Documentation located at https://www.pymc-marketing.io/en/stable/notebooks/mmm/mmm_example.html says that "input scaling of channel spends or control features" and "inverse scaling back to target domain" at the time of out-of-sample predictions are taken care of automatically by MMM.

However, documentation located at https://www.pymc-marketing.io/en/0.6.0/api/generated/pymc_marketing.mmm.delayed_saturated_mmm.DelayedSaturatedMMM.html says that "If control variables are present, we do not scale them! If needed please do it before passing the data to the model."

When these statements are viewed together, it becomes confusing for control variables. Does the user need to scale the control variables? Or does MMM take care of scaling control variables internally?

Kindly clarify.

juanitorduz · 2025-02-05T09:06:32Z

Hey @kb-open, thank you for the feedback; indeed, it is confusing, so I will fix the mmm example notebook soon. Specifically,

If control variables are present, we do not scale them! If needed, please do it before passing the data to the model.

So we do not scale the control variables. This can be seen from the code

pymc-marketing/pymc_marketing/mmm/mmm.py

Lines 869 to 874 in 542a85b

    
           class MMM( 
        
               MaxAbsScaleTarget, 
        
               MaxAbsScaleChannels, 
        
               ValidateControlColumns, 
        
               BaseMMM, 
        
           ):

as the control can come from so many different sources (continuous, one hot encoding, ordinal encoding), we let the user pre-process them. We recommend having a certain scaling procedure because the rest of the variables (e.g. media) and the Fourier modes for yearly seasonality are usually between -1 and 1.

I hope this helps :)

kb-open · 2025-02-05T16:52:38Z

Thanks @juanitorduz. If I understood you correctly, the target variable is scaled using MaxAbScaler, which puts the target variable between 0 and 1, which is fine. But you also mentioned in your comment above that media variables are scaled between -1 and 1, which confuses me again, because as per the documentation, media variables are also scaled using MaxAbScaler, which should put the target variable between 0 and 1, and not between -1 and 1. Or am I missing something? Kindly clarify.

Kindly also clarify the preferred range for control variables after the user scales them. Should the control variables be in the range from -1 to 1, or in the range from 0 to 1?

juanitorduz · 2025-02-05T17:00:33Z

Sorry about not being 100% precise! Indeed, media variables are between 0 and 1. But the Fourier ones can be between -1 and 1. This is what I (tried) meant with "because the rest of the variables (e.g. media) and the Fourier modes for yearly seasonality are usually between -1 and 1."

github-actions bot added the Needs Triage label Feb 5, 2025

wd60622 added MMM and removed Needs Triage labels Feb 5, 2025

juanitorduz added the docs Improvements or additions to documentation label Feb 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Confusing documentation on scaling of control variables #1469

Confusing documentation on scaling of control variables #1469

kb-open commented Feb 5, 2025

juanitorduz commented Feb 5, 2025

kb-open commented Feb 5, 2025

juanitorduz commented Feb 5, 2025

Confusing documentation on scaling of control variables #1469

Confusing documentation on scaling of control variables #1469

Comments

kb-open commented Feb 5, 2025

Discussed in #1462

juanitorduz commented Feb 5, 2025

kb-open commented Feb 5, 2025

juanitorduz commented Feb 5, 2025