Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confusing documentation on scaling of control variables #1469

Open
kb-open opened this issue Feb 5, 2025 · 3 comments
Open

Confusing documentation on scaling of control variables #1469

kb-open opened this issue Feb 5, 2025 · 3 comments
Labels
docs Improvements or additions to documentation MMM

Comments

@kb-open
Copy link

kb-open commented Feb 5, 2025

Discussed in #1462

Originally posted by kb-open February 3, 2025
Documentation located at https://www.pymc-marketing.io/en/stable/notebooks/mmm/mmm_example.html says that "input scaling of channel spends or control features" and "inverse scaling back to target domain" at the time of out-of-sample predictions are taken care of automatically by MMM.

However, documentation located at https://www.pymc-marketing.io/en/0.6.0/api/generated/pymc_marketing.mmm.delayed_saturated_mmm.DelayedSaturatedMMM.html says that "If control variables are present, we do not scale them! If needed please do it before passing the data to the model."

When these statements are viewed together, it becomes confusing for control variables. Does the user need to scale the control variables? Or does MMM take care of scaling control variables internally?

Kindly clarify.

@juanitorduz
Copy link
Collaborator

Hey @kb-open, thank you for the feedback; indeed, it is confusing, so I will fix the mmm example notebook soon. Specifically,

If control variables are present, we do not scale them! If needed, please do it before passing the data to the model.

So we do not scale the control variables. This can be seen from the code

class MMM(
MaxAbsScaleTarget,
MaxAbsScaleChannels,
ValidateControlColumns,
BaseMMM,
):

as the control can come from so many different sources (continuous, one hot encoding, ordinal encoding), we let the user pre-process them. We recommend having a certain scaling procedure because the rest of the variables (e.g. media) and the Fourier modes for yearly seasonality are usually between -1 and 1.

I hope this helps :)

@juanitorduz juanitorduz added the docs Improvements or additions to documentation label Feb 5, 2025
@kb-open
Copy link
Author

kb-open commented Feb 5, 2025

Thanks @juanitorduz. If I understood you correctly, the target variable is scaled using MaxAbScaler, which puts the target variable between 0 and 1, which is fine. But you also mentioned in your comment above that media variables are scaled between -1 and 1, which confuses me again, because as per the documentation, media variables are also scaled using MaxAbScaler, which should put the target variable between 0 and 1, and not between -1 and 1. Or am I missing something? Kindly clarify.

Kindly also clarify the preferred range for control variables after the user scales them. Should the control variables be in the range from -1 to 1, or in the range from 0 to 1?

@juanitorduz
Copy link
Collaborator

Sorry about not being 100% precise! Indeed, media variables are between 0 and 1. But the Fourier ones can be between -1 and 1. This is what I (tried) meant with "because the rest of the variables (e.g. media) and the Fourier modes for yearly seasonality are usually between -1 and 1."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Improvements or additions to documentation MMM
Projects
None yet
Development

No branches or pull requests

3 participants