Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add: support for ignore_pretraining_limits to PHE #12

Merged
merged 1 commit into from
Jan 15, 2025

Conversation

LennartPurucker
Copy link
Collaborator

See #11

@LennartPurucker
Copy link
Collaborator Author

Closes #11

@LeoGrin
Copy link
Collaborator

LeoGrin commented Jan 15, 2025

Thanks Lennart!
I'm thinking we might actually also pass other TabPFN's parameters which are not in HPO search space:
Looking at TabPFNClassifier's parameters

  • n_estimators: ✅ add this? Not sure about this one, people might use it too much for their own good
  • categorical_features_indices: Already in
  • softmax_temperature: In search space
  • balance_probabilities: ✅ add this
  • average_before_softmax: In search space
  • model_path: Might be worth it to include something like that but it's more work, could be another PR
  • device: Already in
  • ignore_pretraining_limits: ✅ add this
  • inference_precision: ✅ add this?
  • fit_mode: We might support this but it shouldn't be passed directly to the intermediary models, so maybe in another PR
  • memory_saving_mode: ✅ maybe add this?
  • random_state: Already in
  • n_jobs: ✅ maybe add this?
  • inference_config: In search space

WDYT? There may be some interaction with PHE I'm not aware of. I'm pretty confident we should add ignore_pretraining_limits and balance_probabilities, the other ones I'm less sure.

@LennartPurucker
Copy link
Collaborator Author

Mhm, it depends on how complicated you want to do this. Another approach would be to provide code instead to pass the models you want to use. PHE already supports this https://github.com/PriorLabs/tabpfn-extensions/blob/main/src/tabpfn_extensions/post_hoc_ensembles/pfn_phe.py#L133

So we can dettach model HPs from PHE HPs. But this would be a bit much for me IMO.
ignore_pretraining_limits was a very good exception, as it was definitely needed.

@LeoGrin
Copy link
Collaborator

LeoGrin commented Jan 15, 2025

Makes sense. Do you think balance_probabilities might also be an exception? I think it could be very useful if you want to apply this option to all models in PHE, which would in any case be annoying to do with the custom models.
Otherwise LGTM thanks! Happy to merge with only ignore_pretraining_limits or ignore_pretraining_limits and balance_probabilities depending on what you think, and look at the rest later.

@LennartPurucker
Copy link
Collaborator Author

I think balance_probabilities is unnecessary if you have a post hoc ensemble. The PHE will optimize for the target metric; if this metric is balanced, the final predictions will be balanced.

What do you think about this line of reasoning?

@noahho
Copy link
Contributor

noahho commented Jan 15, 2025

I thought it's two separate optimizations and the outer optimization relies on the inner to be aligned. Assume inner would have the opposite optimization direction of the outer. Now it would be very hard to optimize the outer ensemble.

But maybe that's not the point you meant I realize. We should rather add a parameter that is passed to TabPFN to optimize any metric also internally?

@LeoGrin
Copy link
Collaborator

LeoGrin commented Jan 15, 2025

Hmm yeah I think that's fair. We might think about adding some more metrics then, for instance balanced_log_loss, but that's a different issue. Approving and merging, then, thanks!
EDIT: I'm not very familiar with the PHE optimization but I'm actually wondering if some of the ensemble members are expected to work well for metrics like balanced_log_loss. Maybe adding balance_probabilities to the search space in this case would make sense?

@LeoGrin LeoGrin merged commit c0125a0 into main Jan 15, 2025
@LennartPurucker
Copy link
Collaborator Author

LennartPurucker commented Jan 15, 2025

It is two separate optimizations, but for ensembling, you do not want to add such a bias to the base model, as it is more likely problematic for aggregating the predictions of base models. For example, if you can, you want to avoid temperature scaling base models of an ensemble but instead scale the ensemble's predictions, as done in AutoGluon.

You could add a metric to the base model and check whether balancing predictions helps, but the overhead would be too large. This would require validation data, which is exactly what the PHE already does.

Adding it to the search space is a good idea for sure!

(One more small ensembling intuition: it is very easy for a weighted average to approximate balanced probabilities given a few different predictions from base models)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants