Funetuning for domain-specific data due to recommened input size limitations #166

fif911 · 2025-02-02T17:27:42Z

Hey Prior Labs team

First want to thank you for outsourcing this. That's a great foundational model. I have not found any scripts for the finetuning process so my question is:

Does it make sense to fine-tune TabPFN on domain data? I have a dataset with ~70,000 rows, and because the model was trained on synthetic data, I am curious if fine-tuning it on domain data would make sense. Also, it seems that the compute required for such a process is not too big, as you used only two weeks of 8x2080 GPUs for training.

If so, how would you recommend to approach the task?

Additionally, I am curious if TabPFN can leverage textual data. I saw somewhere that AutoGluon tries to fuse textual and tabular data, but I'm not sure how effective that would be. If so, what is the optimal text size based on your experience?

Best regards, Oleksandr

LeoGrin · 2025-02-04T09:11:27Z

Hey @fif911 !
Thanks for the kind words! For finetuning on one dataset, there is this script: https://github.com/LennartPurucker/finetune_tabpfn_v2
For finetuning on a set of related datasets from a same domain, we don't have anything public for now but we want to release this quite soon.
Concerning text data, TabPFN v2 leverages text features in the API (https://github.com/PriorLabs/tabpfn-client)!

fif911 · 2025-02-04T16:15:45Z

Thanks @LeoGrin ! That helps a lot.

Just for me to fully understand. So the model served via API does support text features, but the outsourced one does not, right?

Cheers, Oleksandr

LeoGrin · 2025-02-04T16:29:55Z

The local package supports text features in the sense that it doesn't break if the input contains some, but it treats them as categorical for now, which means performance should be quite lower than on the API if you have rich text features.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Funetuning for domain-specific data due to recommened input size limitations #166

Funetuning for domain-specific data due to recommened input size limitations #166

fif911 commented Feb 2, 2025

LeoGrin commented Feb 4, 2025

fif911 commented Feb 4, 2025

LeoGrin commented Feb 4, 2025

Funetuning for domain-specific data due to recommened input size limitations #166

Funetuning for domain-specific data due to recommened input size limitations #166

Comments

fif911 commented Feb 2, 2025

LeoGrin commented Feb 4, 2025

fif911 commented Feb 4, 2025

LeoGrin commented Feb 4, 2025