You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First want to thank you for outsourcing this. That's a great foundational model. I have not found any scripts for the finetuning process so my question is:
Does it make sense to fine-tune TabPFN on domain data? I have a dataset with ~70,000 rows, and because the model was trained on synthetic data, I am curious if fine-tuning it on domain data would make sense. Also, it seems that the compute required for such a process is not too big, as you used only two weeks of 8x2080 GPUs for training.
If so, how would you recommend to approach the task?
Additionally, I am curious if TabPFN can leverage textual data. I saw somewhere that AutoGluon tries to fuse textual and tabular data, but I'm not sure how effective that would be. If so, what is the optimal text size based on your experience?
Best regards, Oleksandr
The text was updated successfully, but these errors were encountered:
The local package supports text features in the sense that it doesn't break if the input contains some, but it treats them as categorical for now, which means performance should be quite lower than on the API if you have rich text features.
Hey Prior Labs team
First want to thank you for outsourcing this. That's a great foundational model. I have not found any scripts for the finetuning process so my question is:
Does it make sense to fine-tune TabPFN on domain data? I have a dataset with ~70,000 rows, and because the model was trained on synthetic data, I am curious if fine-tuning it on domain data would make sense. Also, it seems that the compute required for such a process is not too big, as you used only two weeks of 8x2080 GPUs for training.
If so, how would you recommend to approach the task?
Additionally, I am curious if TabPFN can leverage textual data. I saw somewhere that AutoGluon tries to fuse textual and tabular data, but I'm not sure how effective that would be. If so, what is the optimal text size based on your experience?
Best regards, Oleksandr
The text was updated successfully, but these errors were encountered: