Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Image only #4

Open
ZhangFfF1 opened this issue Jan 15, 2025 · 3 comments
Open

Image only #4

ZhangFfF1 opened this issue Jan 15, 2025 · 3 comments

Comments

@ZhangFfF1
Copy link

Hi, thanks for sharing the great work!
For me, this seems a bit problematic
I would like to know what you did with the image-only dataset to generate for training!
This part of the code doesn't seem to be found in your repository.
I hope you'll get back to me.
Once again, thank you for your contribution!

@muzairkhattak
Copy link
Member

muzairkhattak commented Jan 18, 2025

Hi @ZhangFfF1!

Thank you for showing interest in UniMed-CLIP!

Regarding your question, we convert existing image-only datasets into multi-modal format, by generating captions using LLMs (Label to Template Caption Generation technique), as described in Sec. 3.2 of our main paper.

In this repository, we only provide model training code and the dataset assembling code, alongside preprocessed annotations (image-text annotations) for image-only datasets. You can curate exact same UniMed dataset by following instructions described here at UniMed-DATA.md.

To reconstruct the captions (from scratch) for class labels, you can utilize LLMs using the prompt message given in Fig. 4 of the main paper.

Please let me know if this is helpful. Thank you and kind regards!

@ZhangFfF1
Copy link
Author

ZhangFfF1 commented Jan 23, 2025

@muzairkhattak Thank you for your reply!
For me,
I only have image data, if I use the text description generated by LLM, can it be used as part of the training set.
Hope you reply!
Thanks again nee for your excellent work!

@muzairkhattak
Copy link
Member

Hi @ZhangFfF1,

Regarding your question, yes you can use your custom image-data and add it to UniMed training set for training.
For that, you need to convert your own image-text dataset into webdataset format, refer here for additional info on webdataset .

Once you process your dataset into webdataset format, you can simply use this new data by adding the its directory path to the train_data variable in the config file.

I hope this is helpful. Let us know if there is still any problem. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants