You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
when I use different GPU servers to train adaface in different daatsets, the speed of train stage is normal. But when this two task run validation stage in the same time, the cpu utilization is very low and the "validation dataloader" will take so long time. Specifically, only one task running, it will take about 10 min or less to "validation dataloader" after one train epoch. When two task running, it will task more than several hours to "validation dataloader" after one train epoch. What is the reason of this issue, how can i sovle it? Look forward to your reply !
The text was updated successfully, but these errors were encountered:
Hello, I also had the same issue, which I temporarily mitigated by creating copies of the validation set for each simultaneous training. I think it has something to do with the use of numpy memmap (maybe we should change the mode to "r" in the read_memmap util function) , but I have not had a closer look yet. Have you?
when I use different GPU servers to train adaface in different daatsets, the speed of train stage is normal. But when this two task run validation stage in the same time, the cpu utilization is very low and the "validation dataloader" will take so long time. Specifically, only one task running, it will take about 10 min or less to "validation dataloader" after one train epoch. When two task running, it will task more than several hours to "validation dataloader" after one train epoch. What is the reason of this issue, how can i sovle it? Look forward to your reply !
The text was updated successfully, but these errors were encountered: