You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I try to train the EDM model with a simpler 35.7M #params UNet (proposed by original DDPM paper) and compare the result with DDPM/DDIM.
I notice that $S_{churn} = 0$ leads to deterministic sampling, and $\gamma_i = \sqrt{2}-1$ leads to "max" stochastic sampling. So I introduce a parameter $\eta = \frac{S_{churn} / N}{\sqrt{2}-1}$ to control stochasticity by interpolations. That is to say, $\gamma_i = (\sqrt{2}-1) * \eta$. Like in DDIM, $\eta = 0$ means deterministic, $\eta = 1$ means "max" stochastic.
I set different $\eta$ s and different steps to observe FIDs:
$\eta$/steps
steps=18
steps=50
steps=100
$\eta=0.0$
3.39
3.64
3.68
$\eta=0.5$
3.10
2.95
2.93
$\eta=1.0$
3.12
2.84
2.97
The FID is supposed to decrease when using more sampling steps, right? But why the FID gets worse for deterministic sampling? However it performs normally when $\eta=0.5$, and it increases again from 50 steps to 100 steps @ $\eta=1.0$. Why the behavior is so unstable and unpredictable?
To confirm it's not a bug, I train a model with your official codebase under the simpler setting close to DDPM (duration=100, augment=None, xflip=True; channel_mult=[1,2,2,2], num_blocks=2). The results are:
$\eta$/steps
steps=18
steps=50
$\eta=0.0$
2.94
3.09
$\eta=0.5$
2.80
2.75
$\eta=1.0$
2.95
2.78
For deterministic sampling, the FID is still getting worse when using more steps. When $\eta > 0$, the FID slightly gets better when steps increase.
If the hyper-parameter settings and the corresponding performance are not consistently predictable, then how to obtain a good model under different datasets? Only by brute force & grid search?
Could you please provide some explanation and thoughts?
Thanks a lot!
The text was updated successfully, but these errors were encountered:
Hi, I try to train the EDM model with a simpler 35.7M #params UNet (proposed by original DDPM paper) and compare the result with DDPM/DDIM.$S_{churn} = 0$ leads to deterministic sampling, and $\gamma_i = \sqrt{2}-1$ leads to "max" stochastic sampling. So I introduce a parameter $\eta = \frac{S_{churn} / N}{\sqrt{2}-1}$ to control stochasticity by interpolations. That is to say, $\gamma_i = (\sqrt{2}-1) * \eta$ . Like in DDIM, $\eta = 0$ means deterministic, $\eta = 1$ means "max" stochastic.
I notice that
I set different$\eta$ s and different
steps
to observe FIDs:The FID is supposed to decrease when using more sampling steps, right? But why the FID gets worse for deterministic sampling? However it performs normally when$\eta=0.5$ , and it increases again from 50 steps to 100 steps @ $\eta=1.0$ . Why the behavior is so unstable and unpredictable?
To confirm it's not a bug, I train a model with your official codebase under the simpler setting close to DDPM (
duration=100, augment=None, xflip=True; channel_mult=[1,2,2,2], num_blocks=2
). The results are:For deterministic sampling, the FID is still getting worse when using more steps. When$\eta > 0$ , the FID slightly gets better when steps increase.
If the hyper-parameter settings and the corresponding performance are not consistently predictable, then how to obtain a good model under different datasets? Only by brute force & grid search?
Could you please provide some explanation and thoughts?
Thanks a lot!
The text was updated successfully, but these errors were encountered: