Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SVRG update at full gradient too late? #34

Open
KrisThielemans opened this issue May 2, 2024 · 3 comments
Open

SVRG update at full gradient too late? #34

KrisThielemans opened this issue May 2, 2024 · 3 comments

Comments

@KrisThielemans
Copy link
Contributor

KrisThielemans commented May 2, 2024

@epapoutsellis and i saw differences between this SVRG implementation and a test-one from myself. Checking the code

if self.svrg_iter == 0 or update_flag :
# update memory
self.update_memory(x)
if self.svrg_iter == 0:
# initialise data passes by one due to full gradient computation from update memory
self.data_passes[0] = 1.
else:
# increment by 1, since full gradient is computed again
self.data_passes.append(round(self.data_passes[-1] + 1.,4))
# allocate memory for the difference between the gradient of selected function at iterate
# and the gradient at snapshot
# this allocates at the beginning of every update_flag: is wrong
# self.stochastic_grad_difference = x * 0.
else:
# implements the (L)SVRG inner loop
self.functions[function_num].gradient(x, out=self.stoch_grad_at_iterate)
if self.store_gradients is True:
self.stoch_grad_at_iterate.sapyb(1., self.list_stored_gradients[function_num], -1., out=self.stochastic_grad_difference)
else:
self.stoch_grad_at_iterate.sapyb(1., self.functions[function_num].gradient(self.snapshot), -1., out=self.stochastic_grad_difference)
# only on gradient randomly selected is seen and appended it to data_passes
self.data_passes.append(round(self.data_passes[-1] + 1./self.num_functions,4))
# full gradient is added to the stochastic grad difference
if out is None:
return self.stochastic_grad_difference.sapyb(self.num_functions, self.full_gradient_at_snapshot, 1.)
else:
self.stochastic_grad_difference.sapyb(self.num_functions, self.full_gradient_at_snapshot, 1., out=out)

I believe that at the "update full gradient" stage, the full gradient is not is actually not used to update the image. self.stochastic_grad_difference is not changed in this case, and that as that is used as return value, it means that the return value will actually be full_gradient + previous_grad_diff * N`, which is incorrect.

(I think the fact that at the first iteration the current code works is because self.stochastic_grad_difference is initialised as zero).

Instead, I think the update_flag=True case needs to have

        if not out is None:
            out.fill(self.full_gradient_at_snapshot)
            return out
        return self.full_gradient_at_snapshot

pinging @jakobsj @zeljkozeljko @MargaretDuff for confirmation

@MargaretDuff
Copy link

@KrisThielemans - do you want to try https://github.com/MargaretDuff/CIL-margaret/tree/svrg. This is the latest branch and the one we will (hopefully) be merging into CIL

@KrisThielemans
Copy link
Contributor Author

That one doesn't have the above logic problem as far as I can see. I have a few minor comments on that function. Where/how do you want me to comment on it?

@epapoutsellis
Copy link
Owner

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants