-
-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 Bug Report: Refreshing OAuth token... forever and stop working #150
Comments
Weird. The refresh code is literally just logging in again, which is what happens at launch. Lines 102 to 108 in 6c2579c
I suppose it could have something to do with the header cache not being cleared. Can you reproduce this consistently? If so, can you try running off of this branch? https://github.com/redlib-org/redlib/tree/try_fix_oauth_refresh |
Not sure if this is the same issue as #81 |
The latest commit(fix_oauth_ratelimit) The instance stops responding(no error log) and the last log:
I think it needs timeout for token retrieving. And restarting fixed it. |
Could be an issue with the atomic/concurrent nature of the token retrieval. Would appreciate if you could test after #158 to see if it has any difference |
Wow! Quick fix. Deployed. |
@sigaloid hang again
|
Maybe you should not use async rwlock, using std rwlock instead. |
I don't think it's a synchronization issue. It syncs on mine when I simulate high load. I don't know what exactly the problem is - you're saying it's hanging precisely after the line "TRACE redlib::oauth > Rolling over refresh token. Current rate limit: 99"? Nothing else? If so, it sounds like you cannot retrieve new tokens for some reason, by method of an automatic refresh. Which makes no sense - it's generating one every time you start it, and a refresh should be no different than restarting the program. |
here's an example of my log output:
Besides the occasional fact that a response may take longer to retrieve and the really small rate limits right before a refresh might accidentally overwrite the rate limit, my refreshing works just fine. |
Yes, it's occasional. And it hang exactly
I just wanted to say that maybe we should have better logs to diagnose whether it is caused by "async rwlock" or "token retriving timeout". |
I just add more log and it stuck here Line 74 in 3bd8b51
And restarting fixed it. |
I'll try with a timeout. trace!("Sending OAuth token request...");
let work = client.request(request);
let resp = match tokio::time::timeout(Duration::from_secs(10), work).await {
Ok(result) => result.ok()?,
Err(_) => {
error!("Timeout: no response in 10 seconds.");
return None;
}
};
trace!("Received response from OAuth token request..."); |
It's like some dead lock
two another caught
|
Even success, there are two refresh at the same time. |
That shouldn't be a problem, though. Independent threads can of course reach reddit simultaneously, I don't know why it would block in this case. There's nothing wrong with overwriting the token with a new one twice in succession, and the Pushing a PR that should avoid the simultaneous token case, but I'm unsure if this will fix the problem. It rolls over on my machine under heavy load, but then again, so does |
@freedit-dev @ButteredCats I know you both experienced this issue, please let me know if #160 helps. |
Then it backs to normal
|
@sigaloid So far so good. But I want to say From tokio doc https://docs.rs/tokio/latest/tokio/sync/struct.RwLock.html#method.read
|
Hm. There's something I could possibly do to fix that - create a new Oauth struct and only call write to do an atomic swap, so the write call is held for far less time (just the time to do the swap, not the entire refresh call). Is that the kind of thing you mean? Otherwise I'm not sure of a better solution besides a resource pool and that's a lot of overhead IMO vs just fixing the in place solution until it's performant |
|
#160 not work
|
It doesn't roll over the tokens? The logging order doesn't really matter, that's not indicative of an issue necessarily. but you're saying it hangs again? |
@sigaloid right, it hang again even for the latest commint. Maybe we should see these again
Even timeout doesn't work. |
I updated and mine hung again within a minute, didn't seem to last any longer like freedit's. When I can I'll enable the extra debugging and see if I have any logs to add, but that won't be until tonight and I doubt there's anything different from what freedit has sent. |
Restarting always fixed. So my instance auto restarts every 20mins. |
Yeah, the issue is that the rollover doesn't work once the rate limit is hit. But the new token generation works at startup. @freedit-dev, you mentioned arc-swap. I should have time tonight to push out a commit that switches to that. I assumed RwLock for the oauth client would be the best solution but honestly I'm not sure it is anymore. I don't know why it's not working, but I seriously doubt it's because of performance. CPU-based atomic are plenty to handle this workload of a few dozen requests a minute. On that note, what is your CPU architecture? |
Interesting. Check out #162, let me know here if it helps |
Seems to have fixed it for me! |
Splendid to hear!! |
So far so good. Thanks! |
Amazing. Glad that I finally was able to fix it on the third go-around, huge thanks to @freedit-dev for recommending the ArcSwap. I bet it was exactly the deadlock you mentioned. 🤔 I did limit scope to as small as possible but ArcSwap is the better long-term solution. Thanks @ButteredCats and @freedit-dev again for helping me debug this, as it really only showed up under heavy load! Closing :) |
It occurs several times that my instance stop working and the last log is always
INFO redlib::oauth > [⌛] 86279s Elapsed! Refreshing OAuth token...
And restarting redlib always fixed it.
Maybe there should be timeout for refreshing token.
Thanks for your great work.
The text was updated successfully, but these errors were encountered: