Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Federation issues on v4.3.0-alpha.1+glitch #2609

Closed
Crashdoom opened this issue Feb 6, 2024 · 9 comments
Closed

Federation issues on v4.3.0-alpha.1+glitch #2609

Crashdoom opened this issue Feb 6, 2024 · 9 comments
Labels

Comments

@Crashdoom
Copy link

Steps to reproduce the problem

  1. Attempt to follow a remote user that does not have follow approvals required
  2. UI will initially show "Unfollow"
  3. Refreshing the page will show "Cancel Follow"

Expected behaviour

User should be followed

Actual behaviour

Follow appears pending forever

Detailed description

I'm honestly unsure how to troubleshoot this as mastodon-web is the only service that actually spits out anything about the follow attempt, logging the follow request to my inbox with a 202 Created.

I've checked Sidekiq and there appears to be a lot of LinkCrawlWorker errors with Aws::S3::Errors::InternalError: We encountered an internal error. Please try again. (We're using Cloudflare R2) but I'm unsure if that's related, or a separate issue.

Happy to provide / search for additional debug info as needed!

(As an aside, we're on v4.3.0 alpha as I wasn't sure how to update to a given version with Glitch-SOC, so any info on that for future reference would also be really appreciated!)

Mastodon instance

furry.engineer, pawb.fun

Mastodon version

v4.3.0-alpha.1+glitch

Technical details

If this is happening on your own Mastodon server, please fill out those:

  • Ruby version: ruby 3.2.3 (2024-01-18 revision 52bb2ac0a6) [x86_64-linux]
  • Node.js version: v20.11.0

Sidekiq Setup:

  • 2x Ingress
  • 2x Push / Pull
  • 1x Default, Mailer, Scheduler
@Crashdoom Crashdoom added the bug label Feb 6, 2024
@Crashdoom
Copy link
Author

Additional troubleshooting seems to imply follows are going through, at least between my accounts on furry.engineer and meow.social.

When following from furry.engineer to meow.social:

  • meow.social immediately reflects my furry.engineer account as a follower and increases the follower count
  • furry.engineer appears to reflect the follow and increases the follower count, but on refresh shows "Cancel Follow" implying it didn't go through

@ClearlyClaire
Copy link

This means outgoing communication from furry.engineer to meow.social works fine, but for whatever reason, the Accept activity from meow.social to furry.engineer does not get processed appropriately.

LinkCrawlWorker errors are in themselves not an issue, but I guess they may point at a common underlying issue, although that seems unlikely. Are there other error logs in sidekiq? What's the state of the queues, especially the ingress queue?

What did you update from?

(As an aside, we're on v4.3.0 alpha as I wasn't sure how to update to a given version with Glitch-SOC, so any info on that for future reference would also be really appreciated!)

There are no specific glitch-soc versions, it's a rolling release, as I do not have the ability to maintain multiple branches.

@Crashdoom
Copy link
Author

Crashdoom commented Feb 7, 2024

LinkCrawlWorker errors are in themselves not an issue, but I guess they may point at a common underlying issue, although that seems unlikely. Are there other error logs in sidekiq? What's the state of the queues, especially the ingress queue?

All of the queues are completely empty, but dead jobs have piled up with those LinkCrawlWorker errors causing us to max out our dead jobs queue on both instances. I haven't seen any errors in the journalctl log for any of the workers, and the dead / retry jobs look to have the same typical issues: rate limits, instances being down / offline, etc. other than the LinkCrawlWorker and now RedownloadMediaWorker

What did you update from?

We were previously on v4.2.0+glitch and updated directly. I checked the Mastodon change logs to see what I needed to do for each update to make sure there wasn't anything that stood out, and I'm beginning to wonder if I missed something...

There are no specific glitch-soc versions, it's a rolling release, as I do not have the ability to maintain multiple branches.

Aah, yep, I figured something like that since I can imagine it would be a complete mess trying to organize that!

@ClearlyClaire
Copy link

Nothing immediately comes to mind. Does that occur with one account in particular, or can you reproduce this with multiple accounts? Does it occur with other kinds of activities?

@Crashdoom
Copy link
Author

We're able to reproduce it with our admin accounts, and several users also reported it. Doesn't seem to be limited to meow.social either, trying to follow Gargron on mastodon.social did the same thing just now.

I've verified that incoming and outgoing posts work fine, incoming follows also seem to work fine, as do outgoing follow request approvals.

@ClearlyClaire
Copy link

Hm, I'm not sure what could be happening there. Can you check whether the person-to-be-followed appears in your follows list? It could be that the relationship cache is not correctly updated but the follow has been correctly processed.

@Crashdoom
Copy link
Author

Yep, they appear under the follows on my end and under following on their end (e.g. https://techhub.social/@Raccoon/followers).

image

@ClearlyClaire
Copy link

Can you make sure you have properly restarted all Mastodon processes? That is, are you sure you're not running old sidekiq processes on some queue?

A possible explanation for what you're seeing is that we have changed how we cache relationship data so that cache invalidation is much more efficient. But if you are running two different versions of the code at once, one version will use one cache key, while another will use a different cache key.

@Crashdoom
Copy link
Author

@ClearlyClaire I tried restarting and had the same issue, so completely stopped all services and cleanly brought them back up and that appears to have worked!

Doesn't back-fix the broken follows, but new follow attempts outgoing appear to be working again and that's all that matters!

Thank you for helping troubleshoot -- Closing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants