Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Channel is already publishing-- revert late acknowledgment #3968

Closed
3 tasks done
sentry-io bot opened this issue Mar 3, 2023 · 3 comments · Fixed by #3984
Closed
3 tasks done

Channel is already publishing-- revert late acknowledgment #3968

sentry-io bot opened this issue Mar 3, 2023 · 3 comments · Fixed by #3984
Assignees
Labels
DEV: backend P0 - critical Priority: Release blocker or regression

Comments

@sentry-io
Copy link

sentry-io bot commented Mar 3, 2023

Summary

If async tasks aren't acknowledged within a certain amount of time, celery will redeliver them to a worker, which means more than one worker could be attempting to process channel changes (like publishing) at a time, resulting in the error below. Currently, tasks are configured to acknowledge 'late' meaning they're acknowledged once completed. This is a helpful feature if workers crash, go offline, or are interrupted by a release, since it means those tasks will be redelivered. But if a task takes a long time, then those could cause inadvertent problems from the concurrency.

Tasks

Preview Give feedback

Sentry Issue: STUDIO-FC5

ValidationError: [ErrorDetail(string='Channel is already publishing', code='invalid')]
  File "contentcuration/viewsets/channel.py", line 467, in publish_from_changes
    self.publish(
  File "contentcuration/viewsets/channel.py", line 484, in publish
    raise ValidationError("Channel is already publishing")
@bjester bjester added DEV: backend P0 - critical Priority: Release blocker or regression labels Mar 3, 2023
@bjester bjester added this to the Studio: next major release milestone Mar 3, 2023
@vkWeb
Copy link
Member

vkWeb commented Mar 6, 2023

@bjester sir, I would like to work on this. I'll need some background understanding of this. Let us connect on Slack about this sir.

@vkWeb
Copy link
Member

vkWeb commented Mar 6, 2023

My specific question is:
celery will redeliver them to a worker

When the task is under processing, celery should know that the task is under processing since the worker has not acknowledged its completion/failure...? so why does it redeliver to worker for duplicate processing?

@bjester
Copy link
Member

bjester commented Mar 6, 2023

Celery supports many different backends, both for queuing and result storage, so its underlying behavior is somewhat more generic and follows a 'message queue' architecture. In simple terms, the mechanics of this 'message queue' fall into a few categories: a message, a producer, a queue, and a consumer. This structure decouples message consumption from the the queue handling and the producer logic. Since a task's status is a part of the consumer's role, that can't be relied upon in the mechanics of the message queue (and out of the box, celery supports saving task results, not pending tasks-- the existing of pending tasks in the database is something we added on ourselves). Message acknowledgement is the primary mechanism for interacting between the different parts, which removes the message from the queue. This allows celery to support services like RabbitMQ, whose docs have some additional info.

When the task is under processing, celery should know that the task is under processing since the worker has not acknowledged its completion/failure...?

Technically, we configured celery this way, @vkWeb! It's currently configured to acknowledge messages after completion or failure. That allows us to take advantage of features (without extra development from ourselves) to ensure tasks are processed in the event of unexpected situations like the worker crashes, gets killed for some reason, or support delivery to another worker if another worker becomes available when all others are busy. That in itself is implementation specific to what the tasks actually do as well. The downside is that tasks need to be performant in order to prevent re-delivery if they were to take too long.

So because we're making acknowledgement early, that means we'll need to do that 'extra development' in order to get that protection from worker issues. Making acknowledgement early is as simple as flipping a boolean: https://github.com/learningequality/studio/blob/unstable/contentcuration/contentcuration/utils/celery/tasks.py#L108

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DEV: backend P0 - critical Priority: Release blocker or regression
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants