Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

better sync #55

Closed
jerch opened this issue Jul 18, 2020 · 5 comments
Closed

better sync #55

jerch opened this issue Jul 18, 2020 · 5 comments
Labels
backlog enhancement New feature or request

Comments

@jerch
Copy link
Collaborator

jerch commented Jul 18, 2020

Investigate on options for better sync management.

Problem:
We only have transactions on CF updates currently, which is a rather weak sync guard and only works reliable in single access scenarios (very unlikely for any Django project). For concurrent access we have basically nothing, the sync state is undetermined and CF values are open for race conditions.

Ideas:
Django has not much to offer out of the box beside select_for_update regarding this. It still might be enough to get some sync management done with the help of additional lock tables. Further things to consider are single writer vs. multiple writer, optional read locks.

@jerch jerch added the enhancement New feature or request label Jul 18, 2020
@jerch
Copy link
Collaborator Author

jerch commented Jul 19, 2020

Marking this as bug, as it is currently unreliable behavior for normal usage scenarios.

@jerch jerch added bug Something isn't working and removed enhancement New feature or request labels Jul 19, 2020
@jerch
Copy link
Collaborator Author

jerch commented Jul 19, 2020

Rough idea to tackle this:

  • introduce a @computed flag locked_update (default false)
  • eval locked_update for a CF in _queryset_for_update --> pull dep source rows with select_for_update
  • filter on locked objects/qs
  • move _queryset_for_update call under transaction in update_dependent
  • pull target qs with select_for_update, also including neighboring CFs (how to document that?)
  • descend normal (still under the transaction guard)
  • release transactions during ascent

Issues:
How to avoid deadlocks if someone places a cycle on record level? This would be an error anyway, but cannot be spotted in advance yet. Also later updates of another CF on the same record would not work anymore (currently supported through straight forward TR in the graph) - needs some workaround/transaction juggling?

@mobiware
Copy link
Contributor

Not sure this is a bug, after all the whole Django ORM paradigm kind of operates under the assumption of single threaded access: you get a model object instance from a Query Set, modify it by changing attributes on the instance, and then persist it using save(). This is inherently not meant for concurrent access

So I would rather label that an improvement.

Regarding the deadlock issue, if locking is performed on a field by field basis rather than model instance basis, wouldn't that ensure that if there is no dependency cycle, then there's no possible deadlock ?

Also, regarding the locking mechanism ideally we should support distributed lock solutions like Redis-based distributed locks (not all concurrent accesses might originate from the same process or even machine)

@jerch
Copy link
Collaborator Author

jerch commented Jul 21, 2020

Not sure this is a bug, after all the whole Django ORM paradigm kind of operates under the assumption of single threaded access: you get a model object instance from a Query Set, modify it by changing attributes on the instance, and then persist it using save(). This is inherently not meant for concurrent access

Yes, thats a culprit of all ORMs more or less. We cannot fix that here, as it is an inherent issue with object mappers itself, but I think we should at least offer some of the measures to secure the access. I marked it as bug, since ppl might get into multi connection scenarios pretty quick by using a multi-process HTTP server, where they will see those hard to track down race conditions.

Regarding the deadlock issue, if locking is performed on a field by field basis rather than model instance basis, wouldn't that ensure that if there is no dependency cycle, then there's no possible deadlock ?

Yes field locking would solve that with proper TR in place, but imho django does not expose any field locking mechanism, select_for_update is the only thing exposed and locks on table record level per transaction. Field locking could be implemented with select_for_update and the help of an additional lock table as an access guard (as simple as that - any certain CF write access joins on the lock table entry with select_for_update, thus blocking any other write attempt). But thats not enough - we would have to lock all fields in depends, as they must not be altered by other transactions during the CF update.

Also, regarding the locking mechanism ideally we should support distributed lock solutions like Redis-based distributed locks (not all concurrent accesses might originate from the same process or even machine)

Wouldn't that already be handled by the DBMS, if it supports ACID conform transactions?

@jerch jerch added enhancement New feature or request and removed bug Something isn't working labels Jul 21, 2020
@jerch
Copy link
Collaborator Author

jerch commented Jul 21, 2020

@mobiware Changed the flag back to enhancement, bug is way too exaggerating.

@jerch jerch added the backlog label Apr 12, 2022
@jerch jerch closed this as completed Apr 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backlog enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants