better sync #55

jerch · 2020-07-18T08:40:21Z

Investigate on options for better sync management.

Problem:
We only have transactions on CF updates currently, which is a rather weak sync guard and only works reliable in single access scenarios (very unlikely for any Django project). For concurrent access we have basically nothing, the sync state is undetermined and CF values are open for race conditions.

Ideas:
Django has not much to offer out of the box beside select_for_update regarding this. It still might be enough to get some sync management done with the help of additional lock tables. Further things to consider are single writer vs. multiple writer, optional read locks.

The text was updated successfully, but these errors were encountered:

jerch · 2020-07-19T12:44:27Z

Marking this as bug, as it is currently unreliable behavior for normal usage scenarios.

jerch · 2020-07-19T20:05:28Z

Rough idea to tackle this:

introduce a @computed flag locked_update (default false)
eval locked_update for a CF in _queryset_for_update --> pull dep source rows with select_for_update
filter on locked objects/qs
move _queryset_for_update call under transaction in update_dependent
pull target qs with select_for_update, also including neighboring CFs (how to document that?)
descend normal (still under the transaction guard)
release transactions during ascent

Issues:
How to avoid deadlocks if someone places a cycle on record level? This would be an error anyway, but cannot be spotted in advance yet. Also later updates of another CF on the same record would not work anymore (currently supported through straight forward TR in the graph) - needs some workaround/transaction juggling?

mobiware · 2020-07-20T21:36:27Z

Not sure this is a bug, after all the whole Django ORM paradigm kind of operates under the assumption of single threaded access: you get a model object instance from a Query Set, modify it by changing attributes on the instance, and then persist it using save(). This is inherently not meant for concurrent access

So I would rather label that an improvement.

Regarding the deadlock issue, if locking is performed on a field by field basis rather than model instance basis, wouldn't that ensure that if there is no dependency cycle, then there's no possible deadlock ?

Also, regarding the locking mechanism ideally we should support distributed lock solutions like Redis-based distributed locks (not all concurrent accesses might originate from the same process or even machine)

jerch · 2020-07-21T09:16:09Z

Not sure this is a bug, after all the whole Django ORM paradigm kind of operates under the assumption of single threaded access: you get a model object instance from a Query Set, modify it by changing attributes on the instance, and then persist it using save(). This is inherently not meant for concurrent access

Yes, thats a culprit of all ORMs more or less. We cannot fix that here, as it is an inherent issue with object mappers itself, but I think we should at least offer some of the measures to secure the access. I marked it as bug, since ppl might get into multi connection scenarios pretty quick by using a multi-process HTTP server, where they will see those hard to track down race conditions.

Regarding the deadlock issue, if locking is performed on a field by field basis rather than model instance basis, wouldn't that ensure that if there is no dependency cycle, then there's no possible deadlock ?

Yes field locking would solve that with proper TR in place, but imho django does not expose any field locking mechanism, select_for_update is the only thing exposed and locks on table record level per transaction. Field locking could be implemented with select_for_update and the help of an additional lock table as an access guard (as simple as that - any certain CF write access joins on the lock table entry with select_for_update, thus blocking any other write attempt). But thats not enough - we would have to lock all fields in depends, as they must not be altered by other transactions during the CF update.

Also, regarding the locking mechanism ideally we should support distributed lock solutions like Redis-based distributed locks (not all concurrent accesses might originate from the same process or even machine)

Wouldn't that already be handled by the DBMS, if it supports ACID conform transactions?

jerch · 2020-07-21T10:24:52Z

@mobiware Changed the flag back to enhancement, bug is way too exaggerating.

jerch added the enhancement New feature or request label Jul 18, 2020

This was referenced Jul 18, 2020

Add support for Django 2.1 #49

Closed

notifications from resolver CF updates #56

Open

jerch added bug Something isn't working and removed enhancement New feature or request labels Jul 19, 2020

jerch added enhancement New feature or request and removed bug Something isn't working labels Jul 21, 2020

jerch added the backlog label Apr 12, 2022

jerch closed this as completed Apr 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

better sync #55

better sync #55

jerch commented Jul 18, 2020

jerch commented Jul 19, 2020 •

edited

Loading

jerch commented Jul 19, 2020

mobiware commented Jul 20, 2020

jerch commented Jul 21, 2020

jerch commented Jul 21, 2020 •

edited

Loading

better sync #55

better sync #55

Comments

jerch commented Jul 18, 2020

jerch commented Jul 19, 2020 • edited Loading

jerch commented Jul 19, 2020

mobiware commented Jul 20, 2020

jerch commented Jul 21, 2020

jerch commented Jul 21, 2020 • edited Loading

jerch commented Jul 19, 2020 •

edited

Loading

jerch commented Jul 21, 2020 •

edited

Loading