-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GAIAPLAT-1608] Fix gaia_ptr_t initialization in gaia_ptr_t::auto_connect_to_parent() #1043
Conversation
Yikes, another reason to make locators/offsets/IDs strongly typed! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that this sort of bug is likely to recur, what do you think of removing all public constructors for gaia_ptr_t
and replacing them with explicit factory methods like from_locator()
?
@@ -562,7 +562,7 @@ void gaia_ptr_t::auto_connect_to_parent( | |||
// record, we need to disconnect them. | |||
if (!has_connected) | |||
{ | |||
gaia_ptr_t child_ptr(child_id); | |||
gaia_ptr_t child_ptr = gaia_ptr_t::open(child_id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weren't these lines equivalent? I would have expected them to be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, they aren't:
gaia_ptr_t::gaia_ptr_t(locator)
gaia_ptr_t::open(gaia_id)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is a difference, we should try to eliminate it, otherwise this is a weakness of the API and will lead to further such issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. @senderista had a suggestion. I would crate a follow-up jira for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should disable constructor initialization and have explicit open methods only: open_locator()
and open_gaia_id()
. Then at least the intention is clear and one could check more easily whether the expected parameter is passed in.
Feel free to do this in a followup PR if it's too much of a change for now, but do add a JIRA iterm if you plan to address it this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -121,10 +119,10 @@ void persistent_store_manager::prepare_wal_for_write(gaia::db::txn_log_t* log, c | |||
// The key_count variable represents the number of puts + deletes. | |||
size_t key_count = 0; | |||
// Obtain RocksDB transaction object. | |||
auto txn = m_rdb_internal->get_txn_by_name(txn_name); | |||
rocksdb::Transaction* txn = m_rdb_internal->get_txn_by_name(txn_name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing these. I don't like auto when I can't tell what's the underlying type.
I agree. |
@@ -562,7 +562,7 @@ void gaia_ptr_t::auto_connect_to_parent( | |||
// record, we need to disconnect them. | |||
if (!has_connected) | |||
{ | |||
gaia_ptr_t child_ptr(child_id); | |||
gaia_ptr_t child_ptr = gaia_ptr_t::open(child_id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really strange. Do we use to have a constructor that takes gaia_id_t
similar to what open
does here? I am wondering otherwise how this can work in the first place. We do have a unit test (auto_connect_test.child_update_disconnect
) for this scenario.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we use to have a constructor that takes gaia_id_t similar to what open does here?
Not sure about it. Created a follow up task here: https://gaiaplatform.atlassian.net/browse/GAIAPLAT-1635
Agree, I do not understand how this worked in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Seems like a regression from: d764f7c
I am still not sure why the unit test can pass though... 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we use to have a constructor that takes gaia_id_t similar to what open does here?
Maybe, but we can only have a single constructor if the underlying type is the same integer type. This is why I was proposing using struct-based types instead of typedef-based types. See #801
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess that because both gaia_id_t
and gaia_locator_t
are allocated sequentially in the current implementation, in some simple tests the 2 sequences advanced in lockstep and locator/gaia_id happened to coincide for the objects being tested.
There are 2 takeaways here, I think, one short-term and one long-term: in the short term we need to audit for and remove all public constructors that take types that are typedef-equivalent (like gaia_id_t
/gaia_locator_t
), and in the long term we need to make these types strongly compile-typed (like enum class
). The latter goal has been discussed for a while but seems to have some difficult edge cases (at least one involving std::atomic
IIRC). Here are 2 examples of this approach:
https://github.com/foonathan/type_safe
https://github.com/rollbear/strong_type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I attached the debugger. The locator and id in the unit test are indeed the same. Both were 84 in my debugging session.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's tempting to randomize gaia_ids to flush out these bugs, but we can't really do that right now because we allow up to 2^32 locators, and if we generate 2^32 64-bit gaia_ids then we have ~50% probability of at least one collision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left one suggestion for further improvement, to do now or later.
gaia_ptr_t::auto_connect_to_parent()
was initializinggaia_ptr_t
by passing thegaia_id
to the constructor. The constructor though expects alocator
. This lead to sporadic failures.persistent_store_manager.cpp
where the type cannot be easily inferred by the rhs.direct_access
example.event_manager.cpp