Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GAIAPLAT-1608] Fix gaia_ptr_t initialization in gaia_ptr_t::auto_connect_to_parent() #1043

Merged
merged 3 commits into from
Oct 27, 2021

Conversation

simone-gaia
Copy link
Contributor

gaia_ptr_t::auto_connect_to_parent() was initializing gaia_ptr_t by passing the gaia_id to the constructor. The constructor though expects a locator. This lead to sporadic failures.

  • Remove auto in persistent_store_manager.cpp where the type cannot be easily inferred by the rhs.
  • Minor improvements to the direct_access example.
  • Fix warning in event_manager.cpp

@simone-gaia simone-gaia changed the title [GAIAPLAT-1608] Fix wrong gaia_ptr_t handling in gaia_ptr_t::auto_connect_to_parent() [GAIAPLAT-1608] Fix gaia_ptr_t initialization in gaia_ptr_t::auto_connect_to_parent() Oct 27, 2021
@senderista
Copy link
Contributor

Yikes, another reason to make locators/offsets/IDs strongly typed!

Copy link
Contributor

@senderista senderista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this sort of bug is likely to recur, what do you think of removing all public constructors for gaia_ptr_t and replacing them with explicit factory methods like from_locator()?

production/examples/direct_access/hospital.cpp Outdated Show resolved Hide resolved
@@ -562,7 +562,7 @@ void gaia_ptr_t::auto_connect_to_parent(
// record, we need to disconnect them.
if (!has_connected)
{
gaia_ptr_t child_ptr(child_id);
gaia_ptr_t child_ptr = gaia_ptr_t::open(child_id);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weren't these lines equivalent? I would have expected them to be.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, they aren't:

gaia_ptr_t::gaia_ptr_t(locator)
gaia_ptr_t::open(gaia_id)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is a difference, we should try to eliminate it, otherwise this is a weakness of the API and will lead to further such issues.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. @senderista had a suggestion. I would crate a follow-up jira for it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should disable constructor initialization and have explicit open methods only: open_locator() and open_gaia_id(). Then at least the intention is clear and one could check more easily whether the expected parameter is passed in.

Feel free to do this in a followup PR if it's too much of a change for now, but do add a JIRA iterm if you plan to address it this way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -121,10 +119,10 @@ void persistent_store_manager::prepare_wal_for_write(gaia::db::txn_log_t* log, c
// The key_count variable represents the number of puts + deletes.
size_t key_count = 0;
// Obtain RocksDB transaction object.
auto txn = m_rdb_internal->get_txn_by_name(txn_name);
rocksdb::Transaction* txn = m_rdb_internal->get_txn_by_name(txn_name);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing these. I don't like auto when I can't tell what's the underlying type.

@simone-gaia
Copy link
Contributor Author

Given that this sort of bug is likely to recur, what do you think of removing all public constructors for gaia_ptr_t and replacing them with explicit factory methods like from_locator()?

I agree.

@@ -562,7 +562,7 @@ void gaia_ptr_t::auto_connect_to_parent(
// record, we need to disconnect them.
if (!has_connected)
{
gaia_ptr_t child_ptr(child_id);
gaia_ptr_t child_ptr = gaia_ptr_t::open(child_id);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really strange. Do we use to have a constructor that takes gaia_id_t similar to what open does here? I am wondering otherwise how this can work in the first place. We do have a unit test (auto_connect_test.child_update_disconnect) for this scenario.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we use to have a constructor that takes gaia_id_t similar to what open does here?

Not sure about it. Created a follow up task here: https://gaiaplatform.atlassian.net/browse/GAIAPLAT-1635

Agree, I do not understand how this worked in the first place.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Seems like a regression from: d764f7c
I am still not sure why the unit test can pass though... 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we use to have a constructor that takes gaia_id_t similar to what open does here?

Maybe, but we can only have a single constructor if the underlying type is the same integer type. This is why I was proposing using struct-based types instead of typedef-based types. See #801

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that because both gaia_id_t and gaia_locator_t are allocated sequentially in the current implementation, in some simple tests the 2 sequences advanced in lockstep and locator/gaia_id happened to coincide for the objects being tested.

There are 2 takeaways here, I think, one short-term and one long-term: in the short term we need to audit for and remove all public constructors that take types that are typedef-equivalent (like gaia_id_t/gaia_locator_t), and in the long term we need to make these types strongly compile-typed (like enum class). The latter goal has been discussed for a while but seems to have some difficult edge cases (at least one involving std::atomic IIRC). Here are 2 examples of this approach:

https://github.com/foonathan/type_safe
https://github.com/rollbear/strong_type

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I attached the debugger. The locator and id in the unit test are indeed the same. Both were 84 in my debugging session.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's tempting to randomize gaia_ids to flush out these bugs, but we can't really do that right now because we allow up to 2^32 locators, and if we generate 2^32 64-bit gaia_ids then we have ~50% probability of at least one collision.

Copy link
Contributor

@LaurentiuCristofor LaurentiuCristofor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left one suggestion for further improvement, to do now or later.

@simone-gaia simone-gaia merged commit 1cb580c into master Oct 27, 2021
@simone-gaia simone-gaia deleted the rondelli-fix-autoconnect-to-parent branch October 27, 2021 18:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants