Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C++ BMI Module Integration #343

Merged
merged 21 commits into from
Mar 9, 2022
Merged

Conversation

mattw-nws
Copy link
Contributor

@mattw-nws mattw-nws commented Nov 22, 2021

Support for C++ BMI modules loaded from shared libraries.

Additions

  • C++ module loading code (Adapter, Formulation, Formulation_Constructor...)
  • Test harness C++ BMI model (which mirrors C test model implementation)
  • Test code (largely duplicated from C BMI test code)
  • Documentation updates

Removals

Changes

Testing

  • Added test runner tasks to the workflows for C++ models on Ubuntu and macOS--these tests pass.

Screenshots

Notes

Todos

Checklist

  • PR has an informative and human-readable title
  • Changes are limited to a single goal (no scope creep)
  • Code can be automatically merged (no conflicts)
  • Code follows project standards (link if applicable)
  • Passes all existing automated tests
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future todos are captured in comments
  • Project documentation has been updated (including the "Unreleased" section of the CHANGELOG)
  • Reviewers requested with the Reviewers tool ➡️

Testing checklist (automated report can be put here)

Target Environment support

  • Linux

@mattw-nws mattw-nws force-pushed the integrate-freezethaw branch from e3e68cd to 1d5f076 Compare November 26, 2021 19:09
@mattw-nws mattw-nws marked this pull request as ready for review November 29, 2021 15:17
* `bmi_c`
* `bmi_fortran`
* `bmi_python`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notably, the docs don't cover this yet... though it's supported... should it be removed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those (bmi_fortran, bmi_python, bmi_multi) should have been added previously, so adding them now is appropriate. A section on details of Python module usage is probably also still needed, but can be added separately.

bmi_model = std::shared_ptr<Cpp_Bmi>(
dynamic_creator(),
[](Cpp_Bmi* p) {
// DO NOTHING.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking for feedback on the below. It may make the most sense to simply go back to classic pointers... this is what I evolved to. ???

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is confusing me a bit, just walking through the life cycle of the bmi_model in this adapter.

  1. Adapter is created
    a) BMI dll handle is created
    b) BMI Model pointer created, associated with ONE adapter (and ONE dll handle)
  2. Adapter proxy's calls to BMI pointer
  3. Adapter is no longer needed, gets deleteed.
    a) The way this code is right now, the only custom cleanup happens in AbstractCLibBmiAdapter::~AbstractCLibBmiAdapter(), since this class' destructor is empty (it gets executed, then the parent class destructor will get executed)
    b) Next finalizeForLibAbstraction() is called. Since there is no override for this virutal function in this subclass, the parent implmentation is called.
    c) FInally, this triggers the dlclose
    c++ void finalizeForLibAbstraction() { // close the dynamically loaded library if (dyn_lib_handle != nullptr) { dlclose(dyn_lib_handle); } }
    As I look through this implementation, I don't see how Finalize() is ever actually called. In the BMI C lib apdater, Finalize() on the BMI model is triggered by an external call to Finalize() (which we don't use) or implicitlyby the Adapter destructure (which overrides and points to the same function that the adapter Finalize calls...).

So alltogether, I don't see how any finalization and cleanup is going to happen with this implementation, and I'm concerned that if you were seeing issues with ref-counting this shared pointer and that causing the order of deletion to be affected (the Adapter and its base were destroyed before the shared pointer) then I also wonder if this pointer really needs to be shared (in the OWNERSHIP perspective) or if it should truely be a unique_ptr and we simply share REFERENCES to the pointer as needed.

This may require some additional disscussion.

Copy link
Contributor Author

@mattw-nws mattw-nws Feb 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I may be replying too quick here, but... Yes, that's correct... my assumption was that we treat every adapter as if it were a plain BMI C++ model... that means that there must be a step # 2.(n-1) where Finalize() is called on the adapter (which should be proxied to the underlying model object in the adapter's Finalize() method, but was missed above, per my comment there). Are we not calling Finalize() on the adapters before deleting them in step 3? If not... I might start to argue that we should be. Otherwise we're implementing the BMI interface according to the .hpp file but are not using the interface according to its semantics (I'm assuming those semantics require or imply that Finalize() should be called before delete).

Copy link
Contributor

@robertbartel robertbartel Feb 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few things:

  • finalizeForLibAbstraction() is deliberately not virtual, so the logic needed for Finalize() could be called from the destructor more safely (and then reused more easily by Finalize() itself)
  • See earlier comment again about how this type's destructor probably needs to be calling Finalize() or something equivalent
  • I agree with the earlier comment, and the one above, that the module's own Finalize() doesn't seem to be getting called as part of this object's Finalize(), but should be
  • Yes, we are implementing the BMI interface, but we are not limiting ourselves to exclusively using the semantics of the interface
    • It makes sense to have an adapter implement the interface, as this aligns it intuitively for its role as a proxy
    • It should not be necessary to restrict a BMI implementation to only work (internally) via the BMI interface functions
    • It doesn't make sense to restrict the framework to only working with an adapter via BMI, because we have the code (e.g., why require explicitly calling Finalize() when it makes sense to have this happen during the destructor?)
    • However, adapters must be written to work in a consistent manner whether we follow the more strict BMI workflow (e.g., explicitly call Finalize() somewhere) or not (e.g., expect the destructor to also handling Finalize())

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • It doesn't make sense to restrict the framework to only working with an adapter via BMI, because we have the code (e.g., why require explicitly calling Finalize() when it makes sense to have this happen during the destructor?)

Well, I'd argue It does make sense to restrict ourselves, at least (or especially) if we ever expected to treat adapters and non-adapter C++ BMI models interchangeably...for which we might not have the code. It's not an explicit part of the interface, but is probably implicit in the name--that if you don't call it explicitly then memory may not be freed properly... at the least, we couldn't assume that the non-adapter model would call Finalize on itself in its destructor. It becomes a LSP thing at that point.

A related question...are we considering it safe to call Finalize() more than once? I would generally think we'd assume that was not safe, but I think what's being discussed makes that possible or even likely. Again, the BMI docs are not helpful here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hellkite500 On another part of your first comment above... The bmi_model property is defined in the parent class Bmi_Adapter<T> as shared_ptr<T>; I'm not sure if that was for a reason of needing to hand out the underlying model for some reason--to me it probably makes sense for it to be unique_ptr instead. But, that would require refactoring all the adapters to some degree.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattw-nws, FWIW, we are actually getting the underlying C struct and Python model in certain tests. It may still be more appropriate to change to unique_ptr and do something differently in the tests, but that'd be another aspect to any changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactoring the bmi_model property seems out of scope. Other topics surrounding Finalize() are resolved in d69094a. Resolve this conversation, @hellkite500 and @robertbartel ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean this commit? d73eee3

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extern/test_bmi_cpp/include/bmi.hpp Outdated Show resolved Hide resolved
extern/test_bmi_cpp/src/test_bmi_cpp.cpp Outdated Show resolved Hide resolved
extern/test_bmi_cpp/src/test_bmi_cpp.cpp Outdated Show resolved Hide resolved
extern/test_bmi_cpp/src/test_bmi_cpp.cpp Outdated Show resolved Hide resolved
extern/test_bmi_cpp/src/test_bmi_cpp.cpp Outdated Show resolved Hide resolved
test/realizations/catchments/Bmi_Cpp_Adapter_Test.cpp Outdated Show resolved Hide resolved
test/realizations/catchments/Bmi_Cpp_Adapter_Test.cpp Outdated Show resolved Hide resolved
adapter->Finalize();
}

/** Test that the update function works for a single update. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it really work if it doesn't get the right values back? Why have Update_0_a and Update_0_b tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dunno! Ask him. (@robertbartel) Maybe now you can guess what //Everything below this line is?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually...I'm inferring that this is based on an approach to keeping unit tests to testing a single thing... the first test tries to catch exceptions or other unexpected failures...if that passes, then we can worry about whether it actually succeeded (assuming they run in order).

I'm more wondering why we are naming our human-readable test names with zero-based indexing...Stockholm Syndrome?

Copy link
Contributor

@donaldwj donaldwj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks to me like everything is in order. We may want to formally discuss the register/create issue and how what start point different languages should expect from a DLL.

utils::StreamHandler output, bool do_initialization)
: AbstractCLibBmiAdapter<Cpp_Bmi>(type_name, library_file_path, std::move(bmi_init_config), std::move(forcing_file_path), allow_exceed_end,
has_fixed_time_step, creator_func, output)
//TODO: We are passing creator_func as registration_func because AbstractCLibBmiAdapter expects it to exist, but are not using it the same way...may be okay but we may want to generalize that assumption out!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general Finalize() should only clean up memory(). Setup for a new run should probably be Finalize() followed by Init()

Copy link
Contributor

@hellkite500 hellkite500 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would like to see the inline comments about the life cycle of a Cpp_BMI pointer/object updated to keep track of exactly what is going on with that complex component, but I won't hold this up if we want to address that in a follow up PR.

dynamic_creator = (ModelCreator) symbol;
bmi_model = std::shared_ptr<Cpp_Bmi>(
dynamic_creator(),
[](Cpp_Bmi* p) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The class/destruction heiarchy looks like bmi::BMI -> BMI_Adapter -> AbstrarctCLibBmiAdapter -> Bmi_Cpp_adapter

Possibilities to consider:

  1. shared_ptr ref count goes to 0
    1. only happens if BMI_Adapter drops its reference??? But no guarantee of this...
  2. BMI_adapter is deleted
    1. Delete Bmi_cpp_adapter
    2. delete AbstarctCLibBmiAdapter
      1. calls dl_close
    3. delete BMI_Adapter
      1. delete shared_ptr, ref_count == 0, delete on pointer
      2. error, dl_close happened already, cannot call destroy function...

We should protect the resource behind the shared pointer...I think it works now, but for the wrong reasons...

Scenario 1 won't likely happen right now....but the cas when that becomes problematic is in the future when something decides to use get_bmi_model() and gets a shared_ptr (ref count up...) and they HOLD that pointer until after the adapter is deleted (for whatever reason...) then you have a very problematic state...

A future PR to refactor these BMI pointer resources to unique_ptr should help with ensuring that ownership and cleanup of these abstract resources is the sole responsibility of the adapter. What we have here should work for the moment...

The comment in these delete should probably also better reflect this overral life cycle (will help explain why the seg fault occurs if destrory() is called there....)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants