-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] Add ability to determine whether an inducing path exists between two nodes #78
Conversation
Signed-off-by: Aryan Roy <[email protected]>
The tetrad implementation for reference. |
@adam2392 one more question:
|
You can query the networkx subgraph. Bidirected edges are stored using a |
Signed-off-by: Aryan Roy <[email protected]>
Codecov Report
@@ Coverage Diff @@
## main #78 +/- ##
==========================================
+ Coverage 80.02% 80.11% +0.09%
==========================================
Files 39 39
Lines 2838 2902 +64
Branches 759 775 +16
==========================================
+ Hits 2271 2325 +54
- Misses 408 412 +4
- Partials 159 165 +6
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
@adam2392 so it turns out the "parents" method doesn't work on the bidirected sub-graph. I am using the neighbors method.(since in a bidirected graph a neighbor is also a parent.) |
can you add test cases as well? You can just construct various graphs on paper that follow the definition or not to test. |
Signed-off-by: Aryan Roy <[email protected]>
@adam2392 is this an appropriate amount of tests? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I would say enough tests to convince the reviewer your implementation is correct ;)
I would split this into two tests.
One where L and S are possibly defined. And another where they are the empty set.
You can then have an inducing path if
X-> Y and X<->Y
Signed-off-by: Aryan Roy <[email protected]>
Signed-off-by: Aryan Roy <[email protected]>
Signed-off-by: Aryan Roy <[email protected]>
Signed-off-by: Aryan Roy <[email protected]>
Signed-off-by: Aryan Roy <[email protected]>
@adam2392 The implementation seems to be correct and all the tests are passing as well. Want to review? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great progress! I added some high-level comments to the tests and also the function API.
Re docstrings: please use numpydoc style
I will review more in-depth once those tests pass because rn upon looking at the tests, I am not 100% convinced it will work as expected yet.
Signed-off-by: Aryan Roy <[email protected]>
Signed-off-by: Aryan Roy <[email protected]>
Signed-off-by: Aryan Roy <[email protected]>
@adam2392 I think I added all the tests you asked for. They worked without changing anything in the code. Can you take a look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments. This is a good attempt, but I think this is not matching the Java implementation. I also point out a counter-example and I'm not 100% confident that the code logic follows the logic of the inducing path definition. I think making the abstractions simpler and using proper names may make the whole thing a bit easier to implement in a modular fashion.
Here is a breakdown of pseudo-code for the inducing path algorithm in my head:
Consider starting node X and ending node Y. Add all adjacent neighbors of node X to the stack.
If we encounter a node that we have not seen yet, then we ignore it if:
- it is not on a path to Y not through X (possibly a naive thing to do would just be check the converted undirected graph for a path): If it's not even connected to Y, except through X, then this is not a path, let alone an inducing path.
Now the new node is forsure at least on a path (directed, or bidirected, or both) to Y. The new node is a candidate for adding to the path if:
- if node is in L
- if node is a collider, check that it is an ancestor of X, Y, or some member of S.
If any of the above two conditions, add the node to the path and also add the node to the stack, else, it is not a candidate for the path, and we move on. Continue until Y is found and return the path, or until stack is empty. Note: I don't think using a stack is necessarily the right move, but this is just for illustration purposes. We can either return all possible inducing paths, or just return 1 when found.
Looking at the Java implementation, this is at a high-level pretty similar. https://github.com/cmu-phil/tetrad/blob/2fee3e19bfd79964d872a87845da52714880b8a1/tetrad-lib/src/main/java/edu/cmu/tetrad/graph/Paths.java#L722
Comparing the tetrad and Python implementation, there are a few key differences already that seem to make all this harder:
- collider check is not defined in your implementation correctly
- checking ancestors can be done on the directed subgraph using: https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.dag.ancestors.html
Probably more, but hopefully you get the idea.
@aryan26roy thanks for iterating on this! Feel free to come to the Discord OH, or other time periods and ping the channel so we know we should discuss this if you have any questions. Inducing paths are somewhat abstract at first tbh, so once the definition is clear, I think this will come out cleanly. |
Co-authored-by: Adam Li <[email protected]> Signed-off-by: Aryan Roy <[email protected]>
Co-authored-by: Adam Li <[email protected]> Signed-off-by: Aryan Roy <[email protected]>
Co-authored-by: Adam Li <[email protected]> Signed-off-by: Aryan Roy <[email protected]>
Signed-off-by: Aryan Roy <[email protected]>
Signed-off-by: Aryan Roy <[email protected]>
Signed-off-by: Aryan Roy <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking better! Thanks for your hard work! Do you mind cleaning up the PR a bit? As of now some of the variable names, function names, etc. are a bit incongruent.
It makes it harder to read and debug on the fly (for me and future developers and future you).
Yeah, patching the bugs has wreaked havoc on my naming system. Will make everything better in my next pass. |
Co-authored-by: Adam Li <[email protected]> Signed-off-by: Aryan Roy <[email protected]>
Signed-off-by: Aryan Roy <[email protected]>
Signed-off-by: Aryan Roy <[email protected]>
Signed-off-by: Aryan Roy <[email protected]>
Signed-off-by: Aryan Roy <[email protected]>
@adam2392 The docs are not being built correctly, what am I doing wrong? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CI error comes cuz there isn't a references section in the docstring of inducing path.
Signed-off-by: Aryan Roy <[email protected]>
@adam2392 do you want me to fix the networkx CI issue in another PR? I can bump up the CI from [3.8, 3.10] to [3.9,3.10]. |
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Aryan Roy <[email protected]>
@adam2392 I have removed the example file. Do you want to merge now? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM once CI passes
Thanks @aryan26roy ! |
Closes #70
Changes proposed in this pull request:
Before submitting
section of the
CONTRIBUTING
docs.Writing docstrings section of the
CONTRIBUTING
docs.After submitting