Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Add ability to determine whether an inducing path exists between two nodes #78

Merged
merged 36 commits into from
Jun 21, 2023

Conversation

aryan26roy
Copy link
Collaborator

@aryan26roy aryan26roy commented May 22, 2023

Closes #70

Changes proposed in this pull request:

  • Adds a function to determine whether an inducing path exists between two nodes

Before submitting

  • I've read and followed all steps in the Making a pull request
    section of the CONTRIBUTING docs.
  • I've updated or added any relevant docstrings following the syntax described in the
    Writing docstrings section of the CONTRIBUTING docs.
  • If this PR fixes a bug, I've added a test that will fail without my fix.
  • If this PR adds a new feature, I've added tests that sufficiently cover my new functionality.

After submitting

  • All GitHub Actions jobs for my pull request have passed.

@aryan26roy
Copy link
Collaborator Author

The tetrad implementation for reference.

@aryan26roy
Copy link
Collaborator Author

@adam2392 one more question:

  • Neither the children method nor the parents method finds nodes connected through a bi-directed edge. Is there a function that finds child and parent nodes connected through bi-directed edges as well?

@adam2392
Copy link
Collaborator

@adam2392 one more question:

  • Neither the children method nor the parents method finds nodes connected through a bi-directed edge. Is there a function that finds child and parent nodes connected through bi-directed edges as well?

You can query the networkx subgraph. Bidirected edges are stored using a nx.Graph.

@codecov-commenter
Copy link

codecov-commenter commented May 24, 2023

Codecov Report

Merging #78 (b23a6bd) into main (2d6f1d3) will increase coverage by 0.09%.
The diff coverage is 84.61%.

@@            Coverage Diff             @@
##             main      #78      +/-   ##
==========================================
+ Coverage   80.02%   80.11%   +0.09%     
==========================================
  Files          39       39              
  Lines        2838     2902      +64     
  Branches      759      775      +16     
==========================================
+ Hits         2271     2325      +54     
- Misses        408      412       +4     
- Partials      159      165       +6     
Impacted Files Coverage Δ
pywhy_graphs/algorithms/generic.py 85.11% <84.61%> (-0.46%) ⬇️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@aryan26roy
Copy link
Collaborator Author

@adam2392 so it turns out the "parents" method doesn't work on the bidirected sub-graph. I am using the neighbors method.(since in a bidirected graph a neighbor is also a parent.)

@adam2392
Copy link
Collaborator

can you add test cases as well?

You can just construct various graphs on paper that follow the definition or not to test.

@aryan26roy
Copy link
Collaborator Author

@adam2392 is this an appropriate amount of tests?

Copy link
Collaborator

@adam2392 adam2392 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I would say enough tests to convince the reviewer your implementation is correct ;)

I would split this into two tests.

One where L and S are possibly defined. And another where they are the empty set.

You can then have an inducing path if

X-> Y and X<->Y

aryan26roy and others added 5 commits May 26, 2023 13:34
@aryan26roy
Copy link
Collaborator Author

@adam2392 The implementation seems to be correct and all the tests are passing as well. Want to review?

Copy link
Collaborator

@adam2392 adam2392 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great progress! I added some high-level comments to the tests and also the function API.

Re docstrings: please use numpydoc style

I will review more in-depth once those tests pass because rn upon looking at the tests, I am not 100% convinced it will work as expected yet.

pywhy_graphs/algorithms/generic.py Outdated Show resolved Hide resolved
pywhy_graphs/algorithms/generic.py Outdated Show resolved Hide resolved
pywhy_graphs/algorithms/tests/test_generic.py Show resolved Hide resolved
pywhy_graphs/algorithms/tests/test_generic.py Show resolved Hide resolved
@aryan26roy
Copy link
Collaborator Author

@adam2392 I think I added all the tests you asked for. They worked without changing anything in the code. Can you take a look?

Copy link
Collaborator

@adam2392 adam2392 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments. This is a good attempt, but I think this is not matching the Java implementation. I also point out a counter-example and I'm not 100% confident that the code logic follows the logic of the inducing path definition. I think making the abstractions simpler and using proper names may make the whole thing a bit easier to implement in a modular fashion.

Here is a breakdown of pseudo-code for the inducing path algorithm in my head:

Consider starting node X and ending node Y. Add all adjacent neighbors of node X to the stack.

If we encounter a node that we have not seen yet, then we ignore it if:

  1. it is not on a path to Y not through X (possibly a naive thing to do would just be check the converted undirected graph for a path): If it's not even connected to Y, except through X, then this is not a path, let alone an inducing path.

Now the new node is forsure at least on a path (directed, or bidirected, or both) to Y. The new node is a candidate for adding to the path if:

  1. if node is in L
  2. if node is a collider, check that it is an ancestor of X, Y, or some member of S.

If any of the above two conditions, add the node to the path and also add the node to the stack, else, it is not a candidate for the path, and we move on. Continue until Y is found and return the path, or until stack is empty. Note: I don't think using a stack is necessarily the right move, but this is just for illustration purposes. We can either return all possible inducing paths, or just return 1 when found.

Looking at the Java implementation, this is at a high-level pretty similar. https://github.com/cmu-phil/tetrad/blob/2fee3e19bfd79964d872a87845da52714880b8a1/tetrad-lib/src/main/java/edu/cmu/tetrad/graph/Paths.java#L722

Comparing the tetrad and Python implementation, there are a few key differences already that seem to make all this harder:

  1. collider check is not defined in your implementation correctly
  2. checking ancestors can be done on the directed subgraph using: https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.dag.ancestors.html

Probably more, but hopefully you get the idea.

pywhy_graphs/algorithms/generic.py Outdated Show resolved Hide resolved
pywhy_graphs/algorithms/generic.py Outdated Show resolved Hide resolved
pywhy_graphs/algorithms/generic.py Outdated Show resolved Hide resolved
pywhy_graphs/algorithms/generic.py Outdated Show resolved Hide resolved
pywhy_graphs/algorithms/generic.py Show resolved Hide resolved
pywhy_graphs/algorithms/generic.py Outdated Show resolved Hide resolved
pywhy_graphs/algorithms/generic.py Outdated Show resolved Hide resolved
pywhy_graphs/algorithms/generic.py Outdated Show resolved Hide resolved
pywhy_graphs/algorithms/generic.py Outdated Show resolved Hide resolved
pywhy_graphs/algorithms/generic.py Outdated Show resolved Hide resolved
@adam2392
Copy link
Collaborator

@aryan26roy thanks for iterating on this! Feel free to come to the Discord OH, or other time periods and ping the channel so we know we should discuss this if you have any questions.

Inducing paths are somewhat abstract at first tbh, so once the definition is clear, I think this will come out cleanly.

aryan26roy and others added 6 commits June 1, 2023 21:17
Copy link
Collaborator

@adam2392 adam2392 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking better! Thanks for your hard work! Do you mind cleaning up the PR a bit? As of now some of the variable names, function names, etc. are a bit incongruent.

It makes it harder to read and debug on the fly (for me and future developers and future you).

pywhy_graphs/algorithms/generic.py Show resolved Hide resolved
@adam2392 adam2392 requested review from robertness and removed request for robertness June 12, 2023 15:02
@aryan26roy
Copy link
Collaborator Author

Yeah, patching the bugs has wreaked havoc on my naming system. Will make everything better in my next pass.

aryan26roy and others added 2 commits June 13, 2023 19:47
Co-authored-by: Adam Li <[email protected]>
Signed-off-by: Aryan Roy <[email protected]>
Signed-off-by: Aryan Roy <[email protected]>
Signed-off-by: Aryan Roy <[email protected]>
Signed-off-by: Aryan Roy <[email protected]>
@aryan26roy
Copy link
Collaborator Author

@adam2392 The docs are not being built correctly, what am I doing wrong?

Copy link
Collaborator

@adam2392 adam2392 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CI error comes cuz there isn't a references section in the docstring of inducing path.

pywhy_graphs/algorithms/generic.py Show resolved Hide resolved
@aryan26roy
Copy link
Collaborator Author

@adam2392 do you want me to fix the networkx CI issue in another PR? I can bump up the CI from [3.8, 3.10] to [3.9,3.10].

@aryan26roy
Copy link
Collaborator Author

@adam2392 I have removed the example file. Do you want to merge now?

Copy link
Collaborator

@adam2392 adam2392 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM once CI passes

@adam2392 adam2392 merged commit b71f7ee into py-why:main Jun 21, 2023
@adam2392
Copy link
Collaborator

Thanks @aryan26roy !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

A function to determine whether an inducing path exists between two nodes
3 participants