Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Export and import graph classes from other common interfaces #60

Merged
merged 10 commits into from
Mar 7, 2023

Conversation

adam2392
Copy link
Collaborator

@adam2392 adam2392 commented Feb 27, 2023

Towards #57

Changes proposed in this pull request:

  • numpy export
  • pcalg export
  • tetrad export
  • causal learn export now tested
  • I removed "integration tests", since pywhy-graphs is more of an upstream package and we want to integrate support for exporting/importing other graph packages

Now need to add manual tests using what is present in: https://github.com/py-why/causal-learn/blob/main/tests/fci-test-data/result-py-FCI-8.txt for tetrad.

Before submitting

  • I've read and followed all steps in the Making a pull request
    section of the CONTRIBUTING docs.
  • I've updated or added any relevant docstrings following the syntax described in the
    Writing docstrings section of the CONTRIBUTING docs.
  • If this PR fixes a bug, I've added a test that will fail without my fix.
  • If this PR adds a new feature, I've added tests that sufficiently cover my new functionality.

After submitting

  • All GitHub Actions jobs for my pull request have passed.

Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
@adam2392 adam2392 marked this pull request as draft February 27, 2023 18:47
Signed-off-by: Adam Li <[email protected]>
@jaron-lee
Copy link
Collaborator

At some point it would be great if you could pick one of causallearn.py, tests/test_causallearn.py or causal_learn.py, tests/test_causal_learn.py, it's ruining my Vim projectionist heuristics when I switch back and forth 😂

@jaron-lee
Copy link
Collaborator

So I notice that tigramite has a hard dependency on exactly one patch version of scipy (1.8.0) - so happens that ananke requires ^1.8.1. In this particular case I could go and decrement the patch version in ananke, but is there another way to set this up in poetry? The version conflict here is not that bad given that we don't actually need both of these packages at the same time per se.

@jaron-lee
Copy link
Collaborator

jaron-lee commented Mar 5, 2023

Ran into another issue - when I try and run unit tests python is picking up networkx.py as the networkx package (instead of whatever comes from pip). Can I suggest that we have a different naming convention (one that does not involve the actual package name?)

EDIT: I guess I can just work from another folder, never mind!

@adam2392
Copy link
Collaborator Author

adam2392 commented Mar 5, 2023

What's the issue? The export submodule or the networkx submodule for mixed edge graphs?

@adam2392
Copy link
Collaborator Author

adam2392 commented Mar 5, 2023

So I notice that tigramite has a hard dependency on exactly one patch version of scipy (1.8.0) - so happens that ananke requires ^1.8.1. In this particular case I could go and decrement the patch version in ananke, but is there another way to set this up in poetry? The version conflict here is not that bad given that we don't actually need both of these packages at the same time per se.

Ah yeah that's an issue. If it's easy to do it for ananke, then that would be great!

I'll also submit an issue to tigramite to make the dependency not hard. I know the maintainer.

@jaron-lee
Copy link
Collaborator

What's the issue? The export submodule or the networkx submodule for mixed edge graphs?

I think the export submodule - basically running tests from that folder would get python confused about which networkx to import. Might just be a weird issue on my test runner's end tho.

@adam2392
Copy link
Collaborator Author

adam2392 commented Mar 5, 2023

Ah I see. We can def rename things if it's an issue.

You can just run pytest from the root of the package folder. E.g.

Pytest ./pywhy_graphs/...

And it should work.

adam2392 added 2 commits March 5, 2023 13:08
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
@adam2392 adam2392 marked this pull request as ready for review March 5, 2023 18:09
@adam2392 adam2392 requested review from jaron-lee and robertness March 5, 2023 18:10
@adam2392
Copy link
Collaborator Author

adam2392 commented Mar 5, 2023

This is somewhat of a big PR, and happy to break it up, but just wanted to get a high level review of the code from you @jaron-lee and @robertness to make sure the API looks good to you?

@adam2392 adam2392 added this to the Release v0.1 milestone Mar 5, 2023
adam2392 added 2 commits March 5, 2023 13:45
Signed-off-by: Adam Li <[email protected]>
@codecov-commenter
Copy link

codecov-commenter commented Mar 5, 2023

Codecov Report

Merging #60 (0b8d23b) into main (c387f8a) will increase coverage by 0.22%.
The diff coverage is 66.38%.

@@            Coverage Diff             @@
##             main      #60      +/-   ##
==========================================
+ Coverage   84.03%   84.26%   +0.22%     
==========================================
  Files          29       33       +4     
  Lines        2255     2510     +255     
  Branches      578      671      +93     
==========================================
+ Hits         1895     2115     +220     
+ Misses        254      250       -4     
- Partials      106      145      +39     
Impacted Files Coverage Δ
pywhy_graphs/export/tigramite.py 0.00% <0.00%> (ø)
pywhy_graphs/testing.py 50.00% <50.00%> (ø)
pywhy_graphs/export/pcalg.py 55.04% <55.04%> (ø)
pywhy_graphs/export/tetrad.py 68.75% <68.75%> (ø)
pywhy_graphs/export/numpy.py 72.85% <72.85%> (ø)
pywhy_graphs/networkx/classes/mixededge.py 91.34% <75.00%> (+1.34%) ⬆️
pywhy_graphs/export/causallearn.py 79.22% <80.95%> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Signed-off-by: Adam Li <[email protected]>
@jaron-lee
Copy link
Collaborator

Overall I think the API looks good, but have a few comments, mostly from the perspective of the Ananke API development.

In the <package>_to_graph function we have this argument graph_type. Should this be a mandatory argument? Sometimes (e.g. in Ananke) the graph already carries a type which has a corresponding equivalent in pywhy_graphs, so graph_type can be easily inferred.

Additionally the graph_to_<package> function is typed for MixedEdgeGraph input. Do we expect users to actually pass in MixedEdgeGraph, or will we expect them to pass a child class like ADMG? Or should we expect both? This also has implications for exporting to Ananke.

@adam2392
Copy link
Collaborator Author

adam2392 commented Mar 7, 2023

In the <package>_to_graph function we have this argument graph_type. Should this be a mandatory argument? Sometimes (e.g. in Ananke) the graph already carries a type which has a corresponding equivalent in pywhy_graphs, so graph_type can be easily inferred.

The graph_type parameter is just used to make sure the user explicitly spells out which graph to convert to because sometimes it is not clear. For example, pcalg encodes CPDAG and PAG as a numpy array, but there 1s and 0s mean different things, so we need to have different logic. If ananke can let us know easily which graph type to use, then I would say the graph_type parameter is not needed.

Note: not all graph types are supported in each export function. E.g. pcalg. We can change that tho, but I avoided the extra work.

Additionally the graph_to_<package> function is typed for MixedEdgeGraph input. Do we expect users to actually pass in MixedEdgeGraph, or will we expect them to pass a child class like ADMG? Or should we expect both? This also has implications for exporting to Ananke.

graph_to_<package> can usually eat any graph because we have semantic separation of all types of graphs. The ADMG subclasses a MixedEdgeGraph but with very specific structure.

Lmk how that sounds. If there's anything you want me to change for downstream ananke PR, lmk!

tetrad format.
"""
# instantiate the type of causal graph
if graph_type == "dag":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I have seen different ways of dealing with DAGs - should we use nx.DiGraph or send it to pywhy_graphs? ADMG here seems like a hack. Maybe an in-house pywhy_graphs DAG class is needed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. Perhaps nx.DiGraph with some added semantics to ensure construction is acyclic... But it'd be too expensive to check acyclicity every time someone adds a node.

On one hand it seems overkill to recreate a DAG class when nx.DiGraph works. On the other hand, the mental switching is a bit weird.

Tbh not sure exactly which way to go.

import networkx as nx


def assert_mixed_edge_graphs_isomorphic(G1, G2):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, I presume all roundtrip tests should use this testing function ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep!

FYI I realized there's a lot of LOC scattered in the codebase doing this.



def test_to_tetrad(tmp_path):
G = tetrad_to_graph(filename, graph_type="pag")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if a user has graph_type="dag" instead here? Will this function detect that there are o-> edges and prevent that? Clearly the imported file does not represent a DAG so we shouldn't let the user make this mistake. Or we should have clear tests to show what happens when you import a graph under a different graph assumption than the one it was created with

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh good point... I guess I can add a check for valid "edge types" present in each graph. I should probably add this to tetrad, pcalg and causallearn.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to leave this for another PR.

@jaron-lee
Copy link
Collaborator

In the <package>_to_graph function we have this argument graph_type. Should this be a mandatory argument? Sometimes (e.g. in Ananke) the graph already carries a type which has a corresponding equivalent in pywhy_graphs, so graph_type can be easily inferred.

The graph_type parameter is just used to make sure the user explicitly spells out which graph to convert to because sometimes it is not clear. For example, pcalg encodes CPDAG and PAG as a numpy array, but there 1s and 0s mean different things, so we need to have different logic. If ananke can let us know easily which graph type to use, then I would say the graph_type parameter is not needed.

Note: not all graph types are supported in each export function. E.g. pcalg. We can change that tho, but I avoided the extra work.

Additionally the graph_to_<package> function is typed for MixedEdgeGraph input. Do we expect users to actually pass in MixedEdgeGraph, or will we expect them to pass a child class like ADMG? Or should we expect both? This also has implications for exporting to Ananke.

graph_to_<package> can usually eat any graph because we have semantic separation of all types of graphs. The ADMG subclasses a MixedEdgeGraph but with very specific structure.

Lmk how that sounds. If there's anything you want me to change for downstream ananke PR, lmk!

This all looks good. What I'll do is if a user passes in something of type MixedEdgeGraph then I'll just handle it equivalently on the Ananke side (Graph) without doing any class inferring.

@jaron-lee jaron-lee self-requested a review March 7, 2023 14:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants