Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding triadic analysis functions #3742

Merged
merged 17 commits into from
Dec 31, 2019
Merged

Adding triadic analysis functions #3742

merged 17 commits into from
Dec 31, 2019

Conversation

bakerwho
Copy link
Contributor

@bakerwho bakerwho commented Dec 7, 2019

Re: #3727
Methods added:
in networkx/algorithms/triads.py:

  • all_triplets()
  • all_triads()
  • triad_type()
  • triads_by_type()

Tests added in networkx/algorithms/tests/test_triads.py for all the above

Notes

  • triad_type() implements a (perhaps inefficient) algorithm to identify the triadic type of a triad (an order-3 DiGraph) (good resource on this here).
  • all_triplets and all_triads are generator functions. As the number of possible triads in a graph grows like O(n^3), I was wondering if I should print a warning for larger than some number of nodes (say 100) in the DiGraph.

TODO
Empty functions that I will soon write are:

  • random_triad()
  • triadic_closures()
  • focal_closures()
  • balanced_triads()

@ericmjl @dschult @jarrodmillman @MridulS happy to hear your feedback and suggestions.

@pep8speaks
Copy link

pep8speaks commented Dec 7, 2019

Hello @bakerwho! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-12-30 21:39:08 UTC

@bakerwho
Copy link
Contributor Author

bakerwho commented Dec 7, 2019

Not sure what went wrong with this AppVeyor build (relevant console output below).

utils/tests/test_misc.py::test_make_str_with_bytes
22034utils/tests/test_misc.py::test_make_str_with_unicode
22035  c:\python36-x64\lib\site-packages\networkx\utils\misc.py:109: DeprecationWarning: make_str is deprecated and will be removed in 2.6. Use str instead.
22036    warnings.warn(msg, DeprecationWarning)
22037
22038-- Docs: https://docs.pytest.org/en/latest/warnings.html

Also unable to diagnose the Travis CI build failure. Can anyone help?

@@ -5,6 +5,8 @@
"""Functions for analyzing triads of a graph."""

from networkx.utils import not_implemented_for
from itertools import combinations, permutations
from collections import defaultdict

__all__ = ['triadic_census']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Travis fails because the __all__ var isn't updated with the new functions in the file.
should look something like this

__all__ = ['triadic_census', 'all_triplets', 'all_triads', 'triads_by_type', 'triad_type']

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, will fix.

Copy link
Contributor

@ericmjl ericmjl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bakerwho I found a sliver of time to look through the docs, would you be kind enough to let me know what you think about the comment?

@@ -54,7 +56,7 @@ def triadic_census(G):
Returns
-------
census : dict
Dictionary with triad names as keys and number of occurrences as values.
Dictionary with triad type as keys and number of occurrences as values.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bakerwho I'm wondering if there might be a good opportunity for educating end-users using the docs by adding a very brief summary of what the 16 possible types are? To be clear, I don't mean a complete elucidation of them, but more along the lines of the semantic meaning of each of those triad types? Providing this in the docs might be helpful for newcomers to network science, and make NetworkX an even better resource for learning.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ericmjl this is a great idea. As linked in the original PR, Tom Snijders's slides (here) on this are a great resource.

Here's a TLDR version:
There can be 6 unique edges in a triad (order-3 DiGraph) (so 2^^6=64 unique triads given 3 nodes). These 64 triads each display exactly 1 of 16 topologies of triads (topologies can be permuted). These topologies are identified by the following notation:
{m}{a}{n}{type} (for example: 111D, 210, 102), where

  • {m} = number of mutual ties (takes 0, 1, 2, 3); a mutual tie is (0,1) AND (1,0)
  • {a} = number of assymmetric ties (takes 0, 1, 2, 3); an assymmetric tie is (0,1) BUT NOT (1,0) or vice versa
  • {n} = number of null ties (takes 0, 1, 2, 3); a null tie is NEITHER (0,1) NOR (1,0)
  • {type} = a letter (takes U, D, C, T) corresponding to up, down, cyclical and transitive. This is only used for topologies that can have more than one form (eg: 021D and 021U). Usage is better understood from the following diagram (taken from Snijders' slides):

Screen Shot 2019-12-07 at 6 01 27 PM

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonderful! I think the tl;dr you provided is a very nice and concise description. What do you think about adding it in, then? I propose we go for it! 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll add it in the docstring!

@ericmjl
Copy link
Contributor

ericmjl commented Dec 9, 2019

--@dschult sorry to bother you about this, but I'm finding it hard to debug why the CI builds have been failing. Do you have any insight here?--

@bakerwho here's a paste of the currently failing test that I found digging through the CI logs. I realized how hard it is to scroll to it on the CI system, thought this might make it a bit easier on you

_____________________________ test_triads_by_type ______________________________
1358
1359    def test_triads_by_type():
1360        """Tests the all_triplets function."""
1361        G = nx.DiGraph()
1362        G.add_edges_from(['01', '02', '03', '04', '05', '12', '16', '51', '56',
1363                          '65'])
1364        all_triads = nx.all_triads(G)
1365        expected = {}
1366        for triad in all_triads:
1367            name = nx.triad_type(triad)
1368>           expected[name].append(triad)
1369E           KeyError: '030T'
1370
1371algorithms/tests/test_triads.py:96: KeyError
1372

@bakerwho
Copy link
Contributor Author

bakerwho commented Dec 9, 2019

@ericmjl thanks for that! I clearly forgot to initialize expected as a defaultdict instead of a normal dict! Will fix and try again.

Copy link
Contributor

@ericmjl ericmjl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't see anything more that ought to be done, this looks like a good PR to start with!

@dschult, is there anything that should be done w.r.t. the code coverage?

expected_Gs = expected[tri_type]
for a in actual_Gs:
assert any(nx.is_isomorphic(a, e) for e in expected_Gs)
pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bakerwho can you remove this dangling instance of pass

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@dschult
Copy link
Member

dschult commented Dec 9, 2019

Clicking on the codecov /patch details and then on the "Diff" tab, shows the code with each line colored by whether that line was run during a test. Most lines are green (tested), some are red (never ran during a test) and some are yellow (conditional statements that ran with one outcome but not all possible outcomes).

It looks like it wants you to add tests for bad inputs. For example, test what happens when the input graph is not a valid triad in triad_type(). You can use the pytest function raises() to test that a function call raises an exception.

I think you should also comment out the functions that aren't implemented yet (and remove them from __all__). That will improve the code coverage, but more importantly won't lead to confusion for people using the latest version. You can uncomment them when you implement them.

I would suggest you fix any errors and get docs as you like, then we merge this PR before you start working on the other functions. Ask again if I didn't explain clearly.

@dschult
Copy link
Member

dschult commented Dec 9, 2019

One other comment -- you have long strings of if statements but they don't cover all cases. I mean, they might cover all the possible cases, but the code doesn't use "else"... it finishes with elif. What should the code do if it falls through all the elif clauses? A brief look at the code without thinking hard makes it look like it could return None.

@bakerwho
Copy link
Contributor Author

bakerwho commented Dec 9, 2019

@dschult thanks for the comments. Addressing them in order:

  • "test what happens when the input graph is not a valid triad in triad_type()"
    Any order-3 DiGraph will necessarily be one of the 16 triad types. The only way this fails is if the DiGraph does not have 3 nodes (ie it is not a triad). Currently, I'm using an assert statement in triad_type() to ensure that this is the case. How do you suggest I do it instead? If I'm raising an exception, which one should I raise (I'm guessing ValueError?).

  • "comment out the functions that aren't implemented yet (and remove them from all)"
    This makes perfect sense, doing it right away.

  • "you have long strings of if statements but they don't cover all cases."
    As far as I have checked, they do cover all cases. Firstly, the number of edges in a triad can take {0, 1, 2.., 5, 6} and no other number. So the elif should be appropriate outside it.
    Secondly, I don't have an more efficient algorithm for detecting triad types than matching topologies. So, for 3 or more edges, I have to check all permutations of the edgeset, eg. (e1, e2, e3), (e1, e3, e2) and so on. If for any of these permutations, any of the conditions holds - that would slot the triad immediately into that triad type.

Consider the code below:

    elif num_edges == 3:
        for (e1, e2, e3) in permutations(G.edges(), 3):
            if set(e1) == set(e2):
                if e3[0] in e1:
                    return "111U"
                if e3[1] in e1:
                    return "111D"
            elif set(e1).symmetric_difference(set(e2)) == set(e3):
                if {e1[0], e2[0], e3[0]} == {e1[0], e2[0],
                                             e3[0]} == set(G.nodes()):
                    return "030C"
                if e3 == (e1[0], e2[1]) and e2 == (e1[1], e3[1]):
                    return "030T"

This is pseudocode for:

for 3 edges
   for any permutation of edges
      if first and second are symmetric
          if third has a source in first: return '111U'
          if third has a sink in first: return '111D'
      else if third edge is made of nodes that are the symmetric difference of the first and second: 
          if cyclical: return "030C"
          if transitive: return "030T"

As you can see, each of the nested ifs has to be evaluated for each permutation. Evaluating to True implies necessary and sufficient conditions for exit.

I am fairly confident that my implementation and logic are correct. However, getting None as a return value would be a sure sign of an error somewhere.

@ericmjl
Copy link
Contributor

ericmjl commented Dec 10, 2019

@bakerwho I think I have an idea for this one:

Any order-3 DiGraph will necessarily be one of the 16 triad types. The only way this fails is if the DiGraph does not have 3 nodes (ie it is not a triad). Currently, I'm using an assert statement in triad_type() to ensure that this is the case. How do you suggest I do it instead? If I'm raising an exception, which one should I raise (I'm guessing ValueError?).

Yes, instead of an assert statement in triad_type(), I would use an if/raise check, briefly:

if not some_condition:
    raise SomeError()

It's better than asserts, because asserts can be disabled in Python at the command line (though I haven't run into any real-world scenario yet). In any case, it's idiomatic to use the if/raise pattern.

Perhaps raising one of the errors defined here might be good? One proposal I have here is NetworkXAlgorithmError, what do you think?

@dschult I just looked through the details of the triad_type() function once more, and I think it's correct as well. At least, in test_triads.py, all 16 possible cases. It might not be a fruitful avenue to test further, except maybe for checking that an appropriate error is raised if an invalid graph is entered? I can think of at least the following:

  • Graph that doesn't have 3 nodes.
  • Graph with self-loops.

We're getting there 🎉!

@dschult
Copy link
Member

dschult commented Dec 10, 2019

Great! I figured it was something like that.
To make it clear to future coders looking at your stuff.
Can you remove the last if statement in each of these cases?
Maybe leave a comment as to which case it is.

for (e1, e2, e3) in permutations(G.edges(), 3):
            if set(e1) == set(e2):
                if e3[0] in e1:
                    return "111U"
                if e3[1] in e1:
                    return "111D"

becomes

for (e1, e2, e3) in permutations(G.edges(), 3):
            if set(e1) == set(e2):
                if e3[0] in e1:
                    return "111U"
                # e3[1] in e1:
                return "111D"

@dschult dschult added this to the networkx-2.5 milestone Dec 10, 2019
@bakerwho
Copy link
Contributor Author

@ericmjl @dschult Happy holidays! Got some time to work on this, and the new commit implements:

  1. is_triad() to check if an input G is a triad
  2. random_triad() to generate a random triad from a DiGraph G
  3. Error handling (NetworkXAlgorithmError raised when a triadic algo is called on a non-triad)
  4. Tests for functions in 1 and 2
  5. Style fixes (such as commenting out the last if-else parts of triad_type)

An input to is_triad() needs to satisfy 3 conditions:

  • is an instance of nx.Graph
  • is a directed graph containing 3 nodes
  • contains no self edges (technically, it could contain these, but those self-edges would have to be ignored)

@ericmjl
Copy link
Contributor

ericmjl commented Dec 27, 2019

@dschult - just marked approval on the PR. I think it's comprehensively tested enough, and the docs are definitely well-written and clear.

@bakerwho if we get further questions or issues raised about triadic analysis, can we tag you in them?

@bakerwho
Copy link
Contributor Author

@ericmjl of course! Happy to help! And thanks for all the help and feedback!

@dschult
Copy link
Member

dschult commented Dec 27, 2019

The code as written doesn't run. Can you change the imports at the top to something like:

"""Functions for analyzing triads of a graph."""

from itertools import combinations, permutations
from collections import defaultdict
from random import sample

import networkx as nx
from networkx.utils import not_implemented_for

__all__ = ['triadic_census', 'is_triad', 'all_triplets', 'all_triads',
           'triads_by_type', 'triad_type', 'random_triad']

Then using pytest tests/test_triads.py shows a number of errors many due to checking is_triad on the input graph which doesn't have to be a triad! Simple fixes I believe... :)
Thanks!

@dschult
Copy link
Member

dschult commented Dec 31, 2019

Thanks for this and Thanks @ericmjl for working with @bakerwho

@dschult dschult merged commit 3bb4048 into networkx:master Dec 31, 2019
kazimierz-256 pushed a commit to kazimierz-256/networkx that referenced this pull request Mar 20, 2020
* empty functions for triadic analysis

* add triad functionality

* fixes to triads functions

* pep8 fixes

* add functions to __all__

* rename to fix mismatch

* delete unused variable

* delete unused variable

* minor fixes

* fix Travis failure and add explanation of triads with citation

* fix with defaultdict

* added is_triads, error handling for non-triads, and tests

* remove unnecessary is_triad() checks and rearrange imports
kiryph pushed a commit to kiryph/networkx that referenced this pull request Jul 21, 2020
* empty functions for triadic analysis

* add triad functionality

* fixes to triads functions

* pep8 fixes

* add functions to __all__

* rename to fix mismatch

* delete unused variable

* delete unused variable

* minor fixes

* fix Travis failure and add explanation of triads with citation

* fix with defaultdict

* added is_triads, error handling for non-triads, and tests

* remove unnecessary is_triad() checks and rearrange imports
MridulS pushed a commit to MridulS/networkx that referenced this pull request Feb 4, 2023
* empty functions for triadic analysis

* add triad functionality

* fixes to triads functions

* pep8 fixes

* add functions to __all__

* rename to fix mismatch

* delete unused variable

* delete unused variable

* minor fixes

* fix Travis failure and add explanation of triads with citation

* fix with defaultdict

* added is_triads, error handling for non-triads, and tests

* remove unnecessary is_triad() checks and rearrange imports
jnke2016 pushed a commit to jnke2016/networkx that referenced this pull request Jul 31, 2023
This PR builds on top of rapidsai/cugraph#3734

Merge after rapidsai/cugraph#3734

Todo tests for:  
- [x] _get_renumber_map
- [x]  _get_tensor_d_from_sampled_df
- [x] create_homogeneous_sampled_graphs_from_dataframe
- [x] pytests pass
- [x] dataset_from_disk_cudf.ipynb (`obgn_products_sampling/`)
- [x] Verify training benchmark

Authors:
  - Vibhu Jawa (https://github.com/VibhuJawa)
  - Seunghwa Kang (https://github.com/seunghwak)
  - Chuck Hastings (https://github.com/ChuckHastings)
  - Alex Barghi (https://github.com/alexbarghi-nv)
  - Rick Ratzel (https://github.com/rlratzel)

Approvers:
  - Alex Barghi (https://github.com/alexbarghi-nv)

URL: rapidsai/cugraph#3742
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

5 participants