Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] python 2D shuffling #1133

Merged
merged 10 commits into from
Sep 25, 2020

Conversation

Iroy30
Copy link
Contributor

@Iroy30 Iroy30 commented Sep 3, 2020

No description provided.

@Iroy30 Iroy30 requested a review from a team as a code owner September 3, 2020 14:41
@Iroy30 Iroy30 changed the title [WIP] python 2D pipeline [WIP][skip-ci] python 2D pipeline Sep 3, 2020
@GPUtester
Copy link
Contributor

Please update the changelog in order to start CI tests.

View the gpuCI docs here.

@BradReesWork BradReesWork added this to the 0.16 milestone Sep 9, 2020
Comment on lines +11 to +15
def get_2D_div(ngpus):
pcols = int(math.sqrt(ngpus))
while ngpus % pcols != 0:
pcols = pcols - 1
return int(ngpus/pcols), pcols
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python code to find prows and pcols

@Iroy30 Iroy30 changed the title [WIP][skip-ci] python 2D pipeline [REVIEW] python 2D shuffling Sep 17, 2020
Copy link
Collaborator

@ChuckHastings ChuckHastings left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have MG CI resources, but could we have a unit test that developers could execute locally if they have > 1 GPU?

return partitions


def shuffle(dg):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest adding prows and pcols as parameters to the shuffle function, defaulting to None.

One of the easiest tuning things we want experienced users to be able to do is to configure prows and pcols, and it seems like this would be an easy way to do it.

If prows and pcols are specified then we should use them, otherwise we should compute them. If a user specifies one then the other should be computable (ngpus / 'whichever one is specified'). If user specifies neither then we can compute as the current code does.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

def shuffle(dg):
ddf = dg.edgelist.edgelist_df
ngpus = get_n_workers()
prows, pcols = get_2D_div(ngpus)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably add a check that prows * pcols = ngpus. The implementation of get_2D_div guarantees that, but if the user specifies a value to tune things that isn't guaranteed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added


def get_2D_div(ngpus):
pcols = int(math.sqrt(ngpus))
while ngpus % pcols != 0:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too lazy today to sit down and do math, but we should be able to directly compute this without a while loop.

def _set_partitions_pre(df, num_verts, prows, pcols):
rows_per_div = math.ceil(num_verts/prows)
cols_per_div = math.ceil(num_verts/pcols)
partitions = df['src'].floordiv(rows_per_div) * pcols + df['dst'].floordiv(cols_per_div)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do this? If I am not mistaken, we apply a hash function to edge source & destination vertices to determine the partition each edge belongs to. So, each partition will have roughly same number of rows or columns but they will not be same (and can't be determined in this way). Am I missing something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@afender
Copy link
Member

afender commented Sep 24, 2020

rerun tests

@BradReesWork
Copy link
Member

rerun tests

@BradReesWork
Copy link
Member

rerun tests

return partitions


def shuffle(dg):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few documentation lines would be a good addition

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

@codecov-commenter
Copy link

codecov-commenter commented Sep 25, 2020

Codecov Report

Merging #1133 into branch-0.16 will decrease coverage by 1.25%.
The diff coverage is 30.98%.

Impacted file tree graph

@@               Coverage Diff               @@
##           branch-0.16    #1133      +/-   ##
===============================================
- Coverage        73.44%   72.19%   -1.26%     
===============================================
  Files               60       61       +1     
  Lines             2335     2406      +71     
===============================================
+ Hits              1715     1737      +22     
- Misses             620      669      +49     
Impacted Files Coverage Δ
python/cugraph/structure/shuffle.py 12.96% <12.96%> (ø)
python/cugraph/structure/graph.py 80.04% <87.50%> (+0.26%) ⬆️
python/cugraph/structure/__init__.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 85ec558...baaec00. Read the comment docs.

@BradReesWork BradReesWork merged commit b0455a6 into rapidsai:branch-0.16 Sep 25, 2020
@BradReesWork BradReesWork linked an issue Sep 28, 2020 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ENH] MNMG Shuffle
8 participants