Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support multi node in heterps #31102

Merged
merged 7 commits into from
Feb 24, 2021

Conversation

Thunderbrook
Copy link
Contributor

@Thunderbrook Thunderbrook commented Feb 22, 2021

PR types

New features

PR changes

Others

Describe

support multi node in heterps mode

@paddle-bot-old
Copy link

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@paddle-bot-old
Copy link

paddle-bot-old bot commented Feb 22, 2021

✅ This PR's description meets the template requirements!
Please wait for other CI results.

@Thunderbrook Thunderbrook changed the title Multi node support multi node in heterps Feb 22, 2021
@@ -386,3 +386,27 @@ def __init__(self):
def _transpile_startup_program(self):
block = self.startup_program.global_block()
block.append_op(type='c_comm_init_all', attrs={'ring_id': 0})


class MultiThread(GradAllReduce):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

需要在minimize中添加MultiThread的使用

@@ -111,6 +173,12 @@ class HeterComm {
CustomGradMerger merger_;
int topo_aware_{1};
std::vector<std::vector<Path>> path_;
std::vector<LocalStorage> storage_;
int feanum_{1800 * 2048};
int multi_node_{1};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

写成可配置的形式

@@ -54,7 +54,14 @@ void HeterPs::show_one_table(int gpu_num) { comm_->show_one_table(gpu_num); }

void HeterPs::push_sparse(int num, FeatureKey* d_keys,
FeaturePushValue* d_grads, size_t len) {
comm_->push_sparse(num, d_keys, d_grads, len, opt_);
// comm_->push_sparse(num, d_keys, d_grads, len, opt_);
comm_->push_sparse_multi_node(num, d_keys, d_grads, len, opt_);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

需要加入单机多机的判断,走push_sparse 或 push_sparse_multi_node

@Thunderbrook Thunderbrook merged commit c4f279f into PaddlePaddle:develop Feb 24, 2021
Thunderbrook added a commit to Thunderbrook/Paddle that referenced this pull request Mar 1, 2021
* push multi node

* multi node

* MultiThread

* remove log

* solve bug in 30829
fuyinno4 pushed a commit that referenced this pull request Mar 1, 2021
* solve build gpu task core (#30626)

* build gpu task core

* format

* dump to cpu (#30750)

* dump to cpu

* format

* format

* format

* support multi node in heterps (#31102)

* push multi node

* multi node

* MultiThread

* remove log

* solve bug in 30829

* optimizer
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants