-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change will make the CPU user more convenience to avoid tuning how much thread is used for data loading.
n_parsed_ = 0; | ||
overflow = false; | ||
rnd_.seed(kRandMagic + record_param_.seed); | ||
int maxthread, threadget; | ||
#pragma omp parallel | ||
{ | ||
// be conservative, set number of real cores | ||
maxthread = std::max(omp_get_num_procs() / 2 - 1, 1); | ||
maxthread = std::max(omp_get_num_procs(), 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this get the # of logic cores?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I think we should let user to make the decision, just as normal operator. Before this change, data loader will only use n/2 cores for export OMP_NUM_THREADS=n
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since now the iterator is pushed to engine, will this omp threading make troubles?
@wkcn could you help take a review? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for your contribution!
Could you please provide a performance comparision?
@szha to confirm the API enhancement. |
@mxnet-label-bot add [Data-loading] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Wait a moment to see if there are other comments; otherwise, I will merge this PR in 24 hours.
@xinyu-intel please help rebase the code and paste the performance data as the request from @wkcn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the wrapper looks neet.
* cpu optimized data loader * Fix CI * Fix CI * Fix ci * Fix doc
* cpu optimized data loader * Fix CI * Fix CI * Fix ci * Fix doc
Description
This PR brings below changes to ImageRecordIter:
dtype
, making ImageRecordIter(dtype='uint8') equivalent to ImageRecordUInt8Iter.ctx
, which indicates the device context used for. Whenctx='cpu'
is specified, a CPU backend optimized data loader will be used.@pengzhao-intel @TaoLv @xinyu-intel @anirudh2290
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes
Comments