-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hw07 done! #1
base: main
Are you sure you want to change the base?
hw07 done! #1
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
感谢第一个提交作业!
- 完成作业基本要求 42/50 分
- 能够在 ANSWER.md 中用自己的话解释 23/25 分
- 代码格式规范、能够跨平台 4/5 分
- 有自己独特的创新点 11/20 分
TOCK(matrix_multiply); | ||
} | ||
|
||
// 求出 R^T A R | ||
static void matrix_RtAR(Matrix &RtAR, Matrix const &R, Matrix const &A) { | ||
TICK(matrix_RtAR); | ||
// 这两个是临时变量,有什么可以优化的? 5 分 | ||
Matrix Rt, RtA; | ||
// ans: 改为static变量,预先分配好空间。 | ||
static Matrix Rt(std::array<std::size_t, 2>{1024, 1024}), RtA(std::array<std::size_t, 2>{1024, 1024}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
static Matrix Rt(std::array<std::size_t, 2>{1024, 1024}), RtA(std::array<std::size_t, 2>{1024, 1024}); | |
static thread_local Matrix Rt, RtA; |
我觉得可以一开始为空没问题。thread_local保证如果多个线程访问不会出错。
for(int i=0; i<nx; i+=32){ | ||
for(int t=0; t<nt; t++){ | ||
for(int i_block=i; i_block<i+32; i_block++){ | ||
out(i,j) += lhs(i_block, t) * rhs(t, j); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
out(i,j) += lhs(i_block, t) * rhs(t, j); | |
out(i_block,j) += lhs(i_block, t) * rhs(t, j); |
漏改了一个?
for (int y = 0; y < ny; y++) { | ||
float val = wangsrng(x, y).next_float(); | ||
out(x, y) = val; | ||
// ans: 矩阵的x轴是紧密排列的,但是循环的内循环是y,访问数据时会跳跃,不利于cache; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
10
out(y, x) = in(x, y); | ||
} | ||
} | ||
// ans: 因为out矩阵是紧密访问,但是in矩阵是跳跃访问,cache中放不下。应改为分块转置。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
15
out(x, y) = 0; // 有没有必要手动初始化? 5 分 | ||
for (int t = 0; t < nt; t++) { | ||
out(x, y) += lhs(x, t) * rhs(t, y); | ||
// ans: lhs是跳跃访问,rhs是连续访问,out不动,造成无法矢量化。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
9,漏改了一个。
TOCK(matrix_multiply); | ||
} | ||
|
||
// 求出 R^T A R | ||
static void matrix_RtAR(Matrix &RtAR, Matrix const &R, Matrix const &A) { | ||
TICK(matrix_RtAR); | ||
// 这两个是临时变量,有什么可以优化的? 5 分 | ||
Matrix Rt, RtA; | ||
// ans: 改为static变量,预先分配好空间。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3,应该加thread_local,不需要初始化大小。
// #pragma omp parallel for collapse(2) | ||
// for (int y = 0; y < ny; y++) { | ||
// for (int x = 0; x < nx; x++) { | ||
// out(x, y) = 0; // 有没有必要手动初始化? 5 分 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
5
请见ANSWER.md。