Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hw07 done! #1

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

hw07 done! #1

wants to merge 1 commit into from

Conversation

yangyueren
Copy link

请见ANSWER.md。

Copy link
Contributor

@archibate archibate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

感谢第一个提交作业!

  • 完成作业基本要求 42/50 分
  • 能够在 ANSWER.md 中用自己的话解释 23/25 分
  • 代码格式规范、能够跨平台 4/5 分
  • 有自己独特的创新点 11/20 分

TOCK(matrix_multiply);
}

// 求出 R^T A R
static void matrix_RtAR(Matrix &RtAR, Matrix const &R, Matrix const &A) {
TICK(matrix_RtAR);
// 这两个是临时变量,有什么可以优化的? 5 分
Matrix Rt, RtA;
// ans: 改为static变量,预先分配好空间。
static Matrix Rt(std::array<std::size_t, 2>{1024, 1024}), RtA(std::array<std::size_t, 2>{1024, 1024});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
static Matrix Rt(std::array<std::size_t, 2>{1024, 1024}), RtA(std::array<std::size_t, 2>{1024, 1024});
static thread_local Matrix Rt, RtA;

我觉得可以一开始为空没问题。thread_local保证如果多个线程访问不会出错。

for(int i=0; i<nx; i+=32){
for(int t=0; t<nt; t++){
for(int i_block=i; i_block<i+32; i_block++){
out(i,j) += lhs(i_block, t) * rhs(t, j);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
out(i,j) += lhs(i_block, t) * rhs(t, j);
out(i_block,j) += lhs(i_block, t) * rhs(t, j);

漏改了一个?

for (int y = 0; y < ny; y++) {
float val = wangsrng(x, y).next_float();
out(x, y) = val;
// ans: 矩阵的x轴是紧密排列的,但是循环的内循环是y,访问数据时会跳跃,不利于cache;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10

out(y, x) = in(x, y);
}
}
// ans: 因为out矩阵是紧密访问,但是in矩阵是跳跃访问,cache中放不下。应改为分块转置。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

15

out(x, y) = 0; // 有没有必要手动初始化? 5 分
for (int t = 0; t < nt; t++) {
out(x, y) += lhs(x, t) * rhs(t, y);
// ans: lhs是跳跃访问,rhs是连续访问,out不动,造成无法矢量化。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

9,漏改了一个。

TOCK(matrix_multiply);
}

// 求出 R^T A R
static void matrix_RtAR(Matrix &RtAR, Matrix const &R, Matrix const &A) {
TICK(matrix_RtAR);
// 这两个是临时变量,有什么可以优化的? 5 分
Matrix Rt, RtA;
// ans: 改为static变量,预先分配好空间。
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3,应该加thread_local,不需要初始化大小。

// #pragma omp parallel for collapse(2)
// for (int y = 0; y < ny; y++) {
// for (int x = 0; x < nx; x++) {
// out(x, y) = 0; // 有没有必要手动初始化? 5 分
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants