hw07 done！ #1

yangyueren · 2022-01-27T17:12:50Z

请见ANSWER.md。

archibate

感谢第一个提交作业！

完成作业基本要求 42/50 分
能够在 ANSWER.md 中用自己的话解释 23/25 分
代码格式规范、能够跨平台 4/5 分
有自己独特的创新点 11/20 分

archibate · 2022-01-29T14:37:35Z

main.cpp

    TOCK(matrix_multiply);
 }

 // 求出 R^T A R
 static void matrix_RtAR(Matrix &RtAR, Matrix const &R, Matrix const &A) {
    TICK(matrix_RtAR);
    // 这两个是临时变量，有什么可以优化的？ 5 分
-    Matrix Rt, RtA;
+    // ans: 改为static变量，预先分配好空间。
+    static Matrix Rt(std::array<std::size_t, 2>{1024, 1024}), RtA(std::array<std::size_t, 2>{1024, 1024});


Suggested change

static Matrix Rt(std::array<std::size_t, 2>{1024, 1024}), RtA(std::array<std::size_t, 2>{1024, 1024});

static thread_local Matrix Rt, RtA;

我觉得可以一开始为空没问题。thread_local保证如果多个线程访问不会出错。

archibate · 2022-01-29T14:38:22Z

main.cpp

+        for(int i=0; i<nx; i+=32){
+            for(int t=0; t<nt; t++){
+                for(int i_block=i; i_block<i+32; i_block++){
+                    out(i,j) += lhs(i_block, t) *  rhs(t, j);


Suggested change

out(i,j) += lhs(i_block, t) * rhs(t, j);

out(i_block,j) += lhs(i_block, t) * rhs(t, j);

漏改了一个？

archibate · 2022-01-29T14:39:21Z

main.cpp

-        for (int y = 0; y < ny; y++) {
-            float val = wangsrng(x, y).next_float();
-            out(x, y) = val;
+    // ans: 矩阵的x轴是紧密排列的，但是循环的内循环是y，访问数据时会跳跃，不利于cache;


archibate · 2022-01-29T14:39:37Z

main.cpp

-            out(y, x) = in(x, y);
-        }
-    }
+    // ans: 因为out矩阵是紧密访问，但是in矩阵是跳跃访问，cache中放不下。应改为分块转置。


archibate · 2022-01-29T14:40:01Z

main.cpp

-            out(x, y) = 0;  // 有没有必要手动初始化？ 5 分
-            for (int t = 0; t < nt; t++) {
-                out(x, y) += lhs(x, t) * rhs(t, y);
+    // ans: lhs是跳跃访问，rhs是连续访问，out不动，造成无法矢量化。


9，漏改了一个。

archibate · 2022-01-29T14:40:41Z

main.cpp

    TOCK(matrix_multiply);
 }

 // 求出 R^T A R
 static void matrix_RtAR(Matrix &RtAR, Matrix const &R, Matrix const &A) {
    TICK(matrix_RtAR);
    // 这两个是临时变量，有什么可以优化的？ 5 分
-    Matrix Rt, RtA;
+    // ans: 改为static变量，预先分配好空间。


3，应该加thread_local，不需要初始化大小。

archibate · 2022-01-29T14:41:17Z

main.cpp

+// #pragma omp parallel for collapse(2)
+//     for (int y = 0; y < ny; y++) {
+//         for (int x = 0; x < nx; x++) {
+//             out(x, y) = 0;  // 有没有必要手动初始化？ 5 分


hw07

db96a1e

archibate reviewed Jan 29, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hw07 done！ #1

hw07 done！ #1

yangyueren commented Jan 27, 2022

archibate left a comment

archibate Jan 29, 2022

archibate Jan 29, 2022

archibate Jan 29, 2022

archibate Jan 29, 2022

archibate Jan 29, 2022

archibate Jan 29, 2022

archibate Jan 29, 2022

	static Matrix Rt(std::array<std::size_t, 2>{1024, 1024}), RtA(std::array<std::size_t, 2>{1024, 1024});
	static thread_local Matrix Rt, RtA;

	out(i,j) += lhs(i_block, t) * rhs(t, j);
	out(i_block,j) += lhs(i_block, t) * rhs(t, j);

hw07 done！ #1

Are you sure you want to change the base?

hw07 done！ #1

Conversation

yangyueren commented Jan 27, 2022

archibate left a comment

Choose a reason for hiding this comment

archibate Jan 29, 2022

Choose a reason for hiding this comment

archibate Jan 29, 2022

Choose a reason for hiding this comment

archibate Jan 29, 2022

Choose a reason for hiding this comment

archibate Jan 29, 2022

Choose a reason for hiding this comment

archibate Jan 29, 2022

Choose a reason for hiding this comment

archibate Jan 29, 2022

Choose a reason for hiding this comment

archibate Jan 29, 2022

Choose a reason for hiding this comment