Potential Early Computation Issue in compute_q_matmul_k Function #1

qhy991 · 2023-11-29T08:53:17Z

Thank you for your excellent job on HLS.
I've noticed a potential issue in the compute_q_matmul_k function in attention.cpp file. It appears that during the initial stages of computation, many elements within q_blocks are involved in calculations before they have been fully read in. This could potentially lead to inaccuracies in the computed results. Could you please explain the rationale behind this approach?

jimmy-adams · 2024-03-25T03:23:58Z

Thank you for your excellent job on HLS. I've noticed a potential issue in the compute_q_matmul_k function in attention.cpp file. It appears that during the initial stages of computation, many elements within q_blocks are involved in calculations before they have been fully read in. This could potentially lead to inaccuracies in the computed results. Could you please explain the rationale behind this approach?

Hello,
In which version of Vitis and Vivado did you implement this design?

ArkaneMoose · 2024-03-25T20:39:50Z

Hi @jimmy-adams, we used Vitis HLS 2021.1 to synthesize the HLS design and used the resulting IP in Vivado 2022.1 for implementation. Let me know if you have any further questions.

ArkaneMoose · 2024-03-25T21:00:17Z

Thank you for your excellent job on HLS. I've noticed a potential issue in the compute_q_matmul_k function in attention.cpp file. It appears that during the initial stages of computation, many elements within q_blocks are involved in calculations before they have been fully read in. This could potentially lead to inaccuracies in the computed results. Could you please explain the rationale behind this approach?

Hi @qhy991, thanks for your question and apologies for the late reply. Essentially, the algorithm in attention.cpp implements the algorithm described in Section IV.A and depicted in Figure 5 of the paper. Each block streamed out in compute_q_matmul_k corresponds to the outputs of a row in Figure 5, which indeed may contain uninitialized or garbage data for the first and last iterations (represented by the blank spaces in the figure). However, since we know the exact indices at which these garbage data exist, they can and are ignored by the downstream tasks (e.g., here, where we are explicitly doing this check, and here, where we are implicitly discarding the garbage data from the start when v_patch < attn_patch_offset).

I hope this helps. Let me know if you have any further questions.

jimmy-adams · 2024-03-27T10:35:26Z

Hi @jimmy-adams, we used Vitis HLS 2021.1 to synthesize the HLS design and used the resulting IP in Vivado 2022.1 for implementation. Let me know if you have any further questions.
Hello !
Thank you for your kind reply. I follow your given Vitis configuration to try run tcl script. But the processing runs into the error of gmp.h missing. Is that related to my OS configuration?

INFO: [HLS 200-10] Analyzing design file 'src/ViT_compute.cpp' ...
ERROR: [HLS 207-812] 'gmp.h' file not found: src/../include/kernel.hpp:5:10
INFO: [HLS 200-111] Finished Command csynth_design CPU user time: 0.35 seconds. CPU system time: 0.09 seconds. Elapsed time: 0.21 seconds; current allocated memory: 141.787 MB.
command 'ap_source' returned error code
while executing
"source vitis_hls.tcl"
("uplevel" body line 1)
invoked from within
"uplevel #0 [list source $arg] "

ArkaneMoose · 2024-03-27T20:19:44Z

@jimmy-adams Please try commenting out or removing the following lines from include/kernel.hpp:

Edge-MoE/include/kernel.hpp

Lines 4 to 6 in 9d6dd16

    
           // https://support.xilinx.com/s/question/0D52E00006iHkfp/vivado-20153-hls-bug-gmph?language=en_US 
        
           #include <gmp.h> 
        
           #define __gmp_const const

These lines were a workaround for an issue described in this support.xilinx.com thread. They were necessary on our system but may not be necessary for yours. Removing them should not cause any issues.

Let me know if you have any further questions.

qqxx0011 · 2024-11-07T08:15:23Z

I learned from the paper that you use pynq to control the IP core. How do you implement it? Is there any relevant code?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential Early Computation Issue in compute_q_matmul_k Function #1

Potential Early Computation Issue in compute_q_matmul_k Function #1

qhy991 commented Nov 29, 2023

jimmy-adams commented Mar 25, 2024

ArkaneMoose commented Mar 25, 2024

ArkaneMoose commented Mar 25, 2024

jimmy-adams commented Mar 27, 2024

ArkaneMoose commented Mar 27, 2024

qqxx0011 commented Nov 7, 2024

Potential Early Computation Issue in compute_q_matmul_k Function #1

Potential Early Computation Issue in compute_q_matmul_k Function #1

Comments

qhy991 commented Nov 29, 2023

jimmy-adams commented Mar 25, 2024

ArkaneMoose commented Mar 25, 2024

ArkaneMoose commented Mar 25, 2024

jimmy-adams commented Mar 27, 2024

ArkaneMoose commented Mar 27, 2024

qqxx0011 commented Nov 7, 2024