Skip to content

Commit

Permalink
Merge pull request #214 from ngc92/trimul
Browse files Browse the repository at this point in the history
added triangular matrix multiplication kernel
  • Loading branch information
karpathy authored Apr 22, 2024
2 parents 6984e83 + 732a8b4 commit 7830cf6
Show file tree
Hide file tree
Showing 2 changed files with 579 additions and 2 deletions.
4 changes: 2 additions & 2 deletions dev/cuda/common.h
Original file line number Diff line number Diff line change
Expand Up @@ -86,8 +86,8 @@ void validate_result(T* device_result, const T* cpu_reference, const char* name,
if (i < 5) {
printf("%f %f\n", cpu_reference[i], out_gpu[i]);
}
// ensure correctness for all elements
if (fabs(cpu_reference[i] - out_gpu[i]) > tolerance) {
// ensure correctness for all elements. We can set an "ignore" mask by writing NaN
if (fabs(cpu_reference[i] - out_gpu[i]) > tolerance && !isnan(cpu_reference[i])) {
printf("Mismatch of %s at %d: CPU_ref: %f vs GPU: %f\n", name, i, cpu_reference[i], out_gpu[i]);
nfaults ++;
if (nfaults >= 10) {
Expand Down
Loading

0 comments on commit 7830cf6

Please sign in to comment.