Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BugFix] Fix Crash Cases Caused by "__tvm_meta__ = None" #16634

Closed
wants to merge 2 commits into from

Conversation

Johnson9009
Copy link
Contributor

Fix #16633

The broken tests are below items of "tests/scripts/task_python_unittest.sh"

"tir-transform"
"tir-usmp"
The crashed cases:
tests/python/tir-transform/test_tir_transform_inject_rolling_buffer.py
tests/python/tir-usmp/test_tir_usmp_algo.py
tests/python/tir-usmp/test_tir_usmp_analysis_extract_bufferinfo.py
tests/python/tir-usmp/test_tir_usmp_transform_convert_pool_allocations_to_offsets.py
tests/python/tir-usmp/test_tir_usmp_utils.py

other failed cases:
tests/python/tir-transform/test_tir_transform_pointer_value_type_rewrite.py::TestRewriteToShuffle::test_compare
tests/python/tir-transform/test_tir_transform_force_narrow_index_to_i32.py::test_thread_axis2
tests/python/tir-transform/test_tir_transform_hoist_if.py::test_hoisting_block_scope_4
tests/python/tir-transform/test_transform_default_gpu_schedule.py::test_add_on_metal

@Johnson9009
Copy link
Contributor Author

Run these test cases in local with this patch, below 3 test cases still will be failed because of TIR structure isn't equal, @Hzfengsy @tqchen can you help to see them? Thanks.

tir-transform:
FAILED tests/python/tir-transform/test_tir_transform_force_narrow_index_to_i32.py::test_thread_axis2
FAILED tests/python/tir-transform/test_tir_transform_hoist_if.py::test_hoisting_block_scope_4
FAILED tests/python/tir-transform/test_transform_default_gpu_schedule.py::test_add_on_metal

@Johnson9009
Copy link
Contributor Author

The reason of why the crashed test cases won't be caught by CI is the pytest exist code is 5, it means no tests is collected, in file "tests/scripts/setup-pytest-env.sh" we haven't treat the exist code of 5 as error.
image

I found another test item "tests/python/usmp" have the same exist code 5, because now there isn't the directory "tests/python/usmp", so pytest can't collect any test cases, I think "usmp" is a typo from #16110, so I delete it.

@Hzfengsy
Copy link
Member

cc @tqchen

@slyubomirsky
Copy link
Contributor

I had this issue come up in #16569. I think these tests probably were not being executed before. I am also pretty sure it's safe to just remove those lines, as __tvm_meta__ is not actually read anywhere.

In my PR, I added a check in the parser to make sure that decl_function would only be called with function definitions, which raises a readable error message and prevents a segfault.

@slyubomirsky
Copy link
Contributor

Run these test cases in local with this patch, below 3 test cases still will be failed because of TIR structure isn't equal, @Hzfengsy @tqchen can you help to see them? Thanks.

tir-transform: FAILED tests/python/tir-transform/test_tir_transform_force_narrow_index_to_i32.py::test_thread_axis2 FAILED tests/python/tir-transform/test_tir_transform_hoist_if.py::test_hoisting_block_scope_4 FAILED tests/python/tir-transform/test_transform_default_gpu_schedule.py::test_add_on_metal

I am also having this error in my PR 😂 For test_tir_transform_force_narrow_index_to_i32, I think it's a real bug (the thread binding in the loop is not being updated). Don't know what's going on with the other two.

@slyubomirsky
Copy link
Contributor

slyubomirsky commented Mar 1, 2024

I've figured out, thanks to @Lunderberg's comments in my PR, that the failure in tests/python/tir-transform/test_tir_transform_force_narrow_index_to_i32.py::test_thread_axis2 is due to line 354 in scr/script/ir_builder/tir/ir.cc. If you change

    IterVar iter_var(Range(nullptr), Var("iter", dtype), IterVarType::kThreadIndex, thread);

to

    IterVar iter_var(Range(nullptr), Var("iter", DataType::Int(32)), IterVarType::kThreadIndex, thread);

(what it was before the merge commit that Eric pointed out), the test will pass. However, I'm not sure we want to manually fix the data type like that.

This probably means that we should update the index narrowing pass to deal with the thread index itervar accordingly.

@slyubomirsky
Copy link
Contributor

I did a git bisect for test_tir_transform_hoist_if.py: eb15d04c3bff76062e26d5647fb8af0323de1bed is the first bad commit for tests/python/tir-transform/test_tir_transform_hoist_if.py::test_hoisting_block_scope_4. I will investigate to see if there's a fix hinted in there.

@slyubomirsky
Copy link
Contributor

Bisect for test_transform_default_gpu_schedule.py::test_add_on_metal: First failing commit is 1c35c392648e4336fc5e00ab91abb37af997cd59. It looks to have made changes to test cases, but I'm not sure what exactly is causing the failure. Now that there are commits isolated, it should make it easier to debug.

@slyubomirsky
Copy link
Contributor

slyubomirsky commented Mar 5, 2024

Fix for the add_on_metal bug: You need to update IsScheduledOnGPU() in default_gpu_schedule.cc to also include Metal in its definition of GPU.

bool IsScheduledOnGPU(const BaseFunc& func) {
  // the target from context.
  tvm::Target target = tvm::Target::Current();
  // the Target in kTarget attribute of PrimFunc
  Optional<tvm::Target> func_target = func->attrs.GetAttr<tvm::Target>(tvm::attr::kTarget);
  if (func_target.defined()) {
    target = func_target.value();
  }

  if (target.defined()) {
    int dev_type = target->GetTargetDeviceType();
    // Change is here. I don't know if web GPU is quite correct
    if (!(dev_type == kDLCUDA || dev_type == kDLMetal || dev_type == kDLROCM ||
          dev_type == kDLWebGPU)) {
      return false;
    }
  }
  return true;
}

Note that this issue was mentioned in the code review for the PR. I will include this fix in my PR.

@Lunderberg
Copy link
Contributor

You need to update IsScheduledOnGPU() in default_gpu_schedule.cc to also include Metal in its definition of GPU.

Should we instead check whether the target tags include "gpu"? That would avoid needing to list out each target kind individually.

@slyubomirsky
Copy link
Contributor

slyubomirsky commented Mar 12, 2024

I'm not sure all of them actually contain the string "gpu." I will double check that.

Edit: Indeed, Metal is simply called "Metal"

@slyubomirsky
Copy link
Contributor

These bugs should all be fixed now that #16569 was merged. I recommend you close unless there's a change in here that wasn't covered there.

@slyubomirsky
Copy link
Contributor

#16770 The sage continues...

@ysh329
Copy link
Contributor

ysh329 commented Mar 26, 2024

#16770 The sage continues...

Hi Steven, I'm little confused. Does these three cases been fixed?

I tried main with commit: 240326, b2204ae and update status as below. It seems still two cases failed:

  • still failed: tests/python/tir-transform/test_tir_transform_force_narrow_index_to_i32.py::test_thread_axis2
  • passed: tests/python/tir-transform/test_tir_transform_hoist_if.py::test_hoisting_block_scope_4
  • still failed: tests/python/tir-transform/test_transform_default_gpu_schedule.py::test_add_on_metal

@ysh329
Copy link
Contributor

ysh329 commented Apr 9, 2024

#16770 The sage continues...

Hi Steven, I'm little confused. Does these three cases been fixed?

I tried main with commit: 240326, b2204ae and update status as below. It seems still two cases failed:

  • still failed: tests/python/tir-transform/test_tir_transform_force_narrow_index_to_i32.py::test_thread_axis2
  • passed: tests/python/tir-transform/test_tir_transform_hoist_if.py::test_hoisting_block_scope_4
  • still failed: tests/python/tir-transform/test_transform_default_gpu_schedule.py::test_add_on_metal

Test with tvm.main.20240407.81a850693, 3 tests all passed.

@tqchen tqchen closed this Feb 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] Broken Tests Haven't Been Caught by CI
6 participants