Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] MetaScheduler Literal value exceeds maximum of int32 #15987

Open
malixian opened this issue Oct 26, 2023 · 4 comments
Open

[Bug] MetaScheduler Literal value exceeds maximum of int32 #15987

malixian opened this issue Oct 26, 2023 · 4 comments
Labels
needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it type: bug

Comments

@malixian
Copy link

Expected behavior

I try to use MetaScheduler to tuning matmul, and the dimensions of the matrix are m=8192, n=14336, k=8192.
When n=8192, everything is ok, but once m or n is equal to 14336, an error RuntimeError: parallel_for_dynamic error with [02:23:57] /home/malixian/repos/tensorir/tvm/src/ir/expr.cc:88: InternalError: Check failed: value < 1LL << (dtype.bits() - 1) (8589934591 vs. 2147483648) : ValueError: Literal value 8589934591 exceeds maximum of int32 will occur. BTW, it is ok when k equals 14336.
According to the error message, I tried to comment out the ICHECK code of the function IntImm in expr.cc and it worked normally, again.
I think the DataType of Tir should be expanded to suit this case.

Actual behavior

error RuntimeError: parallel_for_dynamic error with [02:23:57] /home/malixian/repos/tensorir/tvm/src/ir/expr.cc:88: InternalError: Check failed: value < 1LL << (dtype.bits() - 1) (8589934591 vs. 2147483648) : ValueError: Literal value 8589934591 exceeds maximum of int32

Environment

TVM version is '0.15.dev0'

Steps to reproduce

def matmul_fp16(M: int, N: int, K: int, in_dtype: str, out_dtype: str):
    x = te.placeholder((M, K), name="X", dtype=in_dtype)
    y = te.placeholder((K, N), name="Y", dtype=in_dtype)
    k = te.reduce_axis((0, K), name="k")
    c = te.compute(  # pylint: disable=invalid-name
        (M, N),
        lambda i, j: te.sum(x[i][k].astype(out_dtype) * y[k][j].astype(out_dtype), axis=[k]),
        name="C",
    )
    return (x, y, c)


  def tune(in_dtype, out_dtype):
      target = Target("nvidia/nvidia-a100")
      M, N, K = 8192, 14336, 8192
      func = te.create_prim_func(
          matmul_fp16(M=M, N=N, K=K, in_dtype=in_dtype, out_dtype=out_dtype)
      ).with_attr({"global_symbol": "main"})

      space = ms.space_generator.PostOrderApply(
          sch_rules="cuda-tensorcore",
          postprocs="cuda-tensorcore",
          mutator_probs="cuda-tensorcore",
      )

      mod = tvm.IRModule({"main": func})
      with tempfile.TemporaryDirectory() as work_dir:
          db = ms.tir_integration.tune_tir(
              mod=mod,
              target=target,
              work_dir=work_dir,
              max_trials_global=32,
              builder=LocalBuilder(
                  f_build="meta_schedule.builder.async_build", initializer=initializer
              ),
              space=space,
          )
          sch = db.query_schedule(mod, target=target, workload_name="main")
          with tvm.transform.PassContext(config={"tir.use_async_copy": 1}):
              rt_mod = tvm.build(sch.mod, target=target)
@malixian malixian added needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it type: bug labels Oct 26, 2023
@malixian
Copy link
Author

Hi @wrongtest-intellif, I saw that you submitted a related PR before. Can you give me some suggestions for repairing it?

@MasterJianxing
Copy link

I met the same problem. Have you finally solved this problem?

@malixian
Copy link
Author

malixian commented Feb 5, 2024

I tried to comment out the ICHECK code of the function IntImm in expr.cc and it worked normally

I tried to comment out the ICHECK code of the function IntImm in expr.cc and it worked normally.

@MasterJianxing
Copy link

I tried to comment out the ICHECK code of the function IntImm in expr.cc and it worked normally

I tried to comment out the ICHECK code of the function IntImm in expr.cc and it worked normally.

Thanks, but it seems that it's not a safe way to solve, hah

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-triage PRs or issues that need to be investigated by maintainers to find the right assignees to address it type: bug
Projects
None yet
Development

No branches or pull requests

2 participants