Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] "TVM Internal Error" CodeFuse-Deepseek Compilation #1911

Closed
BuildBackBuehler opened this issue Mar 9, 2024 · 8 comments
Closed

[Bug] "TVM Internal Error" CodeFuse-Deepseek Compilation #1911

BuildBackBuehler opened this issue Mar 9, 2024 · 8 comments
Labels
bug Confirmed bugs

Comments

@BuildBackBuehler
Copy link

BuildBackBuehler commented Mar 9, 2024

🐛 Bug

I have a feeling this error is due to disregarding the potential differences between this model and CodeLlama's (I'm too much of a novice to know what about the nuts & bolts need changing to mirror CodeLlama's terms in order for a seamless conv-template conversion of the tokenizer config).

Immediate Error:

tvm.error.InternalError: Traceback (most recent call last): File "/Users/catalyst/Workspace/mlc-ai-package-self-runner/_work/package/package/tvm/src/ir/expr.cc", line 88 InternalError: Check failed: value < 1LL << (dtype.bits() - 1) (8589934591 vs. 2147483648) : ValueError: Literal value 8589934591 exceeds maximum of int32

I have the full one below.

To Reproduce

https://huggingface.co/codefuse-ai/CodeFuse-DeepSeek-33B/tree/main (this is DeepSeekCoder's base model, but IIRC, its beefed up like Instructor, and thus that's how I intended to use it as)

Steps to reproduce the behavior:

  1. Convert Model weights
  2. Generate the pre-compilation config.
  3. Compile!

So, for context, here is the last step prior to the error

Details

`~/local/gitrepos/mlc-llm/dist/models main +9 !4 ?31 > mlc_chat gen_config ./CFDS --quantization q4f16_1 --conv-template codellama_instruct -o ./Codefuse-Deepseek

[2024-03-08 20:17:46] INFO auto_config.py:115: Found model configuration: CFDS/config.json
[2024-03-08 20:17:46] INFO auto_config.py:153: Found model type: llama. Use --model-type to override.
[2024-03-08 20:17:46] INFO llama_model.py:52: context_window_size not found in config.json. Falling back to max_position_embeddings (16384)
[2024-03-08 20:17:46] INFO llama_model.py:72: prefill_chunk_size defaults to context_window_size (16384)
[2024-03-08 20:17:46] INFO config.py:106: Overriding max_batch_size from 1 to 80
[2024-03-08 20:17:46] INFO gen_config.py:121: [generation_config.json] Setting bos_token_id: 32013
[2024-03-08 20:17:46] INFO gen_config.py:121: [generation_config.json] Setting eos_token_id: 32014
[2024-03-08 20:17:46] INFO gen_config.py:135: Not found tokenizer config: CFDS/tokenizer.model
[2024-03-08 20:17:46] INFO gen_config.py:133: Found tokenizer config: CFDS/tokenizer.json. Copying to Codefuse-Deepseek/tokenizer.json
[2024-03-08 20:17:46] INFO gen_config.py:135: Not found tokenizer config: CFDS/vocab.json
[2024-03-08 20:17:46] INFO gen_config.py:135: Not found tokenizer config: CFDS/merges.txt
[2024-03-08 20:17:46] INFO gen_config.py:135: Not found tokenizer config: CFDS/added_tokens.json
[2024-03-08 20:17:46] INFO gen_config.py:133: Found tokenizer config: CFDS/tokenizer_config.json. Copying to Codefuse-Deepseek/tokenizer_config.json
[2024-03-08 20:17:46] INFO gen_config.py:74: [System default] Setting pad_token_id: 0
[2024-03-08 20:17:46] INFO gen_config.py:74: [System default] Setting temperature: 0.7
[2024-03-08 20:17:46] INFO gen_config.py:74: [System default] Setting presence_penalty: 0.0
[2024-03-08 20:17:46] INFO gen_config.py:74: [System default] Setting frequency_penalty: 0.0
[2024-03-08 20:17:46] INFO gen_config.py:74: [System default] Setting repetition_penalty: 1.0
[2024-03-08 20:17:46] INFO gen_config.py:74: [System default] Setting top_p: 0.95
[2024-03-08 20:17:46] INFO gen_config.py:74: [System default] Setting mean_gen_len: 128
[2024-03-08 20:17:46] INFO gen_config.py:74: [System default] Setting max_gen_len: 512
[2024-03-08 20:17:46] INFO gen_config.py:74: [System default] Setting shift_fill_factor: 0.3
[2024-03-08 20:17:46] INFO gen_config.py:186: Dumping configuration file to: Codefuse-Deepseek/mlc-chat-config.json`

Expected behavior

Environment

MacOS, Macbook Pro, M1 Max (Silicon/Metal/MPS)

  • How you installed MLC-LLM (conda, source): Conda
  • How you installed TVM-Unity (pip, source): pip
  • Useragent: conda/23.11.0 requests/2.31.0 CPython/3.10.13 Darwin/23.4.0 OSX/14.4 solver/libmamba conda-libmamba-solver/23.12.0 libmambapy/1.5.5
  • TVM Unity Hash Tag (python -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))", applicable if you compile models):
Details

USE_NVTX: OFF
USE_GTEST: AUTO
SUMMARIZE: OFF
USE_IOS_RPC: OFF
USE_MSC: OFF
USE_ETHOSU:
CUDA_VERSION: NOT-FOUND
USE_LIBBACKTRACE: AUTO
DLPACK_PATH: 3rdparty/dlpack/include
USE_TENSORRT_CODEGEN: OFF
USE_THRUST: OFF
USE_TARGET_ONNX: OFF
USE_AOT_EXECUTOR: ON
BUILD_DUMMY_LIBTVM: OFF
USE_CUDNN: OFF
USE_TENSORRT_RUNTIME: OFF
USE_ARM_COMPUTE_LIB_GRAPH_EXECUTOR: OFF
USE_CCACHE: AUTO
USE_ARM_COMPUTE_LIB: OFF
USE_CPP_RTVM:
USE_OPENCL_GTEST: /path/to/opencl/gtest
USE_MKL: OFF
USE_PT_TVMDSOOP: OFF
MLIR_VERSION: NOT-FOUND
USE_CLML: OFF
USE_STACKVM_RUNTIME: OFF
USE_GRAPH_EXECUTOR_CUDA_GRAPH: OFF
ROCM_PATH: /opt/rocm
USE_DNNL: OFF
USE_VITIS_AI: OFF
USE_MLIR: OFF
USE_RCCL: OFF
USE_LLVM: llvm-config --link-static
USE_VERILATOR: OFF
USE_TF_TVMDSOOP: OFF
USE_THREADS: ON
USE_MSVC_MT: OFF
BACKTRACE_ON_SEGFAULT: OFF
USE_GRAPH_EXECUTOR: ON
USE_NCCL: OFF
USE_ROCBLAS: OFF
GIT_COMMIT_HASH: f06d486b4a1a27f0bbb072688a5fc41e7b15323c
USE_VULKAN: OFF
USE_RUST_EXT: OFF
USE_CUTLASS: OFF
USE_CPP_RPC: OFF
USE_HEXAGON: OFF
USE_CUSTOM_LOGGING: OFF
USE_UMA: OFF
USE_FALLBACK_STL_MAP: OFF
USE_SORT: ON
USE_RTTI: ON
GIT_COMMIT_TIME: 2024-03-08 02:04:22 -0500
USE_HEXAGON_SDK: /path/to/sdk
USE_BLAS: none
USE_ETHOSN: OFF
USE_LIBTORCH: OFF
USE_RANDOM: ON
USE_CUDA: OFF
USE_COREML: OFF
USE_AMX: OFF
BUILD_STATIC_RUNTIME: OFF
USE_CMSISNN: OFF
USE_KHRONOS_SPIRV: OFF
USE_CLML_GRAPH_EXECUTOR: OFF
USE_TFLITE: OFF
USE_HEXAGON_GTEST: /path/to/hexagon/gtest
PICOJSON_PATH: 3rdparty/picojson
USE_OPENCL_ENABLE_HOST_PTR: OFF
INSTALL_DEV: OFF
USE_PROFILER: ON
USE_NNPACK: OFF
LLVM_VERSION: 15.0.7
USE_MRVL: OFF
USE_OPENCL: OFF
COMPILER_RT_PATH: 3rdparty/compiler-rt
RANG_PATH: 3rdparty/rang/include
USE_SPIRV_KHR_INTEGER_DOT_PRODUCT: OFF
USE_OPENMP: OFF
USE_BNNS: OFF
USE_CUBLAS: OFF
USE_METAL: ON
USE_MICRO_STANDALONE_RUNTIME: OFF
USE_HEXAGON_EXTERNAL_LIBS: OFF
USE_ALTERNATIVE_LINKER: AUTO
USE_BYODT_POSIT: OFF
USE_HEXAGON_RPC: OFF
USE_MICRO: OFF
DMLC_PATH: 3rdparty/dmlc-core/include
INDEX_DEFAULT_I64: ON
USE_RELAY_DEBUG: OFF
USE_RPC: ON
USE_TENSORFLOW_PATH: none
TVM_CLML_VERSION:
USE_MIOPEN: OFF
USE_ROCM: OFF
USE_PAPI: OFF
USE_CURAND: OFF
TVM_CXX_COMPILER_PATH: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++
HIDE_PRIVATE_SYMBOLS: ON

  • Any other relevant information:

Full
mlc_chat compile ./CodeFuse-Deepseek/mlc-chat-config.json --device metal -o ../lib/CodeFuse-Deepseek.so [2024-03-08 20:23:36] INFO auto_config.py:69: Found model configuration: CodeFuse-Deepseek/mlc-chat-config.json [2024-03-08 20:23:36] INFO auto_device.py:76: Found device: metal:0 [2024-03-08 20:23:36] INFO auto_target.py:70: Found configuration of target device "metal:0": {"thread_warp_size": 32, "max_threads_per_block": 1024, "max_function_args": 31, "max_num_threads": 256, "kind": "metal", "max_shared_memory_per_block": 32768, "tag": "", "keys": ["metal", "gpu"]} [2024-03-08 20:23:36] INFO auto_target.py:102: Found host LLVM triple: arm64-apple-darwin23.4.0 [2024-03-08 20:23:36] INFO auto_target.py:103: Found host LLVM CPU: apple-m1 [2024-03-08 20:23:36] INFO auto_config.py:153: Found model type: llama. Use --model-type to override. Compiling with arguments: --config LlamaConfig(hidden_size=7168, intermediate_size=19200, num_attention_heads=56, num_hidden_layers=62, rms_norm_eps=1e-06, vocab_size=32256, position_embedding_base=100000, context_window_size=16384, prefill_chunk_size=16384, num_key_value_heads=8, head_dim=128, tensor_parallel_shards=1, max_batch_size=80, kwargs={}) --quantization GroupQuantize(name='q4f16_1', kind='group-quant', group_size=32, quantize_dtype='int4', storage_dtype='uint32', model_dtype='float16', linear_weight_layout='NK', quantize_embedding=True, quantize_final_fc=True, num_elem_per_storage=8, num_storage_per_group=4, max_int_value=7) --model-type llama --target {"thread_warp_size": 32, "host": {"mtriple": "arm64-apple-darwin23.4.0", "tag": "", "kind": "llvm", "mcpu": "apple-m1", "keys": ["arm_cpu", "cpu"]}, "max_threads_per_block": 1024, "max_function_args": 31, "max_num_threads": 256, "kind": "metal", "max_shared_memory_per_block": 32768, "tag": "", "keys": ["metal", "gpu"]} --opt flashinfer=0;cublas_gemm=0;faster_transformer=0;cudagraph=0 --system-lib-prefix "" --output ../lib/CodeFuse-Deepseek.so --overrides context_window_size=None;sliding_window_size=None;prefill_chunk_size=None;attention_sink_size=None;max_batch_size=None;tensor_parallel_shards=None [2024-03-08 20:23:36] INFO compile.py:136: Creating model from: LlamaConfig(hidden_size=7168, intermediate_size=19200, num_attention_heads=56, num_hidden_layers=62, rms_norm_eps=1e-06, vocab_size=32256, position_embedding_base=100000, context_window_size=16384, prefill_chunk_size=16384, num_key_value_heads=8, head_dim=128, tensor_parallel_shards=1, max_batch_size=80, kwargs={}) [2024-03-08 20:23:37] INFO compile.py:155: Exporting the model to TVM Unity compiler [2024-03-08 20:23:38] INFO compile.py:161: Running optimizations using TVM Unity [2024-03-08 20:23:38] INFO compile.py:174: Registering metadata: {'model_type': 'llama', 'quantization': 'q4f16_1', 'context_window_size': 16384, 'sliding_window_size': -1, 'attention_sink_size': -1, 'prefill_chunk_size': 16384, 'tensor_parallel_shards': 1, 'kv_cache_bytes': 0} [2024-03-08 20:23:38] INFO pipeline.py:46: Running TVM Relax graph-level optimizations [2024-03-08 20:27:11] INFO pipeline.py:46: Lowering to TVM TIR kernels [2024-03-08 20:27:16] INFO pipeline.py:46: Running TVM TIR-level optimizations [2024-03-08 20:27:38] INFO pipeline.py:46: Running TVM Dlight low-level optimizations Traceback (most recent call last): File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/bin/mlc_chat", line 8, in <module> sys.exit(main()) ^^^^^^ File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/lib/python3.12/site-packages/mlc_chat/__main__.py", line 24, in main cli.main(sys.argv[2:]) File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/lib/python3.12/site-packages/mlc_chat/cli/compile.py", line 131, in main compile( File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/lib/python3.12/site-packages/mlc_chat/interface/compile.py", line 230, in compile _compile(args, model_config) File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/lib/python3.12/site-packages/mlc_chat/interface/compile.py", line 177, in _compile args.build_func( File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/lib/python3.12/site-packages/mlc_chat/support/auto_target.py", line 242, in build relax.build( File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/lib/python3.12/site-packages/tvm/relax/vm_build.py", line 335, in build mod = pipeline(mod) ^^^^^^^^^^^^^ File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/lib/python3.12/site-packages/tvm/ir/transform.py", line 238, in __call__ return _ffi_transform_api.RunPass(self, mod) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "tvm/_ffi/_cython/./packed_func.pxi", line 332, in tvm._ffi._cy3.core.PackedFuncBase.__call__ File "tvm/_ffi/_cython/./packed_func.pxi", line 263, in tvm._ffi._cy3.core.FuncCall File "tvm/_ffi/_cython/./packed_func.pxi", line 252, in tvm._ffi._cy3.core.FuncCall3 File "tvm/_ffi/_cython/./base.pxi", line 182, in tvm._ffi._cy3.core.CHECK_CALL File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/lib/python3.12/site-packages/tvm/_ffi/base.py", line 481, in raise_last_ffi_error raise py_err File "tvm/_ffi/_cython/./packed_func.pxi", line 56, in tvm._ffi._cy3.core.tvm_callback File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/lib/python3.12/site-packages/mlc_chat/compiler_pass/pipeline.py", line 159, in _pipeline mod = seq(mod) ^^^^^^^^ File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/lib/python3.12/site-packages/tvm/ir/transform.py", line 238, in __call__ return _ffi_transform_api.RunPass(self, mod) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "tvm/_ffi/_cython/./packed_func.pxi", line 332, in tvm._ffi._cy3.core.PackedFuncBase.__call__ File "tvm/_ffi/_cython/./packed_func.pxi", line 263, in tvm._ffi._cy3.core.FuncCall File "tvm/_ffi/_cython/./packed_func.pxi", line 252, in tvm._ffi._cy3.core.FuncCall3 File "tvm/_ffi/_cython/./base.pxi", line 182, in tvm._ffi._cy3.core.CHECK_CALL File "tvm/_ffi/_cython/./packed_func.pxi", line 56, in tvm._ffi._cy3.core.tvm_callback File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/lib/python3.12/site-packages/tvm/ir/transform.py", line 307, in _pass_func return inst.transform_module(mod, ctx) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/lib/python3.12/site-packages/tvm/dlight/base/transform.py", line 64, in transform_module sch = _apply_rules(func, target, self.rules, tunable=False) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/lib/python3.12/site-packages/tvm/dlight/base/transform.py", line 80, in _apply_rules space = rule.apply(func, target, tunable) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/lib/python3.12/site-packages/tvm/dlight/gpu/fallback.py", line 77, in apply bx, tx = sch.split( # pylint: disable=invalid-name ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/lib/python3.12/site-packages/tvm/tir/schedule/_type_checker.py", line 340, in wrap return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/lib/python3.12/site-packages/tvm/tir/schedule/schedule.py", line 811, in split _ffi_api.ScheduleSplit( # type: ignore # pylint: disable=no-member File "tvm/_ffi/_cython/./packed_func.pxi", line 332, in tvm._ffi._cy3.core.PackedFuncBase.__call__ File "tvm/_ffi/_cython/./packed_func.pxi", line 277, in tvm._ffi._cy3.core.FuncCall File "tvm/_ffi/_cython/./base.pxi", line 182, in tvm._ffi._cy3.core.CHECK_CALL File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/lib/python3.12/site-packages/tvm/_ffi/base.py", line 481, in raise_last_ffi_error raise py_err tvm.error.InternalError: Traceback (most recent call last): File "/Users/catalyst/Workspace/mlc-ai-package-self-runner/_work/package/package/tvm/src/ir/expr.cc", line 88 InternalError: Check failed: value < 1LL << (dtype.bits() - 1) (8589934591 vs. 2147483648) : ValueError: Literal value 8589934591 exceeds maximum of int32

Additional context

Based on what I've read elsewhere, I imagine I'm running into issues because I did not build from source, particularly TVM. My other thought is that seeing int32 is odd to me, I'm wondering if that is supposed to be int64 and it may just be some env. flag I need. I also saw someone mention the ability to use --sep-embed (as it is a llama model), donno if that was phased out but it didn't work for me.

Thanks for any help!

@BuildBackBuehler BuildBackBuehler added the bug Confirmed bugs label Mar 9, 2024
@MasterJH5574
Copy link
Member

Thank you for reporting! We will take a look on that.

@MasterJH5574
Copy link
Member

BTW the --sep-embed is a flag used in our old compilation flow. It is not supposed to address the issue here I think.

@BuildBackBuehler
Copy link
Author

Not a problem! Now that I went ahead and did it through the source-built TVM-Unity, it seems things are no better, but this error elucidates more, so I am including it.

[2024-03-10 20:19:53] INFO pipeline.py:47: Running TVM Dlight low-level optimizations Traceback (most recent call last): File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/bin/mlc_chat", line 8, in <module> sys.exit(main()) ^^^^^^ File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/lib/python3.11/site-packages/mlc_chat/__main__.py", line 24, in main cli.main(sys.argv[2:]) File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/lib/python3.11/site-packages/mlc_chat/cli/compile.py", line 131, in main compile( File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/lib/python3.11/site-packages/mlc_chat/interface/compile.py", line 230, in compile _compile(args, model_config) File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/lib/python3.11/site-packages/mlc_chat/interface/compile.py", line 177, in _compile args.build_func( File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/lib/python3.11/site-packages/mlc_chat/support/auto_target.py", line 242, in build relax.build( File "/Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/python/tvm/relax/vm_build.py", line 335, in build mod = pipeline(mod) ^^^^^^^^^^^^^ File "/Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/python/tvm/ir/transform.py", line 238, in __call__ return _ffi_transform_api.RunPass(self, mod) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/python/tvm/_ffi/_ctypes/packed_func.py", line 239, in __call__ raise_last_ffi_error() File "/Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/python/tvm/_ffi/base.py", line 481, in raise_last_ffi_error raise py_err File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/lib/python3.11/site-packages/mlc_chat/compiler_pass/pipeline.py", line 161, in _pipeline mod = seq(mod) ^^^^^^^^ File "/Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/python/tvm/ir/transform.py", line 238, in __call__ return _ffi_transform_api.RunPass(self, mod) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/python/tvm/ir/transform.py", line 307, in _pass_func return inst.transform_module(mod, ctx) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/python/tvm/dlight/base/transform.py", line 64, in transform_module sch = _apply_rules(func, target, self.rules, tunable=False) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/python/tvm/dlight/base/transform.py", line 80, in _apply_rules space = rule.apply(func, target, tunable) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/python/tvm/dlight/gpu/fallback.py", line 77, in apply bx, tx = sch.split( # pylint: disable=invalid-name ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/python/tvm/tir/schedule/_type_checker.py", line 340, in wrap return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/python/tvm/tir/schedule/schedule.py", line 811, in split _ffi_api.ScheduleSplit( # type: ignore # pylint: disable=no-member File "/Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/src/tir/schedule/traced_schedule.cc", line 230, in tvm::tir::TracedScheduleNode::Split(tvm::tir::LoopRV const&, tvm::runtime::Array<tvm::runtime::Optional<tvm::PrimExpr>, void> const&, bool) Array<LoopRV> results = ConcreteScheduleNode::Split(loop_rv, factor_rvs, preserve_unit_iters); ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/src/tir/schedule/concrete_schedule.cc", line 505, in tvm::tir::ConcreteScheduleNode::Split(tvm::tir::LoopRV const&, tvm::runtime::Array<tvm::runtime::Optional<tvm::PrimExpr>, void> const&, bool) results = tir::Split(state_, loop_sref, factors, preserve_unit_iters); ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/src/tir/schedule/primitive/loop_transformation.cc", line 436, in tvm::tir::Split(tvm::tir::ScheduleState, tvm::tir::StmtSRef const&, tvm::runtime::Array<tvm::PrimExpr, void> const&, bool) if (!analyzer.CanProve(predicate, arith::ProofStrength::kSymbolicBound)) { ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/src/arith/analyzer.cc", line 194, in tvm::arith::Analyzer::CanProve(tvm::PrimExpr const&, tvm::arith::ProofStrength) PrimExpr simplified = Simplify(expr); ^^^^^^^^^^^^^^^^^^^^^^ File "/Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/src/arith/analyzer.cc", line 238, in tvm::arith::Analyzer::Simplify(tvm::PrimExpr const&, int) res = this->canonical_simplify(res); ^^^^^^^^^^^^^^^^^^^^^ File "/Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/src/arith/canonical_simplify.cc", line 1440, in tvm::arith::CanonicalSimplifier::operator()(tvm::PrimExpr const&) return impl_->CanonicalSimplify(expr); ^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/src/arith/canonical_simplify.cc", line 587, in tvm::arith::CanonicalSimplifier::Impl::CanonicalSimplify(tvm::PrimExpr) expr = operator()(expr); ^^^^^^^^^^^ File "/Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/src/arith/canonical_simplify.cc", line 593, in tvm::arith::CanonicalSimplifier::Impl::VisitExpr(tvm::PrimExpr const&) auto expr = Rewriter::VisitExpr(input_expr); ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/src/arith/canonical_simplify.cc", line 1423, in tvm::arith::CanonicalSimplifier::Impl::VisitExpr_(tvm::tir::LTNode const*) return Rewriter::VisitExpr(divisible->Normalize() < make_zero(dtype)); ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/src/arith/canonical_simplify.cc", line 1432, in tvm::arith::CanonicalSimplifier::Impl::VisitExpr_(tvm::tir::LTNode const*) return Rewriter::VisitExpr(divisible->Normalize() + extra_expr < make_zero(dtype)); ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/src/arith/canonical_simplify.cc", line 1412, in tvm::arith::CanonicalSimplifier::Impl::VisitExpr_(tvm::tir::LTNode const*) return Rewriter::VisitExpr_(op); ^^^^^^^^^^^^^^^^^^^^^ File "/Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/src/arith/rewrite_simplify.cc", line 1614, in tvm::arith::RewriteSimplifier::Impl::VisitExpr_(tvm::tir::LTNode const*) LT node = Downcast<LT>(IRMutatorWithAnalyzer::VisitExpr_(op)); ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/src/arith/canonical_simplify.cc", line 594, in tvm::arith::CanonicalSimplifier::Impl::VisitExpr(tvm::PrimExpr const&) return Normalize(expr); ^^^^^^^^^^ File "/Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/src/arith/canonical_simplify.cc", line 666, in tvm::arith::CanonicalSimplifier::Impl::Normalize(tvm::PrimExpr) return op->Normalize(); ^^^^^^^^^^^^ File "/Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/src/arith/canonical_simplify.cc", line 258, in tvm::arith::SumExprNode::Normalize() const return Normalize_(this->dtype, SimplifySplitExprs(args), base); ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/src/arith/canonical_simplify.cc", line 509, in tvm::arith::SumExprNode::Normalize_(tvm::runtime::DataType, std::__1::vector<tvm::arith::SplitExpr, std::__1::allocator<tvm::arith::SplitExpr>> const&, long long) res = res + args[i]->Normalize(); ^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/src/arith/canonical_simplify.cc", line 149, in tvm::arith::SplitExprNode::Normalize() const PrimExpr Normalize() const final { return NormalizeWithScale(1); } ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/src/arith/canonical_simplify.cc", line 136, in tvm::arith::SplitExprNode::NormalizeWithScale(long long) const res = ModImpl(res, make_const(dtype, this->upper_factor), div_mode); ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/include/tvm/tir/op.h", line 963, in tvm::PrimExpr tvm::tir::make_const<long long, void>(tvm::runtime::DataType, long long, tvm::Span) return MakeConstScalar(t, value, span); ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/include/tvm/tir/op.h", line 927, in tvm::PrimExpr tvm::tir::MakeConstScalar<long long>(tvm::runtime::DataType, long long, tvm::Span) if (t.is_int()) return IntImm(t, static_cast<int64_t>(value), span); ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/src/ir/expr.cc", line 88, in tvm::IntImm::IntImm(tvm::runtime::DataType, long long, tvm::Span) ICHECK_LT(value, 1LL << (dtype.bits() - 1)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ tvm.error.InternalError: Traceback (most recent call last): 20: tvm::tir::TracedScheduleNode::Split(tvm::tir::LoopRV const&, tvm::runtime::Array<tvm::runtime::Optional<tvm::PrimExpr>, void> const&, bool) at /Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/src/tir/schedule/traced_schedule.cc:230 19: tvm::tir::ConcreteScheduleNode::Split(tvm::tir::LoopRV const&, tvm::runtime::Array<tvm::runtime::Optional<tvm::PrimExpr>, void> const&, bool) at /Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/src/tir/schedule/concrete_schedule.cc:505 18: tvm::tir::Split(tvm::tir::ScheduleState, tvm::tir::StmtSRef const&, tvm::runtime::Array<tvm::PrimExpr, void> const&, bool) at /Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/src/tir/schedule/primitive/loop_transformation.cc:436 17: tvm::arith::Analyzer::CanProve(tvm::PrimExpr const&, tvm::arith::ProofStrength) at /Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/src/arith/analyzer.cc:194 16: tvm::arith::Analyzer::Simplify(tvm::PrimExpr const&, int) at /Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/src/arith/analyzer.cc:238 15: tvm::arith::CanonicalSimplifier::operator()(tvm::PrimExpr const&) at /Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/src/arith/canonical_simplify.cc:1440 14: tvm::arith::CanonicalSimplifier::Impl::CanonicalSimplify(tvm::PrimExpr) at /Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/src/arith/canonical_simplify.cc:587 13: tvm::arith::CanonicalSimplifier::Impl::VisitExpr(tvm::PrimExpr const&) at /Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/src/arith/canonical_simplify.cc:593 12: tvm::arith::CanonicalSimplifier::Impl::VisitExpr_(tvm::tir::LTNode const*) at /Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/src/arith/canonical_simplify.cc:1423 11: tvm::arith::CanonicalSimplifier::Impl::VisitExpr_(tvm::tir::LTNode const*) at /Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/src/arith/canonical_simplify.cc:1432 10: tvm::arith::CanonicalSimplifier::Impl::VisitExpr_(tvm::tir::LTNode const*) at /Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/src/arith/canonical_simplify.cc:1412 9: tvm::arith::RewriteSimplifier::Impl::VisitExpr_(tvm::tir::LTNode const*) at /Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/src/arith/rewrite_simplify.cc:1614 8: tvm::arith::CanonicalSimplifier::Impl::VisitExpr(tvm::PrimExpr const&) at /Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/src/arith/canonical_simplify.cc:594 7: tvm::arith::CanonicalSimplifier::Impl::Normalize(tvm::PrimExpr) at /Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/src/arith/canonical_simplify.cc:666 6: tvm::arith::SumExprNode::Normalize() const at /Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/src/arith/canonical_simplify.cc:258 5: tvm::arith::SumExprNode::Normalize_(tvm::runtime::DataType, std::__1::vector<tvm::arith::SplitExpr, std::__1::allocator<tvm::arith::SplitExpr>> const&, long long) at /Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/src/arith/canonical_simplify.cc:509 4: tvm::arith::SplitExprNode::Normalize() const at /Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/src/arith/canonical_simplify.cc:149 3: tvm::arith::SplitExprNode::NormalizeWithScale(long long) const at /Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/src/arith/canonical_simplify.cc:136 2: tvm::PrimExpr tvm::tir::make_const<long long, void>(tvm::runtime::DataType, long long, tvm::Span) at /Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/include/tvm/tir/op.h:963 1: tvm::PrimExpr tvm::tir::MakeConstScalar<long long>(tvm::runtime::DataType, long long, tvm::Span) at /Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/include/tvm/tir/op.h:927 0: tvm::IntImm::IntImm(tvm::runtime::DataType, long long, tvm::Span) at /Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/src/ir/expr.cc:88 File "/Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/src/ir/expr.cc", line 88 InternalError: Check failed: value < 1LL << (dtype.bits() - 1) (8589934591 vs. 2147483648) : ValueError: Literal value 8589934591 exceeds maximum of int32

@alphaarea
Copy link

#1919
It looks like we're getting the same error:
InternalError: Check failed: value < 1LL << (dtype.bits() - 1) (8589934591 vs. 2147483648) : ValueError: Literal value 8589934591 exceeds maximum of int32

@BuildBackBuehler
Copy link
Author

BuildBackBuehler commented Mar 11, 2024

#1919 It looks like we're getting the same error: InternalError: Check failed: value < 1LL << (dtype.bits() - 1) (8589934591 vs. 2147483648) : ValueError: Literal value 8589934591 exceeds maximum of int32

What model/which versions of CMake/tvmc out of curiosity? Realized I didn't attempt to reversion TVM, wish I did.

CMake 3.28.3, TVM 0.16dev0. That 2nd error is the same model as before. I'd like to see what happens with a smaller model/different version'd cmake/TVM when I get a chance.

Only thing that comes up when I try to search the error...
apache/tvm#15987
Donno where to do that or if I want to bother...

Edit 2: This seems like a critical issue apache/tvm#16661
No? Figured because it sounds/ed like TVM-U as I sourced it, was from TVM-Relax, and potentially this means it was unable to reconcile with the main TVM-U as intended?

@BuildBackBuehler
Copy link
Author

BuildBackBuehler commented Mar 12, 2024

Had a success today with an MLC-HF model. https://huggingface.co/junrushao/Mixtral-8x7B-Instruct-v0.1-q4f16_1-MLC/tree/main

Have to wonder if it is in the lack of overrides.

Registering metadata: {'model_type': 'mixtral', 'quantization': 'q4f16_1', 'context_window_size': -1, 'sliding_window_size': -1, 'attention_sink_size': -1, 'prefill_chunk_size': -1, 'tensor_parallel_shards': 1, 'kv_cache_bytes': 0}

But that'll be for another day. Just happy to have something under the belt.

Edit: Sigh. Seems same bug is hitting me. And it looks like Apache changed TVM's code as of 3 days ago with ICHECK, in particular. apache/tvm@48992a4

I am attempting to run the Mixtral model but this is what I am seeing. It's odd because mlc_chat wouldn't recognize my binary-lib.dylib file, so I duplicated it as an .so and it had no problems. Wondering if TVM changed a flag I need to be using for Metal/MPS and that's whats led to these breaks? (I don't know anything, just desperate for answers!)

Snippet...
tvm::runtime::PackedFunc, tvm::runtime::PackedFunc, tvm::runtime::Optional<tvm::runtime::PackedFunc>) at /Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/src/runtime/relax_vm/paged_kv_cache.cc:377 3: tvm::runtime::NDArray::Empty(tvm::runtime::ShapeTuple, DLDataType, DLDevice, tvm::runtime::Optional<tvm::runtime::String>) at /Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/src/runtime/ndarray.cc:227 2: tvm::runtime::metal::MetalWorkspace::AllocDataSpace(DLDevice, unsigned long, unsigned long, DLDataType) at /Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/src/runtime/metal/metal_device_api.mm:179 1: void tvm::runtime::metal::AutoReleasePoolWrapper::operator<<<tvm::runtime::metal::MetalWorkspace::AllocDataSpace(DLDevice, unsigned long, unsigned long, DLDataType)::$_1>(tvm::runtime::metal::MetalWorkspace::AllocDataSpace(DLDevice, unsigned long, unsigned long, DLDataType)::$_1 const&) at /Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/src/runtime/metal/metal_common.h:89 0: tvm::runtime::metal::MetalWorkspace::AllocDataSpace(DLDevice, unsigned long, unsigned long, DLDataType)::$_1::operator()() const at /Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/src/runtime/metal/metal_device_api.mm:191 File "/Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/src/runtime/metal/metal_device_api.mm", line 191 InternalError: Check failed: (buf != nil) is false:

Full Scope...

Details

Traceback (most recent call last):
File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/bin/mlc_chat", line 8, in
sys.exit(main())
^^^^^^
File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/lib/python3.11/site-packages/mlc_chat/main.py", line 36, in main
cli.main(sys.argv[2:])
File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/lib/python3.11/site-packages/mlc_chat/cli/chat.py", line 41, in main
chat(
File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/lib/python3.11/site-packages/mlc_chat/interface/chat.py", line 133, in chat
cm = ChatModule(model, device, chat_config=config, model_lib_path=model_lib_path)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/lib/python3.11/site-packages/mlc_chat/chat_module.py", line 783, in init
self._reload(self.model_lib_path, self.model_path, user_chat_config_json_str)
File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/lib/python3.11/site-packages/mlc_chat/chat_module.py", line 1002, in _reload
self._reload_func(lib, model_path, app_config_json)
File "/Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/python/tvm/_ffi/_ctypes/packed_func.py", line 239, in call
raise_last_ffi_error()
File "/Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/python/tvm/_ffi/base.py", line 481, in raise_last_ffi_error
raise py_err
File "/Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/src/runtime/relax_vm/vm.cc", line 935, in tvm::runtime::relax_vm::VirtualMachineImpl::_LookupFunction(tvm::runtime::String const&)::$_4::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
self->InvokeClosurePacked(clo, args, rv);
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/src/runtime/relax_vm/vm.cc", line 529, in tvm::runtime::relax_vm::VirtualMachineImpl::InvokeClosurePacked(tvm::runtime::ObjectRef const&, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)
clo->impl.CallPacked(TVMArgs(values.data(), tcodes.data(), args.size() + 1), rv);
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/src/runtime/relax_vm/vm.cc", line 603, in tvm::runtime::relax_vm::VirtualMachineImpl::GetClosureInternal(tvm::runtime::String const&, bool)::$_2::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
rv = static_cast<VirtualMachineImpl>(ctx_ptr)->InvokeBytecode(gf_idx, inputs);
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/src/runtime/relax_vm/vm.cc", line 660, in tvm::runtime::relax_vm::VirtualMachineImpl::InvokeBytecode(long long, std::__1::vector<tvm::runtime::TVMRetValue, std::__1::allocatortvm::runtime::TVMRetValue> const&)
RunLoop();

File "/Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/src/runtime/relax_vm/vm.cc", line 785, in tvm::runtime::relax_vm::VirtualMachineImpl::RunLoop()
this->RunInstrCall(curr_frame, instr);
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/src/runtime/relax_vm/vm.cc", line 738, in tvm::runtime::relax_vm::VirtualMachineImpl::RunInstrCall(tvm::runtime::relax_vm::VMFrame*, tvm::runtime::relax_vm::Instruction)
this->InvokeClosurePacked(func_pool_[instr.func_idx], args, &ret);
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/src/runtime/relax_vm/vm.cc", line 511, in tvm::runtime::relax_vm::VirtualMachineImpl::InvokeClosurePacked(tvm::runtime::ObjectRef const&, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)
packed->CallPacked(args, rv);
^^^^^^^^^^^^^^^^
File "/Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/src/runtime/relax_vm/paged_kv_cache.cc", line 1215, in tvm::runtime::relax_vm::$1::operator()(tvm::runtime::ShapeTuple, long long, long long, long long, long long, int, double, double, tvm::runtime::NDArray, tvm::runtime::PackedFunc, tvm::runtime::PackedFunc, tvm::runtime::PackedFunc, tvm::runtime::PackedFunc, tvm::runtime::PackedFunc, tvm::runtime::PackedFunc, tvm::runtime::PackedFunc, tvm::runtime::Optionaltvm::runtime::PackedFunc) const
ObjectPtr n = make_object(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/include/tvm/runtime/memory.h", line 196, in tvm::runtime::ObjectPtrtvm::runtime::relax_vm::PagedAttentionKVCacheObj tvm::runtime::make_object<tvm::runtime::relax_vm::PagedAttentionKVCacheObj, long long&, long long&, long long&, long long&, long long&, long long&, long long&, long long&, tvm::runtime::relax_vm::RoPEMode, double&, double&, DLDataType const&, DLDevice const&, tvm::runtime::PackedFunc, tvm::runtime::PackedFunc, tvm::runtime::PackedFunc, tvm::runtime::PackedFunc, tvm::runtime::NullOptType const&, tvm::runtime::NullOptType const&, tvm::runtime::NullOptType const&, tvm::runtime::NullOptType const&, tvm::runtime::NullOptType const&, tvm::runtime::NullOptType const&, tvm::runtime::PackedFunc, tvm::runtime::PackedFunc, tvm::runtime::PackedFunc, tvm::runtime::Optionaltvm::runtime::PackedFunc>(long long&, long long&, long long&, long long&, long long&, long long&, long long&, long long&, tvm::runtime::relax_vm::RoPEMode&&, double&, double&, DLDataType const&, DLDevice const&, tvm::runtime::PackedFunc&&, tvm::runtime::PackedFunc&&, tvm::runtime::PackedFunc&&, tvm::runtime::PackedFunc&&, tvm::runtime::NullOptType const&, tvm::runtime::NullOptType const&, tvm::runtime::NullOptType const&, tvm::runtime::NullOptType const&, tvm::runtime::NullOptType const&, tvm::runtime::NullOptType const&, tvm::runtime::PackedFunc&&, tvm::runtime::PackedFunc&&, tvm::runtime::PackedFunc&&, tvm::runtime::Optionaltvm::runtime::PackedFunc&&)
return SimpleObjAllocator().make_object(std::forward(args)...);
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/include/tvm/runtime/memory.h", line 72, in tvm::runtime::ObjectPtrtvm::runtime::relax_vm::PagedAttentionKVCacheObj tvm::runtime::ObjAllocatorBasetvm::runtime::SimpleObjAllocator::make_object<tvm::runtime::relax_vm::PagedAttentionKVCacheObj, long long&, long long&, long long&, long long&, long long&, long long&, long long&, long long&, tvm::runtime::relax_vm::RoPEMode, double&, double&, DLDataType const&, DLDevice const&, tvm::runtime::PackedFunc, tvm::runtime::PackedFunc, tvm::runtime::PackedFunc, tvm::runtime::PackedFunc, tvm::runtime::NullOptType const&, tvm::runtime::NullOptType const&, tvm::runtime::NullOptType const&, tvm::runtime::NullOptType const&, tvm::runtime::NullOptType const&, tvm::runtime::NullOptType const&, tvm::runtime::PackedFunc, tvm::runtime::PackedFunc, tvm::runtime::PackedFunc, tvm::runtime::Optionaltvm::runtime::PackedFunc>(long long&, long long&, long long&, long long&, long long&, long long&, long long&, long long&, tvm::runtime::relax_vm::RoPEMode&&, double&, double&, DLDataType const&, DLDevice const&, tvm::runtime::PackedFunc&&, tvm::runtime::PackedFunc&&, tvm::runtime::PackedFunc&&, tvm::runtime::PackedFunc&&, tvm::runtime::NullOptType const&, tvm::runtime::NullOptType const&, tvm::runtime::NullOptType const&, tvm::runtime::NullOptType const&, tvm::runtime::NullOptType const&, tvm::runtime::NullOptType const&, tvm::runtime::PackedFunc&&, tvm::runtime::PackedFunc&&, tvm::runtime::PackedFunc&&, tvm::runtime::Optionaltvm::runtime::PackedFunc&&)
T* ptr = Handler::New(static_cast<Derived*>(this), std::forward(args)...);
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/include/tvm/runtime/memory.h", line 122, in tvm::runtime::relax_vm::PagedAttentionKVCacheObj* tvm::runtime::SimpleObjAllocator::Handlertvm::runtime::relax_vm::PagedAttentionKVCacheObj::New<long long&, long long&, long long&, long long&, long long&, long long&, long long&, long long&, tvm::runtime::relax_vm::RoPEMode, double&, double&, DLDataType const&, DLDevice const&, tvm::runtime::PackedFunc, tvm::runtime::PackedFunc, tvm::runtime::PackedFunc, tvm::runtime::PackedFunc, tvm::runtime::NullOptType const&, tvm::runtime::NullOptType const&, tvm::runtime::NullOptType const&, tvm::runtime::NullOptType const&, tvm::runtime::NullOptType const&, tvm::runtime::NullOptType const&, tvm::runtime::PackedFunc, tvm::runtime::PackedFunc, tvm::runtime::PackedFunc, tvm::runtime::Optionaltvm::runtime::PackedFunc>(tvm::runtime::SimpleObjAllocator*, long long&, long long&, long long&, long long&, long long&, long long&, long long&, long long&, tvm::runtime::relax_vm::RoPEMode&&, double&, double&, DLDataType const&, DLDevice const&, tvm::runtime::PackedFunc&&, tvm::runtime::PackedFunc&&, tvm::runtime::PackedFunc&&, tvm::runtime::PackedFunc&&, tvm::runtime::NullOptType const&, tvm::runtime::NullOptType const&, tvm::runtime::NullOptType const&, tvm::runtime::NullOptType const&, tvm::runtime::NullOptType const&, tvm::runtime::NullOptType const&, tvm::runtime::PackedFunc&&, tvm::runtime::PackedFunc&&, tvm::runtime::PackedFunc&&, tvm::runtime::Optionaltvm::runtime::PackedFunc&&)
new (data) T(std::forward(args)...);
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/src/runtime/relax_vm/paged_kv_cache.cc", line 373, in tvm::runtime::relax_vm::PagedAttentionKVCacheObj::PagedAttentionKVCacheObj(long long, long long, long long, long long, long long, long long, long long, long long, tvm::runtime::relax_vm::RoPEMode, double, double, DLDataType, DLDevice, tvm::runtime::PackedFunc, tvm::runtime::PackedFunc, tvm::runtime::PackedFunc, tvm::runtime::Optionaltvm::runtime::PackedFunc, tvm::runtime::Optionaltvm::runtime::PackedFunc, tvm::runtime::Optionaltvm::runtime::PackedFunc, tvm::runtime::Optionaltvm::runtime::PackedFunc, tvm::runtime::Optionaltvm::runtime::PackedFunc, tvm::runtime::Optionaltvm::runtime::PackedFunc, tvm::runtime::Optionaltvm::runtime::PackedFunc, tvm::runtime::Optionaltvm::runtime::PackedFunc, tvm::runtime::PackedFunc, tvm::runtime::PackedFunc, tvm::runtime::Optionaltvm::runtime::PackedFunc)
f_debug_get_kv
(std::move(f_debug_get_kv)) {
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/src/runtime/relax_vm/paged_kv_cache.cc", line 377, in tvm::runtime::relax_vm::PagedAttentionKVCacheObj::PagedAttentionKVCacheObj(long long, long long, long long, long long, long long, long long, long long, long long, tvm::runtime::relax_vm::RoPEMode, double, double, DLDataType, DLDevice, tvm::runtime::PackedFunc, tvm::runtime::PackedFunc, tvm::runtime::PackedFunc, tvm::runtime::Optionaltvm::runtime::PackedFunc, tvm::runtime::Optionaltvm::runtime::PackedFunc, tvm::runtime::Optionaltvm::runtime::PackedFunc, tvm::runtime::Optionaltvm::runtime::PackedFunc, tvm::runtime::Optionaltvm::runtime::PackedFunc, tvm::runtime::Optionaltvm::runtime::PackedFunc, tvm::runtime::Optionaltvm::runtime::PackedFunc, tvm::runtime::Optionaltvm::runtime::PackedFunc, tvm::runtime::PackedFunc, tvm::runtime::PackedFunc, tvm::runtime::Optionaltvm::runtime::PackedFunc)
NDArray::Empty({num_total_pages, 2, num_kv_heads, page_size, head_dim}, dtype, device));
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/src/runtime/ndarray.cc", line 227, in tvm::runtime::NDArray::Empty(tvm::runtime::ShapeTuple, DLDataType, DLDevice, tvm::runtime::Optionaltvm::runtime::String)
->AllocDataSpace(ret->device, shape.size(), shape.data(), ret->dtype, mem_scope);
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/src/runtime/metal/metal_device_api.mm", line 179, in tvm::runtime::metal::MetalWorkspace::AllocDataSpace(DLDevice, unsigned long, unsigned long, DLDataType)
AUTORELEASEPOOL {
^^
File "/Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/src/runtime/metal/metal_common.h", line 89, in void tvm::runtime::metal::AutoReleasePoolWrapper::operator<<<tvm::runtime::metal::MetalWorkspace::AllocDataSpace(DLDevice, unsigned long, unsigned long, DLDataType)::$_1>(tvm::runtime::metal::MetalWorkspace::AllocDataSpace(DLDevice, unsigned long, unsigned long, DLDataType)::$_1 const&)
f();

File "/Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/src/runtime/metal/metal_device_api.mm", line 191, in tvm::runtime::metal::MetalWorkspace::AllocDataSpace(DLDevice, unsigned long, unsigned long, DLDataType)::$_1::operator()() const
ICHECK(buf != nil);
^^^^^^
tvm.error.InternalError: Traceback (most recent call last):
19:
18:
17:
16: tvm::runtime::relax_vm::VirtualMachineImpl::_LookupFunction(tvm::runtime::String const&)::$_4::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
at /Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/src/runtime/relax_vm/vm.cc:935
15: tvm::runtime::relax_vm::VirtualMachineImpl::InvokeClosurePacked(tvm::runtime::ObjectRef const&, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)
at /Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/src/runtime/relax_vm/vm.cc:529
14: tvm::runtime::relax_vm::VirtualMachineImpl::GetClosureInternal(tvm::runtime::String const&, bool)::$_2::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
at /Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/src/runtime/relax_vm/vm.cc:603
13: tvm::runtime::relax_vm::VirtualMachineImpl::InvokeBytecode(long long, std::__1::vector<tvm::runtime::TVMRetValue, std::__1::allocatortvm::runtime::TVMRetValue> const&)
at /Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/src/runtime/relax_vm/vm.cc:660
12: tvm::runtime::relax_vm::VirtualMachineImpl::RunLoop()
at /Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/src/runtime/relax_vm/vm.cc:785
11: tvm::runtime::relax_vm::VirtualMachineImpl::RunInstrCall(tvm::runtime::relax_vm::VMFrame*, tvm::runtime::relax_vm::Instruction)
at /Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/src/runtime/relax_vm/vm.cc:738
10: tvm::runtime::relax_vm::VirtualMachineImpl::InvokeClosurePacked(tvm::runtime::ObjectRef const&, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)
at /Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/src/runtime/relax_vm/vm.cc:511
9: tvm::runtime::relax_vm::$_1::operator()(tvm::runtime::ShapeTuple, long long, long long, long long, long long, int, double, double, tvm::runtime::NDArray, tvm::runtime::PackedFunc, tvm::runtime::PackedFunc, tvm::runtime::PackedFunc, tvm::runtime::PackedFunc, tvm::runtime::PackedFunc, tvm::runtime::PackedFunc, tvm::runtime::PackedFunc, tvm::runtime::Optionaltvm::runtime::PackedFunc) const
at /Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/src/runtime/relax_vm/paged_kv_cache.cc:1215
8: tvm::runtime::ObjectPtrtvm::runtime::relax_vm::PagedAttentionKVCacheObj tvm::runtime::make_object<tvm::runtime::relax_vm::PagedAttentionKVCacheObj, long long&, long long&, long long&, long long&, long long&, long long&, long long&, long long&, tvm::runtime::relax_vm::RoPEMode, double&, double&, DLDataType const&, DLDevice const&, tvm::runtime::PackedFunc, tvm::runtime::PackedFunc, tvm::runtime::PackedFunc, tvm::runtime::PackedFunc, tvm::runtime::NullOptType const&, tvm::runtime::NullOptType const&, tvm::runtime::NullOptType const&, tvm::runtime::NullOptType const&, tvm::runtime::NullOptType const&, tvm::runtime::NullOptType const&, tvm::runtime::PackedFunc, tvm::runtime::PackedFunc, tvm::runtime::PackedFunc, tvm::runtime::Optionaltvm::runtime::PackedFunc>(long long&, long long&, long long&, long long&, long long&, long long&, long long&, long long&, tvm::runtime::relax_vm::RoPEMode&&, double&, double&, DLDataType const&, DLDevice const&, tvm::runtime::PackedFunc&&, tvm::runtime::PackedFunc&&, tvm::runtime::PackedFunc&&, tvm::runtime::PackedFunc&&, tvm::runtime::NullOptType const&, tvm::runtime::NullOptType const&, tvm::runtime::NullOptType const&, tvm::runtime::NullOptType const&, tvm::runtime::NullOptType const&, tvm::runtime::NullOptType const&, tvm::runtime::PackedFunc&&, tvm::runtime::PackedFunc&&, tvm::runtime::PackedFunc&&, tvm::runtime::Optionaltvm::runtime::PackedFunc&&)
at /Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/include/tvm/runtime/memory.h:196
7: tvm::runtime::ObjectPtrtvm::runtime::relax_vm::PagedAttentionKVCacheObj tvm::runtime::ObjAllocatorBasetvm::runtime::SimpleObjAllocator::make_object<tvm::runtime::relax_vm::PagedAttentionKVCacheObj, long long&, long long&, long long&, long long&, long long&, long long&, long long&, long long&, tvm::runtime::relax_vm::RoPEMode, double&, double&, DLDataType const&, DLDevice const&, tvm::runtime::PackedFunc, tvm::runtime::PackedFunc, tvm::runtime::PackedFunc, tvm::runtime::PackedFunc, tvm::runtime::NullOptType const&, tvm::runtime::NullOptType const&, tvm::runtime::NullOptType const&, tvm::runtime::NullOptType const&, tvm::runtime::NullOptType const&, tvm::runtime::NullOptType const&, tvm::runtime::PackedFunc, tvm::runtime::PackedFunc, tvm::runtime::PackedFunc, tvm::runtime::Optionaltvm::runtime::PackedFunc>(long long&, long long&, long long&, long long&, long long&, long long&, long long&, long long&, tvm::runtime::relax_vm::RoPEMode&&, double&, double&, DLDataType const&, DLDevice const&, tvm::runtime::PackedFunc&&, tvm::runtime::PackedFunc&&, tvm::runtime::PackedFunc&&, tvm::runtime::PackedFunc&&, tvm::runtime::NullOptType const&, tvm::runtime::NullOptType const&, tvm::runtime::NullOptType const&, tvm::runtime::NullOptType const&, tvm::runtime::NullOptType const&, tvm::runtime::NullOptType const&, tvm::runtime::PackedFunc&&, tvm::runtime::PackedFunc&&, tvm::runtime::PackedFunc&&, tvm::runtime::Optionaltvm::runtime::PackedFunc&&)
at /Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/include/tvm/runtime/memory.h:72
6: tvm::runtime::relax_vm::PagedAttentionKVCacheObj* tvm::runtime::SimpleObjAllocator::Handlertvm::runtime::relax_vm::PagedAttentionKVCacheObj::New<long long&, long long&, long long&, long long&, long long&, long long&, long long&, long long&, tvm::runtime::relax_vm::RoPEMode, double&, double&, DLDataType const&, DLDevice const&, tvm::runtime::PackedFunc, tvm::runtime::PackedFunc, tvm::runtime::PackedFunc, tvm::runtime::PackedFunc, tvm::runtime::NullOptType const&, tvm::runtime::NullOptType const&, tvm::runtime::NullOptType const&, tvm::runtime::NullOptType const&, tvm::runtime::NullOptType const&, tvm::runtime::NullOptType const&, tvm::runtime::PackedFunc, tvm::runtime::PackedFunc, tvm::runtime::PackedFunc, tvm::runtime::Optionaltvm::runtime::PackedFunc>(tvm::runtime::SimpleObjAllocator*, long long&, long long&, long long&, long long&, long long&, long long&, long long&, long long&, tvm::runtime::relax_vm::RoPEMode&&, double&, double&, DLDataType const&, DLDevice const&, tvm::runtime::PackedFunc&&, tvm::runtime::PackedFunc&&, tvm::runtime::PackedFunc&&, tvm::runtime::PackedFunc&&, tvm::runtime::NullOptType const&, tvm::runtime::NullOptType const&, tvm::runtime::NullOptType const&, tvm::runtime::NullOptType const&, tvm::runtime::NullOptType const&, tvm::runtime::NullOptType const&, tvm::runtime::PackedFunc&&, tvm::runtime::PackedFunc&&, tvm::runtime::PackedFunc&&, tvm::runtime::Optionaltvm::runtime::PackedFunc&&)
at /Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/include/tvm/runtime/memory.h:122
5: tvm::runtime::relax_vm::PagedAttentionKVCacheObj::PagedAttentionKVCacheObj(long long, long long, long long, long long, long long, long long, long long, long long, tvm::runtime::relax_vm::RoPEMode, double, double, DLDataType, DLDevice, tvm::runtime::PackedFunc, tvm::runtime::PackedFunc, tvm::runtime::PackedFunc, tvm::runtime::Optionaltvm::runtime::PackedFunc, tvm::runtime::Optionaltvm::runtime::PackedFunc, tvm::runtime::Optionaltvm::runtime::PackedFunc, tvm::runtime::Optionaltvm::runtime::PackedFunc, tvm::runtime::Optionaltvm::runtime::PackedFunc, tvm::runtime::Optionaltvm::runtime::PackedFunc, tvm::runtime::Optionaltvm::runtime::PackedFunc, tvm::runtime::Optionaltvm::runtime::PackedFunc, tvm::runtime::PackedFunc, tvm::runtime::PackedFunc, tvm::runtime::Optionaltvm::runtime::PackedFunc)
at /Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/src/runtime/relax_vm/paged_kv_cache.cc:373
4: tvm::runtime::relax_vm::PagedAttentionKVCacheObj::PagedAttentionKVCacheObj(long long, long long, long long, long long, long long, long long, long long, long long, tvm::runtime::relax_vm::RoPEMode, double, double, DLDataType, DLDevice, tvm::runtime::PackedFunc, tvm::runtime::PackedFunc, tvm::runtime::PackedFunc, tvm::runtime::Optionaltvm::runtime::PackedFunc, tvm::runtime::Optionaltvm::runtime::PackedFunc, tvm::runtime::Optionaltvm::runtime::PackedFunc, tvm::runtime::Optionaltvm::runtime::PackedFunc, tvm::runtime::Optionaltvm::runtime::PackedFunc, tvm::runtime::Optionaltvm::runtime::PackedFunc, tvm::runtime::Optionaltvm::runtime::PackedFunc, tvm::runtime::Optionaltvm::runtime::PackedFunc, tvm::runtime::PackedFunc, tvm::runtime::PackedFunc, tvm::runtime::Optionaltvm::runtime::PackedFunc)
at /Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/src/runtime/relax_vm/paged_kv_cache.cc:377
3: tvm::runtime::NDArray::Empty(tvm::runtime::ShapeTuple, DLDataType, DLDevice, tvm::runtime::Optionaltvm::runtime::String)
at /Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/src/runtime/ndarray.cc:227
2: tvm::runtime::metal::MetalWorkspace::AllocDataSpace(DLDevice, unsigned long, unsigned long, DLDataType)
at /Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/src/runtime/metal/metal_device_api.mm:179
1: void tvm::runtime::metal::AutoReleasePoolWrapper::operator<<<tvm::runtime::metal::MetalWorkspace::AllocDataSpace(DLDevice, unsigned long, unsigned long, DLDataType)::$_1>(tvm::runtime::metal::MetalWorkspace::AllocDataSpace(DLDevice, unsigned long, unsigned long, DLDataType)::$_1 const&)
at /Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/src/runtime/metal/metal_common.h:89
0: tvm::runtime::metal::MetalWorkspace::AllocDataSpace(DLDevice, unsigned long, unsigned long, DLDataType)::$_1::operator()() const
at /Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/src/runtime/metal/metal_device_api.mm:191
File "/Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/src/runtime/metal/metal_device_api.mm", line 191
InternalError: Check failed: (buf != nil) is false:

Also, JIT is on...idk if that matters. Going to put it on redo/readonly and see what happens.
My KV Cache appears to be 0! I imagine that is the issue?

@MasterJH5574
Copy link
Member

MasterJH5574 commented Mar 12, 2024

Hello everyone @BuildBackBuehler @alphaarea, just want to give an update here. We have identified the failure reason and submitted a fix apache/tvm#16704. Please wait for 1-2 days and then update our pip package, given the wheel build takes some time. I will also report back when it's fixed in the wheel package.

@MasterJH5574
Copy link
Member

The package update should be finished. Closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Confirmed bugs
Projects
None yet
Development

No branches or pull requests

3 participants