-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] "TVM Internal Error" CodeFuse-Deepseek Compilation #1911
Comments
Thank you for reporting! We will take a look on that. |
BTW the |
Not a problem! Now that I went ahead and did it through the source-built TVM-Unity, it seems things are no better, but this error elucidates more, so I am including it.
|
#1919 |
What model/which versions of CMake/tvmc out of curiosity? Realized I didn't attempt to reversion TVM, wish I did. CMake 3.28.3, TVM 0.16dev0. That 2nd error is the same model as before. I'd like to see what happens with a smaller model/different version'd cmake/TVM when I get a chance. Only thing that comes up when I try to search the error... Edit 2: This seems like a critical issue apache/tvm#16661 |
Had a success today with an MLC-HF model. https://huggingface.co/junrushao/Mixtral-8x7B-Instruct-v0.1-q4f16_1-MLC/tree/main Have to wonder if it is in the lack of overrides.
But that'll be for another day. Just happy to have something under the belt. Edit: Sigh. Seems same bug is hitting me. And it looks like Apache changed TVM's code as of 3 days ago with ICHECK, in particular. apache/tvm@48992a4 I am attempting to run the Mixtral model but this is what I am seeing. It's odd because mlc_chat wouldn't recognize my binary-lib.dylib file, so I duplicated it as an .so and it had no problems. Wondering if TVM changed a flag I need to be using for Metal/MPS and that's whats led to these breaks? (I don't know anything, just desperate for answers!) Snippet... Full Scope... Details
Traceback (most recent call last): File "/Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/src/runtime/relax_vm/vm.cc", line 785, in tvm::runtime::relax_vm::VirtualMachineImpl::RunLoop() File "/Users/zack/.home/local/gitrepos/mlc-llm/tvm-unity/src/runtime/metal/metal_device_api.mm", line 191, in tvm::runtime::metal::MetalWorkspace::AllocDataSpace(DLDevice, unsigned long, unsigned long, DLDataType)::$_1::operator()() const Also, JIT is on...idk if that matters. Going to put it on redo/readonly and see what happens. |
Hello everyone @BuildBackBuehler @alphaarea, just want to give an update here. We have identified the failure reason and submitted a fix apache/tvm#16704. Please wait for 1-2 days and then update our pip package, given the wheel build takes some time. I will also report back when it's fixed in the wheel package. |
The package update should be finished. Closing this issue. |
🐛 Bug
I have a feeling this error is due to disregarding the potential differences between this model and CodeLlama's (I'm too much of a novice to know what about the nuts & bolts need changing to mirror CodeLlama's terms in order for a seamless conv-template conversion of the tokenizer config).
Immediate Error:
tvm.error.InternalError: Traceback (most recent call last): File "/Users/catalyst/Workspace/mlc-ai-package-self-runner/_work/package/package/tvm/src/ir/expr.cc", line 88 InternalError: Check failed: value < 1LL << (dtype.bits() - 1) (8589934591 vs. 2147483648) : ValueError: Literal value 8589934591 exceeds maximum of int32
I have the full one below.
To Reproduce
https://huggingface.co/codefuse-ai/CodeFuse-DeepSeek-33B/tree/main (this is DeepSeekCoder's base model, but IIRC, its beefed up like Instructor, and thus that's how I intended to use it as)
Steps to reproduce the behavior:
So, for context, here is the last step prior to the error
Details
`~/local/gitrepos/mlc-llm/dist/models main +9 !4 ?31 > mlc_chat gen_config ./CFDS --quantization q4f16_1 --conv-template codellama_instruct -o ./Codefuse-Deepseek
[2024-03-08 20:17:46] INFO auto_config.py:115: Found model configuration: CFDS/config.json
[2024-03-08 20:17:46] INFO auto_config.py:153: Found model type: llama. Use
--model-type
to override.[2024-03-08 20:17:46] INFO llama_model.py:52: context_window_size not found in config.json. Falling back to max_position_embeddings (16384)
[2024-03-08 20:17:46] INFO llama_model.py:72: prefill_chunk_size defaults to context_window_size (16384)
[2024-03-08 20:17:46] INFO config.py:106: Overriding max_batch_size from 1 to 80
[2024-03-08 20:17:46] INFO gen_config.py:121: [generation_config.json] Setting bos_token_id: 32013
[2024-03-08 20:17:46] INFO gen_config.py:121: [generation_config.json] Setting eos_token_id: 32014
[2024-03-08 20:17:46] INFO gen_config.py:135: Not found tokenizer config: CFDS/tokenizer.model
[2024-03-08 20:17:46] INFO gen_config.py:133: Found tokenizer config: CFDS/tokenizer.json. Copying to Codefuse-Deepseek/tokenizer.json
[2024-03-08 20:17:46] INFO gen_config.py:135: Not found tokenizer config: CFDS/vocab.json
[2024-03-08 20:17:46] INFO gen_config.py:135: Not found tokenizer config: CFDS/merges.txt
[2024-03-08 20:17:46] INFO gen_config.py:135: Not found tokenizer config: CFDS/added_tokens.json
[2024-03-08 20:17:46] INFO gen_config.py:133: Found tokenizer config: CFDS/tokenizer_config.json. Copying to Codefuse-Deepseek/tokenizer_config.json
[2024-03-08 20:17:46] INFO gen_config.py:74: [System default] Setting pad_token_id: 0
[2024-03-08 20:17:46] INFO gen_config.py:74: [System default] Setting temperature: 0.7
[2024-03-08 20:17:46] INFO gen_config.py:74: [System default] Setting presence_penalty: 0.0
[2024-03-08 20:17:46] INFO gen_config.py:74: [System default] Setting frequency_penalty: 0.0
[2024-03-08 20:17:46] INFO gen_config.py:74: [System default] Setting repetition_penalty: 1.0
[2024-03-08 20:17:46] INFO gen_config.py:74: [System default] Setting top_p: 0.95
[2024-03-08 20:17:46] INFO gen_config.py:74: [System default] Setting mean_gen_len: 128
[2024-03-08 20:17:46] INFO gen_config.py:74: [System default] Setting max_gen_len: 512
[2024-03-08 20:17:46] INFO gen_config.py:74: [System default] Setting shift_fill_factor: 0.3
[2024-03-08 20:17:46] INFO gen_config.py:186: Dumping configuration file to: Codefuse-Deepseek/mlc-chat-config.json`
Expected behavior
Environment
MacOS, Macbook Pro, M1 Max (Silicon/Metal/MPS)
conda
, source): Condapip
, source): pippython -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))"
, applicable if you compile models):Details
USE_NVTX: OFF
USE_GTEST: AUTO
SUMMARIZE: OFF
USE_IOS_RPC: OFF
USE_MSC: OFF
USE_ETHOSU:
CUDA_VERSION: NOT-FOUND
USE_LIBBACKTRACE: AUTO
DLPACK_PATH: 3rdparty/dlpack/include
USE_TENSORRT_CODEGEN: OFF
USE_THRUST: OFF
USE_TARGET_ONNX: OFF
USE_AOT_EXECUTOR: ON
BUILD_DUMMY_LIBTVM: OFF
USE_CUDNN: OFF
USE_TENSORRT_RUNTIME: OFF
USE_ARM_COMPUTE_LIB_GRAPH_EXECUTOR: OFF
USE_CCACHE: AUTO
USE_ARM_COMPUTE_LIB: OFF
USE_CPP_RTVM:
USE_OPENCL_GTEST: /path/to/opencl/gtest
USE_MKL: OFF
USE_PT_TVMDSOOP: OFF
MLIR_VERSION: NOT-FOUND
USE_CLML: OFF
USE_STACKVM_RUNTIME: OFF
USE_GRAPH_EXECUTOR_CUDA_GRAPH: OFF
ROCM_PATH: /opt/rocm
USE_DNNL: OFF
USE_VITIS_AI: OFF
USE_MLIR: OFF
USE_RCCL: OFF
USE_LLVM: llvm-config --link-static
USE_VERILATOR: OFF
USE_TF_TVMDSOOP: OFF
USE_THREADS: ON
USE_MSVC_MT: OFF
BACKTRACE_ON_SEGFAULT: OFF
USE_GRAPH_EXECUTOR: ON
USE_NCCL: OFF
USE_ROCBLAS: OFF
GIT_COMMIT_HASH: f06d486b4a1a27f0bbb072688a5fc41e7b15323c
USE_VULKAN: OFF
USE_RUST_EXT: OFF
USE_CUTLASS: OFF
USE_CPP_RPC: OFF
USE_HEXAGON: OFF
USE_CUSTOM_LOGGING: OFF
USE_UMA: OFF
USE_FALLBACK_STL_MAP: OFF
USE_SORT: ON
USE_RTTI: ON
GIT_COMMIT_TIME: 2024-03-08 02:04:22 -0500
USE_HEXAGON_SDK: /path/to/sdk
USE_BLAS: none
USE_ETHOSN: OFF
USE_LIBTORCH: OFF
USE_RANDOM: ON
USE_CUDA: OFF
USE_COREML: OFF
USE_AMX: OFF
BUILD_STATIC_RUNTIME: OFF
USE_CMSISNN: OFF
USE_KHRONOS_SPIRV: OFF
USE_CLML_GRAPH_EXECUTOR: OFF
USE_TFLITE: OFF
USE_HEXAGON_GTEST: /path/to/hexagon/gtest
PICOJSON_PATH: 3rdparty/picojson
USE_OPENCL_ENABLE_HOST_PTR: OFF
INSTALL_DEV: OFF
USE_PROFILER: ON
USE_NNPACK: OFF
LLVM_VERSION: 15.0.7
USE_MRVL: OFF
USE_OPENCL: OFF
COMPILER_RT_PATH: 3rdparty/compiler-rt
RANG_PATH: 3rdparty/rang/include
USE_SPIRV_KHR_INTEGER_DOT_PRODUCT: OFF
USE_OPENMP: OFF
USE_BNNS: OFF
USE_CUBLAS: OFF
USE_METAL: ON
USE_MICRO_STANDALONE_RUNTIME: OFF
USE_HEXAGON_EXTERNAL_LIBS: OFF
USE_ALTERNATIVE_LINKER: AUTO
USE_BYODT_POSIT: OFF
USE_HEXAGON_RPC: OFF
USE_MICRO: OFF
DMLC_PATH: 3rdparty/dmlc-core/include
INDEX_DEFAULT_I64: ON
USE_RELAY_DEBUG: OFF
USE_RPC: ON
USE_TENSORFLOW_PATH: none
TVM_CLML_VERSION:
USE_MIOPEN: OFF
USE_ROCM: OFF
USE_PAPI: OFF
USE_CURAND: OFF
TVM_CXX_COMPILER_PATH: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++
HIDE_PRIVATE_SYMBOLS: ON
Full
mlc_chat compile ./CodeFuse-Deepseek/mlc-chat-config.json --device metal -o ../lib/CodeFuse-Deepseek.so [2024-03-08 20:23:36] INFO auto_config.py:69: Found model configuration: CodeFuse-Deepseek/mlc-chat-config.json [2024-03-08 20:23:36] INFO auto_device.py:76: Found device: metal:0 [2024-03-08 20:23:36] INFO auto_target.py:70: Found configuration of target device "metal:0": {"thread_warp_size": 32, "max_threads_per_block": 1024, "max_function_args": 31, "max_num_threads": 256, "kind": "metal", "max_shared_memory_per_block": 32768, "tag": "", "keys": ["metal", "gpu"]} [2024-03-08 20:23:36] INFO auto_target.py:102: Found host LLVM triple: arm64-apple-darwin23.4.0 [2024-03-08 20:23:36] INFO auto_target.py:103: Found host LLVM CPU: apple-m1 [2024-03-08 20:23:36] INFO auto_config.py:153: Found model type: llama. Use
--model-typeto override. Compiling with arguments: --config LlamaConfig(hidden_size=7168, intermediate_size=19200, num_attention_heads=56, num_hidden_layers=62, rms_norm_eps=1e-06, vocab_size=32256, position_embedding_base=100000, context_window_size=16384, prefill_chunk_size=16384, num_key_value_heads=8, head_dim=128, tensor_parallel_shards=1, max_batch_size=80, kwargs={}) --quantization GroupQuantize(name='q4f16_1', kind='group-quant', group_size=32, quantize_dtype='int4', storage_dtype='uint32', model_dtype='float16', linear_weight_layout='NK', quantize_embedding=True, quantize_final_fc=True, num_elem_per_storage=8, num_storage_per_group=4, max_int_value=7) --model-type llama --target {"thread_warp_size": 32, "host": {"mtriple": "arm64-apple-darwin23.4.0", "tag": "", "kind": "llvm", "mcpu": "apple-m1", "keys": ["arm_cpu", "cpu"]}, "max_threads_per_block": 1024, "max_function_args": 31, "max_num_threads": 256, "kind": "metal", "max_shared_memory_per_block": 32768, "tag": "", "keys": ["metal", "gpu"]} --opt flashinfer=0;cublas_gemm=0;faster_transformer=0;cudagraph=0 --system-lib-prefix "" --output ../lib/CodeFuse-Deepseek.so --overrides context_window_size=None;sliding_window_size=None;prefill_chunk_size=None;attention_sink_size=None;max_batch_size=None;tensor_parallel_shards=None [2024-03-08 20:23:36] INFO compile.py:136: Creating model from: LlamaConfig(hidden_size=7168, intermediate_size=19200, num_attention_heads=56, num_hidden_layers=62, rms_norm_eps=1e-06, vocab_size=32256, position_embedding_base=100000, context_window_size=16384, prefill_chunk_size=16384, num_key_value_heads=8, head_dim=128, tensor_parallel_shards=1, max_batch_size=80, kwargs={}) [2024-03-08 20:23:37] INFO compile.py:155: Exporting the model to TVM Unity compiler [2024-03-08 20:23:38] INFO compile.py:161: Running optimizations using TVM Unity [2024-03-08 20:23:38] INFO compile.py:174: Registering metadata: {'model_type': 'llama', 'quantization': 'q4f16_1', 'context_window_size': 16384, 'sliding_window_size': -1, 'attention_sink_size': -1, 'prefill_chunk_size': 16384, 'tensor_parallel_shards': 1, 'kv_cache_bytes': 0} [2024-03-08 20:23:38] INFO pipeline.py:46: Running TVM Relax graph-level optimizations [2024-03-08 20:27:11] INFO pipeline.py:46: Lowering to TVM TIR kernels [2024-03-08 20:27:16] INFO pipeline.py:46: Running TVM TIR-level optimizations [2024-03-08 20:27:38] INFO pipeline.py:46: Running TVM Dlight low-level optimizations Traceback (most recent call last): File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/bin/mlc_chat", line 8, in <module> sys.exit(main()) ^^^^^^ File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/lib/python3.12/site-packages/mlc_chat/__main__.py", line 24, in main cli.main(sys.argv[2:]) File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/lib/python3.12/site-packages/mlc_chat/cli/compile.py", line 131, in main compile( File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/lib/python3.12/site-packages/mlc_chat/interface/compile.py", line 230, in compile _compile(args, model_config) File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/lib/python3.12/site-packages/mlc_chat/interface/compile.py", line 177, in _compile args.build_func( File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/lib/python3.12/site-packages/mlc_chat/support/auto_target.py", line 242, in build relax.build( File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/lib/python3.12/site-packages/tvm/relax/vm_build.py", line 335, in build mod = pipeline(mod) ^^^^^^^^^^^^^ File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/lib/python3.12/site-packages/tvm/ir/transform.py", line 238, in __call__ return _ffi_transform_api.RunPass(self, mod) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "tvm/_ffi/_cython/./packed_func.pxi", line 332, in tvm._ffi._cy3.core.PackedFuncBase.__call__ File "tvm/_ffi/_cython/./packed_func.pxi", line 263, in tvm._ffi._cy3.core.FuncCall File "tvm/_ffi/_cython/./packed_func.pxi", line 252, in tvm._ffi._cy3.core.FuncCall3 File "tvm/_ffi/_cython/./base.pxi", line 182, in tvm._ffi._cy3.core.CHECK_CALL File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/lib/python3.12/site-packages/tvm/_ffi/base.py", line 481, in raise_last_ffi_error raise py_err File "tvm/_ffi/_cython/./packed_func.pxi", line 56, in tvm._ffi._cy3.core.tvm_callback File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/lib/python3.12/site-packages/mlc_chat/compiler_pass/pipeline.py", line 159, in _pipeline mod = seq(mod) ^^^^^^^^ File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/lib/python3.12/site-packages/tvm/ir/transform.py", line 238, in __call__ return _ffi_transform_api.RunPass(self, mod) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "tvm/_ffi/_cython/./packed_func.pxi", line 332, in tvm._ffi._cy3.core.PackedFuncBase.__call__ File "tvm/_ffi/_cython/./packed_func.pxi", line 263, in tvm._ffi._cy3.core.FuncCall File "tvm/_ffi/_cython/./packed_func.pxi", line 252, in tvm._ffi._cy3.core.FuncCall3 File "tvm/_ffi/_cython/./base.pxi", line 182, in tvm._ffi._cy3.core.CHECK_CALL File "tvm/_ffi/_cython/./packed_func.pxi", line 56, in tvm._ffi._cy3.core.tvm_callback File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/lib/python3.12/site-packages/tvm/ir/transform.py", line 307, in _pass_func return inst.transform_module(mod, ctx) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/lib/python3.12/site-packages/tvm/dlight/base/transform.py", line 64, in transform_module sch = _apply_rules(func, target, self.rules, tunable=False) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/lib/python3.12/site-packages/tvm/dlight/base/transform.py", line 80, in _apply_rules space = rule.apply(func, target, tunable) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/lib/python3.12/site-packages/tvm/dlight/gpu/fallback.py", line 77, in apply bx, tx = sch.split( # pylint: disable=invalid-name ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/lib/python3.12/site-packages/tvm/tir/schedule/_type_checker.py", line 340, in wrap return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/lib/python3.12/site-packages/tvm/tir/schedule/schedule.py", line 811, in split _ffi_api.ScheduleSplit( # type: ignore # pylint: disable=no-member File "tvm/_ffi/_cython/./packed_func.pxi", line 332, in tvm._ffi._cy3.core.PackedFuncBase.__call__ File "tvm/_ffi/_cython/./packed_func.pxi", line 277, in tvm._ffi._cy3.core.FuncCall File "tvm/_ffi/_cython/./base.pxi", line 182, in tvm._ffi._cy3.core.CHECK_CALL File "/opt/homebrew/Caskroom/miniforge/base/envs/mlc-llm/lib/python3.12/site-packages/tvm/_ffi/base.py", line 481, in raise_last_ffi_error raise py_err tvm.error.InternalError: Traceback (most recent call last): File "/Users/catalyst/Workspace/mlc-ai-package-self-runner/_work/package/package/tvm/src/ir/expr.cc", line 88 InternalError: Check failed: value < 1LL << (dtype.bits() - 1) (8589934591 vs. 2147483648) : ValueError: Literal value 8589934591 exceeds maximum of int32
Additional context
Based on what I've read elsewhere, I imagine I'm running into issues because I did not build from source, particularly TVM. My other thought is that seeing int32 is odd to me, I'm wondering if that is supposed to be int64 and it may just be some env. flag I need. I also saw someone mention the ability to use --sep-embed (as it is a llama model), donno if that was phased out but it didn't work for me.
Thanks for any help!
The text was updated successfully, but these errors were encountered: