Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Core] Skip PrimExpr index int32 downcasting for batching
This PR makes the ForceNarrowIndexToInt32 to skip application when batching is enabled. The reason is because the flattened index of the KV cache append function may exceed the range of int32 when the cache is large. For example, in Llama-7b, when a KV cache supports more than 8192 tokens, the total cache size will be at least ``` 8192 * 2 (K/V) * 32 (layers) * 4096 = 2147483648, ``` which reaches the maximum int32 value.
- Loading branch information