Replies: 1 comment
-
@Syhong330 it is simply because you are asking the kernel to load the entire weights into RAM at once, instead of lazy-loading them. Said differently, it is simply the time it takes for your computer to move the data from storage to RAM. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello, I'm using llama.cpp to run llama2 in Windows.
When I set '--mlock' option on, the load time seems to increase by about 2 seconds. As I know it's stored in the committed area of RAM, but I'd like to know why the load time increases so much.
Beta Was this translation helpful? Give feedback.
All reactions