-
Notifications
You must be signed in to change notification settings - Fork 230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wording sentencepiece.cpp #1435
Conversation
. before newline. Reformat file name to make clear . is not part of filename
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1435
Note: Links to docs will display an error until the docs builds have been completed. ⏳ No Failures, 4 PendingAs of commit 8455d58 with merge base 019f76f (): This comment was automatically generated by Dr. CI and updates every 15 minutes. |
List tokenizer file to make sure it's present
perform ls for debug only when loading tokenizer model fails
@@ -38,7 +40,13 @@ void SPTokenizer::load(const std::string& tokenizer_path) { | |||
// read in the file | |||
const auto status = _processor->Load(tokenizer_path); | |||
if (!status.ok()) { | |||
fprintf(stderr, "couldn't load %s\n. If this tokenizer artifact is for llama3, please pass `-l 3`.", tokenizer_path.c_str()); | |||
// Execute 'ls -al' on the tokenizer path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the print, great for debugging
Looks like the ls
is spitting out the root torchchat directory instead of tokenize path which is curious
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would explain why the tokenizer can't be loaded, and the AOTI tests keep failing. #1429
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added set -x to the command to echo the ls to make absolutely sure the path is not getting corrupted somehow (not sure how it would, but belts and suspenders)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neutral signal- looks like the arg is not being picked up by ls (which would explain why it just shows PWD)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is sooooo weird! Want to add a print of command and rerun? Maybe there’s some magic character that causes indigestion for the shell running La; and the tokenizer model load?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added print of command before execution
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Split C style strong conversion and c++ const as convert followed by append
add `set -x` to debug output to get command with tokenizer path echoed
Add print
Fix typo.
Explícitly Convert c style string constant to std::string
Update to C++11 ABI for AOTI, similar to ET
Thanks again for helping with the debug. @larryliu0820 was able to get his tokenizer changes back in so we can either rebase or close this one |
Awesome. I suggest we close this one. |
. before newline. Reformat file name to make clear . is not part of filename