-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BERT model loading not working with pytorch 1.3.1-eia container #87
Comments
@rucha3 |
@tracycxw Thanks for your response. So, I also tried saving it as a traced model with |
@rucha3 https://github.com/aws-samples/amazon-sagemaker-bert-pytorch/blob/master/bert-sm-python-SDK.ipynb I have some issues when reproducing the error. If you still see the error, are you willing to provide more details of your setup and codes of how you train&save&load the model? You can send info to "[email protected]".
|
@tracycxw
So, to ensure that I was doing everything right, this time I followed the exact same notebook that you shared - https://github.com/aws-samples/amazon-sagemaker-bert-pytorch/blob/master/bert-sm-python-SDK.ipynb. I used a Here is my notebook: https://github.com/rucha3/pytorch-bert-eia-demo/blob/main/bert_example.ipynb |
@rucha3 Further, If you have issue with inference with accelerator, you might want to check torch version. For now, you need to make sure you trace the model using torch 1.3.1. |
@tracycxw |
I have a custom python file for inference in which I have implemented the functions
model_fn
,input_fn
,predict_fn
andoutput_fn
. I have saved the model as a torchscript usingtorch.jit.trace
,torch.jit.save
and loading it usingtorch.jit.load
. Themodel_fn
implementation is as follows:This implementation works perfectly for the container with pytorch 1.5. But for container with torch 1.3.1 it exits abruptly when loading the pretrained model without any logs. The only line I see in the logs is
The worker dies and tries to restart, and the process repeats till I stop the container.
The model I am using is trained with pytorch 1.5. But since EI support is only supported till 1.3.1, I am using this container.
Things I have tried:
debug
andnotset
levels for logs. Didn't get any more info as to why model loading failsPytorchModel's deploy()
function withframework_version
as 1.3.1. Also tried it using the 1.3.1 container withouteia
. Has same behaviour everywhere.Am I doing something wrong or missing something crucial from the documentation? Any help would be much appreciated.
**Logs for container with torch 1.3.1-eia **
The text was updated successfully, but these errors were encountered: