-
Notifications
You must be signed in to change notification settings - Fork 209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] TGI doesn't start due to permission denied #861
Comments
Looks like tgi is trying to access more locations after enabled TDX. |
That is good workaround. TGI service downloaded all required files to start successfully |
At 1.0 release, we enabled securityContext by default (#258). This will run the pod with non-root user, and with root file system readonly. At the TGI pod start, it will download the required model to a emptyDir mounted volume /data. Easy fix would be disable the securityContext, but that need discussion whether that's a good way to go. Before an official fix, you can use the above workaround while enabling TDX/kata. |
Thanks a lot. Workaround works for me. |
@ksandowi Could you track down which exact part of the (I would assume it to be either |
* Combine CI/CD docker compose yaml files. Signed-off-by: ZePan110 <[email protected]> * Fix path issue in script Signed-off-by: ZePan110 <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Combine CI/CD docker compose yaml files. Signed-off-by: ZePan110 <[email protected]> * Fix path issue in script Signed-off-by: ZePan110 <[email protected]> * Fix path Signed-off-by: ZePan110 <[email protected]> * Combine CI/CD for .github/workflows/_comps-workflow.yml and .github/workflows/_get-image-list.yml Signed-off-by: ZePan110 <[email protected]> * Combine compose yaml file Signed-off-by: ZePan110 <[email protected]> * Remove CD file Signed-off-by: ZePan110 <[email protected]> * Add vllm-fork to CI. Signed-off-by: ZePan110 <[email protected]> * Fix issue Signed-off-by: ZePan110 <[email protected]> * Add CICD mode Signed-off-by: ZePan110 <[email protected]> * Add embedding-reranking-local to compose.yaml Signed-off-by: ZePan110 <[email protected]> * Change descriptions Signed-off-by: ZePan110 <[email protected]> * Fix script issue. Signed-off-by: ZePan110 <[email protected]> * Add new image. Signed-off-by: ZePan110 <[email protected]> * remove useless image from llms-compose.yaml Signed-off-by: ZePan110 <[email protected]> * Change folder name. Signed-off-by: ZePan110 <[email protected]> * Standardize the format of image names. Signed-off-by: ZePan110 <[email protected]> * Add commit print Signed-off-by: ZePan110 <[email protected]> * Standardize the format of image names Signed-off-by: ZePan110 <[email protected]> * Add vllm-arc to compose file. Signed-off-by: ZePan110 <[email protected]> * Restore test_agent_langchain_on_intel_hpu.sh Signed-off-by: ZePan110 <[email protected]> * Remove useless comments Signed-off-by: ZePan110 <[email protected]> --------- Signed-off-by: ZePan110 <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Priority
P1-Stopper
OS type
Ubuntu
Hardware type
Xeon-other (Please let us know in description)
Installation method
Deploy method
Running nodes
Single Node
What's the version?
tag v1.0
Description
The issue affects XEON on both SPR and EMR.
After modification of chatqna.yaml to run all services in TD (protected by TDX), all services running successfully except TGI service which fails during model downloading. It worked fine in previous (v0.9 and v0.8) versions
Reproduce steps
On a platform with TDX enabled, modify ~/GenAIExamples/ChatQnA/kubernetes/intel/cpu/xeon/manifest/chatqna.yaml, so it run all services in TD:
Raw log
The text was updated successfully, but these errors were encountered: