Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds Huggingface GenAI container build specs into pytorch/Dockerfile and pytorch/docker-compose.yaml #146

Merged

Conversation

HarshaRamayanam
Copy link
Contributor

@HarshaRamayanam HarshaRamayanam commented Jun 14, 2024

Description

This PR adds a new entry into pytorch/Dockerfile for building a Huggingface GenAI container that can run a typical Generative AI model from Hugging Face. (like run_clm.py script from transformers)

It also adds a new service entry in pytorch/docker-compose.yaml file named hf-genai (can be renamed in future) to build the container

Additional files added are

  • pytorch/generate_ssh_keys.sh
  • pytorch/hf-genai-requirements.txt

Related Issue

None

Changes Made

  • pytorch/Dockerfile (Modified)

  • pytorch/docker-compose.yaml (Modified)

  • pytorch/generate_ssh_keys.sh (New file)

  • pytorch/hf-genai-requirements.txt (New file)

  • The code follows the project's coding standards.

  • No Intel Internal IP is present within the changes.

  • The documentation has been updated to reflect any changes in functionality.

Validation

  • I have tested any changes in container groups locally with test_runner.py with all existing tests passing, and I have added new tests where applicable.

Copy link

github-actions bot commented Jun 14, 2024

Dependency Review

The following issues were found:
  • ✅ 0 vulnerable package(s)
  • ✅ 0 package(s) with incompatible licenses
  • ✅ 0 package(s) with invalid SPDX license definitions
  • ⚠️ 1 package(s) with unknown licenses.
See the Details below.

License Issues

pytorch/hf-genai-requirements.txt

PackageVersionLicenseIssue Type
rouge_score0.1.2NullUnknown License

OpenSSF Scorecard

Scorecard details
PackageVersionScoreDetails
pip/SentencePiece 0.2.0 🟢 7.5
Details
CheckScoreReason
Code-Review🟢 5Found 6/11 approved changesets -- score normalized to 5
Maintained🟢 1022 commit(s) and 16 issue activity found in the last 90 days -- score normalized to 10
CII-Best-Practices⚠️ 0no effort to earn an OpenSSF best practices badge detected
License🟢 10license file detected
Signed-Releases🟢 42 out of the last 5 releases have a total of 2 signed artifacts.
Packaging⚠️ -1packaging workflow not detected
Branch-Protection⚠️ 0branch protection not enabled on development/release branches
Dangerous-Workflow🟢 10no dangerous workflow patterns detected
Token-Permissions🟢 10GitHub workflow tokens follow principle of least privilege
Binary-Artifacts🟢 10no binaries found in the repo
SAST🟢 5SAST tool is not run on all commits -- score normalized to 5
Fuzzing🟢 10project is fuzzed
Security-Policy🟢 10security policy file detected
Pinned-Dependencies🟢 8dependency not pinned by hash detected -- score normalized to 8
Vulnerabilities🟢 100 existing vulnerabilities detected
pip/accelerate 0.28.0 🟢 6.2
Details
CheckScoreReason
Code-Review🟢 9Found 28/30 approved changesets -- score normalized to 9
Maintained🟢 1030 commit(s) and 16 issue activity found in the last 90 days -- score normalized to 10
CII-Best-Practices⚠️ 0no effort to earn an OpenSSF best practices badge detected
License🟢 10license file detected
Branch-Protection⚠️ -1internal error: error during branchesHandler.setup: internal error: githubv4.Query: Resource not accessible by integration
Signed-Releases⚠️ -1no releases found
Dangerous-Workflow🟢 10no dangerous workflow patterns detected
Binary-Artifacts🟢 10no binaries found in the repo
Token-Permissions⚠️ 0detected GitHub workflow tokens with excessive permissions
Pinned-Dependencies⚠️ 0dependency not pinned by hash detected -- score normalized to 0
Fuzzing⚠️ 0project is not fuzzed
Security-Policy⚠️ 0security policy file not detected
Vulnerabilities🟢 100 existing vulnerabilities detected
Packaging🟢 10packaging workflow detected
SAST🟢 3SAST tool is not run on all commits -- score normalized to 3
pip/datasets 2.19.0 🟢 6
Details
CheckScoreReason
Code-Review🟢 5Found 16/30 approved changesets -- score normalized to 5
Maintained🟢 1030 commit(s) and 9 issue activity found in the last 90 days -- score normalized to 10
CII-Best-Practices⚠️ 0no effort to earn an OpenSSF best practices badge detected
License🟢 10license file detected
Signed-Releases⚠️ -1no releases found
Dangerous-Workflow🟢 10no dangerous workflow patterns detected
Security-Policy🟢 10security policy file detected
Packaging⚠️ -1packaging workflow not detected
Binary-Artifacts🟢 10no binaries found in the repo
Token-Permissions⚠️ 0detected GitHub workflow tokens with excessive permissions
Pinned-Dependencies⚠️ 0dependency not pinned by hash detected -- score normalized to 0
Vulnerabilities🟢 100 existing vulnerabilities detected
Fuzzing⚠️ 0project is not fuzzed
Branch-Protection⚠️ -1internal error: error during branchesHandler.setup: internal error: githubv4.Query: Resource not accessible by integration
SAST⚠️ 0SAST tool is not run on all commits -- score normalized to 0
pip/einops 0.7.0 🟢 5
Details
CheckScoreReason
Code-Review⚠️ 2Found 4/20 approved changesets -- score normalized to 2
Maintained🟢 108 commit(s) and 8 issue activity found in the last 90 days -- score normalized to 10
CII-Best-Practices⚠️ 0no effort to earn an OpenSSF best practices badge detected
License🟢 10license file detected
Branch-Protection⚠️ -1internal error: error during branchesHandler.setup: internal error: githubv4.Query: Resource not accessible by integration
Packaging⚠️ -1packaging workflow not detected
Binary-Artifacts🟢 10no binaries found in the repo
Dangerous-Workflow🟢 10no dangerous workflow patterns detected
Token-Permissions⚠️ 0detected GitHub workflow tokens with excessive permissions
Signed-Releases⚠️ -1no releases found
Pinned-Dependencies⚠️ 0dependency not pinned by hash detected -- score normalized to 0
Vulnerabilities🟢 100 existing vulnerabilities detected
Fuzzing⚠️ 0project is not fuzzed
Security-Policy⚠️ 0security policy file not detected
SAST⚠️ 0SAST tool is not run on all commits -- score normalized to 0
pip/evaluate 0.4.1 🟢 5.4
Details
CheckScoreReason
Code-Review🟢 9Found 29/30 approved changesets -- score normalized to 9
Maintained🟢 56 commit(s) and 1 issue activity found in the last 90 days -- score normalized to 5
CII-Best-Practices⚠️ 0no effort to earn an OpenSSF best practices badge detected
License🟢 10license file detected
Signed-Releases⚠️ -1no releases found
Branch-Protection⚠️ -1internal error: error during branchesHandler.setup: internal error: githubv4.Query: Resource not accessible by integration
Dangerous-Workflow🟢 10no dangerous workflow patterns detected
Packaging⚠️ -1packaging workflow not detected
Token-Permissions⚠️ 0detected GitHub workflow tokens with excessive permissions
Binary-Artifacts🟢 10no binaries found in the repo
Pinned-Dependencies⚠️ 0dependency not pinned by hash detected -- score normalized to 0
Fuzzing⚠️ 0project is not fuzzed
Security-Policy⚠️ 0security policy file not detected
Vulnerabilities🟢 100 existing vulnerabilities detected
SAST🟢 3SAST tool is not run on all commits -- score normalized to 3
pip/nltk 3.8.1 🟢 4.9
Details
CheckScoreReason
Code-Review🟢 10all changesets reviewed
Maintained⚠️ 01 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0
CII-Best-Practices⚠️ 0no effort to earn an OpenSSF best practices badge detected
License🟢 10license file detected
Signed-Releases⚠️ -1no releases found
Dangerous-Workflow🟢 10no dangerous workflow patterns detected
Security-Policy🟢 9security policy file detected
Branch-Protection⚠️ 0branch protection not enabled on development/release branches
Packaging⚠️ -1packaging workflow not detected
Token-Permissions⚠️ 0detected GitHub workflow tokens with excessive permissions
Binary-Artifacts🟢 10no binaries found in the repo
Pinned-Dependencies⚠️ 0dependency not pinned by hash detected -- score normalized to 0
Vulnerabilities🟢 100 existing vulnerabilities detected
Fuzzing⚠️ 0project is not fuzzed
SAST⚠️ 0SAST tool is not run on all commits -- score normalized to 0
pip/onnxruntime 1.17.3 🟢 6.8
Details
CheckScoreReason
Code-Review🟢 10all last 30 commits are reviewed through GitHub
Maintained🟢 1030 commit(s) out of 30 and 8 issue activity out of 30 found in the last 90 days -- score normalized to 10
CII-Best-Practices⚠️ 0no badge detected
Vulnerabilities🟢 10no vulnerabilities detected
Signed-Releases⚠️ 00 out of 5 artifacts are signed or have provenance
Branch-Protection🟢 8branch protection is not maximal on development and all release branches
Security-Policy🟢 10security policy file detected
Dangerous-Workflow🟢 10no dangerous workflow patterns detected
Packaging⚠️ -1no published package detected
License🟢 10license file detected
Token-Permissions⚠️ 0non read-only tokens detected in GitHub workflows
Dependency-Update-Tool🟢 10update tool detected
Binary-Artifacts🟢 10no binaries found in the repo
Fuzzing⚠️ 0project is not fuzzed
Pinned-Dependencies⚠️ 0dependency not pinned by hash detected -- score normalized to 0
pip/onnxruntime-extensions 0.10.1 🟢 6.1
Details
CheckScoreReason
Code-Review🟢 9Found 29/30 approved changesets -- score normalized to 9
Maintained🟢 1030 commit(s) and 11 issue activity found in the last 90 days -- score normalized to 10
CII-Best-Practices⚠️ 0no effort to earn an OpenSSF best practices badge detected
License🟢 10license file detected
Signed-Releases⚠️ -1no releases found
Branch-Protection⚠️ -1internal error: error during branchesHandler.setup: internal error: githubv4.Query: Resource not accessible by integration
Dangerous-Workflow🟢 10no dangerous workflow patterns detected
Packaging⚠️ -1packaging workflow not detected
Security-Policy🟢 10security policy file detected
Token-Permissions⚠️ 0detected GitHub workflow tokens with excessive permissions
SAST⚠️ 0SAST tool is not run on all commits -- score normalized to 0
Fuzzing⚠️ 0project is not fuzzed
Vulnerabilities🟢 100 existing vulnerabilities detected
Binary-Artifacts🟢 7binaries present in source code
Pinned-Dependencies⚠️ 0dependency not pinned by hash detected -- score normalized to 0
pip/peft 0.10.0 UnknownUnknown
pip/protobuf 4.24.4 🟢 7.1
Details
CheckScoreReason
Binary-Artifacts🟢 10no binaries found in the repo
Branch-Protection⚠️ -1internal error: error during branchesHandler.setup: internal error: githubv4.Query: Resource not accessible by integration
CI-Tests🟢 1023 out of 23 merged PRs checked by a CI test -- score normalized to 10
CII-Best-Practices⚠️ 0no effort to earn an OpenSSF best practices badge detected
Code-Review⚠️ 2found 24 unreviewed changesets out of 30 -- score normalized to 2
Contributors🟢 1013 different organizations found -- score normalized to 10
Dangerous-Workflow🟢 10no dangerous workflow patterns detected
Dependency-Update-Tool🟢 10update tool detected
Fuzzing🟢 10project is fuzzed
License🟢 9license file detected
Maintained🟢 1030 commit(s) out of 30 and 2 issue activity out of 30 found in the last 90 days -- score normalized to 10
Packaging⚠️ -1no published package detected
Pinned-Dependencies⚠️ 0dependency not pinned by hash detected -- score normalized to 0
SAST🟢 3SAST tool is not run on all commits -- score normalized to 3
Security-Policy🟢 10security policy file detected
Signed-Releases⚠️ 00 out of 5 artifacts are signed or have provenance
Token-Permissions🟢 10GitHub workflow tokens follow principle of least privilege
Vulnerabilities🟢 73 existing vulnerabilities detected
pip/py-cpuinfo 9.0.0 🟢 3.8
Details
CheckScoreReason
Code-Review🟢 4Found 7/17 approved changesets -- score normalized to 4
Maintained⚠️ 00 commit(s) and 1 issue activity found in the last 90 days -- score normalized to 0
CII-Best-Practices⚠️ 0no effort to earn an OpenSSF best practices badge detected
License🟢 10license file detected
Signed-Releases⚠️ -1no releases found
Branch-Protection⚠️ 0branch protection not enabled on development/release branches
Dangerous-Workflow🟢 10no dangerous workflow patterns detected
Packaging⚠️ -1packaging workflow not detected
Token-Permissions⚠️ 0detected GitHub workflow tokens with excessive permissions
Binary-Artifacts🟢 10no binaries found in the repo
Pinned-Dependencies⚠️ 0dependency not pinned by hash detected -- score normalized to 0
Vulnerabilities🟢 100 existing vulnerabilities detected
Fuzzing⚠️ 0project is not fuzzed
Security-Policy⚠️ 0security policy file not detected
SAST⚠️ 0SAST tool is not run on all commits -- score normalized to 0
pip/rouge_score 0.1.2 UnknownUnknown
pip/scikit-learn 1.5.0 🟢 9.5
Details
CheckScoreReason
Code-Review🟢 10all changesets reviewed
Maintained🟢 1030 commit(s) and 23 issue activity found in the last 90 days -- score normalized to 10
CII-Best-Practices⚠️ 0no effort to earn an OpenSSF best practices badge detected
License🟢 10license file detected
Branch-Protection⚠️ -1internal error: error during branchesHandler.setup: internal error: githubv4.Query: Resource not accessible by integration
Signed-Releases⚠️ -1no releases found
Dangerous-Workflow⚠️ -1no workflows found
Security-Policy🟢 10security policy file detected
Packaging⚠️ -1packaging workflow not detected
Token-Permissions⚠️ -1No tokens found
Vulnerabilities🟢 100 existing vulnerabilities detected
Binary-Artifacts🟢 10no binaries found in the repo
Pinned-Dependencies⚠️ -1no dependencies found
Fuzzing🟢 10project is fuzzed
SAST🟢 10SAST tool is run on all commits
pip/tokenizers 0.19.1 🟢 5.5
Details
CheckScoreReason
Code-Review🟢 7Found 20/28 approved changesets -- score normalized to 7
Maintained🟢 1018 commit(s) and 24 issue activity found in the last 90 days -- score normalized to 10
CII-Best-Practices⚠️ 0no effort to earn an OpenSSF best practices badge detected
License🟢 10license file detected
Branch-Protection⚠️ -1internal error: error during branchesHandler.setup: internal error: githubv4.Query: Resource not accessible by integration
Signed-Releases⚠️ -1no releases found
Dangerous-Workflow🟢 10no dangerous workflow patterns detected
Binary-Artifacts🟢 10no binaries found in the repo
Token-Permissions⚠️ 0detected GitHub workflow tokens with excessive permissions
Pinned-Dependencies⚠️ 0dependency not pinned by hash detected -- score normalized to 0
Fuzzing⚠️ 0project is not fuzzed
Security-Policy⚠️ 0security policy file not detected
Packaging🟢 10packaging workflow detected
SAST⚠️ 0SAST tool is not run on all commits -- score normalized to 0
Vulnerabilities🟢 64 existing vulnerabilities detected
pip/transformers 4.41.2 🟢 4.5
Details
CheckScoreReason
Code-Review🟢 10all changesets reviewed
Maintained🟢 1030 commit(s) and 17 issue activity found in the last 90 days -- score normalized to 10
CII-Best-Practices⚠️ 0no effort to earn an OpenSSF best practices badge detected
License🟢 10license file detected
Branch-Protection⚠️ -1internal error: error during branchesHandler.setup: internal error: githubv4.Query: Resource not accessible by integration
Signed-Releases⚠️ -1no releases found
Security-Policy🟢 10security policy file detected
Dangerous-Workflow⚠️ 0dangerous workflow patterns detected
Token-Permissions⚠️ 0detected GitHub workflow tokens with excessive permissions
Binary-Artifacts🟢 10no binaries found in the repo
Fuzzing⚠️ 0project is not fuzzed
Packaging🟢 10packaging workflow detected
SAST⚠️ 0SAST tool is not run on all commits -- score normalized to 0
Pinned-Dependencies⚠️ 0dependency not pinned by hash detected -- score normalized to 0
Vulnerabilities⚠️ 0465 existing vulnerabilities detected

Scanned Manifest Files

pytorch/hf-genai-requirements.txt

Copy link
Contributor

@tylertitsworth tylertitsworth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • What is the public name of this container going to be? intel/intel-optimized-pytorch:2.3.0-pip-hf-genai?
  • What happens to workflows/charts/huggingface-llm? Does that container go away in favor of this one or are they different?

Please update the README.md file appropriately, this will go to Dockerhub.

pytorch/Dockerfile Outdated Show resolved Hide resolved
pytorch/Dockerfile Outdated Show resolved Hide resolved
pytorch/Dockerfile Outdated Show resolved Hide resolved
pytorch/Dockerfile Show resolved Hide resolved
pytorch/docker-compose.yaml Outdated Show resolved Hide resolved
pytorch/docker-compose.yaml Show resolved Hide resolved
@dmsuehir
Copy link
Contributor

dmsuehir commented Jun 17, 2024

  • What happens to workflows/charts/huggingface-llm? Does that container go away in favor of this one or are they different?

@tylertitsworth The huggingface-llm container will switch to use this Gen AI container as it's base and I think the only thing it'll need is a copy of the fine tuning python script

pytorch/Dockerfile Outdated Show resolved Hide resolved
pytorch/Dockerfile Outdated Show resolved Hide resolved
pytorch/Dockerfile Outdated Show resolved Hide resolved
pytorch/Dockerfile Outdated Show resolved Hide resolved
pytorch/docker-compose.yaml Outdated Show resolved Hide resolved
pytorch/generate_ssh_keys.sh Outdated Show resolved Hide resolved
pytorch/docker-compose.yaml Outdated Show resolved Hide resolved
pytorch/Dockerfile Outdated Show resolved Hide resolved
@tylertitsworth
Copy link
Contributor

@tylertitsworth The huggingface-llm container will switch to use this Gen AI container as it's base and I think the only thing it'll need is a copy of the fine tuning python script

@dmsuehir Good to know, then the only potential conflict with this PR is #124.

@HarshaRamayanam
Copy link
Contributor Author

  • intel/intel-optimized-pytorch:2.3.0-pip-hf-genai

@ashahba @dmsuehir What do you think about public name Tyler mentioned? I'm okay with it.

pytorch/docker-compose.yaml Outdated Show resolved Hide resolved
pytorch/hf-genai-requirements.txt Outdated Show resolved Hide resolved
@dmsuehir
Copy link
Contributor

  • intel/intel-optimized-pytorch:2.3.0-pip-hf-genai

@ashahba @dmsuehir What do you think about public name Tyler mentioned? I'm okay with it.

I'm ok with this, or maybe also including the HF version. Either way

Copy link
Contributor

@tylertitsworth tylertitsworth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recipe looks good, there are a few kinks to work out that have been helpfully suggested by your coworkers. Please write some tests in pytorch/tests/tests.yaml that validate this solution. Make sure no test runs for more than 5-10 minutes.

Next part is the docs, which you can choose to split into another PR if you want. Let me know what you decide.

Most important thing is that all of the CI is green.

@tylertitsworth tylertitsworth added the WIP Work in Progress label Jun 18, 2024
pytorch/hf-genai-requirements.txt Outdated Show resolved Hide resolved
pytorch/generate_ssh_keys.sh Outdated Show resolved Hide resolved
pytorch/docker-compose.yaml Outdated Show resolved Hide resolved
pytorch/hf-genai-requirements.txt Outdated Show resolved Hide resolved
pytorch/hf-genai-requirements.txt Outdated Show resolved Hide resolved
@HarshaRamayanam
Copy link
Contributor Author

Recipe looks good, there are a few kinks to work out that have been helpfully suggested by your coworkers. Please write some tests in pytorch/tests/tests.yaml that validate this solution. Make sure no test runs for more than 5-10 minutes.

Next part is the docs, which you can choose to split into another PR if you want. Let me know what you decide.

Most important thing is that all of the CI is green.

I'll do the docs too in this PR. I'll update you

pytorch/Dockerfile Show resolved Hide resolved
pytorch/Dockerfile Outdated Show resolved Hide resolved
tylertitsworth
tylertitsworth previously approved these changes Jun 27, 2024
Copy link
Contributor

@tylertitsworth tylertitsworth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of your changes look fine, I have a couple comments, but since this is dependent on #124 we need to wait on merging this.

pytorch/tests/tests.yaml Outdated Show resolved Hide resolved
pytorch/tests/tests.yaml Show resolved Hide resolved
pytorch/README.md Outdated Show resolved Hide resolved
pytorch/README.md Outdated Show resolved Hide resolved
pytorch/Dockerfile Show resolved Hide resolved
Signed-off-by: tylertitsworth <[email protected]>
@tylertitsworth tylertitsworth force-pushed the hramayan/hf-genai-container branch from 3bb90f9 to c16761f Compare July 3, 2024 23:21
Copy link
Contributor

@tylertitsworth tylertitsworth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HarshaRamayanam I had to squash your commits because the history was getting complex, feel free to re-contribute those commits however you want.

Copy link
Contributor

@tylertitsworth tylertitsworth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor things, I have locally validated your image and if you're happy with it we can merge today.

pytorch/README.md Outdated Show resolved Hide resolved
pytorch/README.md Outdated Show resolved Hide resolved
pytorch/docker-compose.yaml Outdated Show resolved Hide resolved
pytorch/Dockerfile Outdated Show resolved Hide resolved
HarshaRamayanam and others added 4 commits July 10, 2024 09:20
Co-authored-by: Tyler Titsworth <[email protected]>
Signed-off-by: Harsha Ramayanam <[email protected]>
Co-authored-by: Tyler Titsworth <[email protected]>
Signed-off-by: Harsha Ramayanam <[email protected]>
Co-authored-by: Tyler Titsworth <[email protected]>
Signed-off-by: Harsha Ramayanam <[email protected]>
@tylertitsworth tylertitsworth merged commit b7d2fb8 into intel:main Jul 10, 2024
4 of 6 checks passed
dmsuehir pushed a commit that referenced this pull request Jul 12, 2024
…and pytorch/docker-compose.yaml (#146)

Signed-off-by: tylertitsworth <[email protected]>
Signed-off-by: Harsha Ramayanam <[email protected]>
Co-authored-by: tylertitsworth <[email protected]>
Co-authored-by: Tyler Titsworth <[email protected]>
Signed-off-by: Dina Suehiro Jones <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
WIP Work in Progress
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants