Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sync branch #4718

Merged
merged 59 commits into from
Jun 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
59 commits
Select commit Hold shift + click to select a range
c4ac480
fix: mainline alt config parsing (#4602)
Captainia Apr 23, 2024
30c9bf6
Add Triton v24.03 URI (#4605)
nikhil-sk Apr 23, 2024
fe32d79
feature: support session tag chaining for training job (#4596)
jessicazhu3 Apr 24, 2024
8984d92
prepare release v2.217.0
Apr 24, 2024
ed390dd
update development version to v2.217.1.dev0
Apr 24, 2024
2a52478
fix: properly close files in lineage queries and tests (#4587)
jmahlik Apr 25, 2024
72e0c97
feature: set default allow_pickle param to False (#4557)
akrishna1995 Apr 29, 2024
b17d332
Fix:invalid component error with new metadata (#4634)
Captainia Apr 30, 2024
15094ee
prepare release v2.218.0
May 1, 2024
7c49f5d
update development version to v2.218.1.dev0
May 1, 2024
45e3192
chore: update skipped flaky tests (#4644)
Captainia May 2, 2024
c751dbd
chore: release tgi 2.0.1 (#4642)
haixiw May 2, 2024
0f7e678
fix: Fix UserAgent logging in Python SDK (#4647)
knikure May 3, 2024
fa1a8bf
prepare release v2.218.1
May 3, 2024
0075fb3
update development version to v2.218.2.dev0
May 3, 2024
49e09c3
feature: allow choosing js payload by alias in private method
keerthanvasist May 2, 2024
3ae20b7
Updates for SMP v2.3.1 (#4660)
SuhitK May 8, 2024
b1823cb
chore(deps): bump jinja2 from 3.1.3 to 3.1.4 in /doc (#4655)
dependabot[bot] May 8, 2024
2058912
chore(deps): bump tqdm from 4.66.2 to 4.66.3 in /tests/data/serve_res…
dependabot[bot] May 8, 2024
533f30a
chore(deps): bump jinja2 from 3.1.3 to 3.1.4 in /requirements/extras …
dependabot[bot] May 8, 2024
80d24f3
prepare release v2.219.0
May 8, 2024
2a902cd
update development version to v2.219.1.dev0
May 8, 2024
6fbb6f8
fix: skip flakey tests pending investigation (#4667)
Captainia May 8, 2024
d549e7d
change: update image_uri_configs 05-09-2024 07:17:41 PST
sagemaker-bot May 9, 2024
a5c6229
Add tensorflow_serving support for mlflow models and enable lineage t…
jiapinw May 9, 2024
1279620
fix: model builder race condition on sagemaker session (#4673)
makungaj1 May 10, 2024
2f2be05
feat: Add telemetry support for mlflow models (#4674)
jiapinw May 13, 2024
2bd30c4
feat: add new images for HF TGI release (#4677)
Captainia May 14, 2024
6647c23
feature: AutoGluon 1.1.0 image_uris update (#4679)
prateekdesai04 May 14, 2024
e488c90
change: add debug logs to workflow container dist creation (#4682)
mufaddal-rohawala May 15, 2024
e50ce32
prepare release v2.220.0
May 15, 2024
d4f3c91
update development version to v2.220.1.dev0
May 15, 2024
65cc586
fix: Image URI should take precedence for HF models (#4684)
samruds May 15, 2024
63a9ac3
feat: onboard tei image config to pysdk (#4681)
haixiw May 15, 2024
8002d7f
fix: model builder limited container support for endpoint mode. (#4683)
makungaj1 May 16, 2024
9faa8be
change: Add more debuging (#4687)
mufaddal-rohawala May 16, 2024
6280d81
change: cover tei with image_uris.retrieve API (#4689)
haixiw May 16, 2024
06e6f9d
fix: JS Model with non-TGI/non-DJL deployment failure (#4688)
makungaj1 May 16, 2024
c9b55a4
Feat: Pull latest tei container for sentence similiarity models on Hu…
samruds May 17, 2024
9b7874b
Fix: Add Image URI overrides for transformers models (#4693)
samruds May 20, 2024
a26224a
prepare release v2.221.0
May 20, 2024
4e83cce
update development version to v2.221.1.dev0
May 20, 2024
828cdc3
Add tei cpu image (#4695)
haixiw May 21, 2024
18e76c7
Feat: Add TEI support for ModelBuilder (#4694)
makungaj1 May 21, 2024
6196b75
Convert pytorchddp distribution to smdistributed distribution (#4698)
tombousso May 22, 2024
d4e42db
prepare release v2.221.1
May 22, 2024
3676d3e
update development version to v2.221.2.dev0
May 22, 2024
e60c488
Update: SM Endpoint Routing Strategy Support. (#4702)
makungaj1 May 24, 2024
cbbbb32
change: update image_uri_configs 05-29-2024 07:17:35 PST
sagemaker-bot May 29, 2024
c48d7c8
Making project name in workflow files dynamic (#4708)
zhaoqizqwang May 29, 2024
b68a810
fix: Fix ci unit-tests (#4713)
knikure Jun 3, 2024
dc8e9cd
chore(deps): bump requests from 2.31.0 to 2.32.2 in /tests/data/serve…
dependabot[bot] Jun 3, 2024
c29ca55
chore(deps): bump apache-airflow from 2.9.0 to 2.9.1 in /requirements…
dependabot[bot] Jun 3, 2024
864eb71
chore(deps): bump mlflow from 2.10.2 to 2.12.1 in /tests/data/serve_r…
dependabot[bot] Jun 3, 2024
a6a6cfd
chore(deps): bump mlflow from 2.11.1 to 2.12.1 in /tests/data/serve_r…
dependabot[bot] Jun 3, 2024
2b35717
chore(deps): bump mlflow from 2.11.1 to 2.12.1 in /tests/data/serve_r…
dependabot[bot] Jun 3, 2024
cfe98a9
change: Updates for DJL 0.28.0 release (#4701)
tosterberg Jun 5, 2024
9720b7c
Sync
Jun 5, 2024
73ae14a
Sync Branch
Jun 5, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions .github/workflows/codebuild-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ jobs:
- name: Run Codestyle & Doc Tests
uses: aws-actions/aws-codebuild-run-build@v1
with:
project-name: sagemaker-python-sdk-ci-codestyle-doc-tests
project-name: ${{ github.event.repository.name }}-ci-codestyle-doc-tests
source-version-override: 'refs/pull/${{ github.event.pull_request.number }}/head^{${{ github.event.pull_request.head.sha }}}'
unit-tests:
runs-on: ubuntu-latest
Expand All @@ -74,7 +74,7 @@ jobs:
- name: Run Unit Tests
uses: aws-actions/aws-codebuild-run-build@v1
with:
project-name: sagemaker-python-sdk-ci-unit-tests
project-name: ${{ github.event.repository.name }}-ci-unit-tests
source-version-override: 'refs/pull/${{ github.event.pull_request.number }}/head^{${{ github.event.pull_request.head.sha }}}'
env-vars-for-codebuild: |
PY_VERSION
Expand All @@ -93,5 +93,5 @@ jobs:
- name: Run Integ Tests
uses: aws-actions/aws-codebuild-run-build@v1
with:
project-name: sagemaker-python-sdk-ci-integ-tests
project-name: ${{ github.event.repository.name }}-ci-integ-tests
source-version-override: 'refs/pull/${{ github.event.pull_request.number }}/head^{${{ github.event.pull_request.head.sha }}}'
21 changes: 21 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,26 @@
# Changelog

## v2.221.1 (2024-05-22)

### Bug Fixes and Other Changes

* Convert pytorchddp distribution to smdistributed distribution
* Add tei cpu image

## v2.221.0 (2024-05-20)

### Features

* onboard tei image config to pysdk

### Bug Fixes and Other Changes

* JS Model with non-TGI/non-DJL deployment failure
* cover tei with image_uris.retrieve API
* Add more debuging
* model builder limited container support for endpoint mode.
* Image URI should take precedence for HF models

## v2.220.0 (2024-05-15)

### Features
Expand Down
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
2.220.1.dev0
2.221.2.dev0
2 changes: 1 addition & 1 deletion requirements/extras/test_requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ awslogs==0.14.0
black==24.3.0
stopit==1.1.2
# Update tox.ini to have correct version of airflow constraints file
apache-airflow==2.9.0
apache-airflow==2.9.1
apache-airflow-providers-amazon==7.2.1
attrs>=23.1.0,<24
fabric==2.6.0
Expand Down
12 changes: 12 additions & 0 deletions src/sagemaker/enums.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,3 +28,15 @@ class EndpointType(Enum):
INFERENCE_COMPONENT_BASED = (
"InferenceComponentBased" # Amazon SageMaker Inference Component Based Endpoint
)


class RoutingStrategy(Enum):
"""Strategy for routing https traffics."""

RANDOM = "RANDOM"
"""The endpoint routes each request to a randomly chosen instance.
"""
LEAST_OUTSTANDING_REQUESTS = "LEAST_OUTSTANDING_REQUESTS"
"""The endpoint routes requests to the specific instances that have
more capacity to process them.
"""
99 changes: 2 additions & 97 deletions src/sagemaker/fw_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -145,22 +145,6 @@
],
}

PYTORCHDDP_SUPPORTED_FRAMEWORK_VERSIONS = [
"1.10",
"1.10.0",
"1.10.2",
"1.11",
"1.11.0",
"1.12",
"1.12.0",
"1.12.1",
"1.13.1",
"2.0.0",
"2.0.1",
"2.1.0",
"2.2.0",
]

TORCH_DISTRIBUTED_GPU_SUPPORTED_FRAMEWORK_VERSIONS = [
"1.13.1",
"2.0.0",
Expand Down Expand Up @@ -795,7 +779,6 @@ def _validate_smdataparallel_args(

Raises:
ValueError: if
(`instance_type` is not in SM_DATAPARALLEL_SUPPORTED_INSTANCE_TYPES or
`py_version` is not python3 or
`framework_version` is not in SM_DATAPARALLEL_SUPPORTED_FRAMEWORK_VERSION
"""
Expand All @@ -806,17 +789,10 @@ def _validate_smdataparallel_args(
if not smdataparallel_enabled:
return

is_instance_type_supported = instance_type in SM_DATAPARALLEL_SUPPORTED_INSTANCE_TYPES

err_msg = ""

if not is_instance_type_supported:
# instance_type is required
err_msg += (
f"Provided instance_type {instance_type} is not supported by smdataparallel.\n"
"Please specify one of the supported instance types:"
f"{SM_DATAPARALLEL_SUPPORTED_INSTANCE_TYPES}\n"
)
if not instance_type:
err_msg += "Please specify an instance_type for smdataparallel.\n"

if not image_uri:
# ignore framework_version & py_version if image_uri is set
Expand Down Expand Up @@ -928,13 +904,6 @@ def validate_distribution(
)
if framework_name and framework_name == "pytorch":
# We need to validate only for PyTorch framework
validate_pytorch_distribution(
distribution=validated_distribution,
framework_name=framework_name,
framework_version=framework_version,
py_version=py_version,
image_uri=image_uri,
)
validate_torch_distributed_distribution(
instance_type=instance_type,
distribution=validated_distribution,
Expand Down Expand Up @@ -968,13 +937,6 @@ def validate_distribution(
)
if framework_name and framework_name == "pytorch":
# We need to validate only for PyTorch framework
validate_pytorch_distribution(
distribution=validated_distribution,
framework_name=framework_name,
framework_version=framework_version,
py_version=py_version,
image_uri=image_uri,
)
validate_torch_distributed_distribution(
instance_type=instance_type,
distribution=validated_distribution,
Expand Down Expand Up @@ -1023,63 +985,6 @@ def validate_distribution_for_instance_type(instance_type, distribution):
raise ValueError(err_msg)


def validate_pytorch_distribution(
distribution, framework_name, framework_version, py_version, image_uri
):
"""Check if pytorch distribution strategy is correctly invoked by the user.

Args:
distribution (dict): A dictionary with information to enable distributed training.
(Defaults to None if distributed training is not enabled.) For example:

.. code:: python

{
"pytorchddp": {
"enabled": True
}
}
framework_name (str): A string representing the name of framework selected.
framework_version (str): A string representing the framework version selected.
py_version (str): A string representing the python version selected.
image_uri (str): A string representing a Docker image URI.

Raises:
ValueError: if
`py_version` is not python3 or
`framework_version` is not in PYTORCHDDP_SUPPORTED_FRAMEWORK_VERSIONS
"""
if framework_name and framework_name != "pytorch":
# We need to validate only for PyTorch framework
return

pytorch_ddp_enabled = False
if "pytorchddp" in distribution:
pytorch_ddp_enabled = distribution.get("pytorchddp").get("enabled", False)
if not pytorch_ddp_enabled:
# Distribution strategy other than pytorchddp is selected
return

err_msg = ""
if not image_uri:
# ignore framework_version and py_version if image_uri is set
# in case image_uri is not set, then both are mandatory
if framework_version not in PYTORCHDDP_SUPPORTED_FRAMEWORK_VERSIONS:
err_msg += (
f"Provided framework_version {framework_version} is not supported by"
" pytorchddp.\n"
"Please specify one of the supported framework versions:"
f" {PYTORCHDDP_SUPPORTED_FRAMEWORK_VERSIONS} \n"
)
if "py3" not in py_version:
err_msg += (
f"Provided py_version {py_version} is not supported by pytorchddp.\n"
"Please specify py_version>=py3"
)
if err_msg:
raise ValueError(err_msg)


def validate_torch_distributed_distribution(
instance_type,
distribution,
Expand Down
14 changes: 14 additions & 0 deletions src/sagemaker/huggingface/llm_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,20 @@ def get_huggingface_llm_image_uri(
image_scope="inference",
inference_tool="neuronx",
)
if backend == "huggingface-tei":
return image_uris.retrieve(
"huggingface-tei",
region=region,
version=version,
image_scope="inference",
)
if backend == "huggingface-tei-cpu":
return image_uris.retrieve(
"huggingface-tei-cpu",
region=region,
version=version,
image_scope="inference",
)
if backend == "lmi":
version = version or "0.24.0"
return image_uris.retrieve(framework="djl-deepspeed", region=region, version=version)
Expand Down
1 change: 1 addition & 0 deletions src/sagemaker/huggingface/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -334,6 +334,7 @@ def deploy(
endpoint_type=kwargs.get("endpoint_type", None),
resources=kwargs.get("resources", None),
managed_instance_scaling=kwargs.get("managed_instance_scaling", None),
routing_config=kwargs.get("routing_config", None),
)

def register(
Expand Down
39 changes: 39 additions & 0 deletions src/sagemaker/image_uri_config/djl-lmi.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
{
"scope": [
"inference"
],
"versions": {
"0.28.0": {
"registries": {
"af-south-1": "626614931356",
"il-central-1": "780543022126",
"ap-east-1": "871362719292",
"ap-northeast-1": "763104351884",
"ap-northeast-2": "763104351884",
"ap-northeast-3": "364406365360",
"ap-south-1": "763104351884",
"ap-southeast-1": "763104351884",
"ap-southeast-2": "763104351884",
"ap-southeast-3": "907027046896",
"ca-central-1": "763104351884",
"cn-north-1": "727897471807",
"cn-northwest-1": "727897471807",
"eu-central-1": "763104351884",
"eu-north-1": "763104351884",
"eu-west-1": "763104351884",
"eu-west-2": "763104351884",
"eu-west-3": "763104351884",
"eu-south-1": "692866216735",
"me-south-1": "217643126080",
"sa-east-1": "763104351884",
"us-east-1": "763104351884",
"us-east-2": "763104351884",
"us-west-1": "763104351884",
"us-west-2": "763104351884",
"ca-west-1": "204538143572"
},
"repository": "djl-inference",
"tag_prefix": "0.28.0-lmi10.0.0-cu124"
}
}
}
18 changes: 18 additions & 0 deletions src/sagemaker/image_uri_config/djl-neuronx.json
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,24 @@
"inference"
],
"versions": {
"0.28.0": {
"registries": {
"ap-northeast-1": "763104351884",
"ap-south-1": "763104351884",
"ap-southeast-1": "763104351884",
"ap-southeast-2": "763104351884",
"eu-central-1": "763104351884",
"eu-west-1": "763104351884",
"eu-west-3": "763104351884",
"sa-east-1": "763104351884",
"us-east-1": "763104351884",
"us-east-2": "763104351884",
"us-west-2": "763104351884",
"ca-west-1": "204538143572"
},
"repository": "djl-inference",
"tag_prefix": "0.28.0-neuronx-sdk2.18.2"
},
"0.27.0": {
"registries": {
"ap-northeast-1": "763104351884",
Expand Down
32 changes: 32 additions & 0 deletions src/sagemaker/image_uri_config/djl-tensorrtllm.json
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,38 @@
"inference"
],
"versions": {
"0.28.0": {
"registries": {
"af-south-1": "626614931356",
"il-central-1": "780543022126",
"ap-east-1": "871362719292",
"ap-northeast-1": "763104351884",
"ap-northeast-2": "763104351884",
"ap-northeast-3": "364406365360",
"ap-south-1": "763104351884",
"ap-southeast-1": "763104351884",
"ap-southeast-2": "763104351884",
"ap-southeast-3": "907027046896",
"ca-central-1": "763104351884",
"cn-north-1": "727897471807",
"cn-northwest-1": "727897471807",
"eu-central-1": "763104351884",
"eu-north-1": "763104351884",
"eu-west-1": "763104351884",
"eu-west-2": "763104351884",
"eu-west-3": "763104351884",
"eu-south-1": "692866216735",
"me-south-1": "217643126080",
"sa-east-1": "763104351884",
"us-east-1": "763104351884",
"us-east-2": "763104351884",
"us-west-1": "763104351884",
"us-west-2": "763104351884",
"ca-west-1": "204538143572"
},
"repository": "djl-inference",
"tag_prefix": "0.28.0-tensorrtllm0.9.0-cu122"
},
"0.27.0": {
"registries": {
"af-south-1": "626614931356",
Expand Down
Loading