Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix empty path segments in Data Path transformations #453

Open
wants to merge 37 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 34 commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
6aeb241
Moving callable session from storage S3 to AWSClient and adding a mar…
Andre-Lx-Costa Jul 5, 2024
d040095
Adding docstring for IConnectionDetails
Andre-Lx-Costa Jul 5, 2024
b943218
Adding missing S3Path annotation
Andre-Lx-Costa Jul 8, 2024
d3909c2
Merge branch 'refs/heads/main' into Move-callable-session-from-S3-Sto…
Andre-Lx-Costa Jul 9, 2024
91015cd
Some more improvements
Andre-Lx-Costa Jul 9, 2024
7e8e55f
Merge branch 'refs/heads/main' into Move-callable-session-from-S3-Sto…
Andre-Lx-Costa Jul 16, 2024
2ce1169
Small bugfix after testing with Azure client and rebase fix
Andre-Lx-Costa Jul 16, 2024
f847a23
Addressing lint
Andre-Lx-Costa Jul 16, 2024
8e68394
Merge branch 'refs/heads/main' into Move-callable-session-from-S3-Sto…
Andre-Lx-Costa Jul 17, 2024
85afee0
Fixing readme file
Andre-Lx-Costa Jul 17, 2024
7a17a79
Small fixes
Andre-Lx-Costa Jul 17, 2024
9a571ff
Formating
Andre-Lx-Costa Jul 17, 2024
4ed076d
Bug fix now
Andre-Lx-Costa Jul 17, 2024
bb012f8
Reducing scope in the branch
Andre-Lx-Costa Jul 17, 2024
2fa69e8
Missed a file
Andre-Lx-Costa Jul 17, 2024
def765b
Added unit tests
Andre-Lx-Costa Jul 17, 2024
5a459da
Removing duplicates
Andre-Lx-Costa Jul 17, 2024
8d6b230
Merge branch 'refs/heads/main' into Bug-fix-empty-path-segment
Andre-Lx-Costa Jul 18, 2024
f48d1ea
Formatting
Andre-Lx-Costa Jul 18, 2024
2c3d77e
Improving some unittests
Andre-Lx-Costa Jul 18, 2024
2283dcb
Merge branch 'main' into Bug-fix-empty-path-segment
Andre-Lx-Costa Jul 22, 2024
01ff833
Merge branch 'main' into Bug-fix-empty-path-segment
Andre-Lx-Costa Jul 22, 2024
bb9b1bb
Merge branch 'main' into Bug-fix-empty-path-segment
Andre-Lx-Costa Jul 24, 2024
73464d7
Merge branch 'main' into Bug-fix-empty-path-segment
Andre-Lx-Costa Jul 26, 2024
cbf15e9
Addressing PR comments
Andre-Lx-Costa Jul 29, 2024
820f8a0
Removing no redundant checks
Andre-Lx-Costa Jul 29, 2024
8c3bcea
Small improvement to the unitests
Andre-Lx-Costa Jul 29, 2024
3c0746b
Small improvement to the unitests naming descriptions
Andre-Lx-Costa Jul 29, 2024
c5770be
Let's not change this
Andre-Lx-Costa Jul 29, 2024
0f71abc
Let's not change this
Andre-Lx-Costa Jul 29, 2024
f4c4610
Merge remote-tracking branch 'origin/Bug-fix-empty-path-segment' into…
Andre-Lx-Costa Jul 29, 2024
917199b
Merge branch 'main' into Bug-fix-empty-path-segment
Andre-Lx-Costa Jul 31, 2024
3e0f229
Merge branch 'main' into Bug-fix-empty-path-segment
Andre-Lx-Costa Aug 1, 2024
5e19b0e
Merge branch 'main' into Bug-fix-empty-path-segment
Andre-Lx-Costa Aug 13, 2024
80c9717
Merge remote-tracking branch 'origin/main' into Bug-fix-empty-path-se…
Andre-Lx-Costa Oct 28, 2024
2183be5
Addressing PR comments
Andre-Lx-Costa Oct 28, 2024
fb9ba51
Fix Linting
Andre-Lx-Costa Oct 28, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion adapta/ml/mlflow/_client.py
Original file line number Diff line number Diff line change
Expand Up @@ -129,7 +129,7 @@ def load_model_by_uri(model_uri: str) -> PyFuncModel:

- ``/Users/me/path/to/local/model``
- ``relative/path/to/local/model``
- ``s3://my_bucket/path/to/model``
- ``s3a://my_bucket/path/to/model``
- ``runs:/<mlflow_run_id>/run-relative/path/to/model``
- ``models:/<model_name>/<model_version>``
- ``models:/<model_name>/<stage>``
Expand Down
23 changes: 14 additions & 9 deletions adapta/storage/models/aws.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@
# limitations under the License.
#

import re

from dataclasses import dataclass
from urllib.parse import urlparse

Expand All @@ -33,19 +35,13 @@ def to_uri(self) -> str:
Converts the S3Path to a URI.
:return: URI path
"""
if not self.bucket or not self.path:
raise ValueError("Bucket and path must be defined")

return f"s3://{self.bucket}/{self.path}"

def base_uri(self) -> str:
"""
Returns the base URI of the S3Path.
:return: URI path
"""
if not self.bucket:
raise ValueError("Bucket must be defined")

return f"https://{self.bucket}.s3.amazonaws.com"

@classmethod
Expand All @@ -62,6 +58,18 @@ def from_uri(cls, url: str) -> "S3Path":
path: str
protocol: str = DataProtocols.S3.value

def __post_init__(self):
if not self.bucket:
raise ValueError("Bucket must be defined")

path_regex = r"^(?![0-9]+\.[0-9]+\.[0-9]+\.[0-9]+$)[a-z0-9]([a-z0-9\-]{1,61}[a-z0-9])?(\/(?!.*(\/\/|\\))([^\/].{0,1022}\/?)?)?$"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simplify this regex to just check for //


s3_path_without_prefix = f"{self.bucket}/{self.path}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

regex should only be applied to self.path, bucket is checked on line 62, we do not aim to validate bucket name with this PR, out of scope

match = re.match(path_regex, s3_path_without_prefix)

if not match:
raise ValueError(f"Invalid S3Path provided, must comply with : {path_regex}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do not put regex in error message, explain in human language :)


@classmethod
def from_hdfs_path(cls, hdfs_path: str) -> "S3Path":
"""
Expand All @@ -78,9 +86,6 @@ def to_hdfs_path(self) -> str:
Converts the S3Path to an HDFS compatible path.
:return: HDFS path
"""
if not self.bucket or not self.path:
raise ValueError("Bucket and path must be defined")

return f"s3a://{self.bucket}/{self.path}"

def to_delta_rs_path(self) -> str:
Expand Down
52 changes: 51 additions & 1 deletion tests/test_s3_storage_client.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,46 @@
# See the License for the specific language governing permissions and
# limitations under the License.
#
import pytest

from adapta.storage.blob.s3_storage_client import S3StorageClient
from adapta.storage.models.aws import S3Path
from unittest.mock import patch, MagicMock
from unittest.mock import patch


def test_valid_s3_datapath():
valid_s3_datapaths = [
lambda: S3Path(bucket="bucket", path=""),
lambda: S3Path(bucket="bucket", path="path"),
lambda: S3Path(bucket="bucket", path="path/"),
lambda: S3Path(bucket="bucket", path="path/path_segment"),
lambda: S3Path(bucket="bucket", path="path/path_segment/path_segment"),
]

for s3_data_path in valid_s3_datapaths:
try:
s3_data_path()
except Exception as e:
pytest.fail(f"S3Path creation raised an exception: {e}")


def test_invalid_s3_datapath():
malformed_s3_datapaths = [
lambda: S3Path(bucket="/bucket/", path="path"),
lambda: S3Path(bucket="/bucket", path="path"),
lambda: S3Path(bucket="bucket", path="/path"),
lambda: S3Path(bucket="bucket", path="path//path_segment"),
lambda: S3Path(bucket="bucket", path="path/path_segment//path_segment"),
]

for s3_data_path in malformed_s3_datapaths:
with pytest.raises(ValueError, match=r"Invalid S3Path provided, must comply with : .*"):
s3_data_path()


def test_base_uri():
path = S3Path(bucket="bucket", path="nested/key")
assert path.base_uri() == "https://bucket.s3.amazonaws.com"


def test_from_hdfs_path():
Expand All @@ -24,6 +60,20 @@ def test_from_hdfs_path():
assert path.path == "nested/key"


def test_to_uri():
bucket_name = "bucket"
path = "nested/key"
path_instance = S3Path(bucket=bucket_name, path=path)
assert path_instance.to_uri() == f"s3://{bucket_name}/{path}"


def test_to_delta_rs_path():
bucket_name = "bucket"
path = "nested/key"
path_instance = S3Path(bucket=bucket_name, path=path)
assert path_instance.to_delta_rs_path() == f"s3a://bucket/nested/key"


def test_to_hdfs_path():
path = S3Path("bucket", "nested/key").to_hdfs_path()
assert path == "s3a://bucket/nested/key"
Expand Down
Loading