Skip to content

Commit

Permalink
Merge branch 'main' into 38-allow-pdf-as-a-valid-readme-format
Browse files Browse the repository at this point in the history
  • Loading branch information
ntlhui committed Jan 11, 2025
2 parents cad6bbc + dd9a076 commit f930165
Show file tree
Hide file tree
Showing 9 changed files with 219 additions and 16 deletions.
58 changes: 58 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,53 @@
# CHANGELOG

## v0.2.0 (2025-01-11)

### Chore

* chore: Fixes naming ([`379e0a1`](https://github.com/UCSD-E4E/e4e-data-management/commit/379e0a157375a3bc25765cb3fbae50fc2bb2c85d))

### Feature

* feat: Adds zip ([`6cdc42b`](https://github.com/UCSD-E4E/e4e-data-management/commit/6cdc42b7e60853efa9a50aa2fc383bb043740279))

### Style

* style: Fixes styling ([`c3bead9`](https://github.com/UCSD-E4E/e4e-data-management/commit/c3bead9a2cec33096e4f593b37611da35aceb727))

* style: Fixes spaces and unused variables ([`7c920b1`](https://github.com/UCSD-E4E/e4e-data-management/commit/7c920b1fc836cb04455f5567fb6436fd3d777f86))

### Unknown

* Merge pull request #92 from UCSD-E4E/37-zip-command

37 zip command ([`24b24d5`](https://github.com/UCSD-E4E/e4e-data-management/commit/24b24d53d3985147f1d71278d80b0c683aeba620))

* Merge branch 'main' into 37-zip-command ([`1b573b5`](https://github.com/UCSD-E4E/e4e-data-management/commit/1b573b59113fc1f2d89d1a3c1f1c990fd4e9b879))

* Merge branch 'main' into 37-zip-command ([`f49afac`](https://github.com/UCSD-E4E/e4e-data-management/commit/f49afac204ce134df63b3def37d9e19a21e0fa9a))

* Merge branch '34-remove-dataset-after-pushing-to-server' into 37-zip-command ([`751363a`](https://github.com/UCSD-E4E/e4e-data-management/commit/751363a888875952b7903b265855a055f2e4a0c3))

* Merge branch 'main' into 37-zip-command ([`6bb26e7`](https://github.com/UCSD-E4E/e4e-data-management/commit/6bb26e7463fb440bb7a130236f0706258b529453))

* wip: Creates zip file ([`705346f`](https://github.com/UCSD-E4E/e4e-data-management/commit/705346f284b00954536212dc57f184aaf0915068))

* Merge pull request #47 from UCSD-E4E/42-implement-e4edm-validate

42 implement e4edm validate ([`e5b4340`](https://github.com/UCSD-E4E/e4e-data-management/commit/e5b43401462353d02a0928cd0c5111ba5ab609ff))

* Merge branch 'main' into 42-implement-e4edm-validate ([`b114488`](https://github.com/UCSD-E4E/e4e-data-management/commit/b1144886ed295299613a75bd4d6c0130533f7e88))

* Merge pull request #83 from UCSD-E4E/60-chore-datamangercli-vs-datamanagercli

chore: Fixes naming ([`95533b5`](https://github.com/UCSD-E4E/e4e-data-management/commit/95533b56a7068b682d63bb3e89d38daa0603d668))

* Merge branch 'main' into 60-chore-datamangercli-vs-datamanagercli ([`1500fca`](https://github.com/UCSD-E4E/e4e-data-management/commit/1500fca5d06f3eeeca290539b11ea0417464c3da))

* Merge branch 'main' into 60-chore-datamangercli-vs-datamanagercli ([`6aa2b72`](https://github.com/UCSD-E4E/e4e-data-management/commit/6aa2b72bdfb17d68635f7fa2f3f6ad522bc377db))

* Merge remote-tracking branch 'origin/main' into 42-implement-e4edm-validate ([`836df5a`](https://github.com/UCSD-E4E/e4e-data-management/commit/836df5adbe135b0f28ed4f5cabed4e25a972618c))

## v0.1.5 (2025-01-11)

### Ci
Expand Down Expand Up @@ -32,6 +80,8 @@ fix: Adds exception logging to main invocation ([`cb9c732`](https://github.com/U

34 remove dataset after pushing to server ([`8a811be`](https://github.com/UCSD-E4E/e4e-data-management/commit/8a811bec27556f1dfba5044f3875cbecdc9b38e4))

* Merge branch 'main' into 42-implement-e4edm-validate ([`e09c335`](https://github.com/UCSD-E4E/e4e-data-management/commit/e09c3353ae6ce15f3161dcd644ba17d13bc7ff03))

## v0.1.4 (2024-11-04)

### Fix
Expand Down Expand Up @@ -140,6 +190,10 @@ Added self.save() call to dataset activation ([`cb5168d`](https://github.com/UCS

* added save ([`f9fddfb`](https://github.com/UCSD-E4E/e4e-data-management/commit/f9fddfb44911be81afab8ed8697f0d5d91cc7c15))

* Update version ([`7e0e9fb`](https://github.com/UCSD-E4E/e4e-data-management/commit/7e0e9fb88a7e097389e0ef9b532e83725c002879))

* Merge remote-tracking branch 'origin/main' into 42-implement-e4edm-validate ([`23a95b9`](https://github.com/UCSD-E4E/e4e-data-management/commit/23a95b96eeb332a6365ead70e4f3451aefe50de4))

* Merge pull request #46 from UCSD-E4E/43-include-full-date-in-dataset

43 include full date in dataset ([`e034866`](https://github.com/UCSD-E4E/e4e-data-management/commit/e03486620bfa968b6d5604ea677a2aa6858e6d3c))
Expand All @@ -154,6 +208,10 @@ Added self.save() call to dataset activation ([`cb5168d`](https://github.com/UCS

* Updated Tests ([`b0663e6`](https://github.com/UCSD-E4E/e4e-data-management/commit/b0663e6a544a85c1ef885424f5e4bb231875b640))

* Added remaining logic ([`3223992`](https://github.com/UCSD-E4E/e4e-data-management/commit/32239925c01f1ad06714528d2f741170911df7b9))

* Adding command changes ([`bc12b34`](https://github.com/UCSD-E4E/e4e-data-management/commit/bc12b34fa0a5803087f6d033e7c5c9d572dcff48))

* Merge pull request #36 from UCSD-E4E/34-remove-dataset-after-pushing-to-server

34 remove dataset after pushing to server ([`aad84c9`](https://github.com/UCSD-E4E/e4e-data-management/commit/aad84c96ebc1129ed9a4300d783b6c1ff021b02a))
Expand Down
2 changes: 1 addition & 1 deletion e4e_data_management/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
'''E4E Data Management Tools
'''
__version__ = '0.1.5'
__version__ = '0.2.0'
23 changes: 19 additions & 4 deletions e4e_data_management/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
from e4e_data_management import __version__
from e4e_data_management.core import DataManager
from e4e_data_management.metadata import Metadata

from e4e_data_management.data import Dataset
T = TypeVar('T')
@dataclass
class Parameter:
Expand All @@ -28,7 +28,7 @@ class Parameter:
validator: Callable[[T], bool]


class DataMangerCLI:
class DataManagerCLI:
"""Data Manager Command Line Interface
"""
def __init__(self):
Expand Down Expand Up @@ -88,7 +88,7 @@ def __init__(self):
self.__configure_config_parser(parsers['config'])
self.__configure_activate_parser(parsers['activate'])
self.__configure_ls_parser(parsers['ls'])
# self.__configure_validate_parser(parsers['validate'])
self.__configure_validate_parser(parsers['validate'])
# self.__configure_zip_parser(parsers['zip'])
# self.__configure_unzip_parser(parsers['unzip'])

Expand All @@ -98,6 +98,21 @@ def __init__(self):
self._log.exception('Exception during application load/configuration')
raise exc

def __configure_validate_parser(self, parser: argparse.ArgumentParser):
parser.add_argument('root_dir', nargs='?', default=None, type=Path)
parser.set_defaults(func=self.__external_validate)

def __external_validate(self, root_dir: Optional[Path]):
if root_dir is None:
dataset = self.app.active_dataset
else:
dataset = Dataset.load(path=root_dir)

if not dataset.validate():
print('Dataset validation failed')
else:
print('Dataset valid')

def __configure_logging(self) -> None:
log_dir = Path(DataManager.dirs.user_log_dir).resolve()
log_dir.mkdir(parents=True, exist_ok=True)
Expand Down Expand Up @@ -406,7 +421,7 @@ def __configure_init_dataset_parser(self, parser: argparse.ArgumentParser):
def main():
"""Main bootstrap
"""
DataMangerCLI().main()
DataManagerCLI().main()

if __name__ == '__main__':
main()
15 changes: 9 additions & 6 deletions e4e_data_management/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,8 @@
from __future__ import annotations

import datetime as dt
import fnmatch
import logging
import pickle
import re
from pathlib import Path
from shutil import copy2, rmtree
from typing import Dict, Iterable, List, Optional, Set
Expand Down Expand Up @@ -282,10 +280,7 @@ def push(self, path: Path) -> None:
Args:
path (Path): Destination to push completed dataset to
"""
if any(len(mission.staged_files) != 0
for mission in self.active_dataset.missions.values()) or \
len(self.active_dataset.staged_files) != 0:
raise RuntimeError('Files still in staging')
self.active_dataset.check_complete()

# Check that the README is present
readmes = [file
Expand Down Expand Up @@ -317,6 +312,14 @@ def zip(self, output_path: Path) -> None:
Args:
output_path (Path): Output path
"""
if output_path.suffix.lower() != '.zip':
output_path = output_path.joinpath(
self.active_dataset.name + '.zip')

output_path.parent.mkdir(parents=True, exist_ok=True)
self.active_dataset.check_complete()

self.active_dataset.create_zip(output_path)

def unzip(self, input_file: Path, output_path: Path) -> None:
"""This will unzip the archived dataset to the specified root
Expand Down
52 changes: 52 additions & 0 deletions e4e_data_management/data.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,16 +3,23 @@
from __future__ import annotations

import datetime as dt
import fnmatch
import json
import logging
import pickle
import re
import zipfile
from dataclasses import dataclass
from hashlib import sha256
from pathlib import Path
from shutil import copy2
from typing import (Callable, Dict, Generator, Iterable, List, Optional, Set,
Union)

from e4e_data_management.exception import (CorruptedDataset,
MissionFilesInStaging,
ReadmeFilesInStaging,
ReadmeNotFound)
from e4e_data_management.metadata import Metadata


Expand Down Expand Up @@ -515,3 +522,48 @@ def commit(self) -> List[Path]:
committed_files.extend(new_files)
self.staged_files = []
return committed_files

def create_zip(self, zip_path: Path) -> None:
"""Creates a .zip archive of this Dataset at the specified location
Args:
zip_path (Path): Path to .zip archive
"""
if zip_path.suffix.lower() != '.zip':
raise RuntimeError('Invalid suffix')

with zipfile.ZipFile(file=zip_path, mode='w') as handle:
manifest = self.manifest.get_dict()
for file in manifest:
src_path = self.root.joinpath(file)
dest = Path(self.name) / file
handle.write(filename=src_path, arcname=dest)

def check_complete(self) -> None:
"""Checks if the dataset is complete
Raises:
MissionFilesInStaging: Mission files remain in staging
ReadmeFilesInStaging: Readme files remain in staging
ReadmeNotFound: Readme files not found
ReadmeNotFound: Readme files with acceptable extension not found
CorruptedDataset: Dataset checksum validation failed
"""
staged_mission_files = (mission.staged_files
for mission in self.missions.values())
if any(len(staged) for staged in staged_mission_files):
raise MissionFilesInStaging
if len(self.staged_files) != 0:
raise ReadmeFilesInStaging

readmes = [file for file in self.root.glob('*')
if re.match(fnmatch.translate('readme.*'), file.name, re.IGNORECASE)]
if len(readmes) == 0:
raise ReadmeNotFound

acceptable_exts = ['.md', '.docx']
if not any(readme.suffix.lower() in acceptable_exts for readme in readmes):
raise ReadmeNotFound('Acceptable extension not found')

if not self.validate():
raise CorruptedDataset
28 changes: 28 additions & 0 deletions e4e_data_management/exception.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
'''E4E Data Management Exceptions
'''
from abc import ABC


class Incomplete(Exception, ABC):
"""Dataset not complete
"""


class MissionFilesInStaging(Incomplete):
"""Mission files still in staging area
"""


class ReadmeFilesInStaging(Incomplete):
"""Readme files still in staging area
"""


class ReadmeNotFound(Incomplete):
"""Readme files not found
"""


class CorruptedDataset(Exception):
"""Corrupted Dataset
"""
8 changes: 4 additions & 4 deletions poetry.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[tool.poetry]
name = "e4e-data-management"
version = "0.1.5"
version = "0.2.0"
description = "E4E Data Management Tool (Python)"
authors = [
"Nathan Hui <[email protected]>",
Expand Down
47 changes: 47 additions & 0 deletions tests/test_zip.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
'''Tests zipping
'''
from pathlib import Path
from tempfile import TemporaryDirectory
from typing import Tuple
from unittest.mock import Mock
import zipfile
from e4e_data_management.core import DataManager

SingleMissionFixture = Tuple[Tuple[Mock,
DataManager, Path], Tuple[Path, int, int]]


def test_zip_to_dir(single_mission_data: SingleMissionFixture,
test_readme: Path):
"""Tests zipping data
Args:
single_mission(SingleMissionFixture): Single Mission test fixture
test_readme (Path): Test Readme
"""
test_app, _ = single_mission_data
_, app, _ = test_app

app.add([test_readme], readme=True)
app.commit(readme=True)
with TemporaryDirectory() as target_dir:
zip_path = Path(target_dir)
app.zip(zip_path)

final_path = zip_path.joinpath(app.active_dataset.name + '.zip')
assert final_path.is_file()

with zipfile.ZipFile(file=final_path, mode='r') as handle:
assert handle.testzip() is None
manifest = app.active_dataset.manifest.get_dict()
for name in handle.filelist:
ar_name = Path(name.filename).relative_to(
app.active_dataset.name)
assert ar_name.as_posix() in manifest

handle.extractall(target_dir)

app.active_dataset.manifest.validate(
manifest=manifest,
files=Path(app.active_dataset.name).rglob('*')
)

0 comments on commit f930165

Please sign in to comment.