Releases · awslabs/aws-serverless-data-lake-framework

22 May 14:36

cnfait

2.0.0-beta.0

c8f1f33

Serverless Data Lake Framework 2.0.0-beta.0 Pre-release

Pre-release

Work is ongoing on a new major version of the Serverless Data Lake Framework. This is a pre-release, not ready for production workloads.

We welcome your feedback!

What’s New

SDLF components are now CloudFormation modules
- there is one module per component: foundations, team, pipeline, stageA, stageB, dataset.
- datalakeLibrary and pipLibrary are used to build Lambda layers, they’re not CloudFormation modules.
- deploy.sh takes care of deploying the CICD infrastructure used to build these modules, and register them in the private CloudFormation registry of each account. Modules are updated whenever there is a change to their source repository.
SDLF CICD pipelines now live in the Shared DevOps account
- CloudFormation stacks are created in child accounts through crossaccount IAM roles.
SDLF can deploy an arbitrary number of child accounts driven from a single devops account.
- pDomain (which defaults to datalake) can be provided when deploying foundations.
- each domain can have the usual three environments (dev, test, prod).
Deploying foundations and teams is now done from a new repository called sdlf-main.
- this repository is created during the initial setup with deploy.sh.
- foundations deployment happens in foundations-{domain}-{env}.yaml and teams in teams-{domain}-{env}.yaml.
- sdlf-main works the same way everything works in SDLF - master, test and dev branches are expected.
- it is easier to know which teams have been created, and to remove them as they don’t share the same set of parameters in parameters-{env}.json.
Deploying pipelines and datasets is now done from a new repository called sdlf-{domain}-{team name}-main.
- this repository is created when a new team is created.
- pipelines deployment happens in pipelines-{env}.yaml and datasets in datasets-{env}.yaml.
- sdlf-{team name}-main works the same way everything works in SDLF - master, test and dev branches are expected.
- it is easier to know which pipelines and teams have been created, and to remove them as they don’t share the same set of parameters in parameters-{env}.json.
Mappings between datasets and transforms in stageB is done directly when defining a dataset.
- this mapping used to be done by a CodeBuild project and a script in sdlf-datalakeLibrary. They are no longer needed and have been removed.
- it is now defined through the pPipelineDetails parameter when defining a dataset in sdlf-dataset. This parameter goes even further and can be used to store more information that stages can use. These details are stored in the Datasets DynamoDB table (as was already the case in SDLFv1).
Stages in a pipeline are now driven by EventBridge rules exclusively.
- the rule can be an event pattern or a schedule (cron expression).
- stageA is no longer sending messages to a queue for stageB to process. StageB is configured with an event pattern to listen for stageA runs (pEventPattern in the example), and then process these events on a schedule (pSchedule)
- it is easier now to have pipelines with a single stage, pipelines with dependent stages and overall more complex pipelines than in SDLFv1, as long as there is an event pattern to listen for.
New optional component: sdlf-monitoring, with CloudTrail, ELK and SNS.
- in SDLFv1 Cloudtrail is optional but enabled by default. Here it is optional and not enabled as long as sdlf-monitoring is not deployed.
New optional stage: sdlf-stage-dataquality
- deequ is now entirely optional. While it wasn’t enabled by default in SDLFv1, dedicated infrastructure was still created while deploying sdlf-foundations. This is no longer the case.
- sdlf-stage-dataquality can now be used as an example on how to add a third stage to the default stageA and stageB pipeline.
Outside the initial deploy.sh, there is no more shell scripts.

Full Changelog: 1.5.2...2.0.0-beta.0

Assets 2

1 Join discussion

15 May 10:55

cnfait

1.5.2

ba7857c

Serverless Data Lake Framework 1.5.2

What's Changed

enable versioning on ELK stack bucket by @cnfait in #139

Full Changelog: 1.5.1...1.5.2

Contributors

cnfait

Assets 2

09 May 21:41

cnfait

1.5.1

a7696eb

Serverless Data Lake Framework 1.5.1

Bug Fix

Create a role for Lake Formation data access by @cnfait in #138

Full Changelog: 1.5.0...1.5.1

Thanks

We thank @Druizm128 for raising the issue!

Contributors

Druizm128 and cnfait

Assets 2

04 May 15:40

cnfait

1.5.0

e9a2a30

Serverless Data Lake Framework 1.5.0

Features & Enhancements

ELK Update by @cnfait in #136
rework sdlf-cicd rCodeBuildRole IAM role to avoid using wildcards by @cnfait in #130
avoid wildcards in sdlf-lakeformation-admin role permissions by @cnfait in #132
avoid wildcards in data quality lambda permissions by @cnfait in #131
disable cfn_nag W11 on CodeCommit roles by @cnfait in #133
update awswrangler (aws sdk for pandas) to the latest 2.x version by @ntlohi in #134

Full Changelog: 1.4.0...1.5.0

Thanks

We thank all the contributors/users for their work on this release, in particular @ntlohi.

Contributors

cnfait and ntlohi

Assets 2

23 Mar 14:32

cnfait

1.4.0

80608ae

Serverless Data Lake Framework 1.4.0

Noteworthy

AWS Partition Support by @cnfait in #128
- SDLF can now be deployed on GovCloud (us-gov-west-1)

Features & Enhancements

update codebuild image from standard:4.0 to amazonlinux2-x86_64-standard:4.0 by @cnfait in #113
validate.sh: replace flake8, isort with ruff by @cnfait in #126
Support for specifying glue arguments in dynamodb dataset table by @cnfait in #127
add emr tagging permissions by @cnfait in #129

Full Changelog: 1.3.1...1.4.0

Thanks

We thank all the contributors/users for their work on this release.

Contributors

cnfait

Assets 2

20 Feb 09:12

cnfait

1.3.1

7b88499

Serverless Data Lake Framework 1.3.1

Bug Fixes

fix pipeline stage dynamodb entry creation by @cnfait in 675517a
use deequ 1.1.0 instead of 1.2.2 as it breaks glue jobs by @cnfait and @piers-walter-ibm in 5065cd5
deploy failed when you deploy a new environment (missing sns permission) by @YuliemAlavez in #118

Minor Changes

gitlab support: readme file by @cnfait in b456258
remove executable bit from json files by @cnfait in in #112
fix minor typo by @cnfait in #114

Features & Enhancements

add cfn-lint to validate.sh by @cnfait in #115

Full Changelog: 1.3.0...1.3.1

Thanks

We thank all the contributors/users for their work on this release, in particular @YuliemAlavez and @piers-walter-ibm.

Contributors

YuliemAlavez, piers-walter-ibm, and cnfait

Assets 2

11 Jan 15:54

cnfait

1.3.0

f576f30

Serverless Data Lake Framework 1.3.0

Noteworthy

Third-party SCM support (mirroring to CodeCommit): GitLab🔥
As of version 1.1.0 released on December, 7th 2022, there is now a public roadmap.

Features & Enhancements

third-party scm support: gitlab by @cnfait in #104
enable versioning on central/raw/stage/analytics buckets by @cnfait in #106
add security configuration to sdlf-dataset glue crawler by @cnfait in #107
encrypt cloudtrail logs when using externally-provided bucket by @cnfait in #108

Full Changelog: 1.2.0...1.3.0

Thanks

We thank all the contributors/users for their work on this release.

Contributors

cnfait

Assets 2

02 Jan 10:51

cnfait

1.2.0

54b0bd9

Serverless Data Lake Framework 1.2.0

Noteworthy

As of version 1.1.0 released on December, 7th 2022, there is now a public roadmap.
As of version 1.1.0 released on December, 7th 2022, the main branch of the repository has been renamed to main from master. This is to be in line with what other projects the team is working on are using. master is still available with the same content as main to avoid breaking existing workflows. Currently only master is supported by SDLF CICD infrastructure however.
As of version 1.1.0 released on December, 7th 2022, Semantic Versioning is now used for SDLF releases. This is to be in line with other projects from the same team.

Bug Fixes

correct and clean manifests and cloudfront examples by @mariandumitrascu-p in #71
fix bitbucket team pipeline when checking repositories by @cnfait in #103 - Thanks @YuliemAlavez!

Features & Enhancements

Python 3.9 as default for Lambda functions, Lambda layers and CodeBuild runtimes by @cnfait in #93
Align GlueVersion to 2.0 for all Glue jobs by @cnfait in #94
Update Deequ from 1.0.X to Deequ 1.2.2-spark2.4 by @cnfait in #95
Update ElasticSearch domain from 6.3 to 6.8 by @cnfait in #96
Add simple shell script and configuration files to help improve code quality by @cnfait in #97
isort by @cnfait in #98
black by @cnfait in #99
flake8 by @cnfait in #100
shellcheck by @cnfait in #101

Full Changelog: 1.1.0...1.2.0

Thanks

We thank all the contributors/users for their work on this release.

Contributors

YuliemAlavez, cnfait, and mariandumitrascu-p

Assets 2

06 Dec 23:29

cnfait

1.1.0

f903bc8

Serverless Data Lake Framework 1.1.0

Noteworthy

This release is just a snapshot of the repository as of December, 7th 2022. There is no new feature or change if you already pulled the code from the main branch.
There is now a public roadmap.
The main branch of the repository has been renamed to main from master. This is to be in line with what other projects the team is working on are using. master is still available with the same content as main to avoid breaking existing workflows.
Semantic Versioning is now used for SDLF releases. This is to be in line with other projects from the same team.

Features & Enhancements

Added bucket policies to enforce in transit encryption for s3 buckets #14
Update catalog lambda to handle S3 multipart upload events #19
Update catalog lambda to support DeleteMarkerCreated events #24
3rd party SCM providers - Azure DevOps integration #22
Bumping Wrangler to 2.3.0 and removing ListBucket condition
3rd party SCM providers - Bitbucket integration #26
Enable python 3.8 runtime for non-default lambda layers #29
Add alias option for target e-mail #32
Enable Manifest Based Processing in SDLF #30
Adding Glue Jobs Deployer utility #34
Feature to add pre-existing whl files without having to build them #39
Adding deploy mode for datasets #40
Enable NodeToNodeEncryptionOptions (CFN_Nag W85) #43
Add update stack logic for cross-account team role stack #44
Adding Data Lake testing #45
Enable tracing for step functions #49
Lambda cloudwatch log encryption retention #46
Add template protection function #48
Update key and bucket retention policies #50
Adding PutLifecycleConfiguration permission
Adding in a CloudFormation template that sets up automated testing for CodeCommit Pull Requests #47
Datalake Workload Management #52
Point-in-time recovery (PITR) enabled for DynamoDB tables #53
Modifying user agent
Adding few more examples and public references #58
Sqoop ingestion extension #57
Reducing size policy #62
Removing slf4j logger calls
EMR security configuration #59
Python runtime updated #67

Bug Fixes

Adding missing sdlf-utils and reinstating PubRef
Correct typo of Glue Job's name #33
Deleting additional Images, fixing README and parameters-dev errors #42
Fixing Topic Modelling Example
Sqoop ingestion minor fixes #66
Fix unsupported resource arn format on rXXBucketLakeFormationS3Registration resources #77
Fix S3 buckets ARN - Lakeformation integration #75

Documentation

Adjusting Contributing file to latest template
Adjusting workshop URLs to support i18n
Better documentation for new service connection strategy #25

Thanks

We thank all the contributors/users for their work on this release.

Full Changelog: v1.0.4.0...1.1.0

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What’s New

What's Changed

Contributors

Bug Fix

Thanks

Contributors

Features & Enhancements

Thanks

Contributors

Noteworthy

Features & Enhancements

Thanks

Contributors

Bug Fixes

Minor Changes

Features & Enhancements

Thanks

Contributors

Noteworthy

Features & Enhancements

Thanks

Contributors

Noteworthy

Bug Fixes

Features & Enhancements

Thanks

Contributors

Noteworthy

Features & Enhancements

Bug Fixes

Documentation

Thanks

Releases: awslabs/aws-serverless-data-lake-framework

Serverless Data Lake Framework 2.0.0-beta.0

What’s New

Serverless Data Lake Framework 1.5.2

What's Changed

Contributors

Serverless Data Lake Framework 1.5.1

Bug Fix

Thanks

Contributors

Serverless Data Lake Framework 1.5.0

Features & Enhancements

Thanks

Contributors

Serverless Data Lake Framework 1.4.0

Noteworthy

Features & Enhancements

Thanks

Contributors

Serverless Data Lake Framework 1.3.1

Bug Fixes

Minor Changes

Features & Enhancements

Thanks

Contributors

Serverless Data Lake Framework 1.3.0

Noteworthy

Features & Enhancements

Thanks

Contributors

Serverless Data Lake Framework 1.2.0

Noteworthy

Bug Fixes

Features & Enhancements

Thanks

Contributors

Serverless Data Lake Framework 1.1.0

Noteworthy

Features & Enhancements

Bug Fixes

Documentation

Thanks