-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: 0060 bazel #61
RFC: 0060 bazel #61
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,137 @@ | ||||||
--- | ||||||
feature name: bazel | ||||||
start date: 2020-01-16 | ||||||
rfc pr: (leave this empty) | ||||||
related issue: 0060 | ||||||
--- | ||||||
|
||||||
# Summary | ||||||
|
||||||
This proposal is for migrating the build system to [Bazel](https://bazel.build/), as it solves | ||||||
many if not all of the deficiencies of the current system. This involves adding Bazel resources and rules to the repository to manage | ||||||
build and test workflows, and bringing the team up to speed on how to use them. | ||||||
|
||||||
# Motivation | ||||||
|
||||||
The CDK currently uses a combination of [Lerna](https://lerna.js.org/), [Yarn Workspaces](https://yarnpkg.com/lang/en/docs/workspaces/), | ||||||
and Bash scripts to orchestrate build and test workflows. This has several advantages for managing a monorepo structure, but can also | ||||||
be very slow, difficult to work with and extend, and hard to parallelize amongst the team (ie when one unrelated resource changes, the | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the current thing fells not difficult to extend/parallelize to me. I'd love having some examples, or maybe a little tighter wording (I may simply be interpreting it wrong!) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This was meant to be more about how Bazel is literally "plug-any-play" with any rule. An output from one rule can easily be used and transformed by another. And yes, while you can do that with the current system as well, there are less guardrails and tooling in place to ensure that it works well out-of-the-box. Maybe not a dealbreaker, but it was noticeable to me. I can try to add an example to illustrate this, but I may just remove the verbiage too if it's too misleading. |
||||||
whole project still needs to be rebuilt). | ||||||
|
||||||
# Basic Example | ||||||
|
||||||
Instead of running `bash build.sh` or `npm run build` on a subpackage, an engineer would simply run `bazel build <path-to-resource>` or | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is a "resource"? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What about use cases like:
How do they map to Bazel? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A resource in this case could be any build/test target in a subpackage. For instance, you could have one resource as the Lambdas that support a package, another for the API layer for that package, and another for the test suite that brings them together. I'm not entirely sure what you mean by buildup/builddown and foreach. Can you give me an example of both? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||||||
`bazel test <path-to-resource`. Running a specific Bazel rule (for instance orchestrating a release) is also straightforward, with | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
`bazel run <path-to-resource>.<action>`. | ||||||
|
||||||
# Design Summary | ||||||
|
||||||
In order to setup Bazel, the Bazel-specific rules and resources need to be added to the repository. This means adding `BUILD.bazel` files | ||||||
to each subpackage, and then building custom rules around some of the internals of the project (e.g. CFN to TS). | ||||||
|
||||||
# Detailed Design | ||||||
|
||||||
The structure of adding Bazel to a project is quite simple. Represented below is a tree of the files | ||||||
needed for a standard project | ||||||
|
||||||
``` | ||||||
BAZEL.build | ||||||
package.json | ||||||
|packages| | ||||||
BUILD.bazel | ||||||
|apigateway| | ||||||
BUILD.bazel | ||||||
package.json | ||||||
Comment on lines
+43
to
+44
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there duplication between BUILD.bazel and package.json? For example, where do we specify the module's dependencies all the jsii information? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We would do that in the BUILD.bazel, similarly to how package.json handles it. There is work underway right now to pull the dependencies automatically from the package.json, but this would not extend to In the meantime it would be duplicated, which I cop to is increased work, but the dependencies of packages change so infrequently, and if it's work that's missed it is also easily detectable. |
||||||
apigateway.ts | ||||||
index.ts | ||||||
WORKSPACE <-- Bazel project definition, only needed once | ||||||
yarn.lock | ||||||
``` | ||||||
|
||||||
The `BUILD` files themselves are quite simple, since Bazel has rules built-in for compiling a TypeScript library and for | ||||||
running JavaScript tests. We would need to write custom rules in the | ||||||
Comment on lines
+51
to
+52
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The CDK is built with the Also, what are the implications on all the tools in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Each of those tools would get wrapped with Bazel rules. This would "supercharge" them with the benefits of Bazel with minimal added work to migrate the existing build system. It also gives a great template for adding more scripts to the The implications for |
||||||
[Skylark language](https://docs.bazel.build/versions/master/skylark/language.html) to perform custom operations like | ||||||
building TypeScript definitions from CloudFormation definition files. This is relatively straightforward since the tooling | ||||||
for this exists in TypeScript already, and Skylark is a subset of Python. | ||||||
|
||||||
Example `BUILD.bazel` file, assuming `cdk_library` is a Bazel rule that calls `ts_library` with another rule that performs | ||||||
TS transformation: | ||||||
``` | ||||||
load("//tools:defaults.bzl", "cdk_library", "pkg_npm") | ||||||
|
||||||
package(default_visibility = ["//visibility:public"]) | ||||||
|
||||||
cdk_library( | ||||||
name = "aws-amazonmq", | ||||||
scope = ["AWS::AmazonMQ"], | ||||||
outputs = ["lib/amazonmq.generated.ts"], | ||||||
srcs = glob(["lib/**/*.ts"]), | ||||||
deps = [ | ||||||
"//packages/@aws-cdk/cdk", | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It feels like this information is duplicated from the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As mentioned in Elad's comment above, while this would be duplicated for now, it would not be duplicated in the long run as the Bazel team is working on auto-extraction of dependencies from package.json files. However, I think it's worth biting the bullet for now, because it is just the short term, and the risks are minimal in my opinion. For instance, when was the last time you updated the dependencies of a long-standing CDK package? And even if you did, if you skipped adding it to the Bazel file, it wouldn't build, and if you skipped adding it to the package.json file, the integration tests wouldn't run. |
||||||
], | ||||||
) | ||||||
|
||||||
pkg_npm( | ||||||
name = "npm_package", | ||||||
srcs = [ | ||||||
"package.json", | ||||||
], | ||||||
tags = [ | ||||||
"release-with-framework", | ||||||
], | ||||||
deps = [ | ||||||
":aws-amazonmq", | ||||||
], | ||||||
) | ||||||
``` | ||||||
|
||||||
The `pkg_npm` rule is also built-in to Bazel, and bundles a package based on given sources, along with other needed files like a package.json. | ||||||
Note that the `cdk_library` rule is only needed for packages that have a CFN to TS transformation. If that's not needed, it's simply a `ts_library` rule. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. not really because we use jsii There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Even better then! |
||||||
|
||||||
# Drawbacks | ||||||
|
||||||
There are two main drawbacks to migrating to Bazel: | ||||||
|
||||||
1. Lack of familiarity with Bazel amongst the team | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ...and contributors There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fair enough, but like what the Angular team does, we could include a short reference guide, linked to from the contributor's guide. If your goal is to simply develop for the CDK, and not the tooling behind it, the commands to enable contributions are simple and easy to use. |
||||||
1. Relative newness of Bazel in the JavaScript/TypeScript ecosystem | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are there examples of large typescript projects that use bazel? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Angular and RxJS come to mind, I'll try and find a few others. |
||||||
|
||||||
For the first point, the current system has the advantage of being a glorified monorepo wrapper around NPM. Bazel, | ||||||
on the other hand, has a completely different semantic structure. While using it is straightforward for the most part, | ||||||
understanding how it works and more importantly how to extend it (by writing custom rules, for example) is not. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would like this statement to be more substantiated. I think we're doing stuff in a way that is very much "how you do it in the node ecosystem"; but I'm also no expert on the subject (I mean, AWS CDK is the first ever There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I'm not saying that Bazel is "the way to do things" in the JavaScript world (not yet, anyways), but it certainly has its advantages for a sprawling monorepo like the CDK. This would be very different if the CDK were a single module, but it's not. Bazel was built to solve the issue of tons of modules in various states across developers. Lerna was build to solve the issue of a developer with a couple related libraries housed in one repository. |
||||||
|
||||||
For the second point, while Bazel has had its 1.0 release, that release was relatively recent, and there is still much | ||||||
work to be done in the space. For instance, Bazel does not currently supported nested package.json dependencies when | ||||||
constructing its own dependency tree. While the nested files can exist, their dependencies would have to be deduped to | ||||||
the root package.json file in order for everything to be setup properly by Bazel. | ||||||
|
||||||
# Rationale and Alternatives | ||||||
|
||||||
## Rationale | ||||||
* Bazel is incremental, meaning that only the changed aspects of the project and its dependencies need to be rebuilt | ||||||
* Bazel caches all of its builds, making it much faster than the current solution | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fast is desirable. Safe is required. Cache often introduces safety issues, and I think it'd be interesting to add some wording around how Bazel makes sure it's caching mechanism is exempt of invalidation bugs. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this something I can add in a references section? |
||||||
* Bazel allows for hermiticity and reproducability since it runs in a sandbox | ||||||
* Bazel is highly extensible, meaning custom rules can be written solely in TypeScript/JavaScript as opposed to Bash, but | ||||||
still incorporated easily into the toolchain | ||||||
Comment on lines
+113
to
+114
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A drawback that has not been mentioned earlier is that Bazel introduces yet another dependency in the mix that you need to install (and later maintain) in order to contribute to CDK. I believe it can be installed from There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It can be installed from NPM, and in that way doesn't introduce that much overhead. But definitely worth a mention, you're totally right. |
||||||
|
||||||
## Alternatives | ||||||
* Remain with the current build system, but incorporate [lerno](https://github.com/aws/aws-cdk/pull/5359) | ||||||
|
||||||
# Adoption Strategy | ||||||
|
||||||
The Bazel rules can be added in parallel with current efforts by the team. None of the resources involved would | ||||||
overwrite or interfere with the current project structure, so adding files would not interfere with developer flow. | ||||||
Switching over to Bazel could also be opt-in as people want to try it, with the legacy system in place to manage | ||||||
releases and for developers to fallback on if needed. | ||||||
Comment on lines
+121
to
+124
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. But in the transition we will have to maintain two sets of build definitions, no? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "Maintain" implies that both are ready for use, but with the adoption of any new system there is always a migratory period. How long that takes is up to the team, but while the new system is being built, there is only one system that builds the project, which needs to be maintained. The new one is being built, which would have to be done anyway. In the transition, it could be made clear that the legacy system is not supported during the migration, or that it will only be supported for a short time. |
||||||
|
||||||
The rules for each subpackage would need to be added, along with the general rules for the project. All of the Bazel | ||||||
dependencies would be added to the top-level package.json, in addition to the nested dependencies from all of the | ||||||
subpackages. There should not be too many, since most of the "dependencies" are simply other CDK subpackages. | ||||||
Comment on lines
+127
to
+128
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sounds ripe for automation/generation. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As mentioned above, it's in progress, but there's no committed timeline as of yet. |
||||||
|
||||||
# Unresolved questions | ||||||
|
||||||
* Will the project structure of the CDK remain stable over time, or will it move to a singular import? | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Funny you'd ask that 🤣 we've been talking about it quite a lot lately... But we're not quite sure just yet whether this is the right move or not... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. While the move to Bazel helps the current setup a lot, it would probably help a single-module setup a little bit less. It can certainly be used in both cases though. But just a matter of how much of a commitment we want to make now. |
||||||
|
||||||
# Future Possibilities | ||||||
|
||||||
* We could setup Remote Build Execution (RBE) using a fleet on EC2 or ECS, further speeding up the team's builds | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What does that mean? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It means that instead of using your local machine to execute build and test tasks, your Bazel instance would "outsource" the job to a team-available fleet (only accessible by trusted contributors, of course). Since Bazel is parallelizable, the work for a large and complex build task like the CDK could be spread out over 3 or 4 EC2 hosts and return the result to you locally in a fraction of the time it would otherwise take. |
||||||
* We could enable remote caching for our CI to speed up our CI builds, in addition to local builds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here - or in motivations - I would like to have more verbiage around what "all the deficiencies of the current system" are. Because as much as I'm willing to believe they exist, not explicitly listing them out means: