-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create AWS deployment infrastructure #366
Comments
#368 is related |
#135 is also related |
#50 might be related as well... |
Another related one: 2i2c-org/farallon-image#28 |
OK, I have looked into the pilot-hubs and pangeo-hubs codebases. I have also looked at other issues referred here.
I will continue my comments in a subsequent message, otherwise, it gets pretty long... |
Correct! No docs, just #275. I can walk you through this if you would like, but it is a mess.
The intent of that repo is to only host infrastructure that's org-wide - so manage projects access, terraform state workspaces, maybe a centralized grafana (if we get there), etc. Not for per-project terraform. I also continue to find automating terraform deployments terrifyingly complex and super easy to get wrong. |
I opened #369 fixing an issue (but it makes the code scarier!), and adding some docs on the current terraform workspaces in use. |
Assuming I know the answer to some of the questions I raised above, I think the plan for the item on the list ("Adapt our deploy scripts to support AWS as well") should be:
General thoughts? 😜 |
On AWS, EFS is indeed used. On GCP, Filestore is extremely expensive - unlike EFS, there's a minimum disk size of 1TB. Easily a few hundred dollars a month just on that. EFS is much more real pay per use.
Yep! For
So right now, that's the suggestion for cases when our users are building the docker image themeselves. I think for us, we should try get the image in ECR. It definitely makes node spin up time much faster, and this is super important with dask
I just mount all EFS as NFS. Just setting something like https://github.com/2i2c-org/pangeo-hubs/blob/staging/deployments/farallon/config/common.yaml#L4 as our |
OK, thanks for the clarification!
Yep, I am kind of getting that whereas I read about it... |
I supposed that was the case... thanks for confirming it! |
Filestore is also just NFSv3, and in general doesn't have a lot of the features of EFS that make it so desirable. SIGH |
Pangeo hubs have a `PANGEO_SCRATCH` env variable that points to a GCS bucket, used to share data between users. We implement that here too, but with a more generic `SCRATCH_BUCKET` env var (`PANGEO_SCRATCH` is also set for backwards compat). pangeo-data/pangeo-cloud-federation#610 has some more info on the use cases for `PANGEO_SCRATCH` Right now, we use Google Config Connector (https://cloud.google.com/config-connector/docs/overview) to set this up. We create Kubernetes CRDs, and the connector creates appropriate cloud resources to match them. We use this to provision a GCP Serivce account and a Storage bucket for each hub. Since these are GCP specific, running them on AWS fails. This PR puts them behind a switch, so we can work on getting things to AWS. Eventually, it should also support AWS resources via the AWS Service broker (https://aws.amazon.com/partners/servicebroker/) Ref 2i2c-org#366
#374 puts some GCP specific stuff behind a feature flag |
#379 (by @yuvipanda) collects several PRs toward this goal. |
hey all -- just chiming in here to say that we're super interested in these developments as we're looking to setup a new Pangeo-like hub on AWS in the near future. If there's anything we can do to help move things along, just let me know. |
https://pilot-hubs.2i2c.org/en/latest/topic/storage-layer.html has more info on the nfs-share-creator. This PR adds support for setting baseSharePath to `/`, which is sometimes needed on EFS. Ref 2i2c-org#366
Update: I have a kops-based cluster (mimicking the Farallon one) already deployed in OpenScapes AWS land. |
Also, #379 (supporting hubs deployment in AWS land from the pilot-hub repo) was merged today! |
#453 deals with replication/validation + documentation of the current deployment story. |
#453 was merged, so ticking the last item in the first message of this thread and finally closing this one!! Btw, there could be some other remaining things to be done but those are described and captured in follow-up issues. |
Summary
We currently focus much of our deployment infrastructure around Google Cloud rather than AWS or Azure. We also have a few clients that would like their hubs working on AWS. We should improve our AWS deployment infrastructure and use these use-cases as forcing functions.
The two use-cases are:
Given that two groups wish to use this infrastructure now, and that AWS is extremely popular and will likely be a commonly-requested provider, I think we should prioritize this one.
Acceptance criteria
We should be able to spin up an AWS Pangeo-style hub with the same ease that we currently have with GKE.
Tasks to complete
A few that came to mind...
ping to @jhamman as well as @consideRatio who may be interested in tracking this (or helping out!) as well
The text was updated successfully, but these errors were encountered: