Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split backends into independent repos #432

Closed
geospatial-jeff opened this issue Jul 27, 2022 · 25 comments · Fixed by #555
Closed

Split backends into independent repos #432

geospatial-jeff opened this issue Jul 27, 2022 · 25 comments · Fixed by #555
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@geospatial-jeff
Copy link
Collaborator

Discussion started here: #409 (comment).

Including backends in this repo was convenient when stac-fastapi was first created but the pros are now outweighed by the cons.

Migration plan:

  1. Merge all open PRs.
  2. Release a new version, this will be the last version with a monorepo.
  3. Create two new repos in stac-utils org, one for each backend.
  4. Copy backend submodules over to new repos.
  5. Merge the api, extensions, and types submodules into a single stac-fastapi package, keep import paths the same to avoid breaking.
  6. Cut over to the new stac-fastapi package in the backend repos.
  7. Test / validate backend repos before we merge all of the code.
  8. Move backend specific issues to the appropriate repo.
@duckontheweb
Copy link
Contributor

I'm a huge +1 for this and I'm happy to help with part of it if you want.

5. Merge the api, extensions, and types submodules into a single stac-fastapi package, keep import paths the same to avoid breaking.

I'm a big fan of this. I've been using poetry a bunch lately, but it doesn't support installing from GitHub subdirectories, which has been a blocker for trying to install the latest unstable version of this library. This would solve that issue.

@geospatial-jeff
Copy link
Collaborator Author

geospatial-jeff commented Jul 27, 2022

I'm a huge +1 for this and I'm happy to help with part of it if you want.

That would be great! I'm going to start getting all of the open PRs resolved, not exactly sure how long that will take. But I imagine it will be easier to merge all the packages once this is done. It looks like the relevant PRs to api/extensions/types modules are:

@duckontheweb
Copy link
Contributor

Sounds good. I can create new repos for the backends (stac-fastapi-sqlalchemy and stac-fastapi-pgstac). We will probably want to resolve the following PRs before we port the code over since they all make changes to one or both of the backends:

@geospatial-jeff
Copy link
Collaborator Author

geospatial-jeff commented Aug 1, 2022

@duckontheweb I can start moving code into stac-fastapi-sqlalchemy. I'll try to keep the repo structure as similar as possible.

@duckontheweb
Copy link
Contributor

@duckontheweb I can start moving code into stac-fastapi-sqlalchemy. I'll try to keep the repo structure as similar as possible.

Sounds good. I’m wondering if we should actually fetch from this repo into those ones in order to preserve the Git history. I think setting up this repo as a remote and then doing a “git reset —hard” to the master branch of this repo would do the trick

@duckontheweb
Copy link
Contributor

I'm going to move the pgstac backend over to stac-fastapi-pgstac and preserve the Git history. I'll try to keep the repo structure as similar as possible, too.

I will also set up the default branch for that repo to be main instead of master. We may want to consider making the same change in stac-fastapi-sqlalchemy and in this repo while we are doing this overhaul.

@geospatial-jeff Let me know if you want me to work on consolidating this repo into a single Python package.

@captaincoordinates
Copy link
Contributor

is it part of the plan to release 2.4.0 after the outstanding PRs are merged, and before the repo is broken apart? There seem to be a lot of unpublished changes against the current repo structure.

@duckontheweb
Copy link
Contributor

is it part of the plan to release 2.4.0 after the outstanding PRs are merged, and before the repo is broken apart? There seem to be a lot of unpublished changes against the current repo structure.

I think this would probably be a good idea. The intention is to not make any breaking changes as part of the new repo structure, but just to be safe we might want to cut a release first.

@geospatial-jeff
Copy link
Collaborator Author

Yes I can create a release after #441 is merged.

@duckontheweb
Copy link
Contributor

I ported the code to stac-utils/stac-fastapi-pgstac and opened stac-utils/stac-fastapi-pgstac#1 to start trimming that down to just the PgSTAC backend.

@geospatial-jeff
Copy link
Collaborator Author

I'll have the 2.4.0 release out sometime tomorrow.

@geospatial-jeff
Copy link
Collaborator Author

I've added labels to each issue to help figure out what to move to different repos. It's not perfect, and there is often some overlap but hopefully it helps.

@duckontheweb
Copy link
Contributor

I've added labels to each issue to help figure out what to move to different repos.

I'll open issues on the new stac-fastapi-pgstac repo for anything that is purely related to that backend and close the related issue here.

@duckontheweb
Copy link
Contributor

I can take a stab at trimming this repo down to just the core modules and distributing everything under a single package.

@geospatial-jeff
Copy link
Collaborator Author

geospatial-jeff commented Aug 9, 2022

Thanks @duckontheweb, I will review #450 and stac-utils/stac-fastapi-pgstac#1 tomorrow morning!

@duckontheweb
Copy link
Contributor

@geospatial-jeff There may be some issues around separation of concerns between this repo and the backend repos that we want to work out in v2.5.0-a* releases.

These came up as I started to work on #401 using #450 and stac-utils/stac-fastapi-pgstac#1 as a base. That issue pertains to adding Queryables links to the /, /collections, and /collections/{collection_id} responses. None of the changes require information from a specific backend database (the links are static for each endpoint), but because of the way the libraries are set up it required making changes in both this repo and stac-fastapi-pgstac in order to full implement it. The crux of the problem (at least from my perspective) is that "backend" packages like stac-fastapi.pgstac and stac-fastapi.sqlalchemy end up handling too much of the API layer logic like constructing responses.

The AsyncBaseCoreClient.landing_page method does a pretty good job of implementing the logic for constructing the response in this library and delegating fetching of Collections to AsyncBaseCoreClient.all_collections, which is implemented by the backend packages (although that method does do some of the response formatting by putting the list of collections into the collections property of a dictionary).

The /collections endpoint, however, is just using AsyncBaseCoreClient.all_collections as a handler directly. This means that changes to the links property of that response need to happen in the backend repo even if they are not backend-specific. If we change the CRUD client classes to have something more like the following structure, we might be able to get a better separation of concerns and minimize the places where we need to update multiple repos in order to implement a change:

  • AsyncBaseCoreClient.fetch_all_collections - New abstract method in this repo (implemented in backend repos). Returns a simple list of Collections.
  • AsyncBaseCoreClient.all_collections - Implemented in this repo, calls AsyncBaseCoreClient.fetch_all_collections and constructs the response using that list, including adding top-level links and manipulating links on each Collection as necessary.
  • AsyncBaseCoreClient.fetch_collection - New abstract method in this repo (implemented in backend repos). Returns a Collection.
  • AsyncBaseCoreClient.get_collection - Implemented in this repo, calls AsyncBaseCoreClient.fetch_collection and constructs the response, including manipulating Collection links as necessary.

If this seems like a reasonable approach I can put in an draft PR with these changes and then we can see where else we might want to tease apart some of the logic.

@geospatial-jeff
Copy link
Collaborator Author

This means that changes to the links property of that response need to happen in the backend repo even if they are not backend-specific.

This makes sense, definitely something we need to fix for good separation of concerns.

If this seems like a reasonable approach I can put in an draft PR with these changes and then we can see where else we might want to tease apart some of the logic.

This makes sense to me! I imagine its mostly just adding/resolving links which is already relatively standard across backends using the stac_fastapi.types.links module so seems better to do that once in the API layer.

Do you think we have to do the same for item endpoints?

@duckontheweb
Copy link
Contributor

Do you think we have to do the same for item endpoints?

I haven't looked at these in detail, but I imagine we will. I can start a PR with the collections endpoints and then we can build off of that.

@geospatial-jeff
Copy link
Collaborator Author

@duckontheweb Could you please give me push access to https://github.com/stac-utils/stac-fastapi-sqlalchemy?
Getting 403s on push.

@duckontheweb
Copy link
Contributor

@duckontheweb Could you please give me push access to https://github.com/stac-utils/stac-fastapi-sqlalchemy?
Getting 403s on push.

Oops, should be all set now

@geospatial-jeff
Copy link
Collaborator Author

The sqlalchemy PR is here stac-utils/stac-fastapi-sqlalchemy#1, I have a few TODOs listed on the PR but it's in pretty good shape.

@gadomski
Copy link
Member

gadomski commented Nov 10, 2022

@geospatial-jeff @duckontheweb checking in on this one — has this stalled out? I’m going to be picking up some dev time for stac-fastapi and I’d like to get across the current state of play before I dive in too deep.

@geospatial-jeff
Copy link
Collaborator Author

Hey @gadomski that sounds great! This has stalled out but I'm going to be picking it back up. I have some time set aside tomorrow and Saturday to continue working on this issue. It's been a while since we started so I also need to do some accounting to figure out exactly what we've done and how much is left to do. I will follow up tomorrow with a more detailed comment on the status of this issue.

@geospatial-jeff
Copy link
Collaborator Author

At a high level:

  1. Update Remove backend packages #450 with main, there are some changes that have been released since we started working on this issue.
  2. Splitting out the sqlalchemy/pgstac backends into independent repos has revealed some issues with separation of concerns between the API layer (stac_fastapi.api, stac_fastapi.types, stac_fastapi.extensions) and the backends. The main issue is with link generation and how the current implementation is passing the starlette.requests.Request object around. [WIP] Generate links in api layer #472 moves link generation to the API layer, this PR needs to be finished and merged into Remove backend packages #450.
  3. Update https://github.com/stac-utils/stac-fastapi-sqlalchemy and https://github.com/stac-utils/stac-fastapi-pgstac with main to preserve changes when we cut over.
  4. Update https://github.com/stac-utils/stac-fastapi-sqlalchemy and https://github.com/stac-utils/stac-fastapi-pgstac with Remove backend packages #450. Main challenge here will be to implement the new link generation introduced in [WIP] Generate links in api layer #472.
  5. Do a final review of https://github.com/stac-utils/stac-fastapi-sqlalchemy and https://github.com/stac-utils/stac-fastapi-pgstac to make sure the code looks good and everything is working as expected. There are a few open PRs in each repo that need to be finished up and merged (although some of them look finished, just not merged yet). Also confirm that separation of concerns is better; this may require some iteration.

Some additional steps to cut over to the new repo structure cleanly. We did many of these a while ago but they are worth revisiting now as new code has been pushed to stac-fastapi since then.

  1. Move backend specific issues over to the appropriate backend repo.
  2. Merge any open PRs to prevent merge conflicts.
  3. Release a new stac-fastapi version before we do the cutover.

@gadomski
Copy link
Member

Great, thanks for the summary. If I'm understanding correctly, #472 is the biggest blocker at the moment -- the rest of the TODOs seem to be more general housekeeping?

I can start on some of the housekeeping tasks, after which I can bring an extra set of hands/eyes to #472, with the caveat that I'm still learning the code base so will be a little slow/ignorant :-).

@gadomski gadomski linked a pull request Jan 18, 2023 that will close this issue
5 tasks
@gadomski gadomski mentioned this issue Apr 3, 2023
5 tasks
@gadomski gadomski added this to the 2.4.6 milestone May 11, 2023
@gadomski gadomski self-assigned this May 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants