Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache RegistryEntries for OSS and Cloud connector versions #26878

Closed
bnchrch opened this issue Jun 1, 2023 · 8 comments · Fixed by #27766
Closed

Cache RegistryEntries for OSS and Cloud connector versions #26878

bnchrch opened this issue Jun 1, 2023 · 8 comments · Fixed by #27766
Assignees

Comments

@bnchrch
Copy link
Contributor

bnchrch commented Jun 1, 2023

Problem

For any OLD connector version, how do we get the registry entry that was in the catalog but no longer is?

Solution

Lets update our registry generator to cache connector entries

For example if you are looking for the last registry entry for destination-bigquery:1.2.19 on cloud you would go to

https://connectors.airbyte.com/files/metadata/airbyte/destination-bigquery/1.2.19/cloud.json


OLD CONVERSATION BELOW


The Current System

  1. Metadata files only allow for registry specific overrides
  2. The Connector Registries can only have one connector version at a time. (OSS can have source-s3:1.2.3 and cloud can have source-s3:2.0.0
  3. Platform can only have one default version of a Actor Definition at a time (This directly relates to the version in the registry)
  4. The Platform now allows for individual Actors to reference other Actor Definition Versions other that the default. source-s3:eds-prerelease-123
  5. We want to extend Actor Definition Versions so that we can allow users remain on old versions for a set window of time.
  6. Populating Actor Definition Versions, at this moment, are an awkward combination of current metadata, version specific spec cache, and whatever data was ingested on that version if it ever originally appeared in the Registry.

Questions for the team

The 6th point is the awkward turtle and we need to answer the question(s):

  1. If we move a connector on [oss|cloud] from v1 to v2 with a 6 month deprecation window, how do we ensure we freeze v1s metadata related settings in time
  2. If we add a connector + version to [oss|cloud] that the platform has never seen, how do we populate its metadata related values?

Related Conversations

Normalization Overrides: https://airbytehq-team.slack.com/archives/C03VDJ4FMJB/p1685552842556829
Rollbacks: https://airbytehq-team.slack.com/archives/C03VDJ4FMJB/p1685567791734259

@bnchrch bnchrch changed the title For Launch Darkly Overrides how do we determine which metadata information to use For Launch Darkly Overrides / Connector Versions how do we determine which metadata information to use Jun 1, 2023
@erohmensing
Copy link
Contributor

This goes for any place in which we don't already have an actor definition - the /update endpoints where OSS users can choose any version they like also apply here

@erohmensing
Copy link
Contributor

If we move a connector on [oss|cloud] from v1 to v2 with a 6 month deprecation window, how do we ensure we freeze v1s metadata related settings in time

If everything lives on the ADV and not the actor definition, the breaking changes phase 0 should handle this correctly. What is currently not handling this correctly is the fact that we upsert the ADVs. There are 2 issues here

  • We shouldn't upsert the ADVs - they're versioned info and therefore we should only need the values that were initally inserted. We do not expect these values to change without a new version being shipped.
  • We should, for due dilligence, make sure that the metadata also reports the right versioned info (e.g. in the case of a rollback). However if we do the above point, I don't think that this is strictly necessary to get it working.

If we add a connector + version to [oss|cloud] that the platform has never seen, how do we populate its metadata related values?

If we do this via override or /update endpoint, I think we need to request this info from the metadata service.

@evantahler
Copy link
Contributor

evantahler commented Jun 6, 2023

Grooming:

ADV Properties in Question:

  • Release Stage (could effect billing...)
  • Spec (can be re-discovered locally)
  • Protocol Version
  • Normalization Stuff (going away, but matters for now)
  • Suggested Streams
  • Allowed Hosts (but not in use yet)

Options:

  1. All the above are also needed in LD feature flags
    • "The LD flags are 'registry checkin catalogs #3'" - which implies that these flags can contain all the ADV properties needed. "dockerTag" is just one of many bits of info that an ADV needs.
    • This means the LD feature flags are going to get more complex.
    • Old Tag -> v1.2.3, New Tag -> {dockerTag: "v1.2.3", releaseStage: "generally_available"} - now in LD you provide JSON
  2. Pre-releases publish metadata file, and platform consumes these per-version files and builds ADV entry
    • Then, dagster bundles pre-releases into a pre-release registry file (tagged by commimt or the docker prelease tag - oss_registry-GIT_BRANCH)
    • The reason to use the git branch is because you might change more than one connector
    • The registry is what you are picking in LD now, not the tag of the docker image now.

TODO:

  • Don't update ADVs. ever.
  • Double check with Ed. Ask if we still are changing NormalizationTag for new destinations. If we always leave it enabled, and the new destination ignores it... is that OK?

--> SOMEONE WRITE A SMALL TECH SPEC PICKING WHAT WE SHOULD DO

@evantahler
Copy link
Contributor

#27077 for not updating ADV part

@pedroslopez
Copy link
Contributor

In my mind, the "entry point" for a connector version from our registry should be its metadata - and the docker image just happens to be how we run that connector. So, to have a version override that says "v1.2.3" means go grab everything for this version (not just a "docker image override").

In the platform today when processing overrides we go to the spec cache bucket to grab the spec for the given version. I think this should be replaced with looking at our registry and pulling all the version-specific information from our versioned metadata.

If this were an API, I would imagine something like /connectors/source-faker/versions/1.2.3 and getting that info. With the current model of serving versioned metadata in files, this is sort of already available in YAML form at https://connectors.airbyte.com/files/metadata/airbyte/source-faker/2.1.0/metadata.yaml - though what's a bit weird is this isn't the exact contents that would have been returned in the registry served at https://connectors.airbyte.com/files/registries/v0/cloud_registry.json. There's some processing that happens and we fill in the spec and I'm not sure what else.

In any case, I don't think this is a full registry we need (i.e. a list) or any seeding that we have to do platform side. I think what we need is more a way to request a "registry entry" for a specific verison on demand.

The logic on the platform should follow the same as what happens for spec today, just filling in more information. As a refresher, when an override is found in LaunchDarkly, we:

  1. Check to see if it's already in the db. If it is, we use that.
  2. If it's not, we:
    i. Pull the spec from the spec cache bucket
    ii. Construct a new ADV with the above spec, requested image tag and fill in the rest of the fields from the default version
    iii. Persist the new version and use that.

So, we should replace step 2i with something that provides the rest of the fields (the metadata) and the rest should just work.

@erohmensing
Copy link
Contributor

erohmensing commented Jun 12, 2023

I have a similar worldview as the above! It would make the platform rely on our service a lot more, but I think that's probably an inevitability?

We should do the same if we get an OSS /update to a version that doesn't exist already in the DB (e.g. prereleases) - the user would have to be online to grab the prerelease metadata, but they'd have to be so in order to get the docker image too. If they can hit the api when they pull the docker image, we could find a way to make that work

@bnchrch
Copy link
Contributor Author

bnchrch commented Jun 13, 2023

Grooming notes:

  • Someone needs to write a tech spec.
  • Lets sync up today to discuss.

@bnchrch bnchrch self-assigned this Jun 13, 2023
@bnchrch bnchrch changed the title For Launch Darkly Overrides / Connector Versions how do we determine which metadata information to use Cache Registry entries for OSS and Cloud connector versions Jun 13, 2023
@evantahler evantahler changed the title Cache Registry entries for OSS and Cloud connector versions Cache RegistryEntries for OSS and Cloud connector versions Jun 20, 2023
@evantahler
Copy link
Contributor

Grooming:

  • Make the small RegistiryEntry files for each connector version
  • As part of this, we could make the generation of the full registries use these partial RegistiryEntries. This is better because one bad registry entry doesn't crash the downstream registries. This needs a new sensor pointing at the RegistryEntries (30s). Net, this would add ~90s to the generation time (most of which is spinning up containers).
  • You can choose to make this 2 PRs and 2 issues, if you want

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants