Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: enhance API sources by adding API category/usecase metadata #17893

Closed
swyxio opened this issue Oct 12, 2022 · 18 comments
Closed

Proposal: enhance API sources by adding API category/usecase metadata #17893

swyxio opened this issue Oct 12, 2022 · 18 comments

Comments

@swyxio
Copy link
Contributor

swyxio commented Oct 12, 2022

Tell us about the problem you're trying to solve

in https://github.com/airbytehq/airbyte/blob/master/airbyte-config/init/src/main/resources/seed/source_definitions.yaml , we only indicate APIs vs Databases. However, we're at a scale where this is starting to be less informative and doesn't map to specific usecases that our users have.

We have a manually maintained, nonauthoritative categorization here: https://airbyte.com/connectors
which is pretty useful, but not tied in to our code as source of truth

this info should be driving both our UX and analytics

was discussing with @ChristopheDuong

Describe the solution you’d like

  • consensus/agreement that we should do this
  • doing this
  • implementing as a multiple tags system rather than shoehorning each connector into a single category because its likely that some connectors will fit multiple usecases

sample

- name: AWS CloudTrail
  sourceDefinitionId: 6ff047c0-f5d5-4ce5-8c81-204a830fa7e1
  dockerRepository: airbyte/source-aws-cloudtrail
  dockerImageTag: 0.1.4
  documentationUrl: https://docs.airbyte.com/integrations/sources/aws-cloudtrail
  icon: awscloudtrail.svg
  sourceType: api
  releaseStage: alpha
  usecaseTags:
		- event streaming
		- observability
- name: Amazon Ads
  sourceDefinitionId: c6b0a29e-1da9-4512-9002-7bfd0cba2246
  dockerRepository: airbyte/source-amazon-ads
  dockerImageTag: 0.1.22
  documentationUrl: https://docs.airbyte.com/integrations/sources/amazon-ads
  icon: amazonads.svg
  sourceType: api
  releaseStage: generally_available
  usecaseTags:
		- marketing
- name: Amazon Seller Partner
  sourceDefinitionId: e55879a8-0ef8-4557-abcf-ab34c53ec460
  dockerRepository: airbyte/source-amazon-seller-partner
  dockerImageTag: 0.2.27
  sourceType: api
  documentationUrl: https://docs.airbyte.com/integrations/sources/amazon-seller-partner
  icon: amazonsellerpartner.svg
  releaseStage: alpha
  usecaseTags:
		- ecommerce

Describe the alternative you’ve considered or used

maintaining an external google sheet that does this job

Are you willing to submit a PR?

yea

@evantahler
Copy link
Contributor

evantahler commented Oct 12, 2022

Love it! Unifying the data in these files and the data that only exists in our marketing site(s) was one of the goals of the Connector Metadata Service which the @airbytehq/connector-operations team will be working on next year.

In the short term, I would love to move all connector information out of webflow and into these file. Then, webflow could use them as an API. A github action could be used to convert YAML to JSON could be used to prep the files for webflow's consumption.

@swyxio
Copy link
Contributor Author

swyxio commented Oct 12, 2022

i wouldnt mind doing a onetime pass on this. it'd be a pain but i bet the webflow data is extremely out of date anyway

and yes cant wait for the metadata service. this work will have to be done in some form sooner or later anyway.

do we agree on the schema? many-to-one rather than one-to-one/MECE? any other metadata to add or remove while we're doing this anyway?

@evantahler
Copy link
Contributor

I think the first step is to make a doc that talks about what the schema would look like. In addition to those tags, there are also a few types of descriptions we might want to bring in.

A long time ago as part of a hack day I took a pass at mocking the future service up. As part of that I made Typescript types for the metadata which might be helpful as a starting point https://github.com/airbytehq/connectors-service/blob/main/src/types/Connector.ts

@swyxio
Copy link
Contributor Author

swyxio commented Oct 12, 2022

for logos we might want to make/offer a logo api service? for our needs but that others might use
https://brandfetch.com/developers/demo

https://clearbit.com/logo

@swyxio
Copy link
Contributor Author

swyxio commented Oct 12, 2022

@evantahler mind expanding a bit about the intended difference between

  • about: string;
  • description: string;
  • headline?: string;
  • copy?: ConnectorDescriptionCopy[];

probably some examples of each would make ultra clear

some of them we'll want to try to impose length constraints

@evantahler
Copy link
Contributor

evantahler commented Oct 12, 2022

It's course, but I was trying to come up with a way to represent all the data on the page:

Screen Shot 2022-10-12 at 10 48 54 AM

And copy is further down on the page. I wanted all of that data loaded into the CMS so we could use it for search, show it in the product, etc.

@swyxio
Copy link
Contributor Author

swyxio commented Oct 12, 2022

only 2 ways to make money in software: putting content in webflow and taking content out of webflow

@alafanechere
Copy link
Contributor

alafanechere commented Oct 17, 2022

I'd love to suggest these tags (+ the release stage) not be declared in source_definitions.yaml but to be directly "hardcoded" in the connector itself and could be exposed:

  • with a new command
  • stored in a connector-specific file.
  • with docker images label

I suggest it because thinking of a couple of uses cases that might require this metadata without the need to parse this YAML file:

  • I'd love source acceptance tests to be aware of the release stage of a connector, without parsing the connector catalog
  • We might want the CDK to behave differently according to these tags
  • Migrating away from source_defintions.yaml to the Connector Metadata service will be easier

What do you think @evantahler ?

@evantahler
Copy link
Contributor

evantahler commented Oct 17, 2022

I'd love to suggest these tags (+ the release stage) not be declared in source_definitions.yaml but to be directly "hardcoded" in the connector itself and could be exposed:

I think I'm neutral to where the "source of truth" is for this data, for now,... with the following notes:

  1. We do need this data to be baked into a single file eventually so that it can be consumed by the webflow site and other consumers. If that's the YAMLs directly, or the output of this new command gets baked into a new file like source_specs.yaml, I'm 🤷. We need to get it into JSON anyway. source_specs.yaml is basically a cache of the output of these docker commands, so it could be kind of the same thing.
  2. Can't the source acceptance tests test against the source_specs.yaml anyway?

Migrating away from source_defintions.yaml to the Connector Metadata service will be easier

I actually think the opposite is true. Any path that (1) moves connector data into the codebase which (2) stores that data /outside/ of the docker image is a step toward a future API. If you are searching for a connector, you aren't going to download them all and run a docker command.

Either way, @sw-yx please start gathering/creating the metadata for all of the connectors! Keep it in a google sheet for now, and we'll help figure out where the final home for the info should be. I think the hard part for this story is data entry / data curation - don't let us stop you!

@swyxio
Copy link
Contributor Author

swyxio commented Oct 17, 2022

i think this is gonna be a few dozen lines of web scraping + gpt3 code

image

@swyxio
Copy link
Contributor Author

swyxio commented Oct 18, 2022

yeah lol

image

image

image

@swyxio
Copy link
Contributor Author

swyxio commented Oct 18, 2022

image

image

@swyxio
Copy link
Contributor Author

swyxio commented Oct 18, 2022

ok @evantahler here's the first 20: https://docs.google.com/spreadsheets/d/1kzjp5rBQm_eczzfqZ58Hs9KVF8E_bp0E0TCNBbgZ6ZU/edit?usp=sharing

some cleanup needed of course. but i could run this for all 160 sources pretty easily

https://gist.github.com/sw-yx/6cce0e05d32610c3595d5480d7af695d

@alafanechere
Copy link
Contributor

alafanechere commented Oct 18, 2022

Can't the source acceptance tests test against the source_specs.yaml anyway?
My concern is that source_specs.yaml or source_definitions.yaml are global artifacts updated on the connector release.

If you are working on a new connector, or you want to run SAT according to a targeted release stage you'd need to manually modify these files that are out of the specific scope of a connector.

@swyxio
Copy link
Contributor Author

swyxio commented Oct 19, 2022

btw i just realized there are a LOT of connectors not included in source definitions.yaml

https://docs.airbyte.com/integrations/

is this on purpose? known?

@swyxio
Copy link
Contributor Author

swyxio commented Nov 1, 2022

and we have icons that dont exist...
image

@swyxio
Copy link
Contributor Author

swyxio commented Nov 2, 2022

made a little mvp of connector metadata service for use in the new docs landing page #18752

image

@swyxio
Copy link
Contributor Author

swyxio commented Apr 28, 2023

死心了

@swyxio swyxio closed this as completed Apr 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants