Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add neptune proxy #204

Merged

Conversation

friendtocephalopods
Copy link
Contributor

@friendtocephalopods friendtocephalopods commented Sep 24, 2020

Summary of Changes

add a bunch of test only proxy endpoints to support roundtrip tests.
Use Stat instead of Statistics from latest common.
pin specific common version to reduce surprises (and update to latest common).

Tests

Added abstract, gremlin, neptune tests.
Modified some tests to use Stat instead of Statistics (noop change)

Documentation

Added documentation for how to run roundtrip tests, assuming proper database configuration exists.

CheckList

Make sure you have checked all steps below to ensure a timely review.

  • PR title addresses the issue accurately and concisely. Example: "Updates the version of Flask to v1.0.2"
  • PR includes a summary of changes.
  • PR adds unit tests, updates existing unit tests, OR documents why no test additions or modifications are needed.
  • In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain docstrings that explain what it does
  • PR passes make test

@friendtocephalopods friendtocephalopods force-pushed the abstract-proxy-tests branch 5 times, most recently from 139f798 to edea983 Compare September 25, 2020 16:54
@friendtocephalopods friendtocephalopods marked this pull request as ready for review September 25, 2020 16:57
@alran
Copy link
Contributor

alran commented Sep 25, 2020

Do any of these files need to be updated / removed now that we have upstreamed with amundsen-gremlin (and since we have changed a lot since Jan)? https://github.com/amundsen-io/amundsenmetadatalibrary/pull/96/files

@friendtocephalopods
Copy link
Contributor Author

friendtocephalopods commented Sep 25, 2020

Do any of these files need to be updated / removed now that we have upstreamed with amundsen-gremlin (and since we have changed a lot since Jan)? https://github.com/amundsen-io/amundsenmetadatalibrary/pull/96/files

Good question! I think we can rip out some of the aws4auth stuff.

edit: No, I guess it's used by the janus graph tests. Oh well!
edit 2: On further reflection, that makes 0 sense. Removing the dependency here.

@friendtocephalopods friendtocephalopods force-pushed the abstract-proxy-tests branch 2 times, most recently from e4a8ddb to 31a2605 Compare September 28, 2020 20:11
@feng-tao
Copy link
Member

will take a look this week

Copy link
Member

@feng-tao feng-tao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I look at in brief, but it is hard for me to review without much context :( I have a few high level questions:

  1. do you think we could a doc to describe the gremlin in a bit more detail?
  2. How do we ingest metadata for gremlin as I recall square doesn't use databuilder?
  3. Are all the put methods in proxy solely for testing purposes?
  4. what are the features supported for gremlin so far(dataset, user, dashboard with what metadata)?
  5. any todos or issues we keep an eye and how we could extend databuilder to build metadata for neptune?
  6. +1 on @alran 's comment, any cleanup for previous neptune/gremlin proxy?

metadata_service/proxy/base_proxy.py Outdated Show resolved Hide resolved
metadata_service/proxy/gremlin_proxy.py Outdated Show resolved Hide resolved
requirements.txt Outdated Show resolved Hide resolved
@@ -77,7 +80,8 @@ class Config:
PROGRAMMATIC_DESCRIPTIONS_EXCLUDE_FILTERS = [] # type: list


class LocalConfig(Config):
# NB: If you're using the gremlin proxy, the appropriate GremlinConfig must be added to any other configs
class LocalConfig(LocalGremlinConfig, Config):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is subclass GremlinConfig for testing purpose? If that's the case, could we do sth like https://github.com/amundsen-io/amundsenfrontendlibrary/blob/ccfd2d6b82957fef347e956b243e4048c191fc0d/amundsen_application/config.py#L136 which create a test config in the file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, no, it inherits GremlinConfig so we don't need to litter the metadata service and databuilder config files with our gremlin endpoint information. The alternative is duplicating our gremlin-specific config in 3 different repos, which is unpleasant.

@feng-tao
Copy link
Member

also @AndrewCiambrone if you could take a look and review based on your experience with your gremlin pOC, that will be great!

@friendtocephalopods
Copy link
Contributor Author

  • do you think we could a doc to describe the gremlin in a bit more detail?
  • How do we ingest metadata for gremlin as I recall square doesn't use databuilder?
  • Are all the put methods in proxy solely for testing purposes?
  • what are the features supported for gremlin so far(dataset, user, dashboard with what metadata)?
  • any todos or issues we keep an eye and how we could extend databuilder to build metadata for neptune?
  • +1 on @alran 's comment, any cleanup for previous neptune/gremlin proxy?

Thanks for the review, @feng-tao!

I added some docs in the amundsengremlin readme. The AWS configuration is maybe the most interesting bit of that still pending, but the example code should help folks get started if they knew how to configure AWS and wanted to try the proxy. Is there some additional information you'd like documented here?

We basically have databuilder as an app with cron jobs that fetches data from various sources and then pipes it into the bulk loader using the amundsen-gremlin.

Yep, the put methods in the proxy exist just to test the read code in the proxy along with the write code in amundsengremlin and make sure they agree. Moving them into a separate test class was a good idea, thanks!

Our proxy supports tables, users, but not currently dashboards.

I think one could pretty easily build a publisher for databuilder that uses the amundsengremlin features to lob data into AWS instead of neo4j, we just have a different architecture so we didn't have one built yet! The example code should make it pretty easy if anyone wants to try =)

setup.py Outdated Show resolved Hide resolved
@feng-tao
Copy link
Member

feng-tao commented Oct 1, 2020

also, could you rebase with master? There are some small conflict files.

add a bunch of test only proxy endpoints to support roundtrip tests.
pin specific common version to reduce surprises.

Signed-off-by: Joshua Hoskins <[email protected]>
Per comment, remove deprecated aws4auth code.

Signed-off-by: Joshua Hoskins <[email protected]>
Remove typing from requirements
Move put methods to a roundtrip test class. Fix super() so this works in roundtrip tests with multiple inheritance.

Signed-off-by: Joshua Hoskins <[email protected]>
@feng-tao
Copy link
Member

feng-tao commented Oct 1, 2020

thanks, I will do a quick sanity test with our staging with your pr and get back to you.

Copy link
Member

@feng-tao feng-tao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @friendtocephalopods and everyone who contributed the gremlin integration! lgtm with one nit. And I also do a quick sanity test with the change with our staging as well.

requirements.txt Outdated
# A common package that holds the models deifnition and schemas that are used
# accross different amundsen repositories.
amundsen-common>=0.3.6,<1.0
amundsen-common==0.5.3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we still make it a range?

Copy link
Contributor Author

@friendtocephalopods friendtocephalopods Oct 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, sure, we could make it >=0.5.3. This will need to be bumped anytime someone wants to use an updated common object (changes in 0.5.3 meant I had to change this in my PR from 0.5.2!), but we maybe don't care about all minor changes.

  • >=

requirements.txt Outdated
# A common package that holds the models deifnition and schemas that are used
# accross different amundsen repositories.
amundsen-common>=0.3.6,<1.0
amundsen-common==0.5.3
amundsen-gremlin==0.0.4
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

LOGGER = logging.getLogger(__name__)


class RoundtripGremlinProxy(GenericGremlinProxy, RoundtripBaseProxy):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, we could file a ticket to ask for help and see if the community could help to implement the round trip test for neo4j/atlas.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds great! I'd definitely suggest taking the approach we did with Neptune -- move the neo4j publishing code from databuilder into a shared lib so we can use the same code both in databuilder to load data in production and in metadata service to validate read/write in the roundtrip tests.

Signed-off-by: Joshua Hoskins <[email protected]>
@feng-tao
Copy link
Member

feng-tao commented Oct 1, 2020

thanks, merge it now, may worth sharing in the slack channel for the 3rd option for metadata backend!

@feng-tao feng-tao merged commit 09845d4 into amundsen-io:master Oct 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants