-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add neptune proxy #204
feat: add neptune proxy #204
Conversation
139f798
to
edea983
Compare
edea983
to
c239a1f
Compare
Do any of these files need to be updated / removed now that we have upstreamed with amundsen-gremlin (and since we have changed a lot since Jan)? https://github.com/amundsen-io/amundsenmetadatalibrary/pull/96/files |
Good question! I think we can rip out some of the aws4auth stuff. edit: No, I guess it's used by the janus graph tests. Oh well! |
e4a8ddb
to
31a2605
Compare
will take a look this week |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I look at in brief, but it is hard for me to review without much context :( I have a few high level questions:
- do you think we could a doc to describe the gremlin in a bit more detail?
- How do we ingest metadata for gremlin as I recall square doesn't use databuilder?
- Are all the put methods in proxy solely for testing purposes?
- what are the features supported for gremlin so far(dataset, user, dashboard with what metadata)?
- any todos or issues we keep an eye and how we could extend databuilder to build metadata for neptune?
- +1 on @alran 's comment, any cleanup for previous neptune/gremlin proxy?
@@ -77,7 +80,8 @@ class Config: | |||
PROGRAMMATIC_DESCRIPTIONS_EXCLUDE_FILTERS = [] # type: list | |||
|
|||
|
|||
class LocalConfig(Config): | |||
# NB: If you're using the gremlin proxy, the appropriate GremlinConfig must be added to any other configs | |||
class LocalConfig(LocalGremlinConfig, Config): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is subclass GremlinConfig for testing purpose? If that's the case, could we do sth like https://github.com/amundsen-io/amundsenfrontendlibrary/blob/ccfd2d6b82957fef347e956b243e4048c191fc0d/amundsen_application/config.py#L136 which create a test config in the file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, no, it inherits GremlinConfig so we don't need to litter the metadata service and databuilder config files with our gremlin endpoint information. The alternative is duplicating our gremlin-specific config in 3 different repos, which is unpleasant.
also @AndrewCiambrone if you could take a look and review based on your experience with your gremlin pOC, that will be great! |
Thanks for the review, @feng-tao! I added some docs in the amundsengremlin readme. The AWS configuration is maybe the most interesting bit of that still pending, but the example code should help folks get started if they knew how to configure AWS and wanted to try the proxy. Is there some additional information you'd like documented here? We basically have databuilder as an app with cron jobs that fetches data from various sources and then pipes it into the bulk loader using the amundsen-gremlin. Yep, the put methods in the proxy exist just to test the read code in the proxy along with the write code in amundsengremlin and make sure they agree. Moving them into a separate test class was a good idea, thanks! Our proxy supports tables, users, but not currently dashboards. I think one could pretty easily build a publisher for databuilder that uses the amundsengremlin features to lob data into AWS instead of neo4j, we just have a different architecture so we didn't have one built yet! The example code should make it pretty easy if anyone wants to try =) |
6d6c7ab
to
0f58963
Compare
also, could you rebase with master? There are some small conflict files. |
add a bunch of test only proxy endpoints to support roundtrip tests. pin specific common version to reduce surprises. Signed-off-by: Joshua Hoskins <[email protected]>
Per comment, remove deprecated aws4auth code. Signed-off-by: Joshua Hoskins <[email protected]>
0f58963
to
00f37fa
Compare
Remove typing from requirements Move put methods to a roundtrip test class. Fix super() so this works in roundtrip tests with multiple inheritance. Signed-off-by: Joshua Hoskins <[email protected]>
00f37fa
to
a0e9014
Compare
thanks, I will do a quick sanity test with our staging with your pr and get back to you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @friendtocephalopods and everyone who contributed the gremlin integration! lgtm with one nit. And I also do a quick sanity test with the change with our staging as well.
requirements.txt
Outdated
# A common package that holds the models deifnition and schemas that are used | ||
# accross different amundsen repositories. | ||
amundsen-common>=0.3.6,<1.0 | ||
amundsen-common==0.5.3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we still make it a range?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, sure, we could make it >=0.5.3
. This will need to be bumped anytime someone wants to use an updated common object (changes in 0.5.3 meant I had to change this in my PR from 0.5.2!), but we maybe don't care about all minor changes.
- >=
requirements.txt
Outdated
# A common package that holds the models deifnition and schemas that are used | ||
# accross different amundsen repositories. | ||
amundsen-common>=0.3.6,<1.0 | ||
amundsen-common==0.5.3 | ||
amundsen-gremlin==0.0.4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
LOGGER = logging.getLogger(__name__) | ||
|
||
|
||
class RoundtripGremlinProxy(GenericGremlinProxy, RoundtripBaseProxy): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, we could file a ticket to ask for help and see if the community could help to implement the round trip test for neo4j/atlas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds great! I'd definitely suggest taking the approach we did with Neptune -- move the neo4j publishing code from databuilder into a shared lib so we can use the same code both in databuilder to load data in production and in metadata service to validate read/write in the roundtrip tests.
Signed-off-by: Joshua Hoskins <[email protected]>
thanks, merge it now, may worth sharing in the slack channel for the 3rd option for metadata backend! |
Summary of Changes
add a bunch of test only proxy endpoints to support roundtrip tests.
Use Stat instead of Statistics from latest common.
pin specific common version to reduce surprises (and update to latest common).
Tests
Added abstract, gremlin, neptune tests.
Modified some tests to use Stat instead of Statistics (noop change)
Documentation
Added documentation for how to run roundtrip tests, assuming proper database configuration exists.
CheckList
Make sure you have checked all steps below to ensure a timely review.
make test