Skip to content

Commit

Permalink
Adds the basic structutre of Apache Atlas Proxy (#14)
Browse files Browse the repository at this point in the history
  • Loading branch information
verdan authored and Hans Adriaans committed Jun 30, 2022
1 parent 861ddbb commit 2640350
Show file tree
Hide file tree
Showing 5 changed files with 121 additions and 20 deletions.
19 changes: 1 addition & 18 deletions metadata/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,21 +65,4 @@ This way Metadata service will use production config in production environment.
- Typing hints: Amundsen Metadata service also utilizes [Typing hint](https://docs.python.org/3/library/typing.html "Typing hint") for better readability.

## Code structure
Amundsen metadata service consists of three packages, API, Entity, and Proxy.

### [API package](https://github.com/lyft/amundsenmetadatalibrary/tree/master/metadata_service/api "API package")
A package that contains [Flask Restful resources](https://flask-restful.readthedocs.io/en/latest/api.html#flask_restful.Resource "Flask Restful resources") that serves Restful API request.
The [routing of API](https://flask-restful.readthedocs.io/en/latest/quickstart.html#resourceful-routing "routing of API") is being registered [here](https://github.com/lyft/amundsenmetadatalibrary/blob/master/metadata_service/__init__.py#L67 "here").

### [Proxy package](https://github.com/lyft/amundsenmetadatalibrary/tree/master/metadata_service/proxy "Proxy package")
Proxy package contains proxy modules that talks dependencies of Metadata service. There are currently two modules in Proxy package, [Neo4j](https://github.com/lyft/amundsenmetadatalibrary/blob/master/metadata_service/proxy/neo4j_proxy.py "Neo4j") and [Statsd](https://github.com/lyft/amundsenmetadatalibrary/blob/master/metadata_service/proxy/statsd_utilities.py "Statsd").

##### [Neo4j proxy module](https://github.com/lyft/amundsenmetadatalibrary/blob/master/metadata_service/proxy/neo4j_proxy.py "Neo4j proxy module")
[Neo4j](https://neo4j.com/docs/ "Neo4j") proxy module serves various use case of getting metadata or updating metadata from or into Neo4j. Most of the methods have [Cypher query](https://neo4j.com/developer/cypher/ "Cypher query") for the use case, execute the query and transform into [entity](https://github.com/lyft/amundsenmetadatalibrary/tree/master/metadata_service/entity "entity").

##### [Statsd utilities module](https://github.com/lyft/amundsenmetadatalibrary/blob/master/metadata_service/proxy/statsd_utilities.py "Statsd utilities module")
[Statsd](https://github.com/etsy/statsd/wiki "Statsd") utilities module has methods / functions to support statsd to publish metrics. By default, statsd integration is disabled and you can turn in on from [Metadata service configuration](https://github.com/lyft/amundsenmetadatalibrary/blob/master/metadata_service/config.py "Metadata service configuration").
For specific configuration related to statsd, you can configure it through [environment variable.](https://statsd.readthedocs.io/en/latest/configure.html#from-the-environment "environment variable.")

### [Entity package](https://github.com/lyft/amundsenmetadatalibrary/tree/master/metadata_service/entity "Entity package")
Entity package contains many modules where each module has many Python classes in it. These Python classes are being used as a schema and a data holder. All data exchange within Amundsen Metadata service use classes in Entity to ensure validity of itself and improve readability and mainatability.
Please visit [Code Structure](docs/structure.md) to read how different modules are structured in Amundsen Metadata service.
1 change: 1 addition & 0 deletions metadata/docs/proxy/atlas_proxy.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
[TBD]
28 changes: 28 additions & 0 deletions metadata/docs/structure.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
Amundsen metadata service consists of three packages, API, Entity, and Proxy.

### [API package](https://github.com/lyft/amundsenmetadatalibrary/tree/master/metadata_service/api "API package")
A package that contains [Flask Restful resources](https://flask-restful.readthedocs.io/en/latest/api.html#flask_restful.Resource "Flask Restful resources") that serves Restful API request.
The [routing of API](https://flask-restful.readthedocs.io/en/latest/quickstart.html#resourceful-routing "routing of API") is being registered [here](https://github.com/lyft/amundsenmetadatalibrary/blob/master/metadata_service/__init__.py#L67 "here").

### [Proxy package](https://github.com/lyft/amundsenmetadatalibrary/tree/master/metadata_service/proxy "Proxy package")
Proxy package contains proxy modules that talks dependencies of Metadata service. There are currently three modules in Proxy package,
[Neo4j](https://github.com/lyft/amundsenmetadatalibrary/blob/master/metadata_service/proxy/neo4j_proxy.py "Neo4j"),
[Statsd](https://github.com/lyft/amundsenmetadatalibrary/blob/master/metadata_service/proxy/statsd_utilities.py "Statsd")
and [[WIP] Atlas](https://github.com/lyft/amundsenmetadatalibrary/blob/master/metadata_service/proxy/atlas_proxy.py "Atlas")

Selecting the appropriate proxy (Neo4j or Atlas) is configurable using a config variable `PROXY_CLIENT`,
which takes the path to class name of proxy module available [here](https://github.com/lyft/amundsenmetadatalibrary/blob/master/metadata_service/config.py#L11).

##### [Neo4j proxy module](https://github.com/lyft/amundsenmetadatalibrary/blob/master/metadata_service/proxy/neo4j_proxy.py "Neo4j proxy module")
[Neo4j](https://neo4j.com/docs/ "Neo4j") proxy module serves various use case of getting metadata or updating metadata from or into Neo4j. Most of the methods have [Cypher query](https://neo4j.com/developer/cypher/ "Cypher query") for the use case, execute the query and transform into [entity](https://github.com/lyft/amundsenmetadatalibrary/tree/master/metadata_service/entity "entity").

##### [[WIP] Apache Atlas proxy module](https://github.com/lyft/amundsenmetadatalibrary/blob/master/metadata_service/proxy/atlas_proxy.py "Apache Atlas proxy module")
[Apache Atlas](https://atlas.apache.org/ "Apache Atlas") proxy module serves all of the metadata from Apache Atlas, using [atlasclient](https://atlasclient.readthedocs.io/en/latest/readme.html).
More information on how to setup Apache Atlas to make it compatible with Amundsen can be found [here](proxy/atlas_proxy.md)

##### [Statsd utilities module](https://github.com/lyft/amundsenmetadatalibrary/blob/master/metadata_service/proxy/statsd_utilities.py "Statsd utilities module")
[Statsd](https://github.com/etsy/statsd/wiki "Statsd") utilities module has methods / functions to support statsd to publish metrics. By default, statsd integration is disabled and you can turn in on from [Metadata service configuration](https://github.com/lyft/amundsenmetadatalibrary/blob/master/metadata_service/config.py "Metadata service configuration").
For specific configuration related to statsd, you can configure it through [environment variable.](https://statsd.readthedocs.io/en/latest/configure.html#from-the-environment "environment variable.")

### [Entity package](https://github.com/lyft/amundsenmetadatalibrary/tree/master/metadata_service/entity "Entity package")
Entity package contains many modules where each module has many Python classes in it. These Python classes are being used as a schema and a data holder. All data exchange within Amundsen Metadata service use classes in Entity to ensure validity of itself and improve readability and mainatability.
4 changes: 2 additions & 2 deletions metadata/metadata_service/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,8 @@


PROXY_CLIENTS = {
'NEO4J': 'metadata_service.proxy.neo4j_proxy.Neo4jProxy'
'NEO4J': 'metadata_service.proxy.neo4j_proxy.Neo4jProxy',
'ATLAS': 'metadata_service.proxy.atlas_proxy.AtlasProxy'
}

IS_STATSD_ON = 'IS_STATSD_ON'
Expand All @@ -32,7 +33,6 @@ class LocalConfig(Config):
TESTING = False
LOG_LEVEL = 'DEBUG'
LOCAL_HOST = '0.0.0.0'
NEO4J_ENDPOINT = 'bolt://{LOCAL_HOST}:7687'.format(LOCAL_HOST=LOCAL_HOST)

PROXY_HOST = f'bolt://{LOCAL_HOST}'
PROXY_PORT = 7687
Expand Down
89 changes: 89 additions & 0 deletions metadata/metadata_service/proxy/atlas_proxy.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
from typing import Union, List, Dict, Any

from atlasclient.client import Atlas

from metadata_service.entity.popular_table import PopularTable
from metadata_service.entity.user_detail import User as UserEntity
from metadata_service.entity.table_detail import Table
from metadata_service.proxy import BaseProxy
from metadata_service.util import UserResourceRel


class AtlasProxy(BaseProxy):
"""
Atlas Proxy client for the amundsen metadata
"""

def __init__(self, *,
host: str,
port: int,
user: str = 'admin',
password: str = '') -> None:
"""
Initiate the Apache Atlas client with the provided credentials
"""
self._driver = Atlas(host=host, port=port, username=user, password=password)

def get_user_detail(self, *, user_id: str) -> Union[UserEntity, None]:
pass

def get_table(self, *, table_uri: str) -> Table:
pass

def delete_owner(self, *, table_uri: str, owner: str) -> None:
pass

def add_owner(self, *, table_uri: str, owner: str) -> None:
pass

def get_table_description(self, *,
table_uri: str) -> Union[str, None]:
pass

def put_table_description(self, *,
table_uri: str,
description: str) -> None:
pass

def add_tag(self, *, table_uri: str, tag: str) -> None:
pass

def delete_tag(self, *, table_uri: str, tag: str) -> None:
pass

def put_column_description(self, *,
table_uri: str,
column_name: str,
description: str) -> None:
pass

def get_column_description(self, *,
table_uri: str,
column_name: str) -> Union[str, None]:
pass

def get_popular_tables(self, *,
num_entries: int = 10) -> List[PopularTable]:
return []

def get_latest_updated_ts(self) -> int:
pass

def get_tags(self) -> List:
pass

def get_table_by_user_relation(self, *, user_email: str,
relation_type: UserResourceRel) -> Dict[str, Any]:
pass

def add_table_relation_by_user(self, *,
table_uri: str,
user_email: str,
relation_type: UserResourceRel) -> None:
pass

def delete_table_relation_by_user(self, *,
table_uri: str,
user_email: str,
relation_type: UserResourceRel) -> None:
pass

0 comments on commit 2640350

Please sign in to comment.