Collects grants from tables and views.
This Python script provides a class InventoryCollector that scans a Spark environment for database objects and access control grants. It stores the results in two tables: db_objects and grant_statements.
Note: this has only been tested on one particular client and the sets of objects they have there. Notably it does NOT handle migration of functions, or automatic determination of when to migrate external tables to managed tables, though that can be manually invoked on the class provided.
To use the DbInventoryCollector, first instantiate the class with 3 arguments:
- spark -- the spark object
- inventory_catalog -- What catalog to use to save inventory results to. Must already exist.
- inventory_database -- What database to save inventory results to. Will be created.
scanner = InventoryCollector(spark, 'INVENTORY_CATALOG', 'INVENTORY_DATABASE')
To scan all databases in the Spark environment:
scanner.scan_all_databases()
scanner.scan_all_databases(rescan=True)
To scan a specific database for objects and grants:
database_name = "example_database"
# Scan database objects
(object_exec_id, object_df) = scanner.scan_database_objects(database_name)
# Scan database grants
(grant_exec_id, grant_df) = scanner.scan_database_grants(database_name)
Lists one row per database with summarized counts of scan results.
summary_df = scanner.get_database_inventory_summary()
display(summary_df)
All data is stored in append-only delta tables. This means if you re-run multiple times, there will be multiple scan results. Results are indexed by a data_type
and a execution_id
. The data_type
s are 'object' and 'grant' and are stored in two different tables.
This table stores the inventory of database objects. Each row represents an object (table or view) in a specific database.
Column Name | Data Type | Description |
---|---|---|
objectType | string | The type of the database object (table or view). |
objectKey | string | The full name of the database object, including database name and table/view name. |
database | string | The name of the database the object belongs to. |
tableName | string | The name of the table or view. |
isTemporary | boolean | Indicates if the table or view is temporary. |
inventory_execution_id | string | A unique identifier for the inventory execution. |
execution_time | timestamp | The time when the inventory execution took place. |
source_database | string | The name of the source database where the object is located. |
This table stores the access control grant statements. Each row represents a grant statement for a specific database object or the database itself.
Column Name | Data Type | Description |
---|---|---|
Principal | string | The principal (user or role) to which the grant applies. |
ActionType | string | The action type (e.g., SELECT, INSERT, UPDATE , DELETE) granted by the statement. |
ObjectType | string | The type of the object (table, view, or database ) the grant applies to. |
ObjectKey | string | The full name of the object (including database name and table/view name) or the database name the grant applies to. |
inventory_execution_id | string | A unique identifier for the inventory execution. |
execution_time | timestamp | The time when the inventory execution took place. |
source_database | string | The name of the source database where the grant is located. |
grant_statement | string | The grant statement represented as a string . |
The DbInventoryAccessControlScanner class also provides methods to retrieve execution histories and summaries, as well as to filter results by execution ID.
get_results_by_execution_id(execution_id, data_type=None)
get_last_execution(data_type="grants", database_name=None)
get_all_latest_executions(data_type="grants")
get_last_results(*args)
get_execution_history(data_type=None)
get_database_inventory_summary()