This connector extracts BigQuery lineage information from a Google cloud project using Python Client for Cloud Logging. It computes dataset lineage from jobChange event from the BigQuery audit logs.
Create a Service Accounts based on the Setup guide for the general BigQuery connector.
See Access Control for more information.
The config file inherits all the required and optional fields from the general BigQuery connector Config File. In addition, you can specify the following configurations:
# (Optional) Whether to enable parsing view definition to build view lineage, default True
enable_view_lineage: <boolean>
# (Optional) Whether to enable parsing audit log to find table lineage information, default True
enable_lineage_from_log: <boolean>
# (Optional) Whether to include self-referencing loops in lineage, default True
include_self_lineage: <boolean>
# (Optional) Number of days of logs to extract for lineage analysis. Default to 7.
lookback_days: <days>
# (Optional) The number of access logs fetched in a batch, default to 1000, value must be in range 0 - 1000
batch_size: <batch_size>
Follow the Installation instructions to install metaphor-connectors
in your environment (or virtualenv). Make sure to include either all
or bigquery
extra.
Run the following command to test the connector locally:
metaphor bigquery.lineage <config_file>
Manually verify the output after the run finishes.