This connector extracts technical metadata from a Snowflake account using Snowflake Connector.
We recommend creating a dedicated Snowflake user with limited permissions for the connector to use by running the following statements using ACCOUNTADMIN
role:
use role ACCOUNTADMIN;
set warehouse = '<warehouse>';
set role = 'metaphor_role';
set user = 'metaphor_user';
set password = '<password>';
-- Create metaphor_user
create user identifier($user)
password = $password
default_warehouse = $warehouse
default_role = $role
comment ='User for Metaphor crawler';
-- Create metaphor_role
create role identifier($role) comment ='Limited access role for Metaphor connector';
grant role identifier($role) to user identifier($user);
grant usage on warehouse identifier($warehouse) to role identifier($role);
-- Grant privilege to access Snowflake Account Usage views:
grant imported privileges on database snowflake to role identifier($role);
For each database, including the inbound shared databases, run the following statements to grant the required privileges:
set db = '<database>';
-- Grant usage & references privileges to query information_schema
grant usage on warehouse identifier($warehouse) to role identifier($role);
grant usage on database identifier($db) to role identifier($role);
grant usage on all schemas in database identifier($db) to role identifier($role);
grant usage on future schemas in database identifier($db) to role identifier($role);
grant references on all tables in database identifier($db) to role identifier($role);
grant references on future tables in database identifier($db) to role identifier($role);
grant references on all views in database identifier($db) to role identifier($role);
grant references on future views in database identifier($db) to role identifier($role);
grant references on all materialized views in database identifier($db) to role identifier($role);
grant references on future materialized views in database identifier($db) to role identifier($role);
-- (Optional) Grant privilege to "show streams"
grant select on all streams in database identifier($db) to role identifier($role);
grant select on future streams in database identifier($db) to role identifier($role);
-- (Optional) Grant privilege to "show iceberg tables"
grant select on all iceberg tables in database identifier($db) to role identifier($role);
grant select on future iceberg tables in database identifier($db) to role identifier($role);
If you intend to use key pair authentication instead of password, follow the Snowflake instruction to generate a key pair. After that, assign the public key to the user using the following command:
alter user identifier($user) set rsa_public_key='<public_key_content>';
Create a YAML config file based on the following template.
If using user password authentication:
account: <snowflake_account>
user: <snowflake_username>
password: <snowflake_password>
role: <snowflake_role> # Optional. Will use default role if not specified.
default_database: <default_database_for_connections>
If using key pair authentication:
account: <snowflake_account>
user: <snowflake_username>
private_key:
key_file: <private_key_file>
passphrase: <private_key_encoding_passphrase>
role: <snowflake_role> # Optional. Will use default role if not specified.
default_database: <default_database_for_connections>
The private_key.passphrase
is only needed if using encrypted version of the private key. Otherwise, it can be omitted from the config.
You can also specify the content of the private key file directly in the config like this:
private_key:
key_data: |
-----BEGIN ENCRYPTED PRIVATE KEY-----
...
-----END ENCRYPTED PRIVATE KEY-----
See Output Config for more information.
See Filter Config for more information on the optional filter
config.
By default, the connector will read the QUERY_HISTORY, ACCESS_HISTORY, and TAG_REFERENCES views from SNOWFLAKE.ACCOUNT_USAGE schema. If you do not wish to grant read access to the entire SNOWFLAKE database, you can mirror these views to a different schema and ask the connector to read from it instead:
account_usage_schema: <db_name>.<schema_name>
See Tag Matcher Config for more information on the optional tag_matcher
config.
To stop the crawler from collecting platform tags from Snowflake, set collect_tags
to False
:
collect_tags: false # Default is true.
By default, the snowflake connector will fetch a full day's query logs from yesterday, to be analyzed for additional metadata, such as dataset usage and lineage information. To backfill log data, one can set lookback_days
to the desired value. To turn off query log fetching, set lookback_days
to 0.
query_log:
# (Optional) Number of days of query logs to fetch. Default to 1. If 0, the no query logs will be fetched.
lookback_days: <days>
# (Optional) A list of users whose queries will be excluded from the log fetching.
excluded_usernames:
- <user_name1>
- <user_name2>
# (Optional) The number of query logs to fetch from Snowflake in one batch. Default to 100000.
fetch_size: <number_of_logs>
See Process Query for more information on the optional process_query_config
config.
The max number of concurrent queries to the snowflake database can be configured as follows,
max_concurrency: <max_number_of_queries> # Default to 10
Each query issued by snowflake connectors can be tagged with a query tag. It can be configured as follows,
query_tag: <query_tag> # Default to 'MetaphorData'
Follow the Installation instructions to install metaphor-connectors
in your environment (or virtualenv). Make sure to include either all
or snowflake
extra.
Run the following command to test the connector locally:
metaphor snowflake <config_file>
Manually verify the output after the run finishes.