-
Notifications
You must be signed in to change notification settings - Fork 14.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SIP-137] Can we limit a user to only read certain tables in the database connection? #29171
Comments
Delegating permissions to users who can read specific tables in the database is quite important. I've worked at 3 companies that use superset for data mining, and all have raised the same issue. |
It might be worth filling out the other parts of the SIP template that are missing here. Otherwise, let me know if you need any help kicking off the [DISCUSS] thread on the Superset dev mailing list to move it forward. |
@yousoph I'm aware of that feature, but it is not useful for determining which tables a user can access when querying in SQL Lab |
This is something that should be managed at the db level (using different users), which I know you said was something you didn't want to do. Otherwise data source permissions is the only thing that remotely fits. Overall I think if you are going to give people permission to use sqllab, they need to be trusted with whatever data that user can access in the database. There's really no difference between sqllab and any other SQL IDE/workbench. |
@andy-clapson This is not quite right. For querying on the IDE, we use personal accounts to query the database. But on Superset, we query the database and create datasets to draw dashboards. Therefore, we cannot use personal-account for each dashboard when querying the database. We need to use a service account to represent Superset in querying the database. And I also believe that the dashboards on Superset are a production environment. We are not allowed to use personal-account for the dashboards, and we cannot rely on the assumption that people never make mistakes. We need to avoid situations of human error. |
I think my point still applies then - if you are letting people query production databases, they should be trusted with that data (to the extent of what that particular db user allows them to access). If this doesn't fit, I think you are stuck using data sources as your control layer in some way, and SQLlab access is off-limits. |
I think it's safe to say we need more detail in the proposal... we need to better understand the exact implementation here... how it's manifested in the UI, the default access, how it interacts with datasource permissions, etc. If this happens to be something you already have working on a fork, a Draft PR might help move the conversation forward as well. |
Hi there. Checking in. If we don't get more details or a complete template and technical proposal, this SIP is likely to be closed as incomplete/inactive. |
Hello, I'm having the same problem using apache pinot in superset. In this case, we have an Internet accessible solution, but I don't want my database to have the same access. Solution : As it is, I can't use superset for my project, even though the solution meets my needs perfectly. Thank you advance, |
@rusackas we have another case study, let's examine it |
Pinging @dpgaspar for input here... we still need more details on the technical approach here, but if there's a security issue at hand, he should be aware, and we have the right to close/wipe this issue from public view. Please note that all security concerns should not be made public, but should be reported directly to [email protected]. |
@inctl please report to [email protected] with further details, follow https://github.com/apache/superset/security/policy |
@dpgaspar should we close this or leave it open? |
This SIP hasn't moved forward in terms of ASF/SIP process (i.e. a Discussion email thread) and might be closed in time, so I'm wondering how to carry the idea forward. If there are requests for increased specificity of the permissions in the security model, I would propose that we close this SIP, and instead carry the requirement over to the discussion on [SIP-131] Superset Security Model Redesign. Would that seem appropriate? |
Hi guys, I have sketched out a basic idea of a role-based model on Superset and am looking for the best way to describe it clearly. I will send it to you in the next two weeks. Should I post it on this GitHub issue or through another channel? Please suggest. |
@ttdpro98 that depends on how big of a change it is. @mistercrunch is proposing a new permissions model over here. Maybe you want to add your thoughts and use cases over there? |
Some comments here, first I think the main topic here, given that we do have the permission model and logic to support dataset restrictions in Explore/Dashboard, is something more like "enforcing schema-level and physical dataset-level access in SQL Lab". Currently - and I don't know if it's well documented today - SQL Lab data access-level policy is limited to "Database connection", meaning that if you have access to the database connection AND have SQL Lab access, you can run whatever queries you want. We may need to clarify this in the docs. Now, enforcing the rules we have around schema AND dataset-level access in SQL Lab is tricky because it requires parsing table/view names out of SQL Lab user-defined SQL, and checking against the perms. Note that we do have methods that are pretty well tested in the codebase to do that, but anything that relies on parsing SQL (in this case to extract "relations" (ie views and tables) names and sometimes guessing their schema) is a potential security risk. Today it'd be fairly easy to add something that uses methods we have already to extracts the relations name and checks against schema-level and relations-level perms on the SQL Lab side of the house, but it's not impossible someone could find a way to work around that. Maybe they could achieve that with obfuscated SQL, through some database-specific UDTFs-like constructs, or by creating views (if they're allowed to do that) in a schema they have access to that would be pointing to another schema they don't have access to. Now, given that it'd be possible to build something that solves for this but isn't bulletproof, my recommendation would be to put this behind a feature flag, call it "DANGEROUS__TRY_TO_ENFORCE_DATA_ACCESS_POLICY_IN_SQL_LAB", with the proper disclaimer, something like:
|
If my interpretation of the use case is right, we should rename the sip to "enforcing schema-level and physical dataset-level access in SQL Lab" or similar |
About the current state of things and whether it needs clearer documentation, doing a bit more thinking, I think today, for someone to use SQL Lab they need access to the database connection, which implies that they have access to everything inside that database connection. Now it's possible for someone to give access to a db connection and define schema-level AND dataset-level access, which I believe is disregarded since the database-access-level permission overrides/superseeds that. |
Pointers to relevant code ->
|
Very much my $0.02 here. I always thought of SQLLab as "cool, there is an IDE here I can use, that's nice to have". |
Yeah, unclear if a SQL IDE should even attempt to do this, though I can see the use case. Like i have a set of say 3 roles:
These people all know how to write SQL and need to write SQL as part of their job, and I want for that schema-access enforcement carries through Superset. Now today what you'd do is probably create 3 different database connections, each with a different service account with schema-level restrictions, and give access to each role independently. It does kind of get messy where you may have to use different connections for explore if people want to share/collab on certain datasets. While the physical dataset may be the same, Superset would see them as different datasets as they use different connections. For that, you'd probably need a 4th database connection that like "MAIN CONNECTION - with data access policy" on top of the "DATA SCIENTIST DATABASE CONNECTOIN FOR SQL LAB". It gets kind of messy/confusing quick, but in theory would work. |
Please make sure you are familiar with the SIP process documented
here. The SIP will be numbered by a committer upon acceptance.
[SIP-137] Proposal for limit a user to only read certain tables in the database connection
Motivation
We are allowing a number of users to create dashboards and query data on SQL Lab with the database connections that have been pre-configured by the Superset admin.
However, once a user is allowed access to the database connection, it means they can read all the data in that database, which we do not want. Can we limit a user to only read a specific schema or certain tables directly in Superset without having to create multiple connections to one database with different permissions?
Proposed Change
We can grant read permissions to users for individual schemas or tables directly on Superset. I believe this can be done if Superset can fetch schema metadata from the databases.
The text was updated successfully, but these errors were encountered: