Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SIP-137] Can we limit a user to only read certain tables in the database connection? #29171

Open
ttdpro98 opened this issue Jun 11, 2024 · 25 comments
Assignees
Labels
authentication:access-control Rlated to access control sip Superset Improvement Proposal

Comments

@ttdpro98
Copy link

ttdpro98 commented Jun 11, 2024

Please make sure you are familiar with the SIP process documented
here. The SIP will be numbered by a committer upon acceptance.

[SIP-137] Proposal for limit a user to only read certain tables in the database connection

Motivation

We are allowing a number of users to create dashboards and query data on SQL Lab with the database connections that have been pre-configured by the Superset admin.

However, once a user is allowed access to the database connection, it means they can read all the data in that database, which we do not want. Can we limit a user to only read a specific schema or certain tables directly in Superset without having to create multiple connections to one database with different permissions?

Proposed Change

We can grant read permissions to users for individual schemas or tables directly on Superset. I believe this can be done if Superset can fetch schema metadata from the databases.

@ttdpro98 ttdpro98 added the sip Superset Improvement Proposal label Jun 11, 2024
@ttdpro98 ttdpro98 changed the title [SIP] Can we limit a user to only read certain tables in the database connection? [SIP-137] Can we limit a user to only read certain tables in the database connection? Jun 11, 2024
@dosubot dosubot bot added the authentication:access-control Rlated to access control label Jun 11, 2024
@ttdpro98
Copy link
Author

Delegating permissions to users who can read specific tables in the database is quite important. I've worked at 3 companies that use superset for data mining, and all have raised the same issue.
I believe this improvement will make Superset a much more powerful data mining tool.

@rusackas
Copy link
Member

It might be worth filling out the other parts of the SIP template that are missing here. Otherwise, let me know if you need any help kicking off the [DISCUSS] thread on the Superset dev mailing list to move it forward.

@ttdpro98
Copy link
Author

@rusackas as I read in 5602, I'm missing descriptions of New or Changed Public Interfaces, New dependencies, Migration Plan and Compatibility and Rejected Alternatives. However, I find these descriptions unnecessary. Could you tell me what information we should add to make it clearer?

@rusackas
Copy link
Member

CC @dpgaspar @yousoph for comments/consideration.

@yousoph
Copy link
Member

yousoph commented Jun 18, 2024

Hi there!

If you take a look at the List Roles page, you should be able to set up data permissions for schemas and datasources:
Image

There's some additional information here and here as well - does that address what you're looking to achieve?

@ttdpro98
Copy link
Author

@yousoph I'm aware of that feature, but it is not useful for determining which tables a user can access when querying in SQL Lab

@andy-clapson
Copy link
Contributor

This is something that should be managed at the db level (using different users), which I know you said was something you didn't want to do. Otherwise data source permissions is the only thing that remotely fits.

Overall I think if you are going to give people permission to use sqllab, they need to be trusted with whatever data that user can access in the database. There's really no difference between sqllab and any other SQL IDE/workbench.

@ttdpro98
Copy link
Author

@andy-clapson This is not quite right. For querying on the IDE, we use personal accounts to query the database.

But on Superset, we query the database and create datasets to draw dashboards. Therefore, we cannot use personal-account for each dashboard when querying the database. We need to use a service account to represent Superset in querying the database.

And I also believe that the dashboards on Superset are a production environment. We are not allowed to use personal-account for the dashboards, and we cannot rely on the assumption that people never make mistakes. We need to avoid situations of human error.

@andy-clapson
Copy link
Contributor

I think my point still applies then - if you are letting people query production databases, they should be trusted with that data (to the extent of what that particular db user allows them to access). If this doesn't fit, I think you are stuck using data sources as your control layer in some way, and SQLlab access is off-limits.

@rusackas
Copy link
Member

rusackas commented Jul 2, 2024

I think it's safe to say we need more detail in the proposal... we need to better understand the exact implementation here... how it's manifested in the UI, the default access, how it interacts with datasource permissions, etc. If this happens to be something you already have working on a fork, a Draft PR might help move the conversation forward as well.

@rusackas
Copy link
Member

Hi there. Checking in. If we don't get more details or a complete template and technical proposal, this SIP is likely to be closed as incomplete/inactive.

@inctl
Copy link

inctl commented Sep 5, 2024

Hello,

I'm having the same problem using apache pinot in superset.
Case study:
A hundred clients with different rights access SuperSet. Superset rights management is used to limit access to the view.
But as soon as the client looks at the network requests, they can retrieve the URL, account name and password from the database to reuse it and gain full access to the data.

In this case, we have an Internet accessible solution, but I don't want my database to have the same access.

Solution :
Pass requests to the superset backend before reaching the databases + add a security layer based on superset roles.
The databases will then not be accessible directly.

As it is, I can't use superset for my project, even though the solution meets my needs perfectly.

Thank you advance,

@ttdpro98
Copy link
Author

ttdpro98 commented Sep 5, 2024

@rusackas we have another case study, let's examine it

@rusackas
Copy link
Member

Pinging @dpgaspar for input here... we still need more details on the technical approach here, but if there's a security issue at hand, he should be aware, and we have the right to close/wipe this issue from public view. Please note that all security concerns should not be made public, but should be reported directly to [email protected].

@dpgaspar
Copy link
Member

@inctl please report to [email protected] with further details, follow https://github.com/apache/superset/security/policy

@rusackas
Copy link
Member

rusackas commented Nov 6, 2024

@dpgaspar should we close this or leave it open?

@rusackas
Copy link
Member

rusackas commented Dec 4, 2024

This SIP hasn't moved forward in terms of ASF/SIP process (i.e. a Discussion email thread) and might be closed in time, so I'm wondering how to carry the idea forward. If there are requests for increased specificity of the permissions in the security model, I would propose that we close this SIP, and instead carry the requirement over to the discussion on [SIP-131] Superset Security Model Redesign. Would that seem appropriate?

@ttdpro98
Copy link
Author

Hi guys, I have sketched out a basic idea of a role-based model on Superset and am looking for the best way to describe it clearly. I will send it to you in the next two weeks. Should I post it on this GitHub issue or through another channel? Please suggest.

@rusackas

@rusackas
Copy link
Member

@ttdpro98 that depends on how big of a change it is. @mistercrunch is proposing a new permissions model over here. Maybe you want to add your thoughts and use cases over there?

@mistercrunch
Copy link
Member

mistercrunch commented Dec 13, 2024

Some comments here, first I think the main topic here, given that we do have the permission model and logic to support dataset restrictions in Explore/Dashboard, is something more like "enforcing schema-level and physical dataset-level access in SQL Lab". Currently - and I don't know if it's well documented today - SQL Lab data access-level policy is limited to "Database connection", meaning that if you have access to the database connection AND have SQL Lab access, you can run whatever queries you want. We may need to clarify this in the docs.

Now, enforcing the rules we have around schema AND dataset-level access in SQL Lab is tricky because it requires parsing table/view names out of SQL Lab user-defined SQL, and checking against the perms. Note that we do have methods that are pretty well tested in the codebase to do that, but anything that relies on parsing SQL (in this case to extract "relations" (ie views and tables) names and sometimes guessing their schema) is a potential security risk.

Today it'd be fairly easy to add something that uses methods we have already to extracts the relations name and checks against schema-level and relations-level perms on the SQL Lab side of the house, but it's not impossible someone could find a way to work around that. Maybe they could achieve that with obfuscated SQL, through some database-specific UDTFs-like constructs, or by creating views (if they're allowed to do that) in a schema they have access to that would be pointing to another schema they don't have access to.

Now, given that it'd be possible to build something that solves for this but isn't bulletproof, my recommendation would be to put this behind a feature flag, call it "DANGEROUS__TRY_TO_ENFORCE_DATA_ACCESS_POLICY_IN_SQL_LAB", with the proper disclaimer, something like:

# This feature flag tries to enforce schema-level and physical-dataset access defined for Explore/Dashboard in SQL LAB
# this is done by parsing the user-defined SQL and extracting/infereing schema and table/view names out of that SQL using 
# sqlglot/sqlparse. While this should just work in most cases, it may be possible to use this as an attack vector
# using some SQL trickery. For instance if a user has DDL access, they could run a CREATE VIEW in a schema 
# they have full access to against another schema they don't have access to to bypass the policies in place
DANGEROUS__ENFORCE_DATA_ACCESS_POLICY_IN_SQL_LAB: False,

@mistercrunch
Copy link
Member

If my interpretation of the use case is right, we should rename the sip to "enforcing schema-level and physical dataset-level access in SQL Lab" or similar

@mistercrunch
Copy link
Member

mistercrunch commented Dec 13, 2024

About the current state of things and whether it needs clearer documentation, doing a bit more thinking, I think today, for someone to use SQL Lab they need access to the database connection, which implies that they have access to everything inside that database connection. Now it's possible for someone to give access to a db connection and define schema-level AND dataset-level access, which I believe is disregarded since the database-access-level permission overrides/superseeds that.

@mistercrunch
Copy link
Member

Pointers to relevant code ->

@andy-clapson
Copy link
Contributor

About the current state of things and whether it needs clearer documentation, doing a bit more thinking, I think today, for someone to use SQL Lab they need access to the database connection, which implies that they have access to everything inside that database connection. Now it's possible for someone to give access to a db connection and define schema-level AND dataset-level access, which I believe is disregarded since the database-access-level permission overrides that.

Very much my $0.02 here.
Docs could always be better for everything, everywhere, of course. But IMO it feels weird to spend time/effort expanding Superset's features to include controls for a SQL IDE, which is really just a user connecting to a DB, and up to the DBA and whatever permissions model is implemented at the organization. Feels outside the scope of a bi tool, certainly.

I always thought of SQLLab as "cool, there is an IDE here I can use, that's nice to have".
If the use case was to allow certain users ONLY access to existing Superset datasets, that would make a bit more sense.

@mistercrunch
Copy link
Member

Yeah, unclear if a SQL IDE should even attempt to do this, though I can see the use case. Like i have a set of say 3 roles:

  • data engineer - all access
  • data scientist - subset of schemas
  • data consumers - another subset of schemas

These people all know how to write SQL and need to write SQL as part of their job, and I want for that schema-access enforcement carries through Superset.

Now today what you'd do is probably create 3 different database connections, each with a different service account with schema-level restrictions, and give access to each role independently. It does kind of get messy where you may have to use different connections for explore if people want to share/collab on certain datasets. While the physical dataset may be the same, Superset would see them as different datasets as they use different connections. For that, you'd probably need a 4th database connection that like "MAIN CONNECTION - with data access policy" on top of the "DATA SCIENTIST DATABASE CONNECTOIN FOR SQL LAB". It gets kind of messy/confusing quick, but in theory would work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
authentication:access-control Rlated to access control sip Superset Improvement Proposal
Projects
Development

No branches or pull requests

7 participants