diff --git a/docs-website/sidebars.js b/docs-website/sidebars.js index 42de4571eee23..ea98153080cc3 100644 --- a/docs-website/sidebars.js +++ b/docs-website/sidebars.js @@ -273,7 +273,6 @@ module.exports = { "docs/deploy/gcp", "docker/README", "docs/deploy/kubernetes", - "docs/how/updating-datahub", { Authentication: [ "docs/authentication/README", @@ -318,6 +317,7 @@ module.exports = { "docs/advanced/no-code-upgrade", ], }, + "docs/how/updating-datahub", ], "Developer Guides": [ // The purpose of this section is to provide developers & technical users with diff --git a/docs/authentication/README.md b/docs/authentication/README.md index 9d2594e9fe238..4034cb15cfd22 100644 --- a/docs/authentication/README.md +++ b/docs/authentication/README.md @@ -1,41 +1,55 @@ # Overview -Authentication is the process of verifying the identity of a user or service. In DataHub this can be split into 2 main components: - - How to login into DataHub. - - How to make some action withing DataHub on **behalf** of a user/service. +Authentication is the process of verifying the identity of a user or service. There are two +places where Authentication occurs inside DataHub: -:::note +1. DataHub frontend service when a user attempts to log in to the DataHub application. +2. DataHub backend service when making API requests to DataHub. -Authentication in DataHub does not necessarily mean that the user/service being authenticated will be part of the metadata graph within DataHub itself other concepts like Datasets or Dashboards. -In other words, a user called `john.smith` logging into DataHub does not mean that john.smith appears as a CorpUser Entity within DataHub. +In this document, we'll tak a closer look at both. -For a quick video on that subject, have a look at our video on [DataHub Basics — Users, Groups, & Authentication 101 -](https://youtu.be/8Osw6p9vDYY) +### Authentication in the Frontend -::: +Authentication of normal users of DataHub takes place in two phases. -### Authentication in the Frontend +At login time, authentication is performed by either DataHub itself (via username / password entry) or a third-party Identity Provider. Once the identity +of the user has been established, and credentials validated, a persistent session token is generated for the user and stored +in a browser-side session cookie. + +DataHub provides 3 mechanisms for authentication at login time: + +- **Native Authentication** which uses username and password combinations natively stored and managed by DataHub, with users invited via an invite link. +- [Single Sign-On with OpenID Connect](guides/sso/configure-oidc-react.md) to delegate authentication responsibility to third party systems like Okta or Google/Azure Authentication. This is the recommended approach for production systems. +- [JaaS Authentication](guides/jaas.md) for simple deployments where authenticated users are part of some known list or invited as a [Native DataHub User](guides/add-users.md). + +In subsequent requests, the session token is used to represent the authenticated identity of the user, and is validated by DataHub's backend service (discussed below). +Eventually, the session token is expired (24 hours by default), at which point the end user is required to log in again. + +### Authentication in the Backend (Metadata Service) -Authentication in DataHub happens at 2 possible moments, if enabled. +When a user makes a request for Data within DataHub, the request is authenticated by DataHub's Backend (Metadata Service) via a JSON Web Token. This applies to both requests originating from the DataHub application, +and programmatic calls to DataHub APIs. There are two types of tokens that are important: -The first happens in the **DataHub Frontend** component when you access the UI. -You will be prompted with a login screen, upon which you must supply a username/password combo or OIDC login to access DataHub's UI. -This is typical scenario for a human interacting with DataHub. +1. **Session Tokens**: Generated for users of the DataHub web application. By default, having a duration of 24 hours. +These tokens are encoded and stored inside browser-side session cookies. +2. **Personal Access Tokens**: These are tokens generated via the DataHub settings panel useful for interacting +with DataHub APIs. They can be used to automate processes like enriching documentation, ownership, tags, and more on DataHub. Learn +more about Personal Access Tokens [here](personal-access-tokens.md). -DataHub provides 2 methods of authentication: - - [JaaS Authentication](guides/jaas.md) for simple deployments where authenticated users are part of some known list or invited as a [Native DataHub User](guides/add-users.md). - - [OIDC Authentication](guides/sso/configure-oidc-react.md) to delegate authentication responsibility to third party systems like Okta or Google/Azure Authentication. This is the recommended approach for production systems. +To learn more about DataHub's backend authentication, check out [Introducing Metadata Service Authentication](introducing-metadata-service-authentication.md). -Upon validation of a user's credentials through one of these authentication systems, DataHub will generate a session token with which all subsequent requests will be made. +Credentials must be provided as Bearer Tokens inside of the **Authorization** header in any request made to DataHub's API layer. To learn -### Authentication in the Backend +```shell +Authorization: Bearer +``` -The second way in which authentication occurs, is within DataHub's Backend (Metadata Service) when a user makes a request either through the UI or through APIs. -In this case DataHub makes use of Personal Access Tokens or session HTTP headers to apply actions on behalf of some user. -To learn more about DataHub's backend authentication have a look at our docs on [Introducing Metadata Service Authentication](introducing-metadata-service-authentication.md). +Note that in DataHub local quickstarts, Authentication at the backend layer is disabled for convenience. This leaves the backend +vulnerable to unauthenticated requests and should not be used in production. To enable +backend (token-based) authentication, simply set the `METADATA_SERVICE_AUTH_ENABLED=true` environment variable +for the datahub-gms container or pod. -Note, while authentication can happen on both the frontend or backend components of DataHub, they are separate, related processes. -The first is to authenticate users/services by a third party system (Open-ID connect or Java based authentication) and the latter to only permit identified requests to be accepted by DataHub via access tokens or bearer cookies. +### References -If you only want some users to interact with DataHub's UI, enable authentication in the Frontend and manage who is allowed either through JaaS or OIDC login methods. -If you want users to be able to access DataHub's backend directly without going through the UI in an authenticated manner, then enable authentication in the backend and generate access tokens for them. +For a quick video on the topic of users and groups within DataHub, have a look at [DataHub Basics — Users, Groups, & Authentication 101 +](https://youtu.be/8Osw6p9vDYY) \ No newline at end of file diff --git a/docs/authentication/guides/add-users.md b/docs/authentication/guides/add-users.md index c25c1543f77f2..cdf3c762b89ab 100644 --- a/docs/authentication/guides/add-users.md +++ b/docs/authentication/guides/add-users.md @@ -1,75 +1,109 @@ -# Adding Users to DataHub +# Onboarding Users to DataHub -Users can log into DataHub in 3 ways: +New user accounts can be provisioned on DataHub in 3 ways: -1. Invite users via the UI -2. Static credentials -3. Single Sign-On via [OpenID Connect](https://www.google.com/search?q=openid+connect&oq=openid+connect&aqs=chrome.0.0i131i433i512j0i512l4j69i60l2j69i61.1468j0j7&sourceid=chrome&ie=UTF-8) (For Production Use) +1. Shared Invite Links +2. Single Sign-On using [OpenID Connect](https://www.google.com/search?q=openid+connect&oq=openid+connect&aqs=chrome.0.0i131i433i512j0i512l4j69i60l2j69i61.1468j0j7&sourceid=chrome&ie=UTF-8) +3. Static Credential Configuration File (Self-Hosted Only) -which can be enabled simultaneously. Options 1 and 2 are useful for running proof-of-concept exercises, or just getting DataHub up & running quickly. Option 3 is highly recommended for deploying DataHub in production. +The first option is the easiest to get started with. The second is recommended for deploying DataHub in production. The third should +be reserved for special circumstances where access must be closely monitored and controlled, and is only relevant for Self-Hosted instances. -# Method 1: Inviting users via the DataHub UI +# Shared Invite Links -## Send prospective users an invite link +### Generating an Invite Link -With the right permissions (`MANAGE_USER_CREDENTIALS`), you can invite new users to your deployed DataHub instance from the UI. It's as simple as sending a link! +If you have the `Manage User Credentials` [Platform Privilege](../../authorization/access-policies-guide.md), you can invite new users to DataHub by sharing an invite link. -First navigate, to the Users and Groups tab (under Access) on the Settings page. You'll then see an `Invite Users` button. Note that this will only be clickable -if you have the correct permissions. +To do so, navigate to the **Users & Groups** section inside of Settings page. Here you can generate a shareable invite link by clicking the `Invite Users` button. If you +do not have the correct privileges to invite users, this button will be disabled. -![](../../imgs/invite-users-button.png) +

+ +

-If you click on this button, you'll see a pop-up where you can copy an invite link to send to users, or generate a fresh one. +To invite new users, simply share the link with others inside your organization. -![](../../imgs/invite-users-popup.png) +

+ +

-When a new user visits the link, they will be directed to a sign up screen. Note that if a new link has since been regenerated, the new user won't be able to sign up! +When a new user visits the link, they will be directed to a sign up screen where they can create their DataHub account. -![](../../imgs/user-sign-up-screen.png) +### Resetting User Passwords -## Reset password for native users +To reset a user's password, navigate to the Users & Groups tab, find the user who needs their password reset, +and click **Reset user password** inside the menu dropdown on the right hand side. Note that a user must have the +`Manage User Credentials` [Platform Privilege](../../authorization/access-policies-guide.md) in order to reset passwords. -If a user forgets their password, an admin user with the `MANAGE_USER_CREDENTIALS` privilege can go to the Users and Groups tab and click on the respective user's -`Reset user password` button. +

+ +

-![](../../imgs/reset-user-password-button.png) +To reset the password, simply share the password reset link with the user who needs to change their password. Password reset links expire after 24 hours. -Similar to the invite link, you can generate a new reset link and send a link to that user which they can use to reset their credentials. +

+ +

-![](../../imgs/reset-user-password-popup.png) -When that user visits the link, they will be direct to a screen where they can reset their credentials. If the link is older than 24 hours or another link has since -been generated, they won't be able to reset their credentials! +# Configuring Single Sign-On with OpenID Connect -![](../../imgs/reset-credentials-screen.png) +Setting up Single Sign-On via OpenID Connect enables your organization's users to login to DataHub via a central Identity Provider such as -# Method 2: Configuring static credentials +- Azure AD +- Okta +- Keycloak +- Ping! +- Google Identity -## Changing the default 'datahub' user +and many more. -The 'datahub' admin user is created for you by default. To override that user please follow these steps. This is due to the way the authentication setup is working - we support a "default" user.props containing the root datahub user and a separate custom file, which does not overwrite the first. +This option is strongly recommended for production deployments of DataHub. -However, it's still possible to change the password for the default `datahub user`. To change it, follow these steps: +### Managed DataHub -1. Update the `docker-compose.yaml` to mount your default user.props file to the following location inside the `datahub-frontend-react` container using a volume: -`/datahub-frontend/conf/user.props` - -2. Restart the datahub containers to pick up the new configs - -If you're deploying using the CLI quickstart, you can simply download a copy of the [docker-compose file used in quickstart](https://github.com/datahub-project/datahub/blob/master/docker/quickstart/docker-compose.quickstart.yml), -and modify the `datahub-frontend-react` block to contain the extra volume mount. Then simply run +Single Sign-On can be configured and enabled by navigating to **Settings** > **SSO** > **OIDC**. Note +that a user must have the **Manage Platform Settings** [Platform Privilege](../../authorization/access-policies-guide.md) +in order to configure SSO settings. -``` -datahub docker quickstart —quickstart-compose-file .yml -``` +To complete the integration, you'll need the following: + +1. **Client ID** - A unique identifier for your application with the identity provider +2. **Client Secret** - A shared secret to use for exchange between you and your identity provider +3. **Discovery URL** - A URL where the OpenID settings for your identity provider can be discovered. + +These values can be obtained from your Identity Provider by following Step 1 on the [OpenID Connect Authentication](sso/configure-oidc-react.md)) Guide. + +### Self-Hosted DataHub + +For information about configuring Self-Hosted DataHub to use OpenID Connect (OIDC) to +perform authentication, check out [OIDC Authentication](sso/configure-oidc-react.md). -## Create a user.props file to add new users +> **A note about user URNs**: User URNs are unique identifiers for users on DataHub. The username received from an Identity Provider +> when a user logs into DataHub via OIDC is used to construct a unique identifier for the user on DataHub. The urn is computed as: +> `urn:li:corpuser:` +> +> By default, the email address will be the username extracted from the Identity Provider. For information about customizing +> the claim should be treated as the username in Datahub, check out the [OIDC Authentication](sso/configure-oidc-react.md) documentation. -To define a set of username / password combinations that should be allowed to log in to DataHub, create a new file called `user.props` at the file path `${HOME}/.datahub/plugins/frontend/auth/user.props`. -This file should contain username:password combinations, with 1 user per line. For example, to create 2 new users, +# Static Credential Configuration File (Self-Hosted Only) + +User credentials can be managed via a [JaaS Authentication](./jaas.md) configuration file containing +static username and password combinations. By default, the credentials for the root 'datahub' users are configured +using this mechanism. It is highly recommended that admins change or remove the default credentials for this user + +## Adding new users using a user.props file + +To define a set of username / password combinations that should be allowed to log in to DataHub (in addition to the root 'datahub' user), +create a new file called `user.props` at the file path `${HOME}/.datahub/plugins/frontend/auth/user.props` within the `datahub-frontend-react` container +or pod. + +This file should contain username:password specifications, with one on each line. For example, to create 2 new users, with usernames "janesmith" and "johndoe", we would define the following file: ``` +// custom user.props janesmith:janespassword johndoe:johnspassword ``` @@ -81,14 +115,14 @@ To change or remove existing login credentials, edit and save the `user.props` f If you want to customize the location of the `user.props` file, or if you're deploying DataHub via Helm, proceed to Step 2. -## (Advanced) Mount custom user.props file to container +### (Advanced) Mount custom user.props file to container This step is only required when mounting custom credentials into a Kubernetes pod (e.g. Helm) **or** if you want to change the default filesystem location from which DataHub mounts a custom `user.props` file (`${HOME}/.datahub/plugins/frontend/auth/user.props)`. If you are deploying with `datahub docker quickstart`, or running using Docker Compose, you can most likely skip this step. -### Docker Compose +#### Docker Compose You'll need to modify the `docker-compose.yml` file to mount a container volume mapping your custom user.props to the standard location inside the container (`/etc/datahub/plugins/frontend/auth/user.props`). @@ -111,7 +145,7 @@ For example, to mount a user.props file that is stored on my local filesystem at Once you've made this change, restarting DataHub enable authentication for the configured users. -### Helm +#### Helm You'll need to create a Kubernetes secret, then mount the file as a volume to the `datahub-frontend` pod. @@ -143,55 +177,62 @@ Note that if you update the secret you will need to restart the `datahub-fronten kubectl create secret generic datahub-users-secret --from-file=user.props=./ -o yaml --dry-run=client | kubectl apply -f - ``` -## URNs +> A note on user URNs: User URNs are unique identifiers for users of DataHub. The usernames defined in the `user.props` file will be used to generate the DataHub user "urn", which uniquely identifies +> the user on DataHub. The urn is computed as `urn:li:corpuser:{username}`, where "username is defined inside your user.props file." + +## Changing the default 'datahub' user credentials (Recommended) + +The 'datahub' root user is created for you by default. This user is controlled via a user.props file which [JaaS Authentication](./jaas.md) is configured to use: -URNs are identifiers that uniquely identify an Entity on DataHub. The usernames defined in the `user.props` file will be used to generate the DataHub user "urn", which uniquely identifies -the user on DataHub. The urn is computed as: +By default, the credential file looks like this for each and every self-hosted DataHub deployment: ``` -urn:li:corpuser:{username} +// default user.props +datahub:datahub ``` -## Caveats +Obviously, this is not ideal from a security perspective. It is highly recommended that this file +is changed *prior* to deploying DataHub to production at your organization. -### Adding User Details +To change the default password for this user, or remove it altogether: -If you add a new username / password to the `user.props` file, no other information about the user will exist -about the user in DataHub (full name, email, bio, etc). This means that you will not be able to search to find the user. +1. **Create a new config file**: Create a new version of `user.props` which defines the updated password for the datahub user. +To remove this user, simply omit the username 'datahub' from the new file. For example, to change the +password for the DataHub root user to 'newpassword', your file would contain the following: -In order for the user to become searchable, simply navigate to the new user's profile page (top-right corner) and click -**Edit Profile**. Add some details like a display name, an email, and more. Then click **Save**. Now you should be able -to find the user via search. + ``` + // new user.props + datahub:newpassword + ``` -> You can also use our Python Emitter SDK to produce custom information about the new user via the CorpUser metadata entity. +2. **Mount the updated config file**: Change the `docker-compose.yaml` to mount an updated user.props file to the following location inside the `datahub-frontend-react` container using a volume: + `/datahub-frontend/conf/user.props` -For a more comprehensive overview of how users & groups are managed within DataHub, check out [this video](https://www.youtube.com/watch?v=8Osw6p9vDYY). +2. **Restart DataHub**: Restart the DataHub containers or pods to pick up the new configs -# Method 3: Configuring SSO via OpenID Connect -Setting up SSO via OpenID Connect means that users will be able to login to DataHub via a central Identity Provider such as +If you're deploying using the CLI quickstart, you can simply download a copy of the [docker-compose file used in quickstart](https://github.com/datahub-project/datahub/blob/master/docker/quickstart/docker-compose.quickstart.yml), +and modify the `datahub-frontend-react` block to contain the extra volume mount. Then run -- Azure AD -- Okta -- Keycloak -- Ping! -- Google Identity +``` +datahub docker quickstart —quickstart-compose-file .yml +``` -and more. -This option is recommended for production deployments of DataHub. For detailed information about configuring DataHub to use OIDC to -perform authentication, check out [OIDC Authentication](sso/configure-oidc-react.md). +## Caveats -## URNs +### Adding User Details -URNs are identifiers that uniquely identify an Entity on DataHub. The username received from an Identity Provider -when a user logs into DataHub via OIDC is used to construct a unique identifier for the user on DataHub. The urn is computed as: +If you add a new username / password to the `user.props` file, no other information about the user will exist +about the user in DataHub (full name, email, bio, etc). This means that you will not be able to search to find the user. -``` -urn:li:corpuser: -``` +In order for the user to become searchable, simply navigate to the new user's profile page (top-right corner) and click +**Edit Profile**. Add some details like a display name, an email, and more. Then click **Save**. Now you should be able +to find the user via search. -For information about configuring which OIDC claim should be used as the username for Datahub, check out the [OIDC Authentication](sso/configure-oidc-react.md) doc. +> You can also use our Python Emitter SDK to produce custom information about the new user via the CorpUser metadata entity. + +For a more comprehensive overview of how users & groups are managed within DataHub, check out [this video](https://www.youtube.com/watch?v=8Osw6p9vDYY). ## FAQ @@ -199,7 +240,7 @@ For information about configuring which OIDC claim should be used as the usernam 1. Can I enable OIDC and username / password (JaaS) authentication at the same time? YES! If you have not explicitly disabled JaaS via an environment variable on the datahub-frontend container (AUTH_JAAS_ENABLED), -then you can _always_ access the standard login flow at `http://your-datahub-url.com/login`. +then you can always access the standard login flow at `http://your-datahub-url.com/login`. ## Feedback / Questions / Concerns diff --git a/docs/authentication/personal-access-tokens.md b/docs/authentication/personal-access-tokens.md index 5d56af7dce33f..0188aab49444e 100644 --- a/docs/authentication/personal-access-tokens.md +++ b/docs/authentication/personal-access-tokens.md @@ -4,7 +4,7 @@ import FeatureAvailability from '@site/src/components/FeatureAvailability'; -Personal Access Tokens or PATs for short, allow users to represent themselves in code and programmatically use DataHub's APIs in deployments where security is a concern. +Personal Access Tokens, or PATs for short, allow users to represent themselves in code and programmatically use DataHub's APIs in deployments where security is a concern. Used along-side with [authentication-enabled metadata service](introducing-metadata-service-authentication.md), PATs add a layer of protection to DataHub where only authorized users are able to perform actions in an automated way. diff --git a/docs/authorization/access-policies-guide.md b/docs/authorization/access-policies-guide.md index a659132d90036..563405c087df4 100644 --- a/docs/authorization/access-policies-guide.md +++ b/docs/authorization/access-policies-guide.md @@ -12,7 +12,7 @@ There are 2 types of Access Policy within DataHub: 2. **Metadata** Policies

- +

**Platform** Policies determine who has platform-level Privileges on DataHub. These include: @@ -68,7 +68,7 @@ Policies can be created by first navigating to **Settings > Permissions > Polici To begin building a new Policy, click **Create new Policy**.

- +

### Creating a Platform Policy @@ -88,7 +88,7 @@ You can optionally provide a text description to add richer details about the pu In the second step, we can simply select the Privileges that this Platform Policy will grant.

- +

**Platform** Privileges most often provide access to perform administrative functions on the Platform. These include: @@ -118,13 +118,13 @@ In this step, we can select the actors who should be granted Privileges appearin To do so, simply search and select the Users or Groups that the Policy should apply to.

- +

**Assigning a Policy to a User**

- +

**Assigning a Policy to a Group** @@ -155,7 +155,7 @@ For example, if we only want to grant access for `Datasets` on DataHub, we can s `Datasets`.

- +

Next, we can search for specific Entities of the that the Policy should grant privileges on. @@ -165,7 +165,7 @@ For example, if we only want to grant access for a specific sample dataset, we c select it directly.

- +

We can also limit the scope of the Policy to assets that live in a specific **Domain**. If left blank, @@ -175,14 +175,14 @@ For example, if we only want to grant access for assets part of a "Marketing" Do select it.

- +

Finally, we will choose the Privileges to grant when the selected entities fall into the defined scope.

- +

**Metadata** Privileges grant access to change specific *entities* (i.e. data assets) on DataHub. @@ -228,18 +228,18 @@ can target specific Users & Groups, or the *owners* of the Entities that are inc To do so, simply search and select the Users or Groups that the Policy should apply to.

- +

- +

We can also grant the Privileges to the *owners* of Entities (or *Resources*) that are in scope for the Policy. This advanced functionality allows of Admins of DataHub to closely control which actions can or cannot be performed by owners.

- +

### Updating an Existing Policy @@ -247,7 +247,7 @@ This advanced functionality allows of Admins of DataHub to closely control which To update an existing Policy, simply click the **Edit** on the Policy you wish to change.

- +

Then, make the changes required and click **Save**. When you save a Policy, it may take up to 2 minutes for changes @@ -271,7 +271,7 @@ To deactivate a Policy, simply click the **Deactivate** button on the Policy you the state of a Policy, it may take up to 2 minutes for the changes to be reflected.

- +

After deactivating, you can re-enable a Policy by clicking **Activate**.