Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs(adding users): Refreshing the docs for adding new DataHub Users #6879

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs-website/sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -273,7 +273,6 @@ module.exports = {
"docs/deploy/gcp",
"docker/README",
"docs/deploy/kubernetes",
"docs/how/updating-datahub",
{
Authentication: [
"docs/authentication/README",
Expand Down Expand Up @@ -318,6 +317,7 @@ module.exports = {
"docs/advanced/no-code-upgrade",
],
},
"docs/how/updating-datahub",
],
"Developer Guides": [
// The purpose of this section is to provide developers & technical users with
Expand Down
66 changes: 40 additions & 26 deletions docs/authentication/README.md
Original file line number Diff line number Diff line change
@@ -1,41 +1,55 @@
# Overview

Authentication is the process of verifying the identity of a user or service. In DataHub this can be split into 2 main components:
- How to login into DataHub.
- How to make some action withing DataHub on **behalf** of a user/service.
Authentication is the process of verifying the identity of a user or service. There are two
places where Authentication occurs inside DataHub:

:::note
1. DataHub frontend service when a user attempts to log in to the DataHub application.
2. DataHub backend service when making API requests to DataHub.

Authentication in DataHub does not necessarily mean that the user/service being authenticated will be part of the metadata graph within DataHub itself other concepts like Datasets or Dashboards.
In other words, a user called `john.smith` logging into DataHub does not mean that john.smith appears as a CorpUser Entity within DataHub.
In this document, we'll tak a closer look at both.

For a quick video on that subject, have a look at our video on [DataHub Basics — Users, Groups, & Authentication 101
](https://youtu.be/8Osw6p9vDYY)
### Authentication in the Frontend

:::
Authentication of normal users of DataHub takes place in two phases.

### Authentication in the Frontend
At login time, authentication is performed by either DataHub itself (via username / password entry) or a third-party Identity Provider. Once the identity
of the user has been established, and credentials validated, a persistent session token is generated for the user and stored
in a browser-side session cookie.

DataHub provides 3 mechanisms for authentication at login time:

- **Native Authentication** which uses username and password combinations natively stored and managed by DataHub, with users invited via an invite link.
- [Single Sign-On with OpenID Connect](guides/sso/configure-oidc-react.md) to delegate authentication responsibility to third party systems like Okta or Google/Azure Authentication. This is the recommended approach for production systems.
- [JaaS Authentication](guides/jaas.md) for simple deployments where authenticated users are part of some known list or invited as a [Native DataHub User](guides/add-users.md).

In subsequent requests, the session token is used to represent the authenticated identity of the user, and is validated by DataHub's backend service (discussed below).
Eventually, the session token is expired (24 hours by default), at which point the end user is required to log in again.

### Authentication in the Backend (Metadata Service)

Authentication in DataHub happens at 2 possible moments, if enabled.
When a user makes a request for Data within DataHub, the request is authenticated by DataHub's Backend (Metadata Service) via a JSON Web Token. This applies to both requests originating from the DataHub application,
and programmatic calls to DataHub APIs. There are two types of tokens that are important:

The first happens in the **DataHub Frontend** component when you access the UI.
You will be prompted with a login screen, upon which you must supply a username/password combo or OIDC login to access DataHub's UI.
This is typical scenario for a human interacting with DataHub.
1. **Session Tokens**: Generated for users of the DataHub web application. By default, having a duration of 24 hours.
These tokens are encoded and stored inside browser-side session cookies.
2. **Personal Access Tokens**: These are tokens generated via the DataHub settings panel useful for interacting
with DataHub APIs. They can be used to automate processes like enriching documentation, ownership, tags, and more on DataHub. Learn
more about Personal Access Tokens [here](personal-access-tokens.md).

DataHub provides 2 methods of authentication:
- [JaaS Authentication](guides/jaas.md) for simple deployments where authenticated users are part of some known list or invited as a [Native DataHub User](guides/add-users.md).
- [OIDC Authentication](guides/sso/configure-oidc-react.md) to delegate authentication responsibility to third party systems like Okta or Google/Azure Authentication. This is the recommended approach for production systems.
To learn more about DataHub's backend authentication, check out [Introducing Metadata Service Authentication](introducing-metadata-service-authentication.md).

Upon validation of a user's credentials through one of these authentication systems, DataHub will generate a session token with which all subsequent requests will be made.
Credentials must be provided as Bearer Tokens inside of the **Authorization** header in any request made to DataHub's API layer. To learn

### Authentication in the Backend
```shell
Authorization: Bearer <your-token>
```

The second way in which authentication occurs, is within DataHub's Backend (Metadata Service) when a user makes a request either through the UI or through APIs.
In this case DataHub makes use of Personal Access Tokens or session HTTP headers to apply actions on behalf of some user.
To learn more about DataHub's backend authentication have a look at our docs on [Introducing Metadata Service Authentication](introducing-metadata-service-authentication.md).
Note that in DataHub local quickstarts, Authentication at the backend layer is disabled for convenience. This leaves the backend
vulnerable to unauthenticated requests and should not be used in production. To enable
backend (token-based) authentication, simply set the `METADATA_SERVICE_AUTH_ENABLED=true` environment variable
for the datahub-gms container or pod.

Note, while authentication can happen on both the frontend or backend components of DataHub, they are separate, related processes.
The first is to authenticate users/services by a third party system (Open-ID connect or Java based authentication) and the latter to only permit identified requests to be accepted by DataHub via access tokens or bearer cookies.
### References

If you only want some users to interact with DataHub's UI, enable authentication in the Frontend and manage who is allowed either through JaaS or OIDC login methods.
If you want users to be able to access DataHub's backend directly without going through the UI in an authenticated manner, then enable authentication in the backend and generate access tokens for them.
For a quick video on the topic of users and groups within DataHub, have a look at [DataHub Basics — Users, Groups, & Authentication 101
](https://youtu.be/8Osw6p9vDYY)
Loading