Skip to content

Commit

Permalink
feat(docs) Add documentation on authorization & authentication (datah…
Browse files Browse the repository at this point in the history
  • Loading branch information
pedro93 authored and maggiehays committed Aug 1, 2022
1 parent 6d21b57 commit 6e2bb99
Show file tree
Hide file tree
Showing 33 changed files with 302 additions and 205 deletions.
2 changes: 1 addition & 1 deletion datahub-frontend/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ WHZ-Authentication {

### Authentication in React
The React app supports both JAAS as described above and separately OIDC authentication. To learn about configuring OIDC for React,
see the [OIDC in React](../docs/how/auth/sso/configure-oidc-react.md) document.
see the [OIDC in React](../docs/authentication/guides/sso/configure-oidc-react.md) document.


### API Debugging
Expand Down
59 changes: 48 additions & 11 deletions docs-website/sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,54 @@ module.exports = {
"releases",
],
"Getting Started": ["docs/quickstart", "docs/debugging"],
Authentication: [
{
type: "doc",
id: "docs/authentication/README",
label: "Overview",
},
{
type: "doc",
id: "docs/authentication/concepts",
label: "Concepts",
},
{
"Frontend Authentication": [
"docs/authentication/guides/jaas",
{
"OIDC Authentication": [
"docs/authentication/guides/sso/configure-oidc-react",
"docs/authentication/guides/sso/configure-oidc-react-google",
"docs/authentication/guides/sso/configure-oidc-react-okta",
"docs/authentication/guides/sso/configure-oidc-react-azure",
],
},
"docs/authentication/guides/add-users",
],
},
{
type: "doc",
id: "docs/authentication/introducing-metadata-service-authentication",
label: "Metadata Service Authentication",
},
{
type: "doc",
id: "docs/authentication/personal-access-tokens",
label: "Personal Access Tokens",
},
],
Authorization: [
{
type: "doc",
id: "docs/authorization/README",
label: "Overview",
},
{
type: "doc",
id: "docs/authorization/policies",
label: "Access Policies",
},
],
Ingestion: [
// add a custom label since the default is 'Metadata Ingestion'
// note that we also have to add the path to this file in sidebarsjs_hardcoded_titles in generateDocsDir.ts
Expand Down Expand Up @@ -274,13 +322,11 @@ module.exports = {
},
],
"Usage Guides": [
"docs/policies",
"docs/domains",
"docs/ui-ingestion",
"docs/tags",
"docs/schema-history",
"docs/how/search",
"docs/how/auth/add-users",
"docs/how/ui-tabs-guide",
"docs/how/business-glossary-guide",
],
Expand All @@ -297,15 +343,6 @@ module.exports = {
//"docs/how/build-metadata-service",
//"docs/how/graph-onboarding",
//"docs/demo/graph-onboarding",
{
Authentication: [
"docs/how/auth/jaas",
"docs/how/auth/sso/configure-oidc-react",
"docs/how/auth/sso/configure-oidc-react-google",
"docs/how/auth/sso/configure-oidc-react-okta",
"docs/how/auth/sso/configure-oidc-react-azure",
],
},
"docs/what/mxe",
"docs/how/restore-indices",
"docs/dev-guides/timeline",
Expand Down
2 changes: 1 addition & 1 deletion docs/api/graphql/getting-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ Today, DataHub's GraphQL endpoint is available for use in multiple places. The o

1. **Metadata Service**: The DataHub Metadata Service (backend) is the source-of-truth for the GraphQL endpoint. The endpoint is located at `/api/graphql` path of the DNS address
where your instance of the `datahub-gms` container is deployed. For example, in local deployments it is typically located at `http://localhost:8080/api/graphql`. By default,
the Metadata Service has no explicit authentication checks. However, it does have *Authorization checks*. DataHub [Access Policies](../../../docs/policies.md) will be enforced by the GraphQL API. This means you'll need to provide an actor identity when querying the GraphQL API.
the Metadata Service has no explicit authentication checks. However, it does have *Authorization checks*. DataHub [Access Policies](../../authorization/policies.md) will be enforced by the GraphQL API. This means you'll need to provide an actor identity when querying the GraphQL API.
To do so, include the `X-DataHub-Actor` header with an Authorized Corp User URN as the value in your request. Because anyone is able to set the value of this header, we recommend using this endpoint only in trusted environments, either by administrators themselves or programs that they own directly.

2. **Frontend Proxy**: The DataHub Frontend Proxy Service (frontend) is a basic web server & reverse proxy to the Metadata Service. As such, the
Expand Down
2 changes: 1 addition & 1 deletion docs/api/graphql/querying-entities.md
Original file line number Diff line number Diff line change
Expand Up @@ -122,7 +122,7 @@ DataHub provides the following GraphQL mutations for updating entities in your M

### Authorization

Mutations which change Entity metadata are subject to [DataHub Access Policies](../../../docs/policies.md). This means that DataHub's server
Mutations which change Entity metadata are subject to [DataHub Access Policies](../../authorization/policies.md). This means that DataHub's server
will check whether the requesting actor is authorized to perform the action. If you're querying the GraphQL endpoint via the DataHub
Proxy Server, which is discussed more in [Getting Started](./getting-started.md), then the Session Cookie provided will carry the actor information.
If you're querying the Metadata Service API directly, then you'll have to provide this via a special `X-DataHub-Actor` HTTP header, which should
Expand Down
41 changes: 41 additions & 0 deletions docs/authentication/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# Overview

Authentication is the process of verifying the identity of a user or service. In DataHub this can be split into 2 main components:
- How to login into DataHub.
- How to make some action withing DataHub on **behalf** of a user/service.

:::note

Authentication in DataHub does not necessarily mean that the user/service being authenticated will be part of the metadata graph within DataHub itself other concepts like Datasets or Dashboards.
In other words, a user called `john.smith` logging into DataHub does not mean that john.smith appears as a CorpUser Entity within DataHub.

For a quick video on that subject, have a look at our video on [DataHub Basics — Users, Groups, & Authentication 101
](https://youtu.be/8Osw6p9vDYY)

:::

### Authentication in the Frontend

Authentication in DataHub happens at 2 possible moments, if enabled.

The first happens in the **DataHub Frontend** component when you access the UI.
You will be prompted with a login screen, upon which you must supply a username/password combo or OIDC login to access DataHub's UI.
This is typical scenario for a human interacting with DataHub.

DataHub provides 2 methods of authentication:
- [JaaS Authentication](guides/jaas.md) for simple deployments where authenticated users are part of some known list or invited as a [Native DataHub User](guides/add-users.md).
- [OIDC Authentication](guides/sso/configure-oidc-react.md) to delegate authentication responsibility to third party systems like Okta or Google/Azure Authentication. This is the recommended approach for production systems.

Upon validation of a user's credentials through one of these authentication systems, DataHub will generate a session token with which all subsequent requests will be made.

### Authentication in the Backend

The second way in which authentication occurs, is within DataHub's Backend (Metadata Service) when a user makes a request either through the UI or through APIs.
In this case DataHub makes use of Personal Access Tokens or session HTTP headers to apply actions on behalf of some user.
To learn more about DataHub's backend authentication have a look at our docs on [Introducing Metadata Service Authentication](introducing-metadata-service-authentication.md).

Note, while authentication can happen on both the frontend or backend components of DataHub, they are separate, related processes.
The first is to authenticate users/services by a third party system (Open-ID connect or Java based authentication) and the latter to only permit identified requests to be accepted by DataHub via access tokens or bearer cookies.

If you only want some users to interact with DataHub's UI, enable authentication in the Frontend and manage who is allowed either through JaaS or OIDC login methods.
If you want users to be able to access DataHub's backend directly without going through the UI in an authenticated manner, then enable authentication in the backend and generate access tokens for them.
123 changes: 123 additions & 0 deletions docs/authentication/concepts.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
# Concepts & Key Components

We introduced a few important concepts to the Metadata Service to make authentication work:

1. Actor
2. Authenticator
3. AuthenticatorChain
4. AuthenticationFilter
5. DataHub Access Token
6. DataHub Token Service

In following sections, we'll take a closer look at each individually.

![](../imgs/metadata-service-auth.png)
*High level overview of Metadata Service Authentication*

## What is an Actor?

An **Actor** is a concept within the new Authentication subsystem to represent a unique identity / principal that is initiating actions (e.g. read & write requests)
on the platform.

An actor can be characterized by 2 attributes:

1. **Type**: The "type" of the actor making a request. The purpose is to for example distinguish between a "user" & "service" actor. Currently, the "user" actor type is the only one
formally supported.
2. **Id**: A unique identifier for the actor within DataHub. This is commonly known as a "principal" in other systems. In the case of users, this
represents a unique "username". This username is in turn used when converting from the "Actor" concept into a Metadata Entity Urn (e.g. CorpUserUrn).

For example, the root "datahub" super user would have the following attributes:

```
{
"type": "USER",
"id": "datahub"
}
```

Which is mapped to the CorpUser urn:

```
urn:li:corpuser:datahub
```

for Metadata retrieval.

## What is an Authenticator?

An **Authenticator** is a pluggable component inside the Metadata Service that is responsible for authenticating an inbound request provided context about the request (currently, the request headers).
Authentication boils down to successfully resolving an **Actor** to associate with the inbound request.

There can be many types of Authenticator. For example, there can be Authenticators that

- Verify the authenticity of access tokens (ie. issued by either DataHub itself or a 3rd-party IdP)
- Authenticate username / password credentials against a remote database (ie. LDAP)

and more! A key goal of the abstraction is *extensibility*: a custom Authenticator can be developed to authenticate requests
based on an organization's unique needs.

DataHub ships with 2 Authenticators by default:

- **DataHubSystemAuthenticator**: Verifies that inbound requests have originated from inside DataHub itself using a shared system identifier
and secret. This authenticator is always present.

- **DataHubTokenAuthenticator**: Verifies that inbound requests contain a DataHub-issued Access Token (discussed further in the "DataHub Access Token" section below) in their
'Authorization' header. This authenticator is required if Metadata Service Authentication is enabled.

## What is an AuthenticatorChain?

An **AuthenticatorChain** is a series of **Authenticators** that are configured to run one-after-another. This allows
for configuring multiple ways to authenticate a given request, for example via LDAP OR via local key file.

Only if each Authenticator within the chain fails to authenticate a request will it be rejected.

The Authenticator Chain can be configured in the `application.yml` file under `authentication.authenticators`:

```
authentication:
....
authenticators:
# Configure the Authenticators in the chain
- type: com.datahub.authentication.Authenticator1
...
- type: com.datahub.authentication.Authenticator2
....
```

## What is the AuthenticationFilter?

The **AuthenticationFilter** is a [servlet filter](http://tutorials.jenkov.com/java-servlets/servlet-filters.html) that authenticates each and requests to the Metadata Service.
It does so by constructing and invoking an **AuthenticatorChain**, described above.

If an Actor is unable to be resolved by the AuthenticatorChain, then a 401 unauthorized exception will be returned by the filter.


## What is a DataHub Token Service? What are Access Tokens?

Along with Metadata Service Authentication comes an important new component called the **DataHub Token Service**. The purpose of this
component is twofold:

1. Generate Access Tokens that grant access to the Metadata Service
2. Verify the validity of Access Tokens presented to the Metadata Service

**Access Tokens** granted by the Token Service take the form of [Json Web Tokens](https://jwt.io/introduction), a type of stateless token which
has a finite lifespan & is verified using a unique signature. JWTs can also contain a set of claims embedded within them. Tokens issued by the Token
Service contain the following claims:

- exp: the expiration time of the token
- version: version of the DataHub Access Token for purposes of evolvability (currently 1)
- type: The type of token, currently SESSION (used for UI-based sessions) or PERSONAL (used for personal access tokens)
- actorType: The type of the **Actor** associated with the token. Currently, USER is the only type supported.
- actorId: The id of the **Actor** associated with the token.

Today, Access Tokens are granted by the Token Service under two scenarios:

1. **UI Login**: When a user logs into the DataHub UI, for example via [JaaS](guides/jaas.md) or
[OIDC](guides/sso/configure-oidc-react.md), the `datahub-frontend` service issues an
request to the Metadata Service to generate a SESSION token *on behalf of* of the user logging in. (*Only the frontend service is authorized to perform this action).
2. **Generating Personal Access Tokens**: When a user requests to generate a Personal Access Token (described below) from the UI.

> At present, the Token Service supports the symmetric signing method `HS256` to generate and verify tokens.
Now that we're familiar with the concepts, we will talk concretely about what new capabilities have been built on top
of Metadata Service Authentication.
Original file line number Diff line number Diff line change
Expand Up @@ -182,7 +182,7 @@ Setting up SSO via OpenID Connect means that users will be able to login to Data
and more.
This option is recommended for production deployments of DataHub. For detailed information about configuring DataHub to use OIDC to
perform authentication, check out [OIDC Authentication](./sso/configure-oidc-react.md).
perform authentication, check out [OIDC Authentication](sso/configure-oidc-react.md).
## URNs
Expand All @@ -193,7 +193,7 @@ when a user logs into DataHub via OIDC is used to construct a unique identifier
urn:li:corpuser:<extracted-username>
```
For information about configuring which OIDC claim should be used as the username for Datahub, check out the [OIDC Authentication](./sso/configure-oidc-react.md) doc.
For information about configuring which OIDC claim should be used as the username for Datahub, check out the [OIDC Authentication](sso/configure-oidc-react.md) doc.
## FAQ
Expand Down
File renamed without changes.
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# OIDC Authentication
# Overview

The DataHub React application supports OIDC authentication built on top of the [Pac4j Play](https://github.com/pac4j/play-pac4j) library.
This enables operators of DataHub to integrate with 3rd party identity providers like Okta, Google, Keycloak, & more to authenticate their users.
Expand Down Expand Up @@ -188,5 +188,4 @@ A brief summary of the steps that occur when the user navigates to the React app
Even if OIDC is configured the root user can still login without OIDC by going
to `/login` URL endpoint. It is recommended that you don't use the default
credentials by mounting a different file in the front end container. To do this
please see [jaas](https://datahubproject.io/docs/how/auth/jaas/#mount-a-custom-userprops-file-docker-compose) -
"Mount a custom user.props file".
please see how to mount a custom user.props file for a JAAS authenticated deployment.
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
Loading

0 comments on commit 6e2bb99

Please sign in to comment.